Search tips
Search criteria

Results 1-25 (26)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Retrotransposon Proliferation Coincident with the Evolution of Dioecy in Asparagus 
G3: Genes|Genomes|Genetics  2016;6(9):2679-2685.
Current phylogenetic sampling reveals that dioecy and an XY sex chromosome pair evolved once, or possibly twice, in the genus Asparagus. Although there appear to be some lineage-specific polyploidization events, the base chromosome number of 2n = 2× = 20 is relatively conserved across the Asparagus genus. Regardless, dioecious species tend to have larger genomes than hermaphroditic species. Here, we test whether this genome size expansion in dioecious species is related to a polyploidization and subsequent chromosome fusion, or to retrotransposon proliferation in dioecious species. We first estimate genome sizes, or use published values, for four hermaphrodites and four dioecious species distributed across the phylogeny, and show that dioecious species typically have larger genomes than hermaphroditic species. Utilizing a phylogenomic approach, we find no evidence for ancient polyploidization contributing to increased genome sizes of sampled dioecious species. We do find support for an ancient whole genome duplication (WGD) event predating the diversification of the Asparagus genus. Repetitive DNA content of the four hermaphroditic and four dioecious species was characterized based on randomly sampled whole genome shotgun sequencing, and common elements were annotated. Across our broad phylogenetic sampling, Ty-1 Copia retroelements, in particular, have undergone a marked proliferation in dioecious species. In the absence of a detectable WGD event, retrotransposon proliferation is the most likely explanation for the precipitous increase in genome size in dioecious Asparagus species.
PMCID: PMC5015926  PMID: 27342737
Asparagus; dioecy; sex chromosomes; transposons; genome size
2.  Multiple Polyploidy Events in the Early Radiation of Nodulating and Nonnodulating Legumes 
Molecular Biology and Evolution  2014;32(1):193-210.
Unresolved questions about evolution of the large and diverse legume family include the timing of polyploidy (whole-genome duplication; WGDs) relative to the origin of the major lineages within the Fabaceae and to the origin of symbiotic nitrogen fixation. Previous work has established that a WGD affects most lineages in the Papilionoideae and occurred sometime after the divergence of the papilionoid and mimosoid clades, but the exact timing has been unknown. The history of WGD has also not been established for legume lineages outside the Papilionoideae. We investigated the presence and timing of WGDs in the legumes by querying thousands of phylogenetic trees constructed from transcriptome and genome data from 20 diverse legumes and 17 outgroup species. The timing of duplications in the gene trees indicates that the papilionoid WGD occurred in the common ancestor of all papilionoids. The earliest diverging lineages of the Papilionoideae include both nodulating taxa, such as the genistoids (e.g., lupin), dalbergioids (e.g., peanut), phaseoloids (e.g., beans), and galegoids (=Hologalegina, e.g., clovers), and clades with nonnodulating taxa including Xanthocercis and Cladrastis (evaluated in this study). We also found evidence for several independent WGDs near the base of other major legume lineages, including the Mimosoideae–Cassiinae–Caesalpinieae (MCC), Detarieae, and Cercideae clades. Nodulation is found in the MCC and papilionoid clades, both of which experienced ancestral WGDs. However, there are numerous nonnodulating lineages in both clades, making it unclear whether the phylogenetic distribution of nodulation is due to independent gains or a single origin followed by multiple losses.
PMCID: PMC4271530  PMID: 25349287
nodulation; polyploidy; legume; symbiotic nitrogen fixation; Papilionoideae; Mimosoideae
3.  Plant Comparative and Functional Genomics 
PMCID: PMC4736190  PMID: 26881194
4.  Gas exchange and leaf anatomy of a C3–CAM hybrid, Yucca gloriosa (Asparagaceae) 
Journal of Experimental Botany  2015;67(5):1369-1379.
Physiological and anatomical study of a C3–CAM hybrid species, which has the ability to use CAM under drought and has significant anatomical differences from both C3 and CAM parental species.
While the majority of plants use the typical C3 carbon metabolic pathway, ~6% of angiosperms have adapted to carbon limitation as a result of water stress by employing a modified form of photosynthesis known as Crassulacean acid metabolism (CAM). CAM plants concentrate carbon in the cells by temporally separating atmospheric carbon acquisition from fixation into carbohydrates. CAM has been studied for decades, but the evolutionary progression from C3 to CAM remains obscure. In order to better understand the morphological and physiological characteristics associated with CAM photosynthesis, phenotypic variation was assessed in Yucca aloifolia, a CAM species, Yucca filamentosa, a C3 species, and Yucca gloriosa, a hybrid species derived from these two yuccas exhibiting intermediate C3–CAM characteristics. Gas exchange, titratable leaf acidity, and leaf anatomical traits of all three species were assayed in a common garden under well-watered and drought-stressed conditions. Yucca gloriosa showed intermediate phenotypes for nearly all traits measured, including the ability to acquire carbon at night. Using the variation found among individuals of all three species, correlations between traits were assessed to better understand how leaf anatomy and CAM physiology are related. Yucca gloriosa may be constrained by a number of traits which prevent it from using CAM to as high a degree as Y. aloifolia. The intermediate nature of Y. gloriosa makes it a promising system in which to study the evolution of CAM.
PMCID: PMC4762382  PMID: 26717954
Crassulacean acid metabolism; gas exchange; hybrid; leaf anatomy; photosynthesis; physiology.
5.  Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)? 
Annals of Botany  2013;113(1):119-133.
Background and Aims
Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life.
Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support.
Key Results
Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies.
Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes will continue to be highly useful in plant phylogenetics, but the current study adds to a growing body of literature suggesting that they may not provide enough character information for resolving ancient, rapid radiations.
PMCID: PMC3864734  PMID: 24280362
Tropical gingers; Zingiberales; plastome; next-generation sequencing; Illumina; phylogeny; evolution; monocots; support; phylogenomics; ancient radiation; plastid gene set
6.  Data access for the 1,000 Plants (1KP) project 
GigaScience  2014;3:17.
The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.
PMCID: PMC4306014  PMID: 25625010
Viridiplantae; Biodiversity; Transcriptomes; Phylogenomics; Interactions; Pathways
7.  Gene families as soft cliques with backbones: Amborella contrasted with other flowering plants 
BMC Genomics  2014;15(Suppl 6):S8.
Chaining is a major problem in constructing gene families.
We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families.
We offer several possible evolutionary explanations for this result.
PMCID: PMC4240082  PMID: 25572777
gene families; clustering; S-plex; angiosperms; Amborella trichopeda
8.  Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach 
BMC Medical Genomics  2013;6(Suppl 3):S5.
Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.
PMCID: PMC3980757  PMID: 24565381
9.  Characterization of the basal angiosperm Aristolochia fimbriata: a potential experimental system for genetic studies 
BMC Plant Biology  2013;13:13.
Previous studies in basal angiosperms have provided insight into the diversity within the angiosperm lineage and helped to polarize analyses of flowering plant evolution. However, there is still not an experimental system for genetic studies among basal angiosperms to facilitate comparative studies and functional investigation. It would be desirable to identify a basal angiosperm experimental system that possesses many of the features found in existing plant model systems (e.g., Arabidopsis and Oryza).
We have considered all basal angiosperm families for general characteristics important for experimental systems, including availability to the scientific community, growth habit, and membership in a large basal angiosperm group that displays a wide spectrum of phenotypic diversity. Most basal angiosperms are woody or aquatic, thus are not well-suited for large scale cultivation, and were excluded. We further investigated members of Aristolochiaceae for ease of culture, life cycle, genome size, and chromosome number. We demonstrated self-compatibility for Aristolochia elegans and A. fimbriata, and transformation with a GFP reporter construct for Saruma henryi and A. fimbriata. Furthermore, A. fimbriata was easily cultivated with a life cycle of just three months, could be regenerated in a tissue culture system, and had one of the smallest genomes among basal angiosperms. An extensive multi-tissue EST dataset was produced for A. fimbriata that includes over 3.8 million 454 sequence reads.
Aristolochia fimbriata has numerous features that facilitate genetic studies and is suggested as a potential model system for use with a wide variety of technologies. Emerging genetic and genomic tools for A. fimbriata and closely related species can aid the investigation of floral biology, developmental genetics, biochemical pathways important in plant-insect interactions as well as human health, and various other features present in early angiosperms.
PMCID: PMC3621149  PMID: 23347749
10.  Homoploid hybrid origin of Yucca gloriosa: intersectional hybrid speciation in Yucca (Agavoideae, Asparagaceae) 
Ecology and Evolution  2012;2(9):2213-2222.
There is a growing appreciation for the importance of hybrid speciation in angiosperm evolution. Here, we show that Yucca gloriosa (Asparagaceae: Agavoideae) is the product of intersectional hybridization between Y. aloifolia and Y. filamentosa. These species, all named by Carl Linnaeus, exist in sympatry along the southeastern Atlantic coast of the United States. Yucca gloriosa was found to share a chloroplast haplotype with Y. aloifolia in all populations sampled. In contrast, nuclear gene-based microsatellite markers in Y. gloriosa are shared with both parents. The hybrid origin of Y. gloriosa is supported by multilocus analyses of the nuclear microsatellite markers including principal coordinates analysis (PCO), maximum-likelihood hybrid index scoring (HINDEX), and Bayesian cluster analysis (STRUCTURE). The putative parental species share only one allele at a single locus, suggesting there is little to no introgressive gene flow occurring between these species and Y. gloriosa. At the same time, diagnostic markers are segregating in Y. gloriosa populations. Lack of variation in the chloroplast of Y. aloifolia, the putative maternal parent, makes it difficult to rule out multiple hybrid origins of Y. gloriosa, but allelic variation at nuclear loci can be explained by a single hybrid origin of Y. gloriosa. Overall, these data provide strong support for the homoploid hybrid origin of Y. gloriosa.
PMCID: PMC3488672  PMID: 23139880
Hybrid index; hybrid speciation; microsatellite analysis; Yucca
11.  A genome triplication associated with early diversification of the core eudicots 
Genome Biology  2012;13(1):R3.
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
PMCID: PMC3334584  PMID: 22280555
12.  Taking the First Steps towards a Standard for Reporting on Phylogenies: Minimal Information about a Phylogenetic Analysis (MIAPA) 
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
PMCID: PMC3167193  PMID: 16901231
13.  A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure 
Genome Biology  2011;12(5):R48.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
PMCID: PMC3219971  PMID: 21619600
14.  Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop 
Standards in Genomic Sciences  2010;3(3):259-266.
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at
PMCID: PMC3035314  PMID: 21304730
15.  Floral homeotic C function genes repress specific B function genes in the carpel whorl of the basal eudicot California poppy (Eschscholzia californica) 
EvoDevo  2010;1:13.
The floral homeotic C function gene AGAMOUS (AG) confers stamen and carpel identity and is involved in the regulation of floral meristem termination in Arabidopsis. Arabidopsis ag mutants show complete homeotic conversions of stamens into petals and carpels into sepals as well as indeterminacy of the floral meristem. Gene function analysis in model core eudicots and the monocots rice and maize suggest a conserved function for AG homologs in angiosperms. At the same time gene phylogenies reveal a complex history of gene duplications and repeated subfunctionalization of paralogs.
EScaAG1 and EScaAG2, duplicate AG homologs in the basal eudicot Eschscholzia californica show a high degree of similarity in sequence and expression, although EScaAG2 expression is lower than EScaAG1 expression. Functional studies employing virus-induced gene silencing (VIGS) demonstrate that knock down of EScaAG1 and 2 function leads to homeotic conversion of stamens into petaloid structures and defects in floral meristem termination. However, carpels are transformed into petaloid organs rather than sepaloid structures. We also show that a reduction of EScaAG1 and EScaAG2 expression leads to significantly increased expression of a subset of floral homeotic B genes.
This work presents expression and functional analysis of the two basal eudicot AG homologs. The reduction of EScaAG1 and 2 functions results in the change of stamen to petal identity and a transformation of the central whorl organ identity from carpel into petal identity. Petal identity requires the presence of the floral homeotic B function and our results show that the expression of a subset of B function genes extends into the central whorl when the C function is reduced. We propose a model for the evolution of B function regulation by C function suggesting that the mode of B function gene regulation found in Eschscholzia is ancestral and the C-independent regulation as found in Arabidopsis is evolutionarily derived.
PMCID: PMC3012024  PMID: 21122096
16.  Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels 
Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.
There are 959 single copy nuclear genes shared in Arabidopsis, Populus, Vitis and Oryza ["APVO SSC genes"]. The majority of these genes are also present in the Selaginella and Physcomitrella genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown.
Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.
Putatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.
PMCID: PMC2848037  PMID: 20181251
17.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project 
Nature biotechnology  2008;26(8):889-896.
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
PMCID: PMC2771753  PMID: 18688244
18.  Comparison of next generation sequencing technologies for transcriptome characterization 
BMC Genomics  2009;10:347.
We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis.
The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics.
NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
PMCID: PMC2907694  PMID: 19646272
19.  Parallel Loss of Plastid Introns and Their Maturase in the Genus Cuscuta 
PLoS ONE  2009;4(6):e5982.
Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta.
PMCID: PMC2694360  PMID: 19543388
20.  The minimum information about a genome sequence (MIGS) specification 
Nature biotechnology  2008;26(5):541-547.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
PMCID: PMC2409278  PMID: 18464787
21.  The Amborella genome: an evolutionary reference for plant biology 
Genome Biology  2008;9(3):402.
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
PMCID: PMC2397498  PMID: 18341710
22.  Insights into the Musa genome: Syntenic relationships to rice and between Musa species 
BMC Genomics  2008;9:58.
Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale analyses of Musa genomic sequence have been conducted. This study compares genomic sequence in two Musa species with orthologous regions in the rice genome.
We produced 1.4 Mb of Musa sequence from 13 BAC clones, annotated and analyzed them along with 4 previously sequenced BACs. The 443 predicted genes revealed that Zingiberales genes share GC content and distribution characteristics with eudicot and Poaceae genomes. Comparison with rice revealed microsynteny regions that have persisted since the divergence of the Commelinid orders Poales and Zingiberales at least 117 Mya. The previously hypothesized large-scale duplication event in the common ancestor of major cereal lineages within the Poaceae was verified. The divergence time distributions for Musa-Zingiber (Zingiberaceae, Zingiberales) orthologs and paralogs provide strong evidence for a large-scale duplication event in the Musa lineage after its divergence from the Zingiberaceae approximately 61 Mya. Comparisons of genomic regions from M. acuminata and M. balbisiana revealed highly conserved genome structure, and indicated that these genomes diverged circa 4.6 Mya.
These results point to the utility of comparative analyses between distantly-related monocot species such as rice and Musa for improving our understanding of monocot genome evolution. Sequencing the genome of M. acuminata would provide a strong foundation for comparative genomics in the monocots. In addition a genome sequence would aid genomic and genetic analyses of cultivated Musa polyploid genotypes in research aimed at localizing and cloning genes controlling important agronomic traits for breeding purposes.
PMCID: PMC2270835  PMID: 18234080
23.  PlantTribes: a gene and gene family resource for comparative genomics in plants 
Nucleic Acids Research  2007;36(Database issue):D970-D976.
The PlantTribes database ( is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.
PMCID: PMC2238917  PMID: 18073194
24.  Adaptive evolution of chloroplast genome structure inferred using a parametric bootstrap approach 
Genome rearrangements influence gene order and configuration of gene clusters in all genomes. Most land plant chloroplast DNAs (cpDNAs) share a highly conserved gene content and with notable exceptions, a largely co-linear gene order. Conserved gene orders may reflect a slow intrinsic rate of neutral chromosomal rearrangements, or selective constraint. It is unknown to what extent observed changes in gene order are random or adaptive. We investigate the influence of natural selection on gene order in association with increased rate of chromosomal rearrangement. We use a novel parametric bootstrap approach to test if directional selection is responsible for the clustering of functionally related genes observed in the highly rearranged chloroplast genome of the unicellular green alga Chlamydomonas reinhardtii, relative to ancestral chloroplast genomes.
Ancestral gene orders were inferred and then subjected to simulated rearrangement events under the random breakage model with varying ratios of inversions and transpositions. We found that adjacent chloroplast genes in C. reinhardtii were located on the same strand much more frequently than in simulated genomes that were generated under a random rearrangement processes (increased sidedness; p < 0.0001). In addition, functionally related genes were found to be more clustered than those evolved under random rearrangements (p < 0.0001). We report evidence of co-transcription of neighboring genes, which may be responsible for the observed gene clusters in C. reinhardtii cpDNA.
Simulations and experimental evidence suggest that both selective maintenance and directional selection for gene clusters are determinants of chloroplast gene order.
PMCID: PMC1421436  PMID: 16469102
25.  ChloroplastDB: the Chloroplast Genome Database 
Nucleic Acids Research  2005;34(Database issue):D692-D696.
The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes.
PMCID: PMC1347418  PMID: 16381961

Results 1-25 (26)