Failure to archive published data can impede reproducibility and inhibit downstream synthesis. Alarmingly, we estimate that ∼70% of existing DNA sequence alignments/phylogenetic trees, representing much of the underpinning of modern phylogenetic analysis, are no longer accessible. The evolutionary biology community needs to adopt policies ensuring that data are publicly archived upon publication.
GSK3 (glycogen synthase kinase 3) genes encode signal transduction proteins with roles in a variety of biological processes in eukaryotes. In contrast to the low copy numbers observed in animals, GSK3 genes have expanded into a multi-gene family in land plants (embryophytes), and have also evolved functions in diverse plant specific processes, including floral development in angiosperms. However, despite previous efforts, the phylogeny of land plant GSK3 genes is currently unclear. Here, we analyze genes from a representative sample of phylogenetically pivotal taxa, including basal angiosperms, gymnosperms, and monilophytes, to reconstruct the evolutionary history and functional diversification of the GSK3 gene family in land plants.
Maximum Likelihood phylogenetic analyses resolve a gene tree with four major gene duplication events that coincide with the emergence of novel land plant clades. The single GSK3 gene inherited from the ancestor of land plants was first duplicated along the ancestral branch to extant vascular plants, and three subsequent duplications produced three GSK3 loci in the ancestor of euphyllophytes, four in the ancestor of seed plants, and at least five in the ancestor of angiosperms. A single gene in the Amborella trichopoda genome may be the sole survivor of a sixth GSK3 locus that originated in the ancestor of extant angiosperms. Homologs of two Arabidopsis GSK3 genes with genetically confirmed roles in floral development, AtSK11 and AtSK12, exhibit floral preferential expression in several basal angiosperms, suggesting evolutionary conservation of their floral functions. Members of other gene lineages appear to have independently evolved roles in plant reproductive tissues in individual taxa.
Our phylogenetic analyses provide the most detailed reconstruction of GSK3 gene evolution in land plants to date and offer new insights into the origins, relationships, and functions of family members. Notably, the diversity of this “green” branch of the gene family has increased in concert with the increasing morphological and physiological complexity of land plant life forms. Expression data for seed plants indicate that the functions of GSK3 genes have also diversified during evolutionary time.
GSK3; Land plant evolution; Gene duplication; Gene expression
Comparative phylogeography can elucidate the influence of historical events on current patterns of biodiversity and can identify patterns of co-vicariance among unrelated taxa that span the same geographic areas. Here we analyze temporal and spatial divergence patterns of cloud forest plant and animal species and relate them to the evolutionary history of naturally fragmented cloud forests–among the most threatened vegetation types in northern Mesoamerica. We used comparative phylogeographic analyses to identify patterns of co-vicariance in taxa that share geographic ranges across cloud forest habitats and to elucidate the influence of historical events on current patterns of biodiversity. We document temporal and spatial genetic divergence of 15 species (including seed plants, birds and rodents), and relate them to the evolutionary history of the naturally fragmented cloud forests. We used fossil-calibrated genealogies, coalescent-based divergence time inference, and estimates of gene flow to assess the permeability of putative barriers to gene flow. We also used the hierarchical Approximate Bayesian Computation (HABC) method implemented in the program msBayes to test simultaneous versus non-simultaneous divergence of the cloud forest lineages. Our results show shared phylogeographic breaks that correspond to the Isthmus of Tehuantepec, Los Tuxtlas, and the Chiapas Central Depression, with the Isthmus representing the most frequently shared break among taxa. However, dating analyses suggest that the phylogeographic breaks corresponding to the Isthmus occurred at different times in different taxa. Current divergence patterns are therefore consistent with the hypothesis of broad vicariance across the Isthmus of Tehuantepec derived from different mechanisms operating at different times. This study, coupled with existing data on divergence cloud forest species, indicates that the evolutionary history of contemporary cloud forest lineages is complex and often lineage-specific, and thus difficult to capture in a simple conservation strategy.
Incarvillea sinensis is widely distributed from Southwest China to Northeast China and in the Russian Far East. The distribution of this species was thought to be influenced by the uplift of the Qinghai-Tibet Plateau and Quaternary glaciation. To reveal the imprints of geological events on the spatial genetic structure of Incarvillea sinensis, we examined two cpDNA segments ( trnH- psbA and trnS- trnfM) in 705 individuals from 47 localities.
A total of 16 haplotypes was identified, and significant genetic differentiation was revealed (GST =0.843, NST = 0.975, P < 0.05). The survey detected two highly divergent cpDNA lineages connected by a deep gap with allopatric distributions: the southern lineage with higher genetic diversity and differentiation in the eastern Qinghai-Tibet Plateau, and the northern lineage in the region outside the Qinghai-Tibet Plateau. The divergence between these two lineages was estimated at 4.4 MYA. A correlation between the genetic and the geographic distances indicates that genetic drift was more influential than gene flow in the northern clade with lower diversity and divergence. However, a scenario of regional equilibrium between gene flow and drift was shown for the southern clade. The feature of spatial distribution of the genetic diversity of the southern lineage possibly indicated that allopatric fragmentation was dominant in the collections from the eastern Qinghai-Tibet Plateau.
The results revealed that the uplift of the Qinghai-Tibet Plateau likely resulted in the significant divergence between the lineage in the eastern Qinghai-Tibet Plateau and the other one outside this area. The diverse niches in the eastern Qinghai-Tibet Plateau created a wide spectrum of habitats to accumulate and accommodate new mutations. The features of genetic diversity of populations outside the eastern Qinghai-Tibet Plateau seemed to reveal the imprints of extinction during the Glacial and the interglacial and postglacial recolonization. Our study is a typical case of the significance of the uplift of the Qinghai-Tibet Plateau and the Quaternary Glacial in spatial genetic structure of eastern Asian plants, and sheds new light on the evolution of biodiversity in the Qinghai-Tibet Plateau at the intraspecies level.
Spatial genetic pattern; cpDNA variations; Phylogeography; Eastern Asian plant
Ploidy has been well studied and used extensively in the genus Opuntia to determine species boundaries, detect evidence of hybridization, and infer evolutionary patterns. We carried out chromosome counts for all members of the Humifusa clade to ascertain whether geographic patterns are associated with differences in ploidy. We then related chromosomal data to observed morphological variability, polyploid formation, and consequently the evolutionary history of the clade. We counted chromosomes of 277 individuals from throughout the ranges of taxa included within the Humifusa clade, with emphasis placed on the widely distributed species, Opuntia humifusa (Raf.) Raf., 1820 s.l. and Opuntia macrorhiza Engelm., 1850 s.l. We also compiled previous counts made for species in the clade along with our new counts to plot geographic distributions of the polyploid and diploid taxa. A phylogeny using nuclear ribosomal ITS sequence data was reconstructed to determine whether ploidal variation is consistent with cladogenesis. We discovered that diploids of the Humifusa clade are restricted to the southeastern United States (U.S.), eastern Texas, and southeastern New Mexico. Polyploid members of the clade, however, are much more widely distributed, occurring as far north as the upper midwestern U.S. (e.g., Michigan, Minnesota, Wisconsin). Morphological differentiation, although sometimes cryptic, is commonly observed among diploid and polyploid cytotypes, and such morphological distinctions may be useful in diagnosing possible cryptic species. Certain polyploid populations of Opuntia humifusa s.l. and Opuntia macrorhiza s.l., however, exhibit introgressive morphological characters, complicating species delineations. Phylogenetically, the Humifusa clade forms two subclades that are distributed, respectively, in the southeastern U.S. (including all southeastern U.S. diploids, polyploid Opuntia abjecta Small, 1923, and polyploid Opuntia pusilla (Haw.) Haw., 1812) and the southwestern U.S. (including all southwestern U.S. diploids and polyploids). In addition, tetraploid Opuntia humifusa s.l., which occurs primarily in the eastern U.S., is resolved in the southwestern diploid clade instead of with the southeastern diploid clade that includes diploid Opuntia humifusa s.l. Our results not only provide evidence for the polyphyletic nature of Opuntia humifusa and Opuntia macrorhiza, suggesting that each of these represents more than one species, but also demonstrate the high frequency of polyploidy in the Humifusa clade and the major role that genome duplication has played in the diversification of this lineage of Opuntia s.s. Our data also suggest that the southeastern and southwestern U.S. may represent glacial refugia for diploid members of this clade and that the clade as a whole should be considered a mature polyploid species complex. Widespread polyploids are likely derivatives of secondary contact among southeastern and southwestern diploid taxa as a result of the expansion and contraction of suitable habitat during the Pleistocene following glacial and interglacial events.
Cactaceae; chromosome numbers; Opuntia humifusa; Opuntia macrorhiza; Pleistocene refugia; polyploid complex; polyploidy
Polyploidy (whole genome duplication) has played an important role in speciation and genome evolution in diverse organisms, particularly in plants. However, much of our current understanding of polyploidy is based on analyses of crop species using genomic and transcriptomic tools. Here, we examined the proteomes of natural allopolyploid T. mirus and its diploid parents (T. dubius and T. porrifolius) on a per cell basis. Using iTRAQ LC-MS/MS, we identified 476 proteins among the three species. Parental proteomic analyses revealed that only T. dubius is variable (α=1%) within the Pullman-1 population. We found 68 differentially expressed proteins between the naturally occurring and synthetic (S1) allopolyploid T. mirus and its diploid parents, T. dubius and T. porrifolius, and 408 proteins showed proteomic additivity. Importantly, this study revealed that 32 proteins changed in expression level immediately after hybridization, based on the examination of F1 hybrids between T. dubius and T. porrifolius. An additional 22 proteins were differentially expressed after genomic doubling (based on the analysis of the synthetic T. mirus S1 generation). Furthermore, an additional 14 proteins changed their expression patterns during past 40 generations based on the proteomic profiles of naturally occurring T. mirus. In addition, we found six novel expression in hybrid, S1 synthetics, and T. mirus. Taken together, these data indicate that the effect of hybridization is more important in impacting the proteome than is the immediate impact of polyploidization. Moreover, the proteome of one parent, T. dubius, has contributed more to allopolyploid formation, especially at the early stage of polyploid evolution, than the other diploid parent, T. porrifolius. Furthermore, we detected that two proteins (out of 476) showed homeolog specific expression of a protein based on molecular mass differentiation between the parental proteomes. Comparison of protein expression changes with the previous gene transcription study revealed that there was no good correlation between transcript and protein expression levels in T. mirus. This study shows the utility of proteomics in quantitative analysis of protein expression differences between polyploid species and its parental species.
Hybridization and polyploidization are now recognized as major phenomena in the evolution of plants, promoting genetic diversity, adaptive radiation and speciation. Modern molecular techniques have recently provided evidence that allopolyploidy can induce several types of genetic and epigenetic events that are of critical importance for the evolutionary success of hybrids: (1) chromosomal rearrangements within one or both parental genomes contribute toward proper meiotic pairing and isolation of the hybrid from its progenitors; (2) demethylation and activation of dormant transposable elements may trigger insertional mutagenesis and changes in local patterns of gene expression, facilitating rapid genomic reorganisation; (3) rapid and reproducible loss of low copy DNA sequence appears to result in further differentiation of homoeologous chromosomes; and (4) organ-specific up- or down-regulation of one of the duplicated genes, resulting in unequal expression or silencing one copy. All these alterations also have the potential, while stabilizing allopolyploid genomes, to produce novel expression patterns and new phenotypes, which together with increased heterozygosity and gene redundancy might confer on hybrids an elevated evolutionary potential, with effects at scales ranging from molecular to ecological. Although important advances have been made in understanding genomic responses to allopolyploidization, further insights are still expected to be gained in the near future, such as the direction and nature of the diploidization process, functional relevance of gene expression alterations, molecular mechanisms that result in adaptation to different ecologies/habitats, and ecological and evolutionary implications of recurrent polyploidization.
adaptation; cDNA-AFLP; epigenetic changes; evolution; gene expression; hybridization; microarrays; MSAP; polyploidy; transcriptome
Tragopogon mirus and T. miscellus are allotetraploids (2n = 24) that formed repeatedly during the past 80 years in eastern Washington and adjacent Idaho (USA) following the introduction of the diploids T. dubius, T. porrifolius, and T. pratensis (2n = 12) from Europe. In most natural populations of T. mirus and T. miscellus, there are far fewer 35S rRNA genes (rDNA) of T. dubius than there are of the other diploid parent (T. porrifolius or T. pratensis). We studied the inheritance of parental rDNA loci in allotetraploids resynthesized from diploid accessions. We investigate the dynamics and directionality of these rDNA losses, as well as the contribution of gene copy number variation in the parental diploids to rDNA variation in the derived tetraploids.
Using Southern blot hybridization and fluorescent in situ hybridization (FISH), we analyzed copy numbers and distribution of these highly reiterated genes in seven lines of synthetic T. mirus (110 individuals) and four lines of synthetic T. miscellus (71 individuals). Variation among diploid parents accounted for most of the observed gene imbalances detected in F1 hybrids but cannot explain frequent deviations from repeat additivity seen in the allotetraploid lines. Polyploid lineages involving the same diploid parents differed in rDNA genotype, indicating that conditions immediately following genome doubling are crucial for rDNA changes. About 19% of the resynthesized allotetraploid individuals had equal rDNA contributions from the diploid parents, 74% were skewed towards either T. porrifolius or T. pratensis-type units, and only 7% had more rDNA copies of T. dubius-origin compared to the other two parents. Similar genotype frequencies were observed among natural populations. Despite directional reduction of units, the additivity of 35S rDNA locus number is maintained in 82% of the synthetic lines and in all natural allotetraploids.
Uniparental reductions of homeologous rRNA gene copies occurred in both synthetic and natural populations of Tragopogon allopolyploids. The extent of these rDNA changes was generally higher in natural populations than in the synthetic lines. We hypothesize that locus-specific and chromosomal changes in early generations of allopolyploids may influence patterns of rDNA evolution in later generations.
Although polyploidy has long been recognized as a major force in the evolution of plants, most of what we know about the genetic consequences of polyploidy comes from the study of crops and model systems. Furthermore, although many polyploid species have formed repeatedly, patterns of genome evolution and gene expression are largely unknown for natural polyploid populations of independent origin. We therefore examined patterns of loss and expression in duplicate gene pairs (homeologs) in multiple individuals from seven natural populations of independent origin of Tragopogon mirus (Asteraceae), an allopolyploid that formed repeatedly within the last 80 years from the diploids T. dubius and T. porrifolius.
Using cDNA-AFLPs, we found differential band patterns that could be attributable to gene silencing, novel expression, and/or maternal/paternal effects between T. mirus and its diploid parents. Subsequent cleaved amplified polymorphic sequence (CAPS) analyses of genomic DNA and cDNA revealed that 20 of the 30 genes identified through cDNA-AFLP analysis showed additivity, whereas nine of the 30 exhibited the loss of one parental homeolog in at least one individual. Homeolog loss (versus loss of a restriction site) was confirmed via sequencing. The remaining gene (ADENINE-DNA GLYCOSYLASE) showed ambiguous patterns in T. mirus because of polymorphism in the diploid parent T. dubius. Most (63.6%) of the homeolog loss events were of the T. dubius parental copy. Two genes, NUCLEAR RIBOSOMAL DNA and GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE, showed differential expression of the parental homeologs, with the T. dubius copy silenced in some individuals of T. mirus.
Genomic and cDNA CAPS analyses indicated that plants representing multiple populations of this young natural allopolyploid have experienced frequent and preferential elimination of homeologous loci. Comparable analyses of synthetic F1 hybrids showed only additivity. These results suggest that loss of homeologs and changes in gene expression are not the immediate result of hybridization, but are processes that occur following polyploidization, occurring during the early (<40) generations of the young polyploid. Both T. mirus and a second recently formed allopolyploid, T. miscellus, exhibit more homeolog losses than gene silencing events. Furthermore, both allotetraploids undergo biased loss of homeologs contributed by their shared diploid parent, T. dubius. Further studies are required to assess whether the results for the 30 genes so far examined are representative of the entire genome.
Polyploidy (whole-genome duplication) is an important speciation mechanism, particularly in plants. Gene loss, silencing, and the formation of novel gene complexes are some of the consequences that the new polyploid genome may experience. Despite the recurrent nature of polyploidy, little is known about the genomic outcome of independent polyploidization events. Here, we analyze the fate of genes duplicated by polyploidy (homoeologs) in multiple individuals from ten natural populations of Tragopogon miscellus (Asteraceae), all of which formed independently from T. dubius and T. pratensis less than 80 years ago.
Of the 13 loci analyzed in 84 T. miscellus individuals, 11 showed loss of at least one parental homoeolog in the young allopolyploids. Two loci were retained in duplicate for all polyploid individuals included in this study. Nearly half (48%) of the individuals examined lost a homoeolog of at least one locus, with several individuals showing loss at more than one locus. Patterns of loss were stochastic among individuals from the independently formed populations, except that the T. dubius copy was lost twice as often as T. pratensis.
This study represents the most extensive survey of the fate of genes duplicated by allopolyploidy in individuals from natural populations. Our results indicate that the road to genome downsizing and ultimate genetic diploidization may occur quickly through homoeolog loss, but with some genes consistently maintained as duplicates. Other genes consistently show evidence of homoeolog loss, suggesting repetitive aspects to polyploid genome evolution.
Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al.
We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5%) than the 3-gene data matrix (2.9%), the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97%) rather than in Cornales (BS = 100% in 3-gene analysis). The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data.
Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.
Polyploidy, frequently termed “whole genome duplication”, is a major force in the evolution of many eukaryotes. Indeed, most angiosperm species have undergone at least one round of polyploidy in their evolutionary history. Despite enormous progress in our understanding of many aspects of polyploidy, we essentially have no information about the role of chromosome divergence in the establishment of young polyploid populations. Here we investigate synthetic lines and natural populations of two recently and recurrently formed allotetraploids Tragopogon mirus and T. miscellus (formed within the past 80 years) to assess the role of aberrant meiosis in generating chromosomal/genomic diversity. That diversity is likely important in the formation, establishment and survival of polyploid populations and species.
Applications of fluorescence in situ hybridisation (FISH) to natural populations of T. mirus and T. miscellus suggest that chromosomal rearrangements and other chromosomal changes are common in both allotetraploids. We detected extensive chromosomal polymorphism between individuals and populations, including (i) plants monosomic and trisomic for particular chromosomes (perhaps indicating compensatory trisomy), (ii) intergenomic translocations and (iii) variable sizes and expression patterns of individual ribosomal DNA (rDNA) loci. We even observed karyotypic variation among sibling plants. Significantly, translocations, chromosome loss, and meiotic irregularities, including quadrivalent formation, were observed in synthetic (S0 and S1 generations) polyploid lines. Our results not only provide a mechanism for chromosomal variation in natural populations, but also indicate that chromosomal changes occur rapidly following polyploidisation.
These data shed new light on previous analyses of genome and transcriptome structures in de novo and establishing polyploid species. Crucially our results highlight the necessity of studying karyotypes in young (<150 years old) polyploid species and synthetic polyploids that resemble natural species. The data also provide insight into the mechanisms that perturb inheritance patterns of genetic markers in synthetic polyploids and populations of young natural polyploid species.
Although the flower is the central feature of the angiosperms, little is known of its origin and subsequent diversification. The ABC model has long been the unifying paradigm for floral developmental genetics, but it is based on phylogenetically derived eudicot models. Synergistic research involving phylogenetics, classical developmental studies, genomics and developmental genetics has afforded valuable new insights into floral evolution in general, and the early flower in particular.
Scope and Conclusions
Genomic studies indicate that basal angiosperms, and by inference the earliest angiosperms, had a rich tool kit of floral genes. Homologues of the ABCE floral organ identity genes are also present in basal angiosperm lineages; however, C-, E- and particularly B-function genes are more broadly expressed in basal lineages. There is no single model of floral organ identity that applies to all angiosperms; there are multiple models that apply depending on the phylogenetic position and floral structure of the group in question. The classic ABC (or ABCE) model may work well for most eudicots. However, modifications are needed for basal eudicots and, the focus of this paper, basal angiosperms. We offer ‘fading borders’ as a testable hypothesis for the basal-most angiosperms and, by inference, perhaps some of the earliest (now extinct) angiosperms.
ABC model; floral identity genes; perianth evolution; basal angiosperms; fading borders model
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae).
More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions.
Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.
The glycogen synthase kinase 3 (GSK3)/SHAGGY-like kinases (GSKs) are non-receptor serine/threonine protein kinases that are involved in a variety of biological processes. In contrast to the two members of the GSK3 family in mammals, plants appear to have a much larger set of divergent GSK genes. Plant GSKs are encoded by a multigene family; analysis of the Arabidopsis genome revealed the existence of 10 GSK genes that fall into four major groups. Here we characterized the structure of Arabidopsis and rice GSK genes and conducted the first broad phylogenetic analysis of the plant GSK gene family, covering a taxonomically diverse array of algal and land plant sequences.
We found that the structure of GSK genes is generally conserved in Arabidopsis and rice, although we documented examples of exon expansion and intron loss. Our phylogenetic analyses of 139 sequences revealed four major clades of GSK genes that correspond to the four subgroups initially recognized in Arabidopsis. ESTs from basal angiosperms were represented in all four major clades; GSK homologs from the basal angiosperm Persea americana (avocado) appeared in all four clades. Gymnosperm sequences occurred in clades I, III, and IV, and a sequence of the red alga Porphyra was sister to all green plant sequences.
Our results indicate that (1) the plant-specific GSK gene lineage was established early in the history of green plants, (2) plant GSKs began to diversify prior to the origin of extant seed plants, (3) three of the four major clades of GSKs present in Arabidopsis and rice were established early in the evolutionary history of extant seed plants, and (4) diversification into four major clades (as initially reported in Arabidopsis) occurred either just prior to the origin of the angiosperms or very early in angiosperm history.
The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants.
Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms.
Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways.
The early-diverging eudicot order Trochodendrales contains only two monospecific genera, Tetracentron and Trochodendron. Although an extensive fossil record indicates that the clade is perhaps 100 million years old and was widespread throughout the Northern Hemisphere during the Paleogene and Neogene, the two extant genera are both narrowly distributed in eastern Asia. Recent phylogenetic analyses strongly support a clade of Trochodendrales, Buxales, and Gunneridae (core eudicots), but complete plastome analyses do not resolve the relationships among these groups with strong support. However, plastid phylogenomic analyses have not included data for Tetracentron. To better resolve basal eudicot relationships and to clarify when the two extant genera of Trochodendrales diverged, we sequenced the complete plastid genome of Tetracentron sinense using Illumina technology. The Tetracentron and Trochodendron plastomes possess the typical gene content and arrangement that characterize most angiosperm plastid genomes, but both genomes have the same unusual ∼4 kb expansion of the inverted repeat region to include five genes (rpl22, rps3, rpl16, rpl14, and rps8) that are normally found in the large single-copy region. Maximum likelihood analyses of an 83-gene, 88 taxon angiosperm data set yield an identical tree topology as previous plastid-based trees, and moderately support the sister relationship between Buxaceae and Gunneridae. Molecular dating analyses suggest that Tetracentron and Trochodendron diverged between 44-30 million years ago, which is congruent with the fossil record of Trochodendrales and with previous estimates of the divergence time of these two taxa. We also characterize 154 simple sequence repeat loci from the Tetracentron sinense and Trochodendron aralioides plastomes that will be useful in future studies of population genetic structure for these relict species, both of which are of conservation concern.
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis.
The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics.
NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
The endemic Hawaiian mints represent a major island radiation that likely originated from hybridization between two North American polyploid lineages. In contrast with the extensive morphological and ecological diversity among taxa, ribosomal DNA sequence variation has been found to be remarkably low. In the past few years, expressed sequence tag (EST) projects on plant species have generated a vast amount of publicly available sequence data that can be mined for simple sequence repeats (SSRs). However, these EST projects have largely focused on crop or otherwise economically important plants, and so far only few studies have been published on the use of intragenic SSRs in natural plant populations. We constructed an EST library from developing fleshy nutlets of Stenogyne rugosa principally to identify genetic markers for the Hawaiian endemic mints.
The Stenogyne fruit EST library consisted of 628 unique transcripts derived from 942 high quality ESTs, with 68% of unigenes matching Arabidopsis genes. Relative frequencies of Gene Ontology functional categories were broadly representative of the Arabidopsis proteome. Many unigenes were identified as putative homologs of genes that are active during plant reproductive development. A comparison between unigenes from Stenogyne and tomato (both asterid angiosperms) revealed many homologs that may be relevant for fruit development. Among the 628 unigenes, a total of 44 potentially useful microsatellite loci were predicted. Several of these were successfully tested for cross-transferability to other Hawaiian mint species, and at least five of these demonstrated interesting patterns of polymorphism across a large sample of Hawaiian mints as well as close North American relatives in the genus Stachys.
Analysis of this relatively small EST library illustrated a broad GO functional representation. Many unigenes could be annotated to involvement in reproductive development. Furthermore, first tests of microsatellite primer pairs have proven promising for the use of Stenogyne rugosa EST SSRs for evolutionary and phylogeographic studies of the Hawaiian endemic mints and their close relatives. Given that allelic repeat length variation in developmental genes of other organisms has been linked with morphological evolution, these SSRs may also prove useful for analyses of phenotypic differences among Hawaiian mints.