hybridization; polyploidy; speciation; floral/morphological evolution
The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.
Viridiplantae; Biodiversity; Transcriptomes; Phylogenomics; Interactions; Pathways
Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves.
Azolla; Ceratopteris; Comparative analyses; Ferns; Genomics; Land plants; Monilophytes
Hybridization coupled with whole-genome duplication (allopolyploidy) leads to a variety of genetic and epigenetic modifications in the resultant merged genomes. In particular, gene loss and gene silencing are commonly observed post-polyploidization. Here, we investigated DNA methylation as a potential mechanism for gene silencing in Tragopogon miscellus (Asteraceae), a recent and recurrently formed allopolyploid. This species, which also exhibits extensive gene loss, was formed from the diploids T. dubius and T. pratensis.
Comparative bisulfite sequencing revealed CG methylation of parental homeologs for three loci (S2, S18 and TDF-44) that were previously identified as silenced in T. miscellus individuals relative to the diploid progenitors. One other locus (S3) examined did not show methylation, indicating that other transcriptional and post-transcriptional mechanisms are likely responsible for silencing that homeologous locus.
These results indicate that Tragopogon miscellus allopolyploids employ diverse mechanisms, including DNA methylation, to respond to the potential shock of genome merger and doubling.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-701) contains supplementary material, which is available to authorized users.
Allopolyploidy; DNA methylation; Gene silencing; Tragopogon; Whole-genome duplication
Ploidy has been well studied and used extensively in the genus Opuntia to determine species boundaries, detect evidence of hybridization, and infer evolutionary patterns. We carried out chromosome counts for all members of the Humifusa clade to ascertain whether geographic patterns are associated with differences in ploidy. We then related chromosomal data to observed morphological variability, polyploid formation, and consequently the evolutionary history of the clade. We counted chromosomes of 277 individuals from throughout the ranges of taxa included within the Humifusa clade, with emphasis placed on the widely distributed species, Opuntia humifusa (Raf.) Raf., 1820 s.l. and Opuntia macrorhiza Engelm., 1850 s.l. We also compiled previous counts made for species in the clade along with our new counts to plot geographic distributions of the polyploid and diploid taxa. A phylogeny using nuclear ribosomal ITS sequence data was reconstructed to determine whether ploidal variation is consistent with cladogenesis. We discovered that diploids of the Humifusa clade are restricted to the southeastern United States (U.S.), eastern Texas, and southeastern New Mexico. Polyploid members of the clade, however, are much more widely distributed, occurring as far north as the upper midwestern U.S. (e.g., Michigan, Minnesota, Wisconsin). Morphological differentiation, although sometimes cryptic, is commonly observed among diploid and polyploid cytotypes, and such morphological distinctions may be useful in diagnosing possible cryptic species. Certain polyploid populations of Opuntia humifusa s.l. and Opuntia macrorhiza s.l., however, exhibit introgressive morphological characters, complicating species delineations. Phylogenetically, the Humifusa clade forms two subclades that are distributed, respectively, in the southeastern U.S. (including all southeastern U.S. diploids, polyploid Opuntia abjecta Small, 1923, and polyploid Opuntia pusilla (Haw.) Haw., 1812) and the southwestern U.S. (including all southwestern U.S. diploids and polyploids). In addition, tetraploid Opuntia humifusa s.l., which occurs primarily in the eastern U.S., is resolved in the southwestern diploid clade instead of with the southeastern diploid clade that includes diploid Opuntia humifusa s.l. Our results not only provide evidence for the polyphyletic nature of Opuntia humifusa and Opuntia macrorhiza, suggesting that each of these represents more than one species, but also demonstrate the high frequency of polyploidy in the Humifusa clade and the major role that genome duplication has played in the diversification of this lineage of Opuntia s.s. Our data also suggest that the southeastern and southwestern U.S. may represent glacial refugia for diploid members of this clade and that the clade as a whole should be considered a mature polyploid species complex. Widespread polyploids are likely derivatives of secondary contact among southeastern and southwestern diploid taxa as a result of the expansion and contraction of suitable habitat during the Pleistocene following glacial and interglacial events.
Cactaceae; chromosome numbers; Opuntia humifusa; Opuntia macrorhiza; Pleistocene refugia; polyploid complex; polyploidy
The Norway spruce genome provides key insights into the evolution of plant genomes, leading to testable new hypotheses about conifer, gymnosperm, and vascular plant evolution.
The effect of glaciation on the levels and patterns of genetic variation has been well studied in the Northern Hemisphere. However, although glaciation has undoubtedly shaped the genetic structure of plants in the Southern Hemisphere, fewer studies have characterized the effect, and almost none of them using microsatellites. Particularly, complex patterns of genetic structure might be expected in areas such as the Andes, where both latitudinal and altitudinal glacial advance and retreat have molded modern plant communities. We therefore studied the population genetics of three closely related, hybridizing species of Nothofagus (N. obliqua, N. alpina, and N. glauca, all of subgenus Lophozonia; Nothofagaceae) from Chile. To estimate population genetic parameters and infer the influence of the last ice age on the spatial and genetic distribution of these species, we examined and analyzed genetic variability at seven polymorphic microsatellite DNA loci in 640 individuals from 40 populations covering most of the ranges of these species in Chile. Populations showed no significant inbreeding and exhibited relatively high levels of genetic diversity (HE = 0.502–0.662) and slight, but significant, genetic structure (RST = 8.7–16.0%). However, in N. obliqua, the small amount of genetic structure was spatially organized into three well-defined latitudinal groups. Our data may also suggest some introgression of N. alpina genes into N. obliqua in the northern populations. These results allowed us to reconstruct the influence of the last ice age on the genetic structure of these species, suggesting several centers of genetic diversity for N. obliqua and N. alpina, in agreement with the multiple refugia hypothesis.
Chile; Nothofagus alpina; Nothofagus glauca; Nothofagus nervosa; Nothofagus obliqua; SSR
Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data.
We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa.
Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade (Monilophyta), Equisetales + Psilotales are sister to Marattiales + leptosporangiate ferns. Our analyses also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses, and we provide suggestions for future analyses that will likely incorporate plastid genome sequence data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding strategies.
Composition bias; Phylogenomics; Plastid genome sequences; Plastomes; RY-coding; Viridiplantae
Failure to archive published data can impede reproducibility and inhibit downstream synthesis. Alarmingly, we estimate that ∼70% of existing DNA sequence alignments/phylogenetic trees, representing much of the underpinning of modern phylogenetic analysis, are no longer accessible. The evolutionary biology community needs to adopt policies ensuring that data are publicly archived upon publication.
GSK3 (glycogen synthase kinase 3) genes encode signal transduction proteins with roles in a variety of biological processes in eukaryotes. In contrast to the low copy numbers observed in animals, GSK3 genes have expanded into a multi-gene family in land plants (embryophytes), and have also evolved functions in diverse plant specific processes, including floral development in angiosperms. However, despite previous efforts, the phylogeny of land plant GSK3 genes is currently unclear. Here, we analyze genes from a representative sample of phylogenetically pivotal taxa, including basal angiosperms, gymnosperms, and monilophytes, to reconstruct the evolutionary history and functional diversification of the GSK3 gene family in land plants.
Maximum Likelihood phylogenetic analyses resolve a gene tree with four major gene duplication events that coincide with the emergence of novel land plant clades. The single GSK3 gene inherited from the ancestor of land plants was first duplicated along the ancestral branch to extant vascular plants, and three subsequent duplications produced three GSK3 loci in the ancestor of euphyllophytes, four in the ancestor of seed plants, and at least five in the ancestor of angiosperms. A single gene in the Amborella trichopoda genome may be the sole survivor of a sixth GSK3 locus that originated in the ancestor of extant angiosperms. Homologs of two Arabidopsis GSK3 genes with genetically confirmed roles in floral development, AtSK11 and AtSK12, exhibit floral preferential expression in several basal angiosperms, suggesting evolutionary conservation of their floral functions. Members of other gene lineages appear to have independently evolved roles in plant reproductive tissues in individual taxa.
Our phylogenetic analyses provide the most detailed reconstruction of GSK3 gene evolution in land plants to date and offer new insights into the origins, relationships, and functions of family members. Notably, the diversity of this “green” branch of the gene family has increased in concert with the increasing morphological and physiological complexity of land plant life forms. Expression data for seed plants indicate that the functions of GSK3 genes have also diversified during evolutionary time.
GSK3; Land plant evolution; Gene duplication; Gene expression
The early-diverging eudicot order Trochodendrales contains only two monospecific genera, Tetracentron and Trochodendron. Although an extensive fossil record indicates that the clade is perhaps 100 million years old and was widespread throughout the Northern Hemisphere during the Paleogene and Neogene, the two extant genera are both narrowly distributed in eastern Asia. Recent phylogenetic analyses strongly support a clade of Trochodendrales, Buxales, and Gunneridae (core eudicots), but complete plastome analyses do not resolve the relationships among these groups with strong support. However, plastid phylogenomic analyses have not included data for Tetracentron. To better resolve basal eudicot relationships and to clarify when the two extant genera of Trochodendrales diverged, we sequenced the complete plastid genome of Tetracentron sinense using Illumina technology. The Tetracentron and Trochodendron plastomes possess the typical gene content and arrangement that characterize most angiosperm plastid genomes, but both genomes have the same unusual ∼4 kb expansion of the inverted repeat region to include five genes (rpl22, rps3, rpl16, rpl14, and rps8) that are normally found in the large single-copy region. Maximum likelihood analyses of an 83-gene, 88 taxon angiosperm data set yield an identical tree topology as previous plastid-based trees, and moderately support the sister relationship between Buxaceae and Gunneridae. Molecular dating analyses suggest that Tetracentron and Trochodendron diverged between 44-30 million years ago, which is congruent with the fossil record of Trochodendrales and with previous estimates of the divergence time of these two taxa. We also characterize 154 simple sequence repeat loci from the Tetracentron sinense and Trochodendron aralioides plastomes that will be useful in future studies of population genetic structure for these relict species, both of which are of conservation concern.
Comparative phylogeography can elucidate the influence of historical events on current patterns of biodiversity and can identify patterns of co-vicariance among unrelated taxa that span the same geographic areas. Here we analyze temporal and spatial divergence patterns of cloud forest plant and animal species and relate them to the evolutionary history of naturally fragmented cloud forests–among the most threatened vegetation types in northern Mesoamerica. We used comparative phylogeographic analyses to identify patterns of co-vicariance in taxa that share geographic ranges across cloud forest habitats and to elucidate the influence of historical events on current patterns of biodiversity. We document temporal and spatial genetic divergence of 15 species (including seed plants, birds and rodents), and relate them to the evolutionary history of the naturally fragmented cloud forests. We used fossil-calibrated genealogies, coalescent-based divergence time inference, and estimates of gene flow to assess the permeability of putative barriers to gene flow. We also used the hierarchical Approximate Bayesian Computation (HABC) method implemented in the program msBayes to test simultaneous versus non-simultaneous divergence of the cloud forest lineages. Our results show shared phylogeographic breaks that correspond to the Isthmus of Tehuantepec, Los Tuxtlas, and the Chiapas Central Depression, with the Isthmus representing the most frequently shared break among taxa. However, dating analyses suggest that the phylogeographic breaks corresponding to the Isthmus occurred at different times in different taxa. Current divergence patterns are therefore consistent with the hypothesis of broad vicariance across the Isthmus of Tehuantepec derived from different mechanisms operating at different times. This study, coupled with existing data on divergence cloud forest species, indicates that the evolutionary history of contemporary cloud forest lineages is complex and often lineage-specific, and thus difficult to capture in a simple conservation strategy.
• Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS) of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms.
• Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots), which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×), even for the two monocots.
• Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving ∼50× mean coverage). However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96) available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms.
next-generation sequencing; phylogenomics; plastid genomes
Incarvillea sinensis is widely distributed from Southwest China to Northeast China and in the Russian Far East. The distribution of this species was thought to be influenced by the uplift of the Qinghai-Tibet Plateau and Quaternary glaciation. To reveal the imprints of geological events on the spatial genetic structure of Incarvillea sinensis, we examined two cpDNA segments ( trnH- psbA and trnS- trnfM) in 705 individuals from 47 localities.
A total of 16 haplotypes was identified, and significant genetic differentiation was revealed (GST =0.843, NST = 0.975, P < 0.05). The survey detected two highly divergent cpDNA lineages connected by a deep gap with allopatric distributions: the southern lineage with higher genetic diversity and differentiation in the eastern Qinghai-Tibet Plateau, and the northern lineage in the region outside the Qinghai-Tibet Plateau. The divergence between these two lineages was estimated at 4.4 MYA. A correlation between the genetic and the geographic distances indicates that genetic drift was more influential than gene flow in the northern clade with lower diversity and divergence. However, a scenario of regional equilibrium between gene flow and drift was shown for the southern clade. The feature of spatial distribution of the genetic diversity of the southern lineage possibly indicated that allopatric fragmentation was dominant in the collections from the eastern Qinghai-Tibet Plateau.
The results revealed that the uplift of the Qinghai-Tibet Plateau likely resulted in the significant divergence between the lineage in the eastern Qinghai-Tibet Plateau and the other one outside this area. The diverse niches in the eastern Qinghai-Tibet Plateau created a wide spectrum of habitats to accumulate and accommodate new mutations. The features of genetic diversity of populations outside the eastern Qinghai-Tibet Plateau seemed to reveal the imprints of extinction during the Glacial and the interglacial and postglacial recolonization. Our study is a typical case of the significance of the uplift of the Qinghai-Tibet Plateau and the Quaternary Glacial in spatial genetic structure of eastern Asian plants, and sheds new light on the evolution of biodiversity in the Qinghai-Tibet Plateau at the intraspecies level.
Spatial genetic pattern; cpDNA variations; Phylogeography; Eastern Asian plant
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
Polyploidy (whole genome duplication) has played an important role in speciation and genome evolution in diverse organisms, particularly in plants. However, much of our current understanding of polyploidy is based on analyses of crop species using genomic and transcriptomic tools. Here, we examined the proteomes of natural allopolyploid T. mirus and its diploid parents (T. dubius and T. porrifolius) on a per cell basis. Using iTRAQ LC-MS/MS, we identified 476 proteins among the three species. Parental proteomic analyses revealed that only T. dubius is variable (α=1%) within the Pullman-1 population. We found 68 differentially expressed proteins between the naturally occurring and synthetic (S1) allopolyploid T. mirus and its diploid parents, T. dubius and T. porrifolius, and 408 proteins showed proteomic additivity. Importantly, this study revealed that 32 proteins changed in expression level immediately after hybridization, based on the examination of F1 hybrids between T. dubius and T. porrifolius. An additional 22 proteins were differentially expressed after genomic doubling (based on the analysis of the synthetic T. mirus S1 generation). Furthermore, an additional 14 proteins changed their expression patterns during past 40 generations based on the proteomic profiles of naturally occurring T. mirus. In addition, we found six novel expression in hybrid, S1 synthetics, and T. mirus. Taken together, these data indicate that the effect of hybridization is more important in impacting the proteome than is the immediate impact of polyploidization. Moreover, the proteome of one parent, T. dubius, has contributed more to allopolyploid formation, especially at the early stage of polyploid evolution, than the other diploid parent, T. porrifolius. Furthermore, we detected that two proteins (out of 476) showed homeolog specific expression of a protein based on molecular mass differentiation between the parental proteomes. Comparison of protein expression changes with the previous gene transcription study revealed that there was no good correlation between transcript and protein expression levels in T. mirus. This study shows the utility of proteomics in quantitative analysis of protein expression differences between polyploid species and its parental species.
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
Hybridization and polyploidization are now recognized as major phenomena in the evolution of plants, promoting genetic diversity, adaptive radiation and speciation. Modern molecular techniques have recently provided evidence that allopolyploidy can induce several types of genetic and epigenetic events that are of critical importance for the evolutionary success of hybrids: (1) chromosomal rearrangements within one or both parental genomes contribute toward proper meiotic pairing and isolation of the hybrid from its progenitors; (2) demethylation and activation of dormant transposable elements may trigger insertional mutagenesis and changes in local patterns of gene expression, facilitating rapid genomic reorganisation; (3) rapid and reproducible loss of low copy DNA sequence appears to result in further differentiation of homoeologous chromosomes; and (4) organ-specific up- or down-regulation of one of the duplicated genes, resulting in unequal expression or silencing one copy. All these alterations also have the potential, while stabilizing allopolyploid genomes, to produce novel expression patterns and new phenotypes, which together with increased heterozygosity and gene redundancy might confer on hybrids an elevated evolutionary potential, with effects at scales ranging from molecular to ecological. Although important advances have been made in understanding genomic responses to allopolyploidization, further insights are still expected to be gained in the near future, such as the direction and nature of the diploidization process, functional relevance of gene expression alterations, molecular mechanisms that result in adaptation to different ecologies/habitats, and ecological and evolutionary implications of recurrent polyploidization.
adaptation; cDNA-AFLP; epigenetic changes; evolution; gene expression; hybridization; microarrays; MSAP; polyploidy; transcriptome
Tragopogon mirus and T. miscellus are allotetraploids (2n = 24) that formed repeatedly during the past 80 years in eastern Washington and adjacent Idaho (USA) following the introduction of the diploids T. dubius, T. porrifolius, and T. pratensis (2n = 12) from Europe. In most natural populations of T. mirus and T. miscellus, there are far fewer 35S rRNA genes (rDNA) of T. dubius than there are of the other diploid parent (T. porrifolius or T. pratensis). We studied the inheritance of parental rDNA loci in allotetraploids resynthesized from diploid accessions. We investigate the dynamics and directionality of these rDNA losses, as well as the contribution of gene copy number variation in the parental diploids to rDNA variation in the derived tetraploids.
Using Southern blot hybridization and fluorescent in situ hybridization (FISH), we analyzed copy numbers and distribution of these highly reiterated genes in seven lines of synthetic T. mirus (110 individuals) and four lines of synthetic T. miscellus (71 individuals). Variation among diploid parents accounted for most of the observed gene imbalances detected in F1 hybrids but cannot explain frequent deviations from repeat additivity seen in the allotetraploid lines. Polyploid lineages involving the same diploid parents differed in rDNA genotype, indicating that conditions immediately following genome doubling are crucial for rDNA changes. About 19% of the resynthesized allotetraploid individuals had equal rDNA contributions from the diploid parents, 74% were skewed towards either T. porrifolius or T. pratensis-type units, and only 7% had more rDNA copies of T. dubius-origin compared to the other two parents. Similar genotype frequencies were observed among natural populations. Despite directional reduction of units, the additivity of 35S rDNA locus number is maintained in 82% of the synthetic lines and in all natural allotetraploids.
Uniparental reductions of homeologous rRNA gene copies occurred in both synthetic and natural populations of Tragopogon allopolyploids. The extent of these rDNA changes was generally higher in natural populations than in the synthetic lines. We hypothesize that locus-specific and chromosomal changes in early generations of allopolyploids may influence patterns of rDNA evolution in later generations.
The convergence of distinct lineages upon interspecific hybridisation, including when accompanied by increases in ploidy (allopolyploidy), is a driving force in the origin of many plant species. In plant breeding too, both interspecific hybridisation and allopolyploidy are important because they facilitate introgression of alien DNA into breeding lines enabling the introduction of novel characters. Here we review how fluorescence in situ hybridisation (FISH) and genomic in situ hybridisation (GISH) have been applied to: 1) studies of interspecific hybridisation and polyploidy in nature, 2) analyses of phylogenetic relationships between species, 3) genetic mapping and 4) analysis of plant breeding materials. We also review how FISH is poised to take advantage of next-generation sequencing (NGS) technologies, helping the rapid characterisation of the repetitive fractions of a genome in natural populations and agricultural plants.
cytogenetics; ISH; polyploidy
Although polyploidy has long been recognized as a major force in the evolution of plants, most of what we know about the genetic consequences of polyploidy comes from the study of crops and model systems. Furthermore, although many polyploid species have formed repeatedly, patterns of genome evolution and gene expression are largely unknown for natural polyploid populations of independent origin. We therefore examined patterns of loss and expression in duplicate gene pairs (homeologs) in multiple individuals from seven natural populations of independent origin of Tragopogon mirus (Asteraceae), an allopolyploid that formed repeatedly within the last 80 years from the diploids T. dubius and T. porrifolius.
Using cDNA-AFLPs, we found differential band patterns that could be attributable to gene silencing, novel expression, and/or maternal/paternal effects between T. mirus and its diploid parents. Subsequent cleaved amplified polymorphic sequence (CAPS) analyses of genomic DNA and cDNA revealed that 20 of the 30 genes identified through cDNA-AFLP analysis showed additivity, whereas nine of the 30 exhibited the loss of one parental homeolog in at least one individual. Homeolog loss (versus loss of a restriction site) was confirmed via sequencing. The remaining gene (ADENINE-DNA GLYCOSYLASE) showed ambiguous patterns in T. mirus because of polymorphism in the diploid parent T. dubius. Most (63.6%) of the homeolog loss events were of the T. dubius parental copy. Two genes, NUCLEAR RIBOSOMAL DNA and GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE, showed differential expression of the parental homeologs, with the T. dubius copy silenced in some individuals of T. mirus.
Genomic and cDNA CAPS analyses indicated that plants representing multiple populations of this young natural allopolyploid have experienced frequent and preferential elimination of homeologous loci. Comparable analyses of synthetic F1 hybrids showed only additivity. These results suggest that loss of homeologs and changes in gene expression are not the immediate result of hybridization, but are processes that occur following polyploidization, occurring during the early (<40) generations of the young polyploid. Both T. mirus and a second recently formed allopolyploid, T. miscellus, exhibit more homeolog losses than gene silencing events. Furthermore, both allotetraploids undergo biased loss of homeologs contributed by their shared diploid parent, T. dubius. Further studies are required to assess whether the results for the 30 genes so far examined are representative of the entire genome.
We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis.
The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics.
NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
Polyploidy (whole-genome duplication) is an important speciation mechanism, particularly in plants. Gene loss, silencing, and the formation of novel gene complexes are some of the consequences that the new polyploid genome may experience. Despite the recurrent nature of polyploidy, little is known about the genomic outcome of independent polyploidization events. Here, we analyze the fate of genes duplicated by polyploidy (homoeologs) in multiple individuals from ten natural populations of Tragopogon miscellus (Asteraceae), all of which formed independently from T. dubius and T. pratensis less than 80 years ago.
Of the 13 loci analyzed in 84 T. miscellus individuals, 11 showed loss of at least one parental homoeolog in the young allopolyploids. Two loci were retained in duplicate for all polyploid individuals included in this study. Nearly half (48%) of the individuals examined lost a homoeolog of at least one locus, with several individuals showing loss at more than one locus. Patterns of loss were stochastic among individuals from the independently formed populations, except that the T. dubius copy was lost twice as often as T. pratensis.
This study represents the most extensive survey of the fate of genes duplicated by allopolyploidy in individuals from natural populations. Our results indicate that the road to genome downsizing and ultimate genetic diploidization may occur quickly through homoeolog loss, but with some genes consistently maintained as duplicates. Other genes consistently show evidence of homoeolog loss, suggesting repetitive aspects to polyploid genome evolution.
Phylogenetic analyses of angiosperm relationships have used only a small percentage of available sequence data, but phylogenetic data matrices often can be augmented with existing data, especially if one allows missing characters. We explore the effects on phylogenetic analyses of adding 378 matK sequences and 240 26S rDNA sequences to the complete 3-gene, 567-taxon angiosperm phylogenetic matrix of Soltis et al.
We performed maximum likelihood bootstrap analyses of the complete, 3-gene 567-taxon data matrix and the incomplete, 5-gene 567-taxon data matrix. Although the 5-gene matrix has more missing data (27.5%) than the 3-gene data matrix (2.9%), the 5-gene analysis resulted in higher levels of bootstrap support. Within the 567-taxon tree, the increase in support is most evident for relationships among the 170 taxa for which both matK and 26S rDNA sequences were added, and there is little gain in support for relationships among the 119 taxa having neither matK nor 26S rDNA sequences. The 5-gene analysis also places the enigmatic Hydrostachys in Lamiales (BS = 97%) rather than in Cornales (BS = 100% in 3-gene analysis). The placement of Hydrostachys in Lamiales is unprecedented in molecular analyses, but it is consistent with embryological and morphological data.
Adding available, and often incomplete, sets of sequences to existing data sets can be a fast and inexpensive way to increase support for phylogenetic relationships and produce novel and credible new phylogenetic hypotheses.