There is an increasing awareness that as a result of structural variation, a reference sequence representing a genome of a single individual is unable to capture all of the gene repertoire found in the species. A large number of genes affected by presence/absence and copy number variation suggest that it may contribute to phenotypic and agronomic trait diversity. Here we show by analysis of the Brassica oleracea pangenome that nearly 20% of genes are affected by presence/absence variation. Several genes displaying presence/absence variation are annotated with functions related to major agronomic traits, including disease resistance, flowering time, glucosinolate metabolism and vitamin biosynthesis.
Brassica oleracea is a single species that includes diverse crops such as cabbage, broccoli and Brussels sprouts. Here, the authors identify genes not captured in existing B. oleracea reference genomes by the assembly of a pangenome and show variations in gene content that may be related to important agronomic traits
Steroid alkaloids have been shown to elicit a wide range of pharmacological effects that include anticancer and antifungal activities. Understanding the biosynthesis of these molecules is essential to bioengineering for sustainable production. Herein, we investigate the biosynthetic pathway to cyclopamine, a steroid alkaloid that shows promising antineoplastic activities. Supply of cyclopamine is limited, as the current source is solely derived from wild collection of the plant Veratrum californicum. To elucidate the early stages of the pathway to cyclopamine, we interrogated a V. californicum RNA-seq dataset using the cyclopamine accumulation profile as a predefined model for gene expression with the pattern-matching algorithm Haystack. Refactoring candidate genes in Sf9 insect cells led to discovery of four enzymes that catalyze the first six steps in steroid alkaloid biosynthesis to produce verazine, a predicted precursor to cyclopamine. Three of the enzymes are cytochromes P450 while the fourth is a γ-aminobutyrate transaminase; together they produce verazine from cholesterol.
Verazine; Veratrum californicum; California corn lily; Haystack; steroid alkaloids; cyclopamine; KJ869252; KJ869253; KJ869254; KJ869255; KJ869256; KJ869257; KJ869262; KJ869263; KJ869264; KJ869258; KJ869261; KJ869260; KJ869259
Current phylogenetic sampling reveals that dioecy and an XY sex chromosome pair evolved once, or possibly twice, in the genus Asparagus. Although there appear to be some lineage-specific polyploidization events, the base chromosome number of 2n = 2× = 20 is relatively conserved across the Asparagus genus. Regardless, dioecious species tend to have larger genomes than hermaphroditic species. Here, we test whether this genome size expansion in dioecious species is related to a polyploidization and subsequent chromosome fusion, or to retrotransposon proliferation in dioecious species. We first estimate genome sizes, or use published values, for four hermaphrodites and four dioecious species distributed across the phylogeny, and show that dioecious species typically have larger genomes than hermaphroditic species. Utilizing a phylogenomic approach, we find no evidence for ancient polyploidization contributing to increased genome sizes of sampled dioecious species. We do find support for an ancient whole genome duplication (WGD) event predating the diversification of the Asparagus genus. Repetitive DNA content of the four hermaphroditic and four dioecious species was characterized based on randomly sampled whole genome shotgun sequencing, and common elements were annotated. Across our broad phylogenetic sampling, Ty-1 Copia retroelements, in particular, have undergone a marked proliferation in dioecious species. In the absence of a detectable WGD event, retrotransposon proliferation is the most likely explanation for the precipitous increase in genome size in dioecious Asparagus species.
Asparagus; dioecy; sex chromosomes; transposons; genome size
Panicoideae are the second largest subfamily in Poaceae (grass family), with 212 genera and approximately 3316 species. Previous studies have begun to reveal relationships within the subfamily, but largely lack resolution and/or robust support for certain tribal and subtribal groups. This study aims to resolve these relationships, as well as characterize a putative mitochondrial insert in one linage.
35 newly sequenced Panicoideae plastomes were combined in a phylogenomic study with 37 other species: 15 Panicoideae and 22 from outgroups. A robust Panicoideae topology largely congruent with previous studies was obtained, but with some incongruences with previously reported subtribal relationships. A mitochondrial DNA (mtDNA) to plastid DNA (ptDNA) transfer was discovered in the Paspalum lineage.
The phylogenomic analysis returned a topology that largely supports previous studies. Five previously recognized subtribes appear on the topology to be non-monophyletic. Additionally, evidence for mtDNA to ptDNA transfer was identified in both Paspalum fimbriatum and P. dilatatum, and suggests a single rare event that took place in a common progenitor. Finally, the framework from this study can guide larger whole plastome sampling to discern the relationships in Cyperochloeae, Steyermarkochloeae, Gynerieae, and other incertae sedis taxa that are weakly supported or unresolved.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-016-0823-3) contains supplementary material, which is available to authorized users.
Grasses; mtDNA; Next generation sequencing; Panicoideae; Paspalum; Phylogenomics; Plastome; Poaceae; ptDNA; Subtribal systematics
The ATP-binding cassette (ABC) transporter gene superfamily is ubiquitous among extant organisms and prominently represented in plants. ABC transporters act to transport compounds across cellular membranes and are involved in a diverse range of biological processes. Thus, the applicability to biotechnology is vast, including cancer resistance in humans, drug resistance among vertebrates, and herbicide and other xenobiotic resistance in plants. In addition, plants appear to harbor the highest diversity of ABC transporter genes compared with any other group of organisms. This study applied transcriptome analysis to survey the kingdom-wide ABC transporter diversity in plants and suggest biotechnology applications of this diversity.
We utilized sequence similarity-based informatics techniques to infer the identity of ABC transporter gene candidates from 1295 phylogenetically-diverse plant transcriptomes. A total of 97,149 putative (approximately 25 % were full-length) ABC transporter gene members were identified; each RNA-Seq library (plant sample) had 88 ± 30 gene members. As expected, simpler organisms, such as algae, had fewer unique members than vascular land plants. Differences were also noted in the richness of certain ABC transporter subfamilies. Land plants had more unique ABCB, ABCC, and ABCG transporter gene members on average (p < 0.005), and green algae, red algae, and bryophytes had significantly more ABCF transporter gene members (p < 0.005). Ferns had significantly fewer ABCA transporter gene members than all other plant groups (p < 0.005).
We present a transcriptomic overview of ABC transporter gene members across all major plant groups. An increase in the number of gene family members present in the ABCB, ABCC, and ABCD transporter subfamilies may indicate an expansion of the ABC transporter superfamily among green land plants, which include all crop species. The striking difference between the number of ABCA subfamily transporter gene members between ferns and other plant taxa is surprising and merits further investigation. Discussed is the potential exploitation of ABC transporters in plant biotechnology, with an emphasis on crops.
Electronic supplementary material
The online version of this article (doi:10.1186/s12896-016-0277-6) contains supplementary material, which is available to authorized users.
ABC transporter; Transcriptomics; Computational biology; Taxonomic diversity
Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification.
whole-genome duplication; grasses; monocots; GC content
Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1) proportion of reads mapping to an assembly 2) recovery of conserved, widely expressed genes, 3) N50 length statistics, and 4) the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.
Brassicaceae is one of the most diverse and economically valuable angiosperm families with widely cultivated vegetable crops and scientifically important model plants, such as Arabidopsis thaliana. The evolutionary history, ecological, morphological, and genetic diversity, and abundant resources and knowledge of Brassicaceae make it an excellent model family for evolutionary studies. Recent phylogenetic analyses of the family revealed three major lineages (I, II, and III), but relationships among and within these lineages remain largely unclear. Here, we present a highly supported phylogeny with six major clades using nuclear markers from newly sequenced transcriptomes of 32 Brassicaceae species and large data sets from additional taxa for a total of 55 species spanning 29 out of 51 tribes. Clade A consisting of Lineage I and Macropodium nivale is sister to combined Clade B (with Lineage II and others) and a new Clade C. The ABC clade is sister to Clade D with species previously weakly associated with Lineage II and Clade E (Lineage III) is sister to the ABCD clade. Clade F (the tribe Aethionemeae) is sister to the remainder of the entire family. Molecular clock estimation reveals an early radiation of major clades near or shortly after the Eocene–Oligocene boundary and subsequent nested divergences of several tribes of the previously polytomous Expanded Lineage II. Reconstruction of ancestral morphological states during the Brassicaceae evolution indicates prevalent parallel (convergent) evolution of several traits over deep times across the entire family. These results form a foundation for future evolutionary analyses of structures and functions across Brassicaceae.
ancestral character reconstruction; Brassicaceae; divergence time estimation; orthologous nuclear gene; phylogeny; transcriptome
By mapping translated metagenomic reads to a microbial metabolic network, we show that ruminal ecosystems that are rather dissimilar in their taxonomy can be considerably more similar at the metabolic network level. Using a new network bi-partition approach for linking the microbial network to a bovine metabolic network, we observe that these ruminal metabolic networks exhibit properties consistent with distinct metabolic communities producing similar outputs from common inputs. For instance, the closer in network space that a microbial reaction is to a reaction found in the host, the lower will be the variability of its enzyme copy number across hosts. Similarly, these microbial enzymes that are nearby to host nodes are also higher in copy number than are more distant enzymes. Collectively, these results demonstrate a widely expected pattern that, to our knowledge, has not been explicitly demonstrated in microbial communities: namely that there can exist different community metabolic networks that have the same metabolic inputs and outputs but differ in their internal structure.
Long non-coding RNAs (LncRNAs) have been identified as gene regulatory elements that influence the transcription of their neighbouring protein-coding genes. The discovery of LncRNAs in animals has stimulated genome-wide scans for these elements across plant genomes. Recently, 6480 LincRNAs were putatively identified in Arabidopsis thaliana (Brassicaceae), however there is limited information on their conservation.
Using a phylogenomics approach, we assessed the positional and sequence conservation of these LncRNAs by analyzing the genomes of the basal Brassicaceae species Aethionema arabicum and Tarenaya hassleriana of the sister-family Cleomaceae. Furthermore, we generated transcriptomes for another three Aethionema species and one other Cleomaceae species to validate their transcriptional activity. We show that a subset of LncRNAs are highly diverged at the nucleotide level, but conserved by position (syntenic). Positionally conserved LncRNAs that are expressed neighbour important developmental and physiological genes. Interestingly, >65 % of the positionally conserved LncRNAs are located within 2.5 Mb of telomeres in Arabidopsis thaliana chromosomes.
These results highlight the importance of analysing not only sequence conservation, but also positional conservation of non-coding genetic elements in plants including LncRNAs.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0603-5) contains supplementary material, which is available to authorized users.
Plastome sequences for 18 species of the PACMAD grasses (subfamilies Panicoideae, Aristidoideae, Chloridoideae, Micrairoideae, Arundinoideae, Danthonioideae) were analyzed phylogenomically. Next generation sequencing methods were used to provide complete plastome sequences for 12 species. Sanger sequencing was performed to determine the plastome of one species, Hakonechloa macra, to provide a reference for annotation. These analyses were conducted to resolve deep subfamilial relationships within the clade. Divergence estimates were assessed to determine potential factors that led to the rapid radiation of this lineage and its dominance of warmer open habitats.
New plastomes were completely sequenced and characterized for 13 PACMAD species. An autapomorphic ~1140 bp deletion was found in Hakonechloa macra putatively pseudogenizing rpl14 and eliminating rpl16 from this plastome. Phylogenomic analyses support Panicoideae as the sister group to the ACMAD clade. Complete plastome sequences provide greater support at deep nodes within the PACMAD clade. The initial diversification of PACMAD subfamilies was estimated to occur at 32.4 mya.
Phylogenomic analyses of complete plastomes provides resolution for deep relationships of PACMAD grasses. The divergence estimate of 32.4 mya at the crown node of the PACMAD clade coincides with the Eocene-Oligocene Transition (EOT). The Eocene was a period of global cooling and drying, which led to forest fragmentation and the expansion of open habitats now dominated by these grasses. Understanding how these grasses are related and determining a cause for their rapid radiation allows for future predictions of grassland distribution in the face of a changing global climate.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0563-9) contains supplementary material, which is available to authorized users.
Complete plastome; Divergence estimates; PACMAD Clade; Panicoideae; Phylogenomics; Rapid radiation
Protection of Earth’s ecosystems requires identification of geographical areas of greatest biodiversity. Assessment of biodiversity begins with knowledge of the evolutionary histories of species in a geographic area. Multiple phylogenetic diversity (PD) metrics have been developed to describe biodiversity beyond species counts, but sufficient empirical studies, particularly at fine phylogenetic scales, have not been conducted to provide conservation planners with evidence for incorporating PD metrics into selection of priority regions. We review notable studies that are contributing to a growing database of empirical results, we report on the effect of using high-throughput sequencing to estimate the phylogenies used to calculate PD metrics, and we discuss difficulties in selecting appropriate diversity indices. We focused on two of the most speciose angiosperm families in prairies—Asteraceae and Fabaceae—and compared 12 PD metrics and four traditional measures of biodiversity between three North American prairie sites. The varying results from the literature and from the current data reveal the wide range of applications of PD metrics and the necessity for many more empirical studies. The accumulation of results from further investigations will eventually lead to a scientific understanding upon which conservation planners can make informed decisions about where to apply limited preservation funds.
angiosperms; Asteraceae; biodiversity assessment; conservation prioritization; Fabaceae; next-generation sequencing
Whole plastid genomes (plastomes) are being sequenced rapidly from across the green plant tree of life, and phylogenetic analyses of these are increasing resolution and support for relationships that were unresolved in earlier studies. The cool-season grass subfamily, Pooideae, includes important temperate cereals, turf grasses and forage species, yet some aspects of deep phylogeny in the lineage are unresolved. We newly sequenced 25 Pooideae plastomes, and conducted phylogenomic analyses of these and 20 existing plastomes from the subfamily. Most aspects of deep relationship in Pooideae are maximally supported in our analyses, including those among early-diverging tribes.
Whole plastid genomes are being sequenced rapidly from across the green plant tree of life, and phylogenetic analyses of these are increasing resolution and support for relationships that have varied among or been unresolved in earlier single- and multi-gene studies. Pooideae, the cool-season grass lineage, is the largest of the 12 grass subfamilies and includes important temperate cereals, turf grasses and forage species. Although numerous studies of the phylogeny of the subfamily have been undertaken, relationships among some ‘early-diverging’ tribes conflict among studies, and some relationships among subtribes of Poeae have not yet been resolved. To address these issues, we newly sequenced 25 whole plastomes, which showed rearrangements typical of Poaceae. These plastomes represent 9 tribes and 11 subtribes of Pooideae, and were analysed with 20 existing plastomes for the subfamily. Maximum likelihood (ML), maximum parsimony (MP) and Bayesian inference (BI) robustly resolve most deep relationships in the subfamily. Complete plastome data provide increased nodal support compared with protein-coding data alone at nodes that are not maximally supported. Following the divergence of Brachyelytrum, Phaenospermateae, Brylkinieae–Meliceae and Ampelodesmeae–Stipeae are the successive sister groups of the rest of the subfamily. Ampelodesmeae are nested within Stipeae in the plastome trees, consistent with its hybrid origin between a phaenospermatoid and a stipoid grass (the maternal parent). The core Pooideae are strongly supported and include Brachypodieae, a Bromeae–Triticeae clade and Poeae. Within Poeae, a novel sister group relationship between Phalaridinae and Torreyochloinae is found, and the relative branching order of this clade and Aveninae, with respect to an Agrostidinae–Brizinae clade, are discordant between MP and ML/BI trees. Maximum likelihood and Bayesian analyses strongly support Airinae and Holcinae as the successive sister groups of a Dactylidinae–Loliinae clade.
Chloroplast genome; core Pooideae; phylogenetics; phylogenomics; plastome; Poeae; Schedonorus arundinaceus
Mutualistic symbioses between eukaryotes and beneficial microorganisms of their microbiome play an essential role in nutrition, protection against disease, and development of the host. However, the impact of beneficial symbionts on the evolution of host genomes remains poorly characterized. Here we used the independent loss of the most widespread plant–microbe symbiosis, arbuscular mycorrhization (AM), as a model to address this question. Using a large phenotypic approach and phylogenetic analyses, we present evidence that loss of AM symbiosis correlates with the loss of many symbiotic genes in the Arabidopsis lineage (Brassicales). Then, by analyzing the genome and/or transcriptomes of nine other phylogenetically divergent non-host plants, we show that this correlation occurred in a convergent manner in four additional plant lineages, demonstrating the existence of an evolutionary pattern specific to symbiotic genes. Finally, we use a global comparative phylogenomic approach to track this evolutionary pattern among land plants. Based on this approach, we identify a set of 174 highly conserved genes and demonstrate enrichment in symbiosis-related genes. Our findings are consistent with the hypothesis that beneficial symbionts maintain purifying selection on host gene networks during the evolution of entire lineages.
Symbiotic associations between eukaryotes and microbes play essential roles in the nutrition, health and behavior of both partners. It is well accepted that hosts control and shape their associated microbiome. In this study, we provide evidence that symbiotic microbes also participate in the evolution of host genomes. In particular, we show that the independent loss of a symbiosis in several plant lineages results in a convergent modification of non-host genomes. Interestingly, a significant fraction of genes lost in non-hosts play an important role in this symbiosis, supporting the use of comparative genomics as a powerful approach to identify undiscovered gene networks.
The internal transcribed spacers of the nuclear ribosomal RNA gene cluster, termed ITS1 and ITS2, are the most frequently used nuclear markers for phylogenetic analyses across many eukaryotic groups including most plant families. The reasons for the popularity of these markers include: 1.) Ease of amplification due to high copy number of the gene clusters, 2.) Available cost-effective methods and highly conserved primers, 3.) Rapidly evolving markers (i.e. variable between closely related species), and 4.) The assumption (and/or treatment) that these sequences are non-functional, neutrally evolving phylogenetic markers. Here, our analyses of ITS1 and ITS2 for 50 species suggest that both sequences are instead under selective constraints to preserve proper secondary structure, likely to maintain complete self-splicing functions, and thus are not neutrally-evolving phylogenetic markers. Our results indicate the majority of sequence sites are co-evolving with other positions to form proper secondary structure, which has implications for phylogenetic inference. We also found that the lowest energy state and total number of possible alternate secondary structures are highly significantly different between ITS regions and random sequences with an identical overall length and Guanine-Cytosine (GC) content. Lastly, we review recent evidence highlighting some additional problematic issues with using these regions as the sole markers for phylogenetic studies, and thus strongly recommend additional markers and cost-effective approaches for future studies to estimate phylogenetic relationships.
Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus.
We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event.
Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.
Background and Aims
Brassica rapa and B. oleracea are the progenitors of oilseed rape B. napus. The addition of each chromosome of B. oleracea to the chromosome complement of B. rapa results in a series of monosomic alien addition lines (MAALs). Analysis of MAALs determines which B. oleracea chromosomes carry genes controlling specific phenotypic traits, such as seed colour. Yellow-seeded oilseed rape is a desirable breeding goal both for food and livestock feed end-uses that relate to oil, protein and fibre contents. The aims of this study included developing a missing MAAL to complement an available series, for studies on seed colour control, chromosome homoeology and assignment of linkage groups to B. oleracea chromosomes.
A new batch of B. rapa–B. oleracea aneuploids was produced to generate the missing MAAL. Seed colour and other plant morphological features relevant to differentiation of MAALs were recorded. For chromosome characterization, Snow's carmine, fluorescence in situ hybridization (FISH) and genomic in situ hybridization (GISH) were used.
The final MAAL was developed. Morphological traits that differentiated the MAALs comprised cotyledon number, leaf morphology, flower colour and seed colour. Seed colour was controlled by major genes on two B. oleracea chromosomes and minor genes on five other chromosomes of this species. Homoeologous pairing was largely between chromosomes with similar centromeric positions. FISH, GISH and a parallel microsatellite marker analysis defined the chromosomes in terms of their linkage groups.
A complete set of MAALs is now available for genetic, genomic, evolutionary and breeding perspectives. Defining chromosomes that carry specific genes, physical localization of DNA markers and access to established genetic linkage maps contribute to the integration of these approaches, manifested in the confirmed correspondence of linkage groups with specific chromosomes. Applications include marker-assisted selection and breeding for yellow seeds.
Brassica rapa var. trilocularis; B. oleracea var. alboglabra; MAALs; characterization of C chromosomes; plant morphology; seed colour control; FISH; GISH; chromosome homoeology; chromosome structural changes; linkage groups; crop plant breeding
Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers.
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
The well supported gene dosage hypothesis predicts that genes encoding proteins engaged in dose–sensitive interactions cannot be reduced back to single copies once all interacting partners are simultaneously duplicated in a whole genome duplication. The genomes of extant flowering plants are the result of many sequential rounds of whole genome duplication, yet the fraction of genomes devoted to encoding complex molecular machines does not increase as fast as expected through multiple rounds of whole genome duplications. Using parallel interspecies genomic comparisons in the grasses and crucifers, we demonstrate that genes retained as duplicates following a whole genome duplication have only a 50% chance of being retained as duplicates in a second whole genome duplication. Genes which fractionated to a single copy following a second whole genome duplication tend to be the member of a gene pair with less complex promoters, lower levels of expression, and to be under lower levels of purifying selection. We suggest the copy with lower levels of expression and less purifying selection contributes less to effective gene-product dosage and therefore is under less dosage constraint in future whole genome duplications, providing an explanation for why flowering plant genomes are not overrun with subunits of large dose–sensitive protein complexes.
polyploidy; gene dosage; gene loss; genome evolution; comparative genomics; crucifers; grasses
The evolutionary origins of the multitude of duplicate genes in the plant genomes are still incompletely understood. To gain an appreciation of the potential selective forces acting on these duplicates, we phylogenetically inferred the set of metabolic gene families from 10 flowering plant (angiosperm) genomes. We then compared the metabolic fluxes for these families, predicted using the Arabidopsis thaliana and Sorghum bicolor metabolic networks, with the families' duplication propensities. For duplications produced by both small scale (small-scale duplications) and genome duplication (whole-genome duplications), there is a significant association between the flux and the tendency to duplicate. Following this global analysis, we made a more fine-scale study of the selective constraints observed on plant sodium and phosphate transporters. We find that the different duplication mechanisms give rise to differing selective constraints. However, the exact nature of this pattern varies between the gene families, and we argue that the duplication mechanism alone does not define a duplicated gene's subsequent evolutionary trajectory. Collectively, our results argue for the interplay of history, function, and selection in shaping the duplicate gene evolution in plants.
dosage selection; genome duplication; gene duplication
Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity.
A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity.
A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes.
All the physical mapping data is freely shared at a WebFPC site (http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/; Temporarily password-protected: account: pgml; password: 123qwe123.
Comparative genomics; polyploidy; Arabidopsis thaliana
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.