Parasitic plants, represented by several thousand species of angiosperms, use modified structures known as haustoria to tap into photosynthetic host plants and extract nutrients and water. As a result of their direct plant-plant connections with their host plant, parasitic plants have special opportunities for horizontal gene transfer, the nonsexual transmission of genetic material across species boundaries. There is increasing evidence that parasitic plants have served as recipients and donors of horizontal gene transfer (HGT), but the long-term impacts of eukaryotic HGT in parasitic plants are largely unknown.
Here we show that a gene encoding albumin 1 KNOTTIN-like protein, closely related to the albumin 1 genes only known from papilionoid legumes, where they serve dual roles as food storage and insect toxin, was found in Phelipanche aegyptiaca and related parasitic species of family Orobanchaceae, and was likely acquired by a Phelipanche ancestor via HGT from a legume host based on phylogenetic analyses. The KNOTTINs are well known for their unique “disulfide through disulfide knot” structure and have been extensively studied in various contexts, including drug design. Genomic sequences from nine related parasite species were obtained, and 3D protein structure simulation tests and evolutionary constraint analyses were performed. The parasite gene we identified here retains the intron structure, six highly conserved cysteine residues necessary to form a KNOTTIN protein, and displays levels of purifying selection like those seen in legumes. The albumin 1 xenogene has evolved through >150 speciation events over ca. 16 million years, forming a small family of differentially expressed genes that may confer novel functions in the parasites. Moreover, further data show that a distantly related parasitic plant, Cuscuta, obtained two copies of albumin 1 KNOTTIN-like genes from legumes through a separate HGT event, suggesting that legume KNOTTIN structures have been repeatedly co-opted by parasitic plants.
The HGT-derived albumins in Phelipanche represent a novel example of how plants can acquire genes from other plants via HGT that then go on to duplicate, evolve, and retain the specialized features required to perform a unique host-derived function.
Parasitic plants; Horizontal gene transfer; Phelipanche; Orobanche; Legume; KNOTTIN; Albumin 1; Evolution
Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers.
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
The rapidly increasing number of available plant genomes opens up almost unlimited prospects for biology in general and molecular phylogenetics in particular. A recent study took advantage of this data and identified a set of nuclear genes that occur in single copy in multiple sequenced angiosperms. The present study is the first to apply genomic sequence of one of these low copy genes, agt1, as a phylogenetic marker for species-level phylogenetics. Its utility is compared to the performance of several coding and non-coding chloroplast loci that have been suggested as most applicable for this taxonomic level. As a model group, we chose Tildenia, a subgenus of Peperomia (Piperaceae), one of the largest plant genera. Relationships are particularly difficult to resolve within these species rich groups due to low levels of polymorphisms and fast or recent radiation. Therefore, Tildenia is a perfect test case for applying new phylogenetic tools.
We show that the nuclear marker agt1, and in particular the agt1 introns, provide a significantly increased phylogenetic signal compared to chloroplast markers commonly used for low level phylogenetics. 25% of aligned characters from agt1 intron sequence are parsimony informative. In comparison, the introns and spacer of several common chloroplast markers (trnK intron, trnK-psbA spacer, ndhF-rpl32 spacer, rpl32-trnL spacer, psbA-trnH spacer) provide less than 10% parsimony informative characters. The agt1 dataset provides a deeper resolution than the chloroplast markers in Tildenia.
Single (or very low) copy nuclear genes are of immense value in plant phylogenetics. Compared to other nuclear genes that are members of gene families of all sizes, lab effort, such as cloning, can be kept to a minimum. They also provide regions with different phylogenetic content deriving from coding and non-coding parts of different length. Thus, they can be applied to a wide range of taxonomic levels from family down to population level. As more plant genomes are sequenced, we will obtain increasingly precise information about which genes return to single copy most rapidly following gene duplication and may be most useful across a wide range of plant groups.
Vascular plants appeared ~410 million years ago then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes (1). We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first non-seed vascular plant genome reported. By comparing gene content in evolutionary diverse taxa, we found that the transition from a gametophyte- to sporophyte-dominated life cycle required far fewer new genes than the transition from a non-seed vascular to a flowering plant, while secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in post-transcriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the tasiRNA pathway and extensive RNA editing of organellar genes.
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
Perennial woody species, such as poplar (Populus spp.) must acquire necessary heavy metals like zinc (Zn) while avoiding potential toxicity. Poplar contains genes with sequence homology to genes HMA4 and PCS1 from other species which are involved in heavy metal regulation. While basic genomic conservation exists, poplar does not have a hyperaccumulating phenotype. Poplar has a common indicator phenotype in which heavy metal accumulation is proportional to environmental concentrations but excesses are prevented. Phenotype is partly affected by regulation of HMA4 and PCS1 transcriptional abundance. Wild-type poplar down-regulates several transcripts in its Zn-interacting pathway at high Zn levels. Also, overexpressed PtHMA4 and PtPCS1 genes result in varying Zn phenotypes in poplar; specifically, there is a doubling of Zn accumulation in leaf tissues in an overexpressed PtPCS1 line. The genomic complement and regulation of poplar highlighted in this study supports a role of HMA4 and PCS1 in Zn regulation dictating its phenotype. These genes can be altered in poplar to change its interaction with Zn. However, other poplar genes in the surrounding pathway may maintain the phenotype by inhibiting drastic changes in heavy metal accumulation with a single gene transformation.
Heavy metal; heavy metal transporter; phytochelatin synthase; poplar nutrition
This review bridges functional and evolutionary aspects of plastid chromosome architecture in land plants and their putative ancestors. We provide an overview on the structure and composition of the plastid genome of land plants as well as the functions of its genes in an explicit phylogenetic and evolutionary context. We will discuss the architecture of land plant plastid chromosomes, including gene content and synteny across land plants. Moreover, we will explore the functions and roles of plastid encoded genes in metabolism and their evolutionary importance regarding gene retention and conservation. We suggest that the slow mode at which the plastome typically evolves is likely to be influenced by a combination of different molecular mechanisms. These include the organization of plastid genes in operons, the usually uniparental mode of plastid inheritance, the activity of highly effective repair mechanisms as well as the rarity of plastid fusion. Nevertheless, structurally rearranged plastomes can be found in several unrelated lineages (e.g. ferns, Pinaceae, multiple angiosperm families). Rearrangements and gene losses seem to correlate with an unusual mode of plastid transmission, abundance of repeats, or a heterotrophic lifestyle (parasites or myco-heterotrophs). While only a few functional gene gains and more frequent gene losses have been inferred for land plants, the plastid Ndh complex is one example of multiple independent gene losses and will be discussed in detail. Patterns of ndh-gene loss and functional analyses indicate that these losses are usually found in plant groups with a certain degree of heterotrophy, might rendering plastid encoded Ndh1 subunits dispensable.
Plastid genome; Land plants; Genome evolution; Plastid gene function; Gene retention
Because of their phylogenetic position and unique characteristics of their biology and life cycle, ferns represent an important lineage for studying the evolution of land plants. Large and complex genomes in ferns combined with the absence of economically important species have been a barrier to the development of genomic resources. However, high throughput sequencing technologies are now being widely applied to non-model species. We leveraged the Roche 454 GS-FLX Titanium pyrosequencing platform in sequencing the gametophyte transcriptome of bracken fern (Pteridium aquilinum) to develop genomic resources for evolutionary studies.
681,722 quality and adapter trimmed reads totaling 254 Mbp were assembled de novo into 56,256 unique sequences (i.e. unigenes) with a mean length of 547.2 bp and a total assembly size of 30.8 Mbp with an average read-depth coverage of 7.0×. We estimate that 87% of the complete transcriptome has been sequenced and that all transcripts have been tagged. 61.8% of the unigenes had blastx hits in the NCBI nr protein database, representing 22,596 unique best hits. The longest open reading frame in 52.2% of the unigenes had positive domain matches in InterProScan searches. We assigned 46.2% of the unigenes with a GO functional annotation and 16.0% with an enzyme code annotation. Enzyme codes were used to retrieve and color KEGG pathway maps. A comparative genomics approach revealed a substantial proportion of genes expressed in bracken gametophytes to be shared across the genomes of Arabidopsis, Selaginella and Physcomitrella, and identified a substantial number of potentially novel fern genes. By comparing the list of Arabidopsis genes identified by blast with a list of gametophyte-specific Arabidopsis genes taken from the literature, we identified a set of potentially conserved gametophyte specific genes. We screened unigenes for repetitive sequences to identify 548 potentially-amplifiable simple sequence repeat loci and 689 expressed transposable elements.
This study is the first comprehensive transcriptome analysis for a fern and represents an important scientific resource for comparative evolutionary and functional genomics studies in land plants. We demonstrate the utility of high-throughput sequencing of a normalized cDNA library for de novo transcriptome characterization and gene discovery in a non-model plant.
Although overall pollinator populations have declined over the last couple of decades, the honey bee (Apis mellifera) malady, colony collapse disorder (CCD), has caused major concern in the agricultural community. Among honey bee pathogens, RNA viruses are emerging as a serious threat and are suspected as major contributors to CCD. Recent detection of these viral species in bumble bees suggests a possible wider environmental spread of these viruses with potential broader impact. It is therefore vital to study the ecology and epidemiology of these viruses in the hymenopteran pollinator community as a whole. We studied the viral distribution in honey bees, in their pollen loads, and in other non-Apis hymenopteran pollinators collected from flowering plants in Pennsylvania, New York, and Illinois in the United States. Viruses in the samples were detected using reverse transcriptase-PCR and confirmed by sequencing. For the first time, we report the molecular detection of picorna-like RNA viruses (deformed wing virus, sacbrood virus and black queen cell virus) in pollen pellets collected directly from forager bees. Pollen pellets from several uninfected forager bees were detected with virus, indicating that pollen itself may harbor viruses. The viruses in the pollen and honey stored in the hive were demonstrated to be infective, with the queen becoming infected and laying infected eggs after these virus-contaminated foods were given to virus-free colonies. These viruses were detected in eleven other non-Apis hymenopteran species, ranging from many solitary bees to bumble bees and wasps. This finding further expands the viral host range and implies a possible deeper impact on the health of our ecosystem. Phylogenetic analyses support that these viruses are disseminating freely among the pollinators via the flower pollen itself. Notably, in cases where honey bee apiaries affected by CCD harbored honey bees with Israeli Acute Paralysis virus (IAPV), nearby non-Apis hymenopteran pollinators also had IAPV, while those near apiaries without IAPV did not. In containment greenhouse experiments, IAPV moved from infected honey bees to bumble bees and from infected bumble bees to honey bees within a week, demonstrating that the viruses could be transmitted from one species to another. This study adds to our present understanding of virus epidemiology and may help explain bee disease patterns and pollinator population decline in general.
Molecular genetic studies of floral development have concentrated on several core eudicots and grasses (monocots), which have canalized floral forms. Basal eudicots possess a wider range of floral morphologies than the core eudicots and grasses and can serve as an evolutionary link between core eudicots and monocots, and provide a reference for studies of other basal angiosperms. Recent advances in genomics have enabled researchers to profile gene activities during floral development, primarily in the eudicot Arabidopsis thaliana and the monocots rice and maize. However, our understanding of floral developmental processes among the basal eudicots remains limited.
Using a recently generated expressed sequence tag (EST) set, we have designed an oligonucleotide microarray for the basal eudicot Eschscholzia californica (California poppy). We performed microarray experiments with an interwoven-loop design in order to characterize the E. californica floral transcriptome and to identify differentially expressed genes in flower buds with pre-meiotic and meiotic cells, four floral organs at pre-anthesis stages (sepals, petals, stamens and carpels), developing fruits, and leaves.
Our results provide a foundation for comparative gene expression studies between eudicots and basal angiosperms. We identified whorl-specific gene expression patterns in E. californica and examined the floral expression of several gene families. Interestingly, most E. californica homologs of Arabidopsis genes important for flower development, except for genes encoding MADS-box transcription factors, show different expression patterns between the two species. Our comparative transcriptomics study highlights the unique evolutionary position of E. californica compared with basal angiosperms and core eudicots.
Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as Selaginella and Physcomitrella, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and de novo amplification via RT-PCR in the family Brassicaceae.
There are 959 single copy nuclear genes shared in Arabidopsis, Populus, Vitis and Oryza ["APVO SSC genes"]. The majority of these genes are also present in the Selaginella and Physcomitrella genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown.
Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.
Putatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.
We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis.
The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics.
NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.
Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
Genome evolution is shaped not only by nucleotide substitutions, but also by structural changes including gene and genome duplications, insertions, deletions and gene order rearrangements. The most popular methods for reconstructing phylogeny from genome rearrangements include GRAPPA and MGR. However these methods are limited to cases where equal gene content or few deletions can be assumed. Since conserved duplicated regions are present in many chloroplast genomes, the inference of inverted repeats is needed in chloroplast phylogeny analysis and ancestral genome reconstruction.
We extend GRAPPA and develop a new method GRAPPA-IR to handle chloroplast genomes. A test of GRAPPA-IR using divergent chloroplast genomes from land plants and green algae recovers the phylogeny congruent with prior studies, while analysis that do not consider IR structure fail to obtain the accepted topology. Our extensive simulation study also confirms that GRAPPA has better accuracy then the existing methods.
Tests on a biological and simulated dataset show GRAPPA-IR can accurately recover the genome phylogeny as well as ancestral gene orders. Close analysis of the ancestral genome structure suggests that genome rearrangement in chloroplasts is probably limited by inverted repeats with a conserved core region. In addition, the boundaries of inverted repeats are hot spots for gene duplications or deletions. The new GRAPPA-IR is available from .
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
The nuclear genome sequence of Amborella trichopoda, the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.
MicroRNAs (miRNAs) are small RNAs (sRNA) ~21 nucleotides in length that negatively control gene expression by cleaving or inhibiting the translation of target gene transcripts. miRNAs have been extensively analyzed in Arabidopsis and rice and partially investigated in other non-model plant species. To date, 109 and 62 miRNA families have been identified in Arabidopsis and rice respectively. However, only 33 miRNAs have been identified from the genome of the model tree species (Populus trichocarpa), of which 11 are Populus specific. The low number of miRNA families previously identified in Populus, compared with the number of families identified in Arabidopsis and rice, suggests that many miRNAs still remain to be discovered in Populus. In this study, we analyzed expressed small RNAs from leaves and vegetative buds of Populus using high throughput pyrosequencing.
Analysis of almost eighty thousand small RNA reads allowed us to identify 123 new sequences belonging to previously identified miRNA families as well as 48 new miRNA families that could be Populus-specific. Comparison of the organization of miRNA families in Populus, Arabidopsis and rice showed that miRNA family sizes were generally expanded in Populus. The putative targets of non-conserved miRNA include both previously identified targets as well as several new putative target genes involved in development, resistance to stress, and other cellular processes. Moreover, almost half of the genes predicted to be targeted by non-conserved miRNAs appear to be Populus-specific. Comparative analyses showed that genes targeted by conserved and non-conserved miRNAs are biased mainly towards development, electron transport and signal transduction processes. Similar results were found for non-conserved miRNAs from Arabidopsis.
Our results suggest that while there is a conserved set of miRNAs among plant species, a large fraction of miRNAs vary among species. The non-conserved miRNAs may regulate cellular, physiological or developmental processes specific to the taxa that produce them, as appears likely to be the case for those miRNAs that have only been observed in Populus. Non-conserved and conserved miRNAs seem to target genes with similar biological functions indicating that similar selection pressures are acting on both types of miRNAs. The expansion in the number of most conserved miRNAs in Populus relative to Arabidopsis, may be linked to the recent genome duplication in Populus, the slow evolution of the Populus genome, or to differences in the selection pressure on duplicated miRNAs in these species.
Some of the most difficult phylogenetic questions in evolutionary biology involve identification of the free-living relatives of parasitic organisms, particularly those of parasitic flowering plants. Consequently, the number of origins of parasitism and the phylogenetic distribution of the heterotrophic lifestyle among angiosperm lineages is unclear.
Here we report the results of a phylogenetic analysis of 102 species of seed plants designed to infer the position of all haustorial parasitic angiosperm lineages using three mitochondrial genes: atp1, coxI, and matR. Overall, the mtDNA phylogeny agrees with independent studies in terms of non-parasitic plant relationships and reveals at least 11 independent origins of parasitism in angiosperms, eight of which consist entirely of holoparasitic species that lack photosynthetic ability. From these results, it can be inferred that modern-day parasites have disproportionately evolved in certain lineages and that the endoparasitic habit has arisen by convergence in four clades. In addition, reduced taxon, single gene analyses revealed multiple horizontal transfers of atp1 from host to parasite lineage, suggesting that parasites may be important vectors of horizontal gene transfer in angiosperms. Furthermore, in Pilostyles we show evidence for a recent host-to-parasite atp1 transfer based on a chimeric gene sequence that indicates multiple historical xenologous gene acquisitions have occurred in this endoparasite. Finally, the phylogenetic relationships inferred for parasites indicate that the origins of parasitism in angiosperms are strongly correlated with horizontal acquisitions of the invasive coxI group I intron.
Collectively, these results indicate that the parasitic lifestyle has arisen repeatedly in angiosperm evolutionary history and results in increasing parasite genomic chimerism over time.
The genus Cuscuta L. (Convolvulaceae), commonly known as dodders, are epiphytic vines that invade the stems of their host with haustorial feeding structures at the points of contact. Although they lack expanded leaves, some species are noticeably chlorophyllous, especially as seedlings and in maturing fruits. Some species are reported as crop pests of worldwide distribution, whereas others are extremely rare and have local distributions and apparent niche specificity. A strong phylogenetic framework for this large genus is essential to understand the interesting ecological, morphological and molecular phenomena that occur within these parasites in an evolutionary context.
Here we present a well-supported phylogeny of Cuscuta using sequences of the nuclear ribosomal internal transcribed spacer and plastid rps2, rbcL and matK from representatives across most of the taxonomic diversity of the genus. We use the phylogeny to interpret morphological and plastid genome evolution within the genus. At least three currently recognized taxonomic sections are not monophyletic and subgenus Cuscuta is unequivocally paraphyletic. Plastid genes are extremely variable with regards to evolutionary constraint, with rbcL exhibiting even higher levels of purifying selection in Cuscuta than photosynthetic relatives. Nuclear genome size is highly variable within Cuscuta, particularly within subgenus Grammica, and in some cases may indicate the existence of cryptic species in this large clade of morphologically similar species.
Some morphological characters traditionally used to define major taxonomic splits within Cuscuta are homoplastic and are of limited use in defining true evolutionary groups. Chloroplast genome evolution seems to have evolved in a punctuated fashion, with episodes of loss involving suites of genes or tRNAs followed by stabilization of gene content in major clades. Nearly all species of Cuscuta retain some photosynthetic ability, most likely for nutrient apportionment to their seeds, while complete loss of photosynthesis and possible loss of the entire chloroplast genome is limited to a single small clade of outcrossing species found primarily in western South America.
The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.
The magnoliids with four orders, 19 families, and 8,500 species represent one of the largest clades of early diverging angiosperms. Although several recent angiosperm phylogenetic analyses supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence resulted in phylogenetic reconstructions supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. We sequenced the plastid genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales), and Piper (Piperales), and used these data in combination with 32 other angiosperm plastid genomes to assess phylogenetic relationships among magnoliids and to examine patterns of variation of GC content.
The Drimys, Liriodendron, and Piper plastid genomes are very similar in size at 160,604, 159,886 bp, and 160,624 bp, respectively. Gene content and order are nearly identical to many other unrearranged angiosperm plastid genomes, including Calycanthus, the other published magnoliid genome. Overall GC content ranges from 34–39%, and coding regions have a substantially higher GC content than non-coding regions. Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Phylogenetic analyses using parsimony and likelihood methods and sequences of 61 protein-coding genes provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. Strong support is reported for monocots and eudicots as sister clades with magnoliids diverging before the monocot-eudicot split. The trees also provided moderate or strong support for the position of Amborella as sister to a clade including all other angiosperms.
Evolutionary comparisons of three new magnoliid plastid genome sequences, combined with other published angiosperm genomes, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.
Genome rearrangements influence gene order and configuration of gene clusters in all genomes. Most land plant chloroplast DNAs (cpDNAs) share a highly conserved gene content and with notable exceptions, a largely co-linear gene order. Conserved gene orders may reflect a slow intrinsic rate of neutral chromosomal rearrangements, or selective constraint. It is unknown to what extent observed changes in gene order are random or adaptive. We investigate the influence of natural selection on gene order in association with increased rate of chromosomal rearrangement. We use a novel parametric bootstrap approach to test if directional selection is responsible for the clustering of functionally related genes observed in the highly rearranged chloroplast genome of the unicellular green alga Chlamydomonas reinhardtii, relative to ancestral chloroplast genomes.
Ancestral gene orders were inferred and then subjected to simulated rearrangement events under the random breakage model with varying ratios of inversions and transpositions. We found that adjacent chloroplast genes in C. reinhardtii were located on the same strand much more frequently than in simulated genomes that were generated under a random rearrangement processes (increased sidedness; p < 0.0001). In addition, functionally related genes were found to be more clustered than those evolved under random rearrangements (p < 0.0001). We report evidence of co-transcription of neighboring genes, which may be responsible for the observed gene clusters in C. reinhardtii cpDNA.
Simulations and experimental evidence suggest that both selective maintenance and directional selection for gene clusters are determinants of chloroplast gene order.
The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes.