Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms.
We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction.
Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco.
Transcription factors play a fundamental role in plants by orchestrating temporal and spatial gene expression in response to environmental stimuli. Several R2R3-MYB genes of the Arabidopsis subgroup 4 (Sg4) share a C-terminal EAR motif signature recently linked to stress response in angiosperm plants. It is reported here that nearly all Sg4 MYB genes in the conifer trees Picea glauca (white spruce) and Pinus taeda (loblolly pine) form a monophyletic clade (Sg4C) that expanded following the split of gymnosperm and angiosperm lineages. Deeper sequencing in P. glauca identified 10 distinct Sg4C sequences, indicating over-represention of Sg4 sequences compared with angiosperms such as Arabidopsis, Oryza, Vitis, and Populus. The Sg4C MYBs share the EAR motif core. Many of them had stress-responsive transcript profiles after wounding, jasmonic acid (JA) treatment, or exposure to cold in P. glauca and P. taeda, with MYB14 transcripts accumulating most strongly and rapidly. Functional characterization was initiated by expressing the P. taeda MYB14 (PtMYB14) gene in transgenic P. glauca plantlets with a tissue-preferential promoter (cinnamyl alcohol dehydrogenase) and a ubiquitous gene promoter (ubiquitin). Histological, metabolite, and transcript (microarray and targeted quantitiative real-time PCR) analyses of PtMYB14 transgenics, coupled with mechanical wounding and JA application experiments on wild-type plantlets, allowed identification of PtMYB14 as a putative regulator of an isoprenoid-oriented response that leads to the accumulation of sesquiterpene in conifers. Data further suggested that PtMYB14 may contribute to a broad defence response implicating flavonoids. This study also addresses the potential involvement of closely related Sg4C sequences in stress responses and plant evolution.
Gene family expansion; gymnosperms; isoprenoid metabolism; MYB transcription factors; microarray RNA profiling; Picea glauca; plant evolution; stress response; terpenes; tissue-specific expression
Comparative genomics can inform us about the processes of mutation and selection across diverse taxa. Among seed plants, gymnosperms have been lacking in genomic comparisons. Recent EST and full-length cDNA collections for two conifers, Sitka spruce (Picea sitchensis) and loblolly pine (Pinus taeda), together with full genome sequences for two angiosperms, Arabidopsis thaliana and poplar (Populus trichocarpa), offer an opportunity to infer the evolutionary processes underlying thousands of orthologous protein-coding genes in gymnosperms compared with an angiosperm orthologue set.
Based upon pairwise comparisons of 3,723 spruce and pine orthologues, we found an average synonymous genetic distance (dS) of 0.191, and an average dN/dS ratio of 0.314. Using a fossil-established divergence time of 140 million years between spruce and pine, we extrapolated a nucleotide substitution rate of 0.68 × 10-9 synonymous substitutions per site per year. When compared to angiosperms, this indicates a dramatically slower rate of nucleotide substitution rates in conifers: on average 15-fold. Coincidentally, we found a three-fold higher dN/dS for the spruce-pine lineage compared to the poplar-Arabidopsis lineage. This joint occurrence of a slower evolutionary rate in conifers with higher dN/dS, and possibly positive selection, showcases the uniqueness of conifer genome evolution.
Our results are in line with documented reduced nucleotide diversity, conservative genome evolution and low rates of diversification in conifers on the one hand and numerous examples of local adaptation in conifers on the other hand. We propose that reduced levels of nucleotide mutation in large and long-lived conifer trees, coupled with large effective population size, were the main factors leading to slow substitution rates but retention of beneficial mutations.
In conifers, terpene synthases (TPSs) of the gymnosperm-specific TPS-d subfamily form a diverse array of mono-, sesqui-, and diterpenoid compounds, which are components of the oleoresin secretions and volatile emissions. These compounds contribute to defence against herbivores and pathogens and perhaps also protect against abiotic stress.
The availability of extensive transcriptome resources in the form of expressed sequence tags (ESTs) and full-length cDNAs in several spruce (Picea) species allowed us to estimate that a conifer genome contains at least 69 unique and transcriptionally active TPS genes. This number is comparable to the number of TPSs found in any of the sequenced and well-annotated angiosperm genomes. We functionally characterized a total of 21 spruce TPSs: 12 from Sitka spruce (P. sitchensis), 5 from white spruce (P. glauca), and 4 from hybrid white spruce (P. glauca × P. engelmannii), which included 15 monoterpene synthases, 4 sesquiterpene synthases, and 2 diterpene synthases.
The functional diversity of these characterized TPSs parallels the diversity of terpenoids found in the oleoresin and volatile emissions of Sitka spruce and provides a context for understanding this chemical diversity at the molecular and mechanistic levels. The comparative characterization of Sitka spruce and Norway spruce diterpene synthases revealed the natural occurrence of TPS sequence variants between closely related spruce species, confirming a previous prediction from site-directed mutagenesis and modelling.
Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.
As part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.
This paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.
Transcription factors of the basic leucine zipper (bZIP) family control important processes in all eukaryotes. In plants, bZIPs are regulators of many central developmental and physiological processes including photomorphogenesis, leaf and seed formation, energy homeostasis, and abiotic and biotic stress responses. Here we performed a comprehensive phylogenetic analysis of bZIP genes from algae, mosses, ferns, gymnosperms and angiosperms.
We identified 13 groups of bZIP homologues in angiosperms, three more than known before, that represent 34 Possible Groups of Orthologues (PoGOs). The 34 PoGOs may correspond to the complete set of ancestral angiosperm bZIP genes that participated in the diversification of flowering plants. Homologous genes dedicated to seed-related processes and ABA-mediated stress responses originated in the common ancestor of seed plants, and three groups of homologues emerged in the angiosperm lineage, of which one group plays a role in optimizing the use of energy.
Our data suggest that the ancestor of green plants possessed four bZIP genes functionally involved in oxidative stress and unfolded protein responses that are bZIP-mediated processes in all eukaryotes, but also in light-dependent regulations. The four founder genes amplified and diverged significantly, generating traits that benefited the colonization of new environments.
Background and Aims
During embryo development in most gymnosperms, the establishment of the shoot apical meristem (SAM) occurs concomitantly with the formation of a crown of cotyledons surrounding the SAM. It has previously been shown that the differentiation of cotyledons in somatic embryos of Picea abies is dependent on polar auxin transport (PAT). In the angiosperm model plant, Arabidopsis thaliana, the establishment of cotyledonary boundaries and the embryonal SAM is dependent on PAT and the expression of the CUP-SHAPED COTYLEDON (CUC) genes, which belong to the large NAC gene family. The aim of this study was to characterize CUC-like genes in a gymnosperm, and to elucidate their expression during SAM and cotyledon differentiation, and in response to PAT.
Sixteen Picea glauca NAC sequences were identified in GenBank and deployed to different clades within the NAC gene family using maximum parsimony analysis and Bayesian inference. Motifs conserved between angiosperms and gymnosperms were analysed using the motif discovery tool MEME. Expression profiles during embryo development were produced using quantitative real-time PCR. Protein conservation was analysed by introducing a P. abies CUC orthologue into the A. thaliana cuc1cuc2 double mutant.
Two full-length CUC-like cDNAs denoted PaNAC01 and PaNAC02 were cloned from P. abies. PaNAC01, but not PaNAC02, harbours previously characterized functional motifs in CUC1 and CUC2. The expression profile of PaNAC01 showed that the gene is PAT regulated and associated with SAM differentiation and cotyledon formation. Furthermore, PaNAC01 could functionally substitute for CUC2 in the A. thaliana cuc1cuc2 double mutant.
The results show that CUC-like genes with distinct signature motifs existed before the separation of angiosperms and gymnosperms approx. 300 million years ago, and suggest a conserved function between PaNAC01 and CUC1/CUC2.
Angiosperm; CUP-SHAPED COTYLEDONS (CUC); embryo patterning; gymnosperm; NAC, Picea abies; P. glauca; polar auxin transport (PAT); shoot apical meristem (SAM); somatic embryogenesis
Homeodomain-leucine zipper (HD-ZIP) proteins are plant-specific transcriptional factors known to play crucial roles in plant development. Although sequence phylogeny analysis of Populus HD-ZIPs was carried out in a previous study, no systematic analysis incorporating genome organization, gene structure, and expression compendium has been conducted in model tree species Populus thus far.
In this study, a comprehensive analysis of Populus HD-ZIP gene family was performed. Sixty-three full-length HD-ZIP genes were found in Populus genome. These Populus HD-ZIP genes were phylogenetically clustered into four distinct subfamilies (HD-ZIP I–IV) and predominately distributed across 17 linkage groups (LG). Fifty genes from 25 Populus paralogous pairs were located in the duplicated blocks of Populus genome and then preferentially retained during the sequential evolutionary courses. Genomic organization analyses indicated that purifying selection has played a pivotal role in the retention and maintenance of Populus HD-ZIP gene family. Microarray analysis has shown that 21 Populus paralogous pairs have been differentially expressed across different tissues and under various stresses, with five paralogous pairs showing nearly identical expression patterns, 13 paralogous pairs being partially redundant and three paralogous pairs diversifying significantly. Quantitative real-time RT-PCR (qRT-PCR) analysis performed on 16 selected Populus HD-ZIP genes in different tissues and under both drought and salinity stresses confirms their tissue-specific and stress-inducible expression patterns.
Genomic organizations indicated that segmental duplications contributed significantly to the expansion of Populus HD-ZIP gene family. Exon/intron organization and conserved motif composition of Populus HD-ZIPs are highly conservative in the same subfamily, suggesting the members in the same subfamilies may also have conservative functionalities. Microarray and qRT-PCR analyses showed that 89% (56 out of 63) of Populus HD-ZIPs were duplicate genes that might have been retained by substantial subfunctionalization. Taken together, these observations may lay the foundation for future functional analysis of Populus HD-ZIP genes to unravel their biological roles.
Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood.
Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA.
Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes.
Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling.
To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago.
Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
Angiosperm; duplication; evolution; gene families; genetic map; gymnosperm; phylogenomics; Picea; spruce; structural genomics
Lignin is a phenolic heteropolymer in secondary cell walls that plays a major role in the development of plants and their defense against pathogens. The biosynthesis of monolignols, which represent the main component of lignin involves many enzymes. The cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis as it catalyzes the final step in the synthesis of monolignols. The CAD gene family has been studied in Arabidopsis thaliana, Oryza sativa and partially in Populus. This is the first comprehensive study on the CAD gene family in woody plants including genome organization, gene structure, phylogeny across land plant lineages, and expression profiling in Populus.
The phylogenetic analyses showed that CAD genes fall into three main classes (clades), one of which is represented by CAD sequences from gymnosperms and angiosperms. The other two clades are represented by sequences only from angiosperms. All Populus CAD genes, except PoptrCAD 4 are distributed in Class II and Class III. CAD genes associated with xylem development (PoptrCAD 4 and PoptrCAD 10) belong to Class I and Class II. Most of the CAD genes are physically distributed on duplicated blocks and are still in conserved locations on the homeologous duplicated blocks. Promoter analysis of CAD genes revealed several motifs involved in gene expression modulation under various biological and physiological processes. The CAD genes showed different expression patterns in poplar with only two genes preferentially expressed in xylem tissues during lignin biosynthesis.
The phylogeny of CAD genes suggests that the radiation of this gene family may have occurred in the early ancestry of angiosperms. Gene distribution on the chromosomes of Populus showed that both large scale and tandem duplications contributed significantly to the CAD gene family expansion. The duplication of several CAD genes seems to be associated with a genome duplication event that happened in the ancestor of Salicaceae. Phylogenetic analyses associated with expression profiling and results from previous studies suggest that CAD genes involved in wood development belong to Class I and Class II. The other CAD genes from Class II and Class III may function in plant tissues under biotic stresses. The conservation of most duplicated CAD genes, the differential distribution of motifs in their promoter regions, and the divergence of their expression profiles in various tissues of Populus plants indicate that genes in the CAD family have evolved tissue-specialized expression profiles and may have divergent functions.
Phenylalanine ammonia lyase (PAL) is a key enzyme of the phenylpropanoid pathway that catalyzes the deamination of phenylalanine to trans-cinnamic acid, a precursor for the lignin and flavonoid biosynthetic pathways. To date, PAL genes have been less extensively studied in gymnosperms than in angiosperms. Our interest in PAL genes stems from their potential role in the defense responses of Pinus taeda, especially with respect to lignification and production of low molecular weight phenolic compounds under various biotic and abiotic stimuli. In contrast to all angiosperms for which reference genome sequences are available, P. taeda has previously been characterized as having only a single PAL gene. Our objective was to re-evaluate this finding, assess the evolutionary history of PAL genes across major angiosperm and gymnosperm lineages, and characterize PAL gene expression patterns in Pinus taeda.
We compiled a large set of PAL genes from the largest transcript dataset available for P. taeda and other conifers. The transcript assemblies for P. taeda were validated through sequencing of PCR products amplified using gene-specific primers based on the putative PAL gene assemblies. Verified PAL gene sequences were aligned and a gene tree was estimated. The resulting gene tree was reconciled with a known species tree and the time points for gene duplication events were inferred relative to the divergence of major plant lineages.
In contrast to angiosperms, gymnosperms have retained a diverse set of PAL genes distributed among three major clades that arose from gene duplication events predating the divergence of these two seed plant lineages. Whereas multiple PAL genes have been identified in sequenced angiosperm genomes, all characterized angiosperm PAL genes form a single clade in the gene PAL tree, suggesting they are derived from a single gene in an ancestral angiosperm genome. The five distinct PAL genes detected and verified in P. taeda were derived from a combination of duplication events predating and postdating the divergence of angiosperms and gymnosperms.
Gymnosperms have a more phylogenetically diverse set of PAL genes than angiosperms. This inference has contrasting implications for the evolution of PAL gene function in gymnosperms and angiosperms.
Background and Aims
The closely related NAC family genes NO APICAL MERISTEM (NAM) and CUP-SHAPED COTYLEDON3 (CUC3) regulate the formation of boundaries within and between plant organs. NAM is post-transcriptionally regulated by miR164, whereas CUC3 is not. To gain insight into the evolution of NAM and CUC3 in the angiosperms, we analysed orthologous genes in early-diverging ANA-grade angiosperms and gymnosperms.
We obtained NAM- and CUC3-like sequences from diverse angiosperms and gymnosperms by a combination of reverse transcriptase PCR, cDNA library screening and database searching, and then investigated their phylogenetic relationships by performing maximum-likelihood reconstructions. We also studied the spatial expression patterns of NAM, CUC3 and MIR164 orthologues in female reproductive tissues of Amborella trichopoda, the probable sister to all other flowering plants.
Separate NAM and CUC3 orthologues were found in early-diverging angiosperms, but not in gymnosperms, which contained putative orthologues of the entire NAM + CUC3 clade that possessed sites of regulation by miR164. Multiple paralogues of NAM or CUC3 genes were noted in certain taxa, including Brassicaceae. Expression of NAM, CUC3 and MIR164 orthologues from Am. trichopoda was found to co-localize in ovules at the developmental boundary between the chalaza and nucellus.
The NAM and CUC3 lineages were generated by duplication, and CUC3 was subsequently lost regulation by miR164, prior to the last common ancestor of the extant angiosperms. However, the paralogous NAM clade genes CUC1 and CUC2 were generated by a more recent duplication, near the base of Brassicaceae. The function of NAM and CUC3 in defining a developmental boundary in the ovule appears to have been conserved since the last common ancestor of the flowering plants, as does the post-transcriptional regulation in ovule tissues of NAM by miR164.
CUP-SHAPED COTYLEDON; CUC; NO APICAL MERISTEM; NAM; NAC; MIR164; Amborella trichopoda; Cabomba aquatica; Ginkgo biloba; angiosperm; gymnosperm
The angiosperm radiation has been linked to sharp declines in gymnosperm diversity and the virtual elimination of conifers from the tropics. The conifer family Podocarpaceae stands as an exception with highest species diversity in wet equatorial forests. It has been hypothesized that efficient light harvesting by the highly flattened leaves of several podocarp genera facilitates persistence with canopy-forming angiosperms, and the angiosperm ecological radiation may have preferentially favoured the diversification of these lineages. To test these ideas, we develop a molecular phylogeny for Podocarpaceae using Bayesian-relaxed clock methods incorporating fossil time constraints. We find several independent origins of flattened foliage types, and that these lineages have diversified predominantly through the Cenozoic and therefore among canopy-forming angiosperms. The onset of sustained foliage flattening podocarp diversification is coincident with a declining diversification rate of scale/needle-leaved lineages and also with ecological and climatic transformations linked to angiosperm foliar evolution. We demonstrate that climatic range evolution is contingent on the underlying state for leaf morphology. Taken together, our findings imply that as angiosperms came to dominate most terrestrial ecosystems, competitive interactions at the foliar level have profoundly shaped podocarp geography and as a consequence, rates of lineage diversification.
Podocarpaceae; molecular phylogeny; divergence time estimates; plant evolution
Forested ecosystems diversified more than 350 Ma to become major engines of continental silicate weathering, regulating the Earth's atmospheric carbon dioxide concentration by driving calcium export into ocean carbonates. Our field experiments with mature trees demonstrate intensification of this weathering engine as tree lineages diversified in concert with their symbiotic mycorrhizal fungi. Preferential hyphal colonization of the calcium silicate-bearing rock, basalt, progressively increased with advancement from arbuscular mycorrhizal (AM) to later, independently evolved ectomycorrhizal (EM) fungi, and from gymnosperm to angiosperm hosts with both fungal groups. This led to ‘trenching’ of silicate mineral surfaces by AM and EM fungi, with EM gymnosperms and angiosperms releasing calcium from basalt at twice the rate of AM gymnosperms. Our findings indicate mycorrhiza-driven weathering may have originated hundreds of millions of years earlier than previously recognized and subsequently intensified with the evolution of trees and mycorrhizas to affect the Earth's long-term CO2 and climate history.
biological weathering; arbuscular mycorrhiza; ectomycorrhiza; land plant evolution; silicate mineral weathering; global change ecology
The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions.
As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa × P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in poplar leaves attacked by forest tent caterpillars.
This study has generated a high-quality FLcDNA resource for poplar and the third largest FLcDNA collection published to date for any plant species. We successfully used the FLcDNA sequences to reassess gene prediction in the poplar genome sequence, perform comparative sequence annotation, and identify differentially expressed transcripts associated with defense against insects. The FLcDNA sequences will be essential to the ongoing curation and annotation of the poplar genome, in particular for targeting gaps in the current genome assembly and further improvement of gene predictions. The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth. Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.
Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer.
We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing.
We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
Growing evidence of morphological diversity in angiosperm flowers, seeds and pollen from the mid Cretaceous and the presence of derived lineages from increasingly older geological deposits both imply that the timing of early angiosperm cladogenesis is older than fossil-based estimates have indicated. An alternative to fossils for calibrating the phylogeny comes from divergence in DNA sequence data. Here, angiosperm divergence times are estimated using non-parametric rate smoothing and a three-gene dataset covering ca. 75% of all angiosperm families recognized in recent classifications. The results provide an initial hypothesis of angiosperm diversification times. Using an internal calibration point, an independent evaluation of angiosperm and eudicot origins is performed. The origin of the crown group of extant angiosperms is indicated to be Early to Middle Jurassic (179-158 Myr), and the origin of eudicots is resolved as Late Jurassic to mid Cretaceous (147-131 Myr). Both estimates, despite a conservative calibration point, are older than current fossil-based estimates.
Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.
We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.
We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.
The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms.
The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements.
The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.
Homeodomain-leucine zipper (HD-Zip) proteins are transcription factors unique to plants and are encoded by more than 25 genes in Arabidopsis thaliana. Based on sequence analyses these proteins have been classified into four distinct groups: HD-Zip I–IV. HD-Zip proteins are characterized by the presence of two functional domains; a homeodomain (HD) responsible for DNA binding and a leucine zipper domain (Zip) located immediately C-terminal to the homeodomain and involved in protein-protein interaction. Despite sequence similarities HD-ZIP proteins participate in a variety of processes during plant growth and development. HD-Zip I proteins are generally involved in responses related to abiotic stress, abscisic acid (ABA), blue light, de-etiolation and embryogenesis. HD-Zip II proteins participate in light response, shade avoidance and auxin signalling. Members of the third group (HD-Zip III) control embryogenesis, leaf polarity, lateral organ initiation and meristem function. HD-Zip IV proteins play significant roles during anthocyanin accumulation, differentiation of epidermal cells, trichome formation and root development.
homeodomain-leucine zipper; development; structure; function; signaling; embryogenesis
The developmental mechanisms regulating cell differentiation and patterning during the secondary growth of woody tissues are poorly understood. Class III HD ZIP transcription factors are evolutionarily ancient and play fundamental roles in various aspects of plant development. Here we investigate the role of a Class III HD ZIP transcription factor, POPCORONA, during secondary growth of woody stems. Transgenic Populus (poplar) trees expressing either a miRNA-resistant POPCORONA or a synthetic miRNA targeting POPCORONA were used to infer function of POPCORONA during secondary growth. Whole plant, histological, and gene expression changes were compared for transgenic and wild-type control plants. Synthetic miRNA knock down of POPCORONA results in abnormal lignification in cells of the pith, while overexpression of a miRNA-resistant POPCORONA results in delayed lignification of xylem and phloem fibers during secondary growth. POPCORONA misexpression also results in coordinated changes in expression of genes within a previously described transcriptional network regulating cell differentiation and cell wall biosynthesis, and hormone-related genes associated with fiber differentiation. POPCORONA illustrates another function of Class III HD ZIPs: regulating cell differentiation during secondary growth.
The impact of transgenic white spruce [Picea glauca (Moench) Voss] containing the endochitinase gene (ech42) on soil fungal biomass and on the ectendomycorrhizal fungi Wilcoxina spp. was tested using a greenhouse trial. The measured level of endochitinase in roots of transgenic white spruce was up to 10 times higher than that in roots of nontransformed white spruce. The level of endochitinase in root exudates of three of four ech42-transformed lines was significantly greater than that in controls. Analysis soil ergosterol showed that the amount of fungal biomass in soil samples from control white spruce was slightly larger than that in soil samples from ech42-transformed white spruce. Nevertheless, the difference was not statistically significant. The rates of mycorrhizal colonization of transformed lines and controls were similar. Sequencing the internal transcribed spacer rRNA region revealed that the root tips were colonized by the ectendomycorrhizal fungi Wilcoxina spp. and the dark septate endophyte Phialocephala fortinii. Colonization of root tips by Wilcoxina spp. was monitored by real-time PCR to quantify the fungus present during the development of ectendomycorrhizal symbiosis in ech42-transformed and control lines. The numbers of Wilcoxina molecules in the transformed lines and the controls were not significantly different (P > 0.05, as determined by analysis of covariance), indicating that in spite of higher levels of endochitinase expression, mycorrhization was not inhibited. Our results indicate that the higher levels of chitinolytic activity in root exudates and root tissues from ech42-transformed lines did not alter the soil fungal biomass or the development of ectendomycorrhizal symbiosis involving Wilcoxina spp.
Background and Aims
ADP-glucose pyrophosphorylase (AGPase) is a key enzyme of starch biosynthesis. In the green plant lineage, it is composed of two large (LSU) and two small (SSU) sub-units encoded by paralogous genes, as a consequence of several rounds of duplication. First, our aim was to detect specific patterns of molecular evolution following duplication events and the divergence between monocotyledons and dicotyledons. Secondly, we investigated coevolution between amino acids both within and between sub-units.
A phylogeny of each AGPase sub-unit was built using all gymnosperm and angiosperm sequences available in databases. Accelerated evolution along specific branches was tested using the ratio of the non-synonymous to the synonymous substitution rate. Coevolution between amino acids was investigated taking into account compensatory changes between co-substitutions.
We showed that SSU paralogues evolved under high functional constraints during angiosperm radiation, with a significant level of coevolution between amino acids that participate in SSU major functions. In contrast, in the LSU paralogues, we identified residues under positive selection (1) following the first LSU duplication that gave rise to two paralogues mainly expressed in angiosperm source and sink tissues, respectively; and (2) following the emergence of grass-specific paralogues expressed in the endosperm. Finally, we found coevolution between residues that belong to the interaction domains of both sub-units.
Our results support the view that coevolution among amino acid residues, especially those lying in the interaction domain of each sub-unit, played an important role in AGPase evolution. First, within SSU, coevolution allowed compensating mutations in a highly constrained context. Secondly, the LSU paralogues probably acquired tissue-specific expression and regulatory properties via the coevolution between sub-unit interacting domains. Finally, the pattern we observed during LSU evolution is consistent with repeated sub-functionalization under ‘Escape from Adaptive Conflict’, a model rarely illustrated in the literature.
Angiosperms; monocotyledons; dicotyledons; paralogue genes; molecular evolution; coevolution; neofunctionalization; subfunctionalization; starch synthesis; AGPase
High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions.
We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or ‘poplar’). A total of 20.76Mb (5%) of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end), 86.8% of the bait regions were on average sequenced at a depth ≥10X. Few off-target regions (>250bp away from any bait) were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (≥10X) to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection.
Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies.
Poplar; Sureselect; Exome; Population genomics