1.  Conifer R2R3-MYB transcription factors: sequence analyses and gene expression in wood-forming tissues of white spruce (Picea glauca) 
BMC Plant Biology  2007;7:17.
Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms.
We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction.
Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco.
PMCID: PMC1851958  PMID: 17397551
2.  The cinnamyl alcohol dehydrogenase gene family in Populus: phylogeny, organization, and expression 
BMC Plant Biology  2009;9:26.
Lignin is a phenolic heteropolymer in secondary cell walls that plays a major role in the development of plants and their defense against pathogens. The biosynthesis of monolignols, which represent the main component of lignin involves many enzymes. The cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis as it catalyzes the final step in the synthesis of monolignols. The CAD gene family has been studied in Arabidopsis thaliana, Oryza sativa and partially in Populus. This is the first comprehensive study on the CAD gene family in woody plants including genome organization, gene structure, phylogeny across land plant lineages, and expression profiling in Populus.
The phylogenetic analyses showed that CAD genes fall into three main classes (clades), one of which is represented by CAD sequences from gymnosperms and angiosperms. The other two clades are represented by sequences only from angiosperms. All Populus CAD genes, except PoptrCAD 4 are distributed in Class II and Class III. CAD genes associated with xylem development (PoptrCAD 4 and PoptrCAD 10) belong to Class I and Class II. Most of the CAD genes are physically distributed on duplicated blocks and are still in conserved locations on the homeologous duplicated blocks. Promoter analysis of CAD genes revealed several motifs involved in gene expression modulation under various biological and physiological processes. The CAD genes showed different expression patterns in poplar with only two genes preferentially expressed in xylem tissues during lignin biosynthesis.
The phylogeny of CAD genes suggests that the radiation of this gene family may have occurred in the early ancestry of angiosperms. Gene distribution on the chromosomes of Populus showed that both large scale and tandem duplications contributed significantly to the CAD gene family expansion. The duplication of several CAD genes seems to be associated with a genome duplication event that happened in the ancestor of Salicaceae. Phylogenetic analyses associated with expression profiling and results from previous studies suggest that CAD genes involved in wood development belong to Class I and Class II. The other CAD genes from Class II and Class III may function in plant tissues under biotic stresses. The conservation of most duplicated CAD genes, the differential distribution of motifs in their promoter regions, and the divergence of their expression profiles in various tissues of Populus plants indicate that genes in the CAD family have evolved tissue-specialized expression profiles and may have divergent functions.
PMCID: PMC2662859  PMID: 19267902
3.  Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding 
BMC Genomics  2008;9:57.
The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions.
As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa × P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in poplar leaves attacked by forest tent caterpillars.
This study has generated a high-quality FLcDNA resource for poplar and the third largest FLcDNA collection published to date for any plant species. We successfully used the FLcDNA sequences to reassess gene prediction in the poplar genome sequence, perform comparative sequence annotation, and identify differentially expressed transcripts associated with defense against insects. The FLcDNA sequences will be essential to the ongoing curation and annotation of the poplar genome, in particular for targeting gaps in the current genome assembly and further improvement of gene predictions. The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth. Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.
PMCID: PMC2270264  PMID: 18230180
4.  A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis) 
BMC Genomics  2008;9:484.
Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.
As part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.
This paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.
PMCID: PMC2579922  PMID: 18854048
5.  Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms 
Comparative genomics can inform us about the processes of mutation and selection across diverse taxa. Among seed plants, gymnosperms have been lacking in genomic comparisons. Recent EST and full-length cDNA collections for two conifers, Sitka spruce (Picea sitchensis) and loblolly pine (Pinus taeda), together with full genome sequences for two angiosperms, Arabidopsis thaliana and poplar (Populus trichocarpa), offer an opportunity to infer the evolutionary processes underlying thousands of orthologous protein-coding genes in gymnosperms compared with an angiosperm orthologue set.
Based upon pairwise comparisons of 3,723 spruce and pine orthologues, we found an average synonymous genetic distance (dS) of 0.191, and an average dN/dS ratio of 0.314. Using a fossil-established divergence time of 140 million years between spruce and pine, we extrapolated a nucleotide substitution rate of 0.68 × 10-9 synonymous substitutions per site per year. When compared to angiosperms, this indicates a dramatically slower rate of nucleotide substitution rates in conifers: on average 15-fold. Coincidentally, we found a three-fold higher dN/dS for the spruce-pine lineage compared to the poplar-Arabidopsis lineage. This joint occurrence of a slower evolutionary rate in conifers with higher dN/dS, and possibly positive selection, showcases the uniqueness of conifer genome evolution.
Our results are in line with documented reduced nucleotide diversity, conservative genome evolution and low rates of diversification in conifers on the one hand and numerous examples of local adaptation in conifers on the other hand. We propose that reduced levels of nucleotide mutation in large and long-lived conifer trees, coupled with large effective population size, were the main factors leading to slow substitution rates but retention of beneficial mutations.
PMCID: PMC3328258  PMID: 22264329
6.  The phenylalanine ammonia lyase (PAL) gene family shows a gymnosperm-specific lineage 
BMC Genomics  2012;13(Suppl 3):S1.
Phenylalanine ammonia lyase (PAL) is a key enzyme of the phenylpropanoid pathway that catalyzes the deamination of phenylalanine to trans-cinnamic acid, a precursor for the lignin and flavonoid biosynthetic pathways. To date, PAL genes have been less extensively studied in gymnosperms than in angiosperms. Our interest in PAL genes stems from their potential role in the defense responses of Pinus taeda, especially with respect to lignification and production of low molecular weight phenolic compounds under various biotic and abiotic stimuli. In contrast to all angiosperms for which reference genome sequences are available, P. taeda has previously been characterized as having only a single PAL gene. Our objective was to re-evaluate this finding, assess the evolutionary history of PAL genes across major angiosperm and gymnosperm lineages, and characterize PAL gene expression patterns in Pinus taeda.
We compiled a large set of PAL genes from the largest transcript dataset available for P. taeda and other conifers. The transcript assemblies for P. taeda were validated through sequencing of PCR products amplified using gene-specific primers based on the putative PAL gene assemblies. Verified PAL gene sequences were aligned and a gene tree was estimated. The resulting gene tree was reconciled with a known species tree and the time points for gene duplication events were inferred relative to the divergence of major plant lineages.
In contrast to angiosperms, gymnosperms have retained a diverse set of PAL genes distributed among three major clades that arose from gene duplication events predating the divergence of these two seed plant lineages. Whereas multiple PAL genes have been identified in sequenced angiosperm genomes, all characterized angiosperm PAL genes form a single clade in the gene PAL tree, suggesting they are derived from a single gene in an ancestral angiosperm genome. The five distinct PAL genes detected and verified in P. taeda were derived from a combination of duplication events predating and postdating the divergence of angiosperms and gymnosperms.
Gymnosperms have a more phylogenetically diverse set of PAL genes than angiosperms. This inference has contrasting implications for the evolution of PAL gene function in gymnosperms and angiosperms.
PMCID: PMC3394424  PMID: 22759610
7.  Bioinformatic and phylogenetic analysis of the CLAVATA3/EMBRYO-SURROUNDING REGION (CLE) and the CLE-LIKE signal peptide genes in the Pinophyta 
BMC Plant Biology  2014;14:47.
There is a rapidly growing awareness that plant peptide signalling molecules are numerous and varied and they are known to play fundamental roles in angiosperm plant growth and development. Two closely related peptide signalling molecule families are the CLAVATA3-EMBRYO-SURROUNDING REGION (CLE) and CLE-LIKE (CLEL) genes, which encode precursors of secreted peptide ligands that have roles in meristem maintenance and root gravitropism. Progress in peptide signalling molecule research in gymnosperms has lagged behind that of angiosperms. We therefore sought to identify CLE and CLEL genes in gymnosperms and conduct a comparative analysis of these gene families with angiosperms.
We undertook a meta-analysis of the GenBank/EMBL/DDBJ gymnosperm EST database and the Picea abies and P. glauca genomes and identified 93 putative CLE genes and 11 CLEL genes among eight Pinophyta species, in the genera Cryptomeria, Pinus and Picea. The predicted conifer CLE and CLEL protein sequences had close phylogenetic relationships with their homologues in Arabidopsis. Notably, perfect conservation of the active CLE dodecapeptide in presumed orthologues of the Arabidopsis CLE41/44-TRACHEARY ELEMENT DIFFERENTIATION (TDIF) protein, an inhibitor of tracheary element (xylem) differentiation, was seen in all eight conifer species. We cloned the Pinus radiata CLE41/44-TDIF orthologues. These genes were preferentially expressed in phloem in planta as expected, but unexpectedly, also in differentiating tracheary element (TE) cultures. Surprisingly, transcript abundances of these TE differentiation-inhibitors sharply increased during early TE differentiation, suggesting that some cells differentiate into phloem cells in addition to TEs in these cultures. Applied CLE13 and CLE41/44 peptides inhibited root elongation in Pinus radiata seedlings. We show evidence that two CLEL genes are alternatively spliced via 3′-terminal acceptor exons encoding separate CLEL peptides.
The CLE and CLEL genes are found in conifers and they exhibit at least as much sequence diversity in these species as they do in other plant species. Only one CLE peptide sequence has been 100% conserved between gymnosperms and angiosperms over 300 million years of evolutionary history, the CLE41/44-TDIF peptide and its likely conifer orthologues. The preferential expression of these vascular development-regulating genes in phloem in conifers, as they are in dicot species, suggests close parallels in the regulation of secondary growth and wood formation in gymnosperm and dicot plants. Based on our bioinformatic analysis, we predict a novel mechanism of regulation of the expression of several conifer CLEL peptides, via alternative splicing resulting in the selection of alternative C-terminal exons encoding separate CLEL peptides.
PMCID: PMC4016512  PMID: 24529101
CLE peptide ligands; CLEL peptide ligands; Pinophyta; Conifers; Phylogenetic analysis; Pine tracheary element system
8.  A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers 
BMC Biology  2012;10:84.
Seed plants are composed of angiosperms and gymnosperms, which diverged from each other around 300 million years ago. While much light has been shed on the mechanisms and rate of genome evolution in flowering plants, such knowledge remains conspicuously meagre for the gymnosperms. Conifers are key representatives of gymnosperms and the sheer size of their genomes represents a significant challenge for characterization, sequencing and assembling.
To gain insight into the macro-organisation and long-term evolution of the conifer genome, we developed a genetic map involving 1,801 spruce genes. We designed a statistical approach based on kernel density estimation to analyse gene density and identified seven gene-rich isochors. Groups of co-localizing genes were also found that were transcriptionally co-regulated, indicative of functional clusters. Phylogenetic analyses of 157 gene families for which at least two duplicates were mapped on the spruce genome indicated that ancient gene duplicates shared by angiosperms and gymnosperms outnumbered conifer-specific duplicates by a ratio of eight to one. Ancient duplicates were much more translocated within and among spruce chromosomes than conifer-specific duplicates, which were mostly organised in tandem arrays. Both high synteny and collinearity were also observed between the genomes of spruce and pine, two conifers that diverged more than 100 million years ago.
Taken together, these results indicate that much genomic evolution has occurred in the seed plant lineage before the split between gymnosperms and angiosperms, and that the pace of evolution of the genome macro-structure has been much slower in the gymnosperm lineage leading to extent conifers than that seen for the same period of time in flowering plants. This trend is largely congruent with the contrasted rates of diversification and morphological evolution observed between these two groups of seed plants.
PMCID: PMC3519789  PMID: 23102090
Angiosperm; duplication; evolution; gene families; genetic map; gymnosperm; phylogenomics; Picea; spruce; structural genomics
9.  Poplar GTL1 Is a Ca2+/Calmodulin-Binding Transcription Factor that Functions in Plant Water Use Efficiency and Drought Tolerance 
PLoS ONE  2012;7(3):e32925.
Diminishing global fresh water availability has focused research to elucidate mechanisms of water use in poplar, an economically important species. A GT-2 family trihelix transcription factor that is a determinant of water use efficiency (WUE), PtaGTL1 (GT-2 like 1), was identified in Populus tremula × P. alba (clone 717-IB4). Like other GT-2 family members, PtaGTL1 contains both N- and C-terminal trihelix DNA binding domains. PtaGTL1 expression, driven by the Arabidopsis thaliana AtGTL1 promoter, suppressed the higher WUE and drought tolerance phenotypes of an Arabidopsis GTL1 loss-of-function mutation (gtl1-4). Genetic suppression of gtl1-4 was associated with increased stomatal density due to repression of Arabidopsis STOMATAL DENSITY AND DISTRIBUTION1 (AtSDD1), a negative regulator of stomatal development. Electrophoretic mobility shift assays (EMSA) indicated that a PtaGTL1 C-terminal DNA trihelix binding fragment (PtaGTL1-C) interacted with an AtSDD1 promoter fragment containing the GT3 box (GGTAAA), and this GT3 box was necessary for binding. PtaGTL1-C also interacted with a PtaSDD1 promoter fragment via the GT2 box (GGTAAT). PtaSDD1 encodes a protein with 60% primary sequence identity with AtSDD1. In vitro molecular interaction assays were used to determine that Ca2+-loaded calmodulin (CaM) binds to PtaGTL1-C, which was predicted to have a CaM-interaction domain in the first helix of the C-terminal trihelix DNA binding domain. These results indicate that, in Arabidopsis and poplar, GTL1 and SDD1 are fundamental components of stomatal lineage. In addition, PtaGTL1 is a Ca2+-CaM binding protein, which infers a mechanism by which environmental stimuli can induce Ca2+ signatures that would modulate stomatal development and regulate plant water use.
PMCID: PMC3292583  PMID: 22396800
10.  Large-scale screening of transcription factor–promoter interactions in spruce reveals a transcriptional network involved in vascular development 
Journal of Experimental Botany  2014;65(9):2319-2333.
This research aimed to investigate the role of diverse transcription factors (TFs) and to delineate gene regulatory networks directly in conifers at a relatively high-throughput level. The approach integrated sequence analyses, transcript profiling, and development of a conifer-specific activation assay. Transcript accumulation profiles of 102 TFs and potential target genes were clustered to identify groups of coordinately expressed genes. Several different patterns of transcript accumulation were observed by profiling in nine different organs and tissues: 27 genes were preferential to secondary xylem both in stems and roots, and other genes were preferential to phelloderm and periderm or were more ubiquitous. A robust system has been established as a screening approach to define which TFs have the ability to regulate a given promoter in planta. Trans-activation or repression effects were observed in 30% of TF–candidate gene promoter combinations. As a proof of concept, phylogenetic analysis and expression and trans-activation data were used to demonstrate that two spruce NAC-domain proteins most likely play key roles in secondary vascular growth as observed in other plant species. This study tested many TFs from diverse families in a conifer tree species, which broadens the knowledge of promoter–TF interactions in wood development and enables comparisons of gene regulatory networks found in angiosperms and gymnosperms.
PMCID: PMC4036505  PMID: 24713992
Conifer; expression pattern; Picea glauca; secondary cell wall; somatic embryogenesis; trans-activation assay; transcription factor; xylem.
11.  Expression analysis of LIM gene family in poplar, toward an updated phylogenetic classification 
BMC Research Notes  2012;5:102.
Plant LIM domain proteins may act as transcriptional activators of lignin biosynthesis and/or as actin binding and bundling proteins. Plant LIM genes have evolved in phylogenetic subgroups differing in their expression profiles: in the whole plant or specifically in pollen. However, several poplar PtLIM genes belong to uncharacterized monophyletic subgroups and the expression patterns of the LIM gene family in a woody plant have not been studied.
In this work, the expression pattern of the twelve duplicated poplar PtLIM genes has been investigated by semi quantitative RT-PCR in different vegetative and reproductive tissues. As in other plant species, poplar PtLIM genes were widely expressed in the tree or in particular tissues. Especially, PtXLIM1a, PtXLIM1b and PtWLIM1b genes were preferentially expressed in the secondary xylem, suggesting a specific function in wood formation. Moreover, the expression of these genes and of the PtPLIM2a gene was increased in tension wood. Western-blot analysis confirmed the preferential expression of PtXLIM1a protein during xylem differentiation and tension wood formation. Genes classified within the pollen specific PLIM2 and PLIM2-like subgroups were all strongly expressed in pollen but also in cottony hairs. Interestingly, pairs of duplicated PtLIM genes exhibited different expression patterns indicating subfunctionalisations in specific tissues.
The strong expression of several LIM genes in cottony hairs and germinating pollen, as well as in xylem fibers suggests an involvement of plant LIM domain proteins in the control of cell expansion. Comparisons of expression profiles of poplar LIM genes with the published functions of closely related plant LIM genes suggest conserved functions in the areas of lignin biosynthesis, pollen tube growth and mechanical stress response. Based on these results, we propose a novel nomenclature of poplar LIM domain proteins.
PMCID: PMC3392731  PMID: 22339987
12.  Transcriptome profiling in conifers and the PiceaGenExpress database show patterns of diversification within gene families and interspecific conservation in vascular gene expression 
BMC Genomics  2012;13:434.
Conifers have very large genomes (13 to 30 Gigabases) that are mostly uncharacterized although extensive cDNA resources have recently become available. This report presents a global overview of transcriptome variation in a conifer tree and documents conservation and diversity of gene expression patterns among major vegetative tissues.
An oligonucleotide microarray was developed from Picea glauca and P. sitchensis cDNA datasets. It represents 23,853 unique genes and was shown to be suitable for transcriptome profiling in several species. A comparison of secondary xylem and phelloderm tissues showed that preferential expression in these vascular tissues was highly conserved among Picea spp. RNA-Sequencing strongly confirmed tissue preferential expression and provided a robust validation of the microarray design. A small database of transcription profiles called PiceaGenExpress was developed from over 150 hybridizations spanning eight major tissue types. In total, transcripts were detected for 92% of the genes on the microarray, in at least one tissue. Non-annotated genes were predominantly expressed at low levels in fewer tissues than genes of known or predicted function. Diversity of expression within gene families may be rapidly assessed from PiceaGenExpress. In conifer trees, dehydrins and late embryogenesis abundant (LEA) osmotic regulation proteins occur in large gene families compared to angiosperms. Strong contrasts and low diversity was observed in the dehydrin family, while diverse patterns suggested a greater degree of diversification among LEAs.
Together, the oligonucleotide microarray and the PiceaGenExpress database represent the first resource of this kind for gymnosperm plants. The spruce transcriptome analysis reported here is expected to accelerate genetic studies in the large and important group comprised of conifer trees.
PMCID: PMC3534630  PMID: 22931377
13.  Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don 
BMC Genomics  2009;10:41.
Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs) from genes involved in wood formation in radiata pine.
Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology) terms and their functions are unknown or unclassified. More than half (52.1%) of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of wood development.
The first large scale genomic resource in radiata pine was generated from six developing xylem cDNA libraries. Cell wall-related genes and transcription factors were identified. Juvenile earlywood has a distinct transcriptome, which is likely to contribute to the undesirable properties of juvenile wood in radiata pine. The publicly available resource of radiata pine will also be valuable for gene function studies and comparative genomics in forest trees.
PMCID: PMC2636829  PMID: 19159482
14.  The Populus Class III HD ZIP Transcription Factor POPCORONA Affects Cell Differentiation during Secondary Growth of Woody Stems 
PLoS ONE  2011;6(2):e17458.
The developmental mechanisms regulating cell differentiation and patterning during the secondary growth of woody tissues are poorly understood. Class III HD ZIP transcription factors are evolutionarily ancient and play fundamental roles in various aspects of plant development. Here we investigate the role of a Class III HD ZIP transcription factor, POPCORONA, during secondary growth of woody stems. Transgenic Populus (poplar) trees expressing either a miRNA-resistant POPCORONA or a synthetic miRNA targeting POPCORONA were used to infer function of POPCORONA during secondary growth. Whole plant, histological, and gene expression changes were compared for transgenic and wild-type control plants. Synthetic miRNA knock down of POPCORONA results in abnormal lignification in cells of the pith, while overexpression of a miRNA-resistant POPCORONA results in delayed lignification of xylem and phloem fibers during secondary growth. POPCORONA misexpression also results in coordinated changes in expression of genes within a previously described transcriptional network regulating cell differentiation and cell wall biosynthesis, and hormone-related genes associated with fiber differentiation. POPCORONA illustrates another function of Class III HD ZIPs: regulating cell differentiation during secondary growth.
PMCID: PMC3046250  PMID: 21386988
15.  The Role of bZIP Transcription Factors in Green Plant Evolution: Adaptive Features Emerging from Four Founder Genes 
PLoS ONE  2008;3(8):e2944.
Transcription factors of the basic leucine zipper (bZIP) family control important processes in all eukaryotes. In plants, bZIPs are regulators of many central developmental and physiological processes including photomorphogenesis, leaf and seed formation, energy homeostasis, and abiotic and biotic stress responses. Here we performed a comprehensive phylogenetic analysis of bZIP genes from algae, mosses, ferns, gymnosperms and angiosperms.
Methodology/Principal Findings
We identified 13 groups of bZIP homologues in angiosperms, three more than known before, that represent 34 Possible Groups of Orthologues (PoGOs). The 34 PoGOs may correspond to the complete set of ancestral angiosperm bZIP genes that participated in the diversification of flowering plants. Homologous genes dedicated to seed-related processes and ABA-mediated stress responses originated in the common ancestor of seed plants, and three groups of homologues emerged in the angiosperm lineage, of which one group plays a role in optimizing the use of energy.
Our data suggest that the ancestor of green plants possessed four bZIP genes functionally involved in oxidative stress and unfolded protein responses that are bZIP-mediated processes in all eukaryotes, but also in light-dependent regulations. The four founder genes amplified and diverged significantly, generating traits that benefited the colonization of new environments.
PMCID: PMC2492810  PMID: 18698409
16.  Subgroup 4 R2R3-MYBs in conifer trees: gene family expansion and contribution to the isoprenoid- and flavonoid-oriented responses 
Journal of Experimental Botany  2010;61(14):3847-3864.
Transcription factors play a fundamental role in plants by orchestrating temporal and spatial gene expression in response to environmental stimuli. Several R2R3-MYB genes of the Arabidopsis subgroup 4 (Sg4) share a C-terminal EAR motif signature recently linked to stress response in angiosperm plants. It is reported here that nearly all Sg4 MYB genes in the conifer trees Picea glauca (white spruce) and Pinus taeda (loblolly pine) form a monophyletic clade (Sg4C) that expanded following the split of gymnosperm and angiosperm lineages. Deeper sequencing in P. glauca identified 10 distinct Sg4C sequences, indicating over-represention of Sg4 sequences compared with angiosperms such as Arabidopsis, Oryza, Vitis, and Populus. The Sg4C MYBs share the EAR motif core. Many of them had stress-responsive transcript profiles after wounding, jasmonic acid (JA) treatment, or exposure to cold in P. glauca and P. taeda, with MYB14 transcripts accumulating most strongly and rapidly. Functional characterization was initiated by expressing the P. taeda MYB14 (PtMYB14) gene in transgenic P. glauca plantlets with a tissue-preferential promoter (cinnamyl alcohol dehydrogenase) and a ubiquitous gene promoter (ubiquitin). Histological, metabolite, and transcript (microarray and targeted quantitiative real-time PCR) analyses of PtMYB14 transgenics, coupled with mechanical wounding and JA application experiments on wild-type plantlets, allowed identification of PtMYB14 as a putative regulator of an isoprenoid-oriented response that leads to the accumulation of sesquiterpene in conifers. Data further suggested that PtMYB14 may contribute to a broad defence response implicating flavonoids. This study also addresses the potential involvement of closely related Sg4C sequences in stress responses and plant evolution.
PMCID: PMC2935864  PMID: 20732878
Gene family expansion; gymnosperms; isoprenoid metabolism; MYB transcription factors; microarray RNA profiling; Picea glauca; plant evolution; stress response; terpenes; tissue-specific expression
17.  A new genomic resource dedicated to wood formation in Eucalyptus 
BMC Plant Biology  2009;9:36.
Renowned for their fast growth, valuable wood properties and wide adaptability, Eucalyptus species are amongst the most planted hardwoods in the world, yet they are still at the early stages of domestication because conventional breeding is slow and costly. Thus, there is huge potential for marker-assisted breeding programs to improve traits such as wood properties. To this end, the sequencing, analysis and annotation of a large collection of expressed sequences tags (ESTs) from genes involved in wood formation in Eucalyptus would provide a valuable resource.
We report here the normalization and sequencing of a cDNA library from developing Eucalyptus secondary xylem, as well as the construction and sequencing of two subtractive libraries (juvenile versus mature wood and vice versa). A total of 9,222 high quality sequences were collected from about 10,000 cDNA clones. The EST assembly generated a set of 3,857 wood-related unigenes including 2,461 contigs (Cg) and 1,396 singletons (Sg) that we named 'EUCAWOOD'. About 65% of the EUCAWOOD sequences produced matches with poplar, grapevine, Arabidopsis and rice protein sequence databases. BlastX searches of the Uniref100 protein database allowed us to allocate gene ontology (GO) and protein family terms to the EUCAWOOD unigenes. This annotation of the EUCAWOOD set revealed key functional categories involved in xylogenesis. For instance, 422 sequences matched various gene families involved in biosynthesis and assembly of primary and secondary cell walls. Interestingly, 141 sequences were annotated as transcription factors, some of them being orthologs of regulators known to be involved in xylogenesis. The EUCAWOOD dataset was also mined for genomic simple sequence repeat markers, yielding a total of 639 putative microsatellites. Finally, a publicly accessible database was created, supporting multiple queries on the EUCAWOOD dataset.
In this work, we have identified a large set of wood-related Eucalyptus unigenes called EUCAWOOD, thus creating a valuable resource for functional genomics studies of wood formation and molecular breeding in this economically important genus. This set of publicly available annotated sequences will be instrumental for candidate gene approaches, custom array development and marker-assisted selection programs aimed at improving and modulating wood properties.
PMCID: PMC2670833  PMID: 19327132
18.  Comparative in silico analysis of EST-SSRs in angiosperm and gymnosperm tree genera 
BMC Plant Biology  2014;14(1):220.
Simple Sequence Repeats (SSRs) derived from Expressed Sequence Tags (ESTs) belong to the expressed fraction of the genome and are important for gene regulation, recombination, DNA replication, cell cycle and mismatch repair. Here, we present a comparative analysis of the SSR motif distribution in the 5′UTR, ORF and 3′UTR fractions of ESTs across selected genera of woody trees representing gymnosperms (17 species from seven genera) and angiosperms (40 species from eight genera).
Our analysis supports a modest contribution of EST-SSR length to genome size in gymnosperms, while EST-SSR density was not associated with genome size in neither angiosperms nor gymnosperms. Multiple factors seem to have contributed to the lower abundance of EST-SSRs in gymnosperms that has resulted in a non-linear relationship with genome size diversity. The AG/CT motif was found to be the most abundant in SSRs of both angiosperms and gymnosperms, with a relative increase in AT/AT in the latter. Our data also reveals a higher abundance of hexamers across the gymnosperm genera.
Our analysis provides the foundation for future comparative studies at the species level to unravel the evolutionary processes that control the SSR genesis and divergence between angiosperm and gymnosperm tree species.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0220-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4160553  PMID: 25143005
Angiosperms; Gymnosperms; Expressed sequence tags; Simple sequence repeats (SSR); Microsatellites
19.  Phylogenetic Study of Plant Q-type C2H2 Zinc Finger Proteins and Expression Analysis of Poplar Genes in Response to Osmotic, Cold and Mechanical Stresses 
Plant Q-type C2H2 zinc finger transcription factors play an important role in plant tolerance to various environmental stresses such as drought, cold, osmotic stress, wounding and mechanical loading. To carry out an improved analysis of the specific role of each member of this subfamily in response to mechanical loading in poplar, we identified 16 two-fingered Q-type C2H2-predicted proteins from the poplar Phytozome database and compared their phylogenetic relationships with 152 two-fingered Q-type C2H2 protein sequences belonging to more than 50 species isolated from the NR protein database of NCBI. Phylogenetic analyses of these Q-type C2H2 proteins sequences classified them into two groups G1 and G2, and conserved motif distributions of interest were established. These two groups differed essentially in their signatures at the C-terminus of their two QALGGH DNA-binding domains. Two additional conserved motifs, MALEAL and LVDCHY, were found only in sequences from Group G1 or from Group G2, respectively. Functional significance of these phylogenetic divergences was assessed by studying transcript accumulation of six poplar C2H2 Q-type genes in responses to abiotic stresses; but no group specificity was found in any organ. Further expression analyses focused on PtaZFP1 and PtaZFP2, the two genes strongly induced by mechanical loading in poplars. The results revealed that these two genes were regulated by several signalling molecules including hydrogen peroxide and the phytohormone jasmonate.
PMCID: PMC3077037  PMID: 21367962
C2H2; phylogenetic analysis; abiotic stress; mechanical loading
20.  The Poplar MYB Master Switches Bind to the SMRE Site and Activate the Secondary Wall Biosynthetic Program during Wood Formation 
PLoS ONE  2013;8(7):e69219.
Wood is mainly composed of secondary walls, which constitute the most abundant stored carbon produced by vascular plants. Understanding the molecular mechanisms controlling secondary wall deposition during wood formation is not only an important issue in plant biology but also critical for providing molecular tools to custom-design wood composition suited for diverse end uses. Past molecular and genetic studies have revealed a transcriptional network encompassing a group of wood-associated NAC and MYB transcription factors that are involved in the regulation of the secondary wall biosynthetic program during wood formation in poplar trees. Here, we report the functional characterization of poplar orthologs of MYB46 and MYB83 that are known to be master switches of secondary wall biosynthesis in Arabidopsis. In addition to the two previously-described PtrMYB3 and PtrMYB20, two other MYBs, PtrMYB2 and PtrMYB21, were shown to be MYB46/MYB83 orthologs by complementation and overexpression studies in Arabidopsis. The functional roles of these PtrMYBs in regulating secondary wall biosynthesis were further demonstrated in transgenic poplar plants showing an ectopic deposition of secondary walls in PtrMYB overexpressors and a reduction of secondary wall thickening in their dominant repressors. Furthermore, PtrMYB2/3/20/21 together with two other tree MYBs, the Eucalyptus EgMYB2 and the pine PtMYB4, were shown to differentially bind to and activate the eight variants of the 7-bp SMRE consensus sequence, composed of ACC(A/T)A(A/C)(T/C). Together, our results indicate that the tree MYBs, PtrMYB2/3/20/21, EgMYB2 and PtMYB4, are master transcriptional switches that activate the SMRE sites in the promoters of target genes and thereby regulate secondary wall biosynthesis during wood formation.
PMCID: PMC3726746  PMID: 23922694
21.  Diversification of the C-TERMINALLY ENCODED PEPTIDE (CEP) gene family in angiosperms, and evolution of plant-family specific CEP genes 
BMC Genomics  2014;15(1):870.
Small, secreted signaling peptides work in parallel with phytohormones to control important aspects of plant growth and development. Genes from the C-TERMINALLY ENCODED PEPTIDE (CEP) family produce such peptides which negatively regulate plant growth, especially under stress, and affect other important developmental processes. To illuminate how the CEP gene family has evolved within the plant kingdom, including its emergence, diversification and variation between lineages, a comprehensive survey was undertaken to identify and characterize CEP genes in 106 plant genomes.
Using a motif-based system developed for this study to identify canonical CEP peptide domains, a total of 916 CEP genes and 1,223 CEP domains were found in angiosperms and for the first time in gymnosperms. This defines a narrow band for the emergence of CEP genes in plants, from the divergence of lycophytes to the angiosperm/gymnosperm split. Both CEP genes and domains were found to have diversified in angiosperms, particularly in the Poaceae and Solanaceae plant families. Multispecies orthologous relationships were determined for 22% of identified CEP genes, and further analysis of those groups found selective constraints upon residues within the CEP peptide and within the previously little-characterized variable region. An examination of public Oryza sativa RNA-Seq datasets revealed an expression pattern that links OsCEP5 and OsCEP6 to panicle development and flowering, and CEP gene trees reveal these emerged from a duplication event associated with the Poaceae plant family.
The characterization of the plant-family specific CEP genes OsCEP5 and OsCEP6, the association of CEP genes with angiosperm-specific development processes like panicle development, and the diversification of CEP genes in angiosperms provides further support for the hypothesis that CEP genes have been integral to the evolution of novel traits within the angiosperm lineage. Beyond these findings, the comprehensive set of CEP genes and their properties reported here will be a resource for future research on CEP genes and peptides.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-870) contains supplementary material, which is available to authorized users.
PMCID: PMC4197245  PMID: 25287121
C-terminally encoded peptide; Gene family; Signaling peptides; GC-biased gene conversion; Panicle development; Orthology detection; Angiosperm evolution
22.  A Functional Phylogenomic View of the Seed Plants 
PLoS Genetics  2011;7(12):e1002411.
A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification.
Author Summary
Understanding the genetic and genomic basis of plant diversification has been a major goal of evolutionary biologists since Darwin first pondered his “abominable mystery,” the rapid diversification of the angiosperms in the fossil record. We develop and deploy a functional phylogenomic approach that helps identify genes and biological processes putatively involved in species diversification. We assembled a matrix of 22,833 orthologs from 150 species to reconstruct seed plant phylogenetic relationships and to identify gene sets with a unique evolutionary signal. Our analysis of overrepresented biological processes in these sets narrowed down possible genetic mechanisms underlying plant adaptation and diversification. The phylogenetic relationships we uncovered support the hypothesis that gnetophytes are closely related to the rest of the gymnosperms at the base of the living seed plants. We also found that genes involved in post-transcriptional silencing via RNA interference (RNAi)—increasingly important in understanding plant evolution—are significantly represented early in angiosperm and gymnosperm divergence, with an apparent loss of specific classes of small interfering RNAs (siRNA) in gymnosperms. Our functional phylogenomic approach can be applied to any taxa with available sequences to enhance our knowledge of the evolutionary processes underlying biodiversity in general.
PMCID: PMC3240601  PMID: 22194700
23.  Genome-wide analysis and expression profile of the bZIP transcription factor gene family in grapevine (Vitis vinifera) 
BMC Genomics  2014;15:281.
Basic leucine zipper (bZIP) transcription factor gene family is one of the largest and most diverse families in plants. Current studies have shown that the bZIP proteins regulate numerous growth and developmental processes and biotic and abiotic stress responses. Nonetheless, knowledge concerning the specific expression patterns and evolutionary history of plant bZIP family members remains very limited.
We identified 55 bZIP transcription factor-encoding genes in the grapevine (Vitis vinifera) genome, and divided them into 10 groups according to the phylogenetic relationship with those in Arabidopsis. The chromosome distribution and the collinearity analyses suggest that expansion of the grapevine bZIP (VvbZIP) transcription factor family was greatly contributed by the segment/chromosomal duplications, which may be associated with the grapevine genome fusion events. Nine intron/exon structural patterns within the bZIP domain and the additional conserved motifs were identified among all VvbZIP proteins, and showed a high group-specificity. The predicted specificities on DNA-binding domains indicated that some highly conserved amino acid residues exist across each major group in the tree of land plant life. The expression patterns of VvbZIP genes across the grapevine gene expression atlas, based on microarray technology, suggest that VvbZIP genes are involved in grapevine organ development, especially seed development. Expression analysis based on qRT-PCR indicated that VvbZIP genes are extensively involved in drought- and heat-responses, with possibly different mechanisms.
The genome-wide identification, chromosome organization, gene structures, evolutionary and expression analyses of grapevine bZIP genes provide an overall insight of this gene family and their potential involvement in growth, development and stress responses. This will facilitate further research on the bZIP gene family regarding their evolutionary history and biological functions.
PMCID: PMC4023599  PMID: 24725365
bZIP transcription factor family; Grapevine; Gene expression; Drought response; Heat stress response
24.  Genome-wide analysis of eukaryote thaumatin-like proteins (TLPs) with an emphasis on poplar 
BMC Plant Biology  2011;11:33.
Plant inducible immunity includes the accumulation of a set of defense proteins during infection called pathogenesis-related (PR) proteins, which are grouped into families termed PR-1 to PR-17. The PR-5 family is composed of thaumatin-like proteins (TLPs), which are responsive to biotic and abiotic stress and are widely studied in plants. TLPs were also recently discovered in fungi and animals. In the poplar genome, TLPs are over-represented compared with annual species and their transcripts strongly accumulate during stress conditions.
Our analysis of the poplar TLP family suggests that the expansion of this gene family was followed by diversification, as differences in expression patterns and predicted properties correlate with phylogeny. In particular, we identified a clade of poplar TLPs that cluster to a single 350 kb locus of chromosome I and that are up-regulated by poplar leaf rust infection. A wider phylogenetic analysis of eukaryote TLPs - including plant, animal and fungi sequences - shows that TLP gene content and diversity increased markedly during land plant evolution. Mapping the reported functions of characterized TLPs to the eukaryote phylogenetic tree showed that antifungal or glycan-lytic properties are widespread across eukaryote phylogeny, suggesting that these properties are shared by most TLPs and are likely associated with the presence of a conserved acidic cleft in their 3D structure. Also, we established an exhaustive catalog of TLPs with atypical architectures such as small-TLPs, TLP-kinases and small-TLP-kinases, which have potentially developed alternative functions (such as putative receptor kinases for pathogen sensing and signaling).
Our study, based on the most recent plant genome sequences, provides evidence for TLP gene family diversification during land plant evolution. We have shown that the diverse functions described for TLPs are not restricted to specific clades but seem to be universal among eukaryotes, with some exceptions likely attributable to atypical protein structures. In the perennial plant model Populus, we unravelled the TLPs likely involved in leaf rust resistance, which will provide the foundation for further functional investigations.
PMCID: PMC3048497  PMID: 21324123
25.  Insights from ANA-grade angiosperms into the early evolution of CUP-SHAPED COTYLEDON genes 
Annals of Botany  2011;107(9):1511-1519.
Background and Aims
The closely related NAC family genes NO APICAL MERISTEM (NAM) and CUP-SHAPED COTYLEDON3 (CUC3) regulate the formation of boundaries within and between plant organs. NAM is post-transcriptionally regulated by miR164, whereas CUC3 is not. To gain insight into the evolution of NAM and CUC3 in the angiosperms, we analysed orthologous genes in early-diverging ANA-grade angiosperms and gymnosperms.
We obtained NAM- and CUC3-like sequences from diverse angiosperms and gymnosperms by a combination of reverse transcriptase PCR, cDNA library screening and database searching, and then investigated their phylogenetic relationships by performing maximum-likelihood reconstructions. We also studied the spatial expression patterns of NAM, CUC3 and MIR164 orthologues in female reproductive tissues of Amborella trichopoda, the probable sister to all other flowering plants.
Key Results
Separate NAM and CUC3 orthologues were found in early-diverging angiosperms, but not in gymnosperms, which contained putative orthologues of the entire NAM + CUC3 clade that possessed sites of regulation by miR164. Multiple paralogues of NAM or CUC3 genes were noted in certain taxa, including Brassicaceae. Expression of NAM, CUC3 and MIR164 orthologues from Am. trichopoda was found to co-localize in ovules at the developmental boundary between the chalaza and nucellus.
The NAM and CUC3 lineages were generated by duplication, and CUC3 was subsequently lost regulation by miR164, prior to the last common ancestor of the extant angiosperms. However, the paralogous NAM clade genes CUC1 and CUC2 were generated by a more recent duplication, near the base of Brassicaceae. The function of NAM and CUC3 in defining a developmental boundary in the ovule appears to have been conserved since the last common ancestor of the flowering plants, as does the post-transcriptional regulation in ovule tissues of NAM by miR164.
PMCID: PMC3108802  PMID: 21320879
CUP-SHAPED COTYLEDON; CUC; NO APICAL MERISTEM; NAM; NAC; MIR164; Amborella trichopoda; Cabomba aquatica; Ginkgo biloba; angiosperm; gymnosperm

