The mechanical properties of wood are largely determined by the orientation of cellulose microfibrils in secondary cell walls. Several genes and their allelic variants have previously been found to affect microfibril angle (MFA) and wood stiffness; however, the molecular mechanisms controlling microfibril orientation and mechanical strength are largely uncharacterised. In the present study, cDNA microarrays were used to compare gene expression in developing xylem with contrasting stiffness and MFA in juvenile Pinus radiata trees in order to gain further insights into the molecular mechanisms underlying microfibril orientation and cell wall mechanics.
Juvenile radiata pine trees with higher stiffness (HS) had lower MFA in the earlywood and latewood of each ring compared to low stiffness (LS) trees. Approximately 3.4 to 14.5% out of 3, 320 xylem unigenes on cDNA microarrays were differentially regulated in juvenile wood with contrasting stiffness and MFA. Greater variation in MFA and stiffness was observed in earlywood compared to latewood, suggesting earlywood contributes most to differences in stiffness; however, 3-4 times more genes were differentially regulated in latewood than in earlywood. A total of 108 xylem unigenes were differentially regulated in juvenile wood with HS and LS in at least two seasons, including 43 unigenes with unknown functions. Many genes involved in cytoskeleton development and secondary wall formation (cellulose and lignin biosynthesis) were preferentially transcribed in wood with HS and low MFA. In contrast, several genes involved in cell division and primary wall synthesis were more abundantly transcribed in LS wood with high MFA.
Microarray expression profiles in Pinus radiata juvenile wood with contrasting stiffness has shed more light on the transcriptional control of microfibril orientation and the mechanical properties of wood. The identified candidate genes provide an invaluable resource for further gene function and association genetics studies aimed at deepening our understanding of cell wall biomechanics with a view to improving the mechanical properties of wood.
Renowned for their fast growth, valuable wood properties and wide adaptability, Eucalyptus species are amongst the most planted hardwoods in the world, yet they are still at the early stages of domestication because conventional breeding is slow and costly. Thus, there is huge potential for marker-assisted breeding programs to improve traits such as wood properties. To this end, the sequencing, analysis and annotation of a large collection of expressed sequences tags (ESTs) from genes involved in wood formation in Eucalyptus would provide a valuable resource.
We report here the normalization and sequencing of a cDNA library from developing Eucalyptus secondary xylem, as well as the construction and sequencing of two subtractive libraries (juvenile versus mature wood and vice versa). A total of 9,222 high quality sequences were collected from about 10,000 cDNA clones. The EST assembly generated a set of 3,857 wood-related unigenes including 2,461 contigs (Cg) and 1,396 singletons (Sg) that we named 'EUCAWOOD'. About 65% of the EUCAWOOD sequences produced matches with poplar, grapevine, Arabidopsis and rice protein sequence databases. BlastX searches of the Uniref100 protein database allowed us to allocate gene ontology (GO) and protein family terms to the EUCAWOOD unigenes. This annotation of the EUCAWOOD set revealed key functional categories involved in xylogenesis. For instance, 422 sequences matched various gene families involved in biosynthesis and assembly of primary and secondary cell walls. Interestingly, 141 sequences were annotated as transcription factors, some of them being orthologs of regulators known to be involved in xylogenesis. The EUCAWOOD dataset was also mined for genomic simple sequence repeat markers, yielding a total of 639 putative microsatellites. Finally, a publicly accessible database was created, supporting multiple queries on the EUCAWOOD dataset.
In this work, we have identified a large set of wood-related Eucalyptus unigenes called EUCAWOOD, thus creating a valuable resource for functional genomics studies of wood formation and molecular breeding in this economically important genus. This set of publicly available annotated sequences will be instrumental for candidate gene approaches, custom array development and marker-assisted selection programs aimed at improving and modulating wood properties.
Formation of compression (CW) and opposite wood (OW) in branches and bent trunks is an adaptive feature of conifer trees in response to various displacement forces, such as gravity, wind, snow and artificial bending. Several previous studies have characterized tracheids, wood and gene transcription in artificially or naturally bent conifer trunks. These studies have provided molecular basis of reaction wood formation in response to bending forces and gravity stimulus. However, little is known about reaction wood formation and gene transcription in conifer branches under gravity stress. In this study SilviScan® technology was used to characterize tracheid and wood traits in radiate pine (Pinus radiata D. Don) branches and genes differentially transcribed in CW and OW were investigated using cDNA microarrays.
CW drastically differed from OW in tracheids and wood traits with increased growth, thicker tracheid walls, larger microfibril angle (MFA), higher density and lower stiffness. However, CW and OW tracheids had similar diameters in either radial or tangential direction. Thus, gravity stress largely influenced wood growth, secondary wall deposition, cellulose microfibril orientation and wood properties, but had little impact on primary wall expansion. Microarray gene transcription revealed about 29% of the xylem transcriptomes were significantly altered in CW and OW sampled in both spring and autumn, providing molecular evidence for the drastic variation in tracheid and wood traits. Genes involved in cell division, cellulose biosynthesis, lignin deposition, and microtubules were mostly up-regulated in CW, conferring its greater growth, thicker tracheid walls, higher density, larger MFA and lower stiffness. However, genes with roles in cell expansion and primary wall formation were differentially transcribed in CW and OW, respectively, implicating their similar diameters of tracheid walls and different tracheid lengths. Interestingly, many genes related to hormone and calcium signalling as well as various environmental stresses were exclusively up-regulated in CW, providing important clues for earlier molecular signatures of reaction wood formation under gravity stimulus.
The first comprehensive investigation of tracheid characteristics, wood properties and gene transcription in branches of a conifer species revealed more accurate and new insights into reaction wood formation in response to gravity stress. The identified differentially transcribed genes with diverse functions conferred or implicated drastic CW and OW variation observed in radiata pine branches. These genes are excellent candidates for further researches on the molecular mechanisms of reaction wood formation with a view to plant gravitropism.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-768) contains supplementary material, which is available to authorized users.
Compression wood; Tracheid; Conifers; Transcriptome; Microarray; Plant gravitropism; Microfibril angle (MFA); Wood stiffness
Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics.
We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes) consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE).
We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.
Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases.
EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided.
The EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome.
Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.
As part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.
This paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.
There is a rapidly growing awareness that plant peptide signalling molecules are numerous and varied and they are known to play fundamental roles in angiosperm plant growth and development. Two closely related peptide signalling molecule families are the CLAVATA3-EMBRYO-SURROUNDING REGION (CLE) and CLE-LIKE (CLEL) genes, which encode precursors of secreted peptide ligands that have roles in meristem maintenance and root gravitropism. Progress in peptide signalling molecule research in gymnosperms has lagged behind that of angiosperms. We therefore sought to identify CLE and CLEL genes in gymnosperms and conduct a comparative analysis of these gene families with angiosperms.
We undertook a meta-analysis of the GenBank/EMBL/DDBJ gymnosperm EST database and the Picea abies and P. glauca genomes and identified 93 putative CLE genes and 11 CLEL genes among eight Pinophyta species, in the genera Cryptomeria, Pinus and Picea. The predicted conifer CLE and CLEL protein sequences had close phylogenetic relationships with their homologues in Arabidopsis. Notably, perfect conservation of the active CLE dodecapeptide in presumed orthologues of the Arabidopsis CLE41/44-TRACHEARY ELEMENT DIFFERENTIATION (TDIF) protein, an inhibitor of tracheary element (xylem) differentiation, was seen in all eight conifer species. We cloned the Pinus radiata CLE41/44-TDIF orthologues. These genes were preferentially expressed in phloem in planta as expected, but unexpectedly, also in differentiating tracheary element (TE) cultures. Surprisingly, transcript abundances of these TE differentiation-inhibitors sharply increased during early TE differentiation, suggesting that some cells differentiate into phloem cells in addition to TEs in these cultures. Applied CLE13 and CLE41/44 peptides inhibited root elongation in Pinus radiata seedlings. We show evidence that two CLEL genes are alternatively spliced via 3′-terminal acceptor exons encoding separate CLEL peptides.
The CLE and CLEL genes are found in conifers and they exhibit at least as much sequence diversity in these species as they do in other plant species. Only one CLE peptide sequence has been 100% conserved between gymnosperms and angiosperms over 300 million years of evolutionary history, the CLE41/44-TDIF peptide and its likely conifer orthologues. The preferential expression of these vascular development-regulating genes in phloem in conifers, as they are in dicot species, suggests close parallels in the regulation of secondary growth and wood formation in gymnosperm and dicot plants. Based on our bioinformatic analysis, we predict a novel mechanism of regulation of the expression of several conifer CLEL peptides, via alternative splicing resulting in the selection of alternative C-terminal exons encoding separate CLEL peptides.
CLE peptide ligands; CLEL peptide ligands; Pinophyta; Conifers; Phylogenetic analysis; Pine tracheary element system
Wood quality can be defined in terms of particular end use with the involvement of several traits. Over the last fifteen years
researchers have assessed the wood quality traits in forest trees. The wood quality was categorized as: cell wall biochemical traits,
fibre properties include the microfibril angle, density and stiffness in loblolly pine . The user friendly and an open-access
database has been developed named Wood Gene Database (WGDB) for describing the wood genes along the information of
protein and published research articles. It contains 720 wood genes from species namely Pinus, Deodar, fast growing trees namely
Poplar, Eucalyptus. WGDB designed to encompass the majority of publicly accessible genes codes for cellulose, hemicellulose and
lignin in tree species which are responsive to wood formation and quality. It is an interactive platform for collecting, managing and
searching the specific wood genes; it also enables the data mining relate to the genomic information specifically in Arabidopsis
thaliana, Populus trichocarpa, Eucalyptus grandis, Pinus taeda, Pinus radiata, Cedrus deodara, Cedrus atlantica. For user convenience, this
database is cross linked with public databases namely NCBI, EMBL & Dendrome with the search engine Google for making it more
informative and provides bioinformatics tools named BLAST,COBALT.
The database is freely available on www.wgdb.in
Wood; Cellulose; Pinus; Cedrus; Poplar; Eucalyptus
The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity.
We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html.
This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.
Pine wilt disease (PWD) caused by pine wood nematode (PWN), Bursaphelenchus xylophilus, is the most destructive diseases of pine and poses a threat of serious economic losses worldwide. Although several of the mechanisms involved in disease progression have been discovered, the molecular response of Pinus massoniana to PWN infection has not been explored. We constructed four subtractive suppression hybridization cDNA libraries by taking time-course samples from PWN-inoculated Masson pine trees. One-hundred forty-four significantly differentially expressed sequence tags (ESTs) were identified, and 124 high-quality sequences with transcriptional features were selected for gene ontology (GO) and individual gene analyses. There were marked differences in the types of transcripts, as well as in the timing and levels of transcript expression in the pine trees following PWN inoculation. Genes involved in signal transduction, transcription and translation and secondary metabolism were highly expressed after 24 h and 72 h, while stress response genes were highly expressed only after 72 h. Certain transcripts responding to PWN infection were discriminative; pathogenesis and cell wall-related genes were more abundant, while detoxification or redox process-related genes were less abundant. This study provides new insights into the molecular mechanisms that control the biochemical and physiological responses of pine trees to PWN infection, particularly during the initial stage of infection.
pine wilt disease; differentially expressed genes; suppression subtractive hybridization; Pinus massoniana
Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination.
We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species.
The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546.
Our understanding of the contribution of Golgi proteins to cell wall and wood formation in any woody plant species is limited. Currently, little Golgi proteomics data exists for wood-forming tissues. In this study, we attempted to address this issue by generating and analyzing Golgi-enriched membrane preparations from developing xylem of compression wood from the conifer Pinus radiata. Developing xylem samples from 3-year-old pine trees were harvested for this purpose at a time of active growth and subjected to a combination of density centrifugation followed by free flow electrophoresis, a surface charge separation technique used in the enrichment of Golgi membranes. This combination of techniques was successful in achieving an approximately 200-fold increase in the activity of the Golgi marker galactan synthase and represents a significant improvement for proteomic analyses of the Golgi from conifers. A total of thirty known Golgi proteins were identified by mass spectrometry including glycosyltransferases from gene families involved in glucomannan and glucuronoxylan biosynthesis. The free flow electrophoresis fractions of enriched Golgi were highly abundant in structural proteins (actin and tubulin) indicating a role for the cytoskeleton during compression wood formation. The mass spectrometry proteomics data associated with this study have been deposited to the ProteomeXchange with identifier PXD000557.
Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of SSR markers from a variety of sources and published cDNA markers into a composite P. taeda genetic map constructed from two reference mapping pedigrees. A dense genetic map that incorporates SSR loci will benefit complete pine genome sequencing, pine population genetics studies, and pine breeding programs. Careful marker annotation using a variety of references further enhances the utility of the integrated SSR map.
The updated P. taeda genetic map, with an estimated genome coverage of 1,515 cM(Kosambi) across 12 linkage groups, incorporated 170 new SSR markers and 290 previously reported SSR, RFLP, and ESTP markers. The average marker interval was 3.1 cM. Of 233 mapped SSR loci, 84 were from cDNA-derived sequences (EST-SSRs) and 149 were from non-transcribed genomic sequences (genomic-SSRs). Of all 311 mapped cDNA-derived markers, 77% were associated with NCBI Pta UniGene clusters, 67% with RefSeq proteins, and 62% with functional Gene Ontology (GO) terms. Duplicate (i.e., redundant accessory) and paralogous markers were tentatively identified by evaluating marker sequences by their UniGene cluster IDs, clone IDs, and relative map positions. The average gene diversity, He, among polymorphic SSR loci, including those that were not mapped, was 0.43 for 94 EST-SSRs and 0.72 for 83 genomic-SSRs. The genetic map can be viewed and queried at http://www.conifergdb.org/pinemap.
Many polymorphic and genetically mapped SSR markers are now available for use in P. taeda population genetics, studies of adaptive traits, and various germplasm management applications. Annotating mapped genes with UniGene clusters and GO terms allowed assessment of redundant and paralogous EST markers and further improved the quality and utility of the genetic map for P. taeda.
The sequencing and analysis of ESTs is for now the only practical approach for large-scale gene discovery and annotation in conifers because their very large genomes are unlikely to be sequenced in the near future. Our objective was to produce extensive collections of ESTs and cDNA clones to support manufacture of cDNA microarrays and gene discovery in white spruce (Picea glauca [Moench] Voss).
We produced 16 cDNA libraries from different tissues and a variety of treatments, and partially sequenced 50,000 cDNA clones. High quality 3' and 5' reads were assembled into 16,578 consensus sequences, 45% of which represented full length inserts. Consensus sequences derived from 5' and 3' reads of the same cDNA clone were linked to define 14,471 transcripts. A large proportion (84%) of the spruce sequences matched a pine sequence, but only 68% of the spruce transcripts had homologs in Arabidopsis or rice. Nearly all the sequences that matched the Populus trichocarpa genome (the only sequenced tree genome) also matched rice or Arabidopsis genomes. We used several sequence similarity search approaches for assignment of putative functions, including blast searches against general and specialized databases (transcription factors, cell wall related proteins), Gene Ontology term assignation and Hidden Markov Model searches against PFAM protein families and domains. In total, 70% of the spruce transcripts displayed matches to proteins of known or unknown function in the Uniref100 database (blastx e-value < 1e-10). We identified multigenic families that appeared larger in spruce than in the Arabidopsis or rice genomes. Detailed analysis of translationally controlled tumour proteins and S-adenosylmethionine synthetase families confirmed a twofold size difference. Sequences and annotations were organized in a dedicated database, SpruceDB. Several search tools were developed to mine the data either based on their occurrence in the cDNA libraries or on functional annotations.
This report illustrates specific approaches for large-scale gene discovery and annotation in an organism that is very distantly related to any of the fully sequenced genomes. The ArboreaSet sequences and cDNA clones represent a valuable resource for investigations ranging from plant comparative genomics to applied conifer genetics.
Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs).
A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay.
The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.
Wood is a valuable natural resource and a major carbon sink. Wood formation is an important developmental process in vascular plants which played a crucial role in plant evolution. Although genes involved in xylem formation have been investigated, the molecular mechanisms of xylem evolution are not well understood. We use comparative genomics to examine evolution of the xylem transcriptome to gain insights into xylem evolution.
The xylem transcriptome is highly conserved in conifers, but considerably divergent in angiosperms. The functional domains of genes in the xylem transcriptome are moderately to highly conserved in vascular plants, suggesting the existence of a common ancestral xylem transcriptome. Compared to the total transcriptome derived from a range of tissues, the xylem transcriptome is relatively conserved in vascular plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known proteins and transcription factors are relatively more conserved in vascular plants. A total of 527 putative xylem orthologs were identified, which are unevenly distributed across the Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis revealed that evolution of the xylem transcriptome has paralleled plant evolution. We also identified 274 conifer-specific xylem unigenes, all of which are of unknown function. These xylem orthologs and conifer-specific unigenes are likely to have played a crucial role in xylem evolution.
Conifers have highly conserved xylem transcriptomes, while angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a common ancestral xylem transcriptome. The xylem transcriptomes of vascular plants are more conserved than the total transcriptomes. Evolution of the xylem transcriptome has largely followed the trend of plant evolution.
Optimal defense theory (ODT) predicts that the within-plant quantitative allocation of defenses is not random, but driven by the potential relative contribution of particular plant tissues to overall fitness. These predictions have been poorly tested on long-lived woody plants. We explored the allocation of constitutive and methyl-jasmonate (MJ) inducible chemical defenses in six half-sib families of Pinus radiata juveniles. Specifically, we studied the quantitative allocation of resin and polyphenolics (the two major secondary chemicals in pine trees) to tissues with contrasting fitness value (stem phloem, stem xylem and needles) across three parts of the plants (basal, middle and apical upper part), using nitrogen concentration as a proxy of tissue value. Concentration of nitrogen in the phloem, xylem and needles was found to be greater higher up the plant. As predicted by the ODT, the same pattern was found for the concentration of non-volatile resin in the stem. However, in leaf tissues the concentrations of both resin and total phenolics were greater towards the base of the plant. Two weeks after MJ application, the concentrations of nitrogen in the phloem, resin in the stem and total phenolics in the needles increased by roughly 25% compared with the control plants, inducibility was similar across all plant parts, and families differed in the inducibility of resin compounds in the stem. In contrast, no significant changes were observed either for phenolics in the stems, or for resin in the needles after MJ application. Concentration of resin in the phloem was double that in the xylem and MJ-inducible, with inducibility being greater towards the base of the stem. In contrast, resin in the xylem was not MJ-inducible and increased in concentration higher up the plant. The pattern of inducibility by MJ-signaling in juvenile P. radiata is tissue, chemical-defense and plant-part specific, and is genetically variable.
Cotton fiber is the world's leading natural fiber used in the manufacture of textiles. Gossypium is also the model plant in the study of polyploidization, evolution, cell elongation, cell wall development, and cellulose biosynthesis. G. barbadense L. is an ideal candidate for providing new genetic variations useful to improve fiber quality for its superior properties. However, little is known about fiber development mechanisms of G. barbadense and only a few molecular resources are available in GenBank.
Methodology and Principal Findings
In total, 10,979 high-quality expressed sequence tags (ESTs) were generated from a normalized fiber cDNA library of G. barbadense. The ESTs were clustered and assembled into 5852 unigenes, consisting of 1492 contigs and 4360 singletons. The blastx result showed 2165 unigenes with significant similarity to known genes and 2687 unigenes with significant similarity to genes of predicted proteins. Functional classification revealed that unigenes were abundant in the functions of binding, catalytic activity, and metabolic pathways of carbohydrate, amino acid, energy, and lipids. The function motif/domain-related cytoskeleton and redox homeostasis were enriched. Among the 5852 unigenes, 282 and 736 unigenes were identified as potential cell wall biosynthesis and transcription factors, respectively. Furthermore, the relationships among cotton species or between cotton and other model plant systems were analyzed. Some putative species-specific unigenes of G. barbadense were highlighted.
The ESTs generated in this study are from the first large-scale EST project for G. barbadense and significantly enhance the number of G. barbadense ESTs in public databases. This knowledge will contribute to cotton improvements by studying fiber development mechanisms of G. barbadense, establishing a breeding program using marker-assisted selection, and discovering candidate genes related to important agronomic traits of cotton through oligonucleotide array. Our work will also provide important resources for comparative genomics, polyploidization, and genome evolution among Gossypium species.
Expressed sequence tag (EST) databases represent a valuable resource for the identification of genes in organisms with uncharacterized genomes and for development of molecular markers. One class of markers derived from EST sequences are simple sequence repeat (SSR) markers, also known as EST-SSRs. These are useful in plant genetic and evolutionary studies because they are located in transcribed genes and a putative function can often be inferred from homology searches. Another important feature of EST-SSR markers is their expected high level of transferability to related species that makes them very promising for comparative mapping. In the present study we constructed a normalized EST library from floral tissue of Silene latifolia with the aim to identify expressed genes and to develop polymorphic molecular markers.
We obtained a total of 3662 high quality sequences from a normalized Silene cDNA library. These represent 3105 unigenes, with 73% of unigenes matching genes in other species. We found 255 sequences containing one or more SSR motifs. More than 60% of these SSRs were trinucleotides. A total of 30 microsatellite loci were identified from 106 ESTs having sufficient flanking sequences for primer design. The inheritance of these loci was tested via segregation analyses and their usefulness for linkage mapping was assessed in an interspecific cross. Tests for crossamplification of the EST-SSR loci in other Silene species established their applicability to related species.
The newly characterized genes and gene-derived markers from our Silene EST library represent a valuable genetic resource for future studies on Silene latifolia and related species. The polymorphism and transferability of EST-SSR markers facilitate comparative linkage mapping and analyses of genetic diversity in the genus Silene.
There is no dedicated database available for Expressed Sequence Tags (EST) of the chili pepper (Capsicum annuum), although the interest in a chili pepper EST database is increasing internationally due to the nutritional, economic, and pharmaceutical value of the plant. Recent advances in high-throughput sequencing of the ESTs of chili pepper cv. Bukang have produced hundreds of thousands of complementary DNA (cDNA) sequences. Therefore, a chili pepper EST database was designed and constructed to enable comprehensive analysis of chili pepper gene expression in response to biotic and abiotic stresses.
We built the Pepper EST database to mine the complexity of chili pepper ESTs. The database was built on 122,582 sequenced ESTs and 116,412 refined ESTs from 21 pepper EST libraries. The ESTs were clustered and assembled into virtual consensus cDNAs and the cDNAs were assigned to metabolic pathway, Gene Ontology (GO), and MIPS Functional Catalogue (FunCat). The Pepper EST database is designed to provide a workbench for (i) identifying unigenes in pepper plants, (ii) analyzing expression patterns in different developmental tissues and under conditions of stress, and (iii) comparing the ESTs with those of other members of the Solanaceae family. The Pepper EST database is freely available at .
The Pepper EST database is expected to provide a high-quality resource, which will contribute to gaining a systemic understanding of plant diseases and facilitate genetics-based population studies. The database is also expected to contribute to analysis of gene synteny as part of the chili pepper sequencing project by mapping ESTs to the genome.
Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao.
Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.
Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.
A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.
To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.
A large collection of new genetic markers was provided by this ESTs collection.
This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation.
Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms.
We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction.
Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco.
Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited.
We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences.
The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.
Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction.
After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned or filtered by AFST.
cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
cDNA terminus; cDNA library construction; Pattern analysis; Restriction enzyme cutting abnormality; Chimeric EST sequences
The mountain pine beetle (MPB, Dendroctonus ponderosae) epidemic has affected lodgepole pine (Pinus contorta) across an area of more than 18 million hectares of pine forests in western Canada, and is a threat to the boreal jack pine (Pinus banksiana) forest. Defence of pines against MPB and associated fungal pathogens, as well as other pests, involves oleoresin monoterpenes, which are biosynthesized by families of terpene synthases (TPSs). Volatile monoterpenes also serve as host recognition cues for MPB and as precursors for MPB pheromones. The genes responsible for terpene biosynthesis in jack pine and lodgepole pine were previously unknown.
We report the generation and quality assessment of assembled transcriptome resources for lodgepole pine and jack pine using Sanger, Roche 454, and Illumina sequencing technologies. Assemblies revealed transcripts for approximately 20,000 - 30,000 genes from each species and assembly analyses led to the identification of candidate full-length prenyl transferase, TPS, and P450 genes of oleoresin biosynthesis. We cloned and functionally characterized, via expression of recombinant proteins in E. coli, nine different jack pine and eight different lodgepole pine mono-TPSs. The newly identified lodgepole pine and jack pine mono-TPSs include (+)-α-pinene synthases, (-)-α-pinene synthases, (-)-β-pinene synthases, (+)-3-carene synthases, and (-)-β-phellandrene synthases from each of the two species.
In the absence of genome sequences, transcriptome assemblies are important for defence gene discovery in lodgepole pine and jack pine, as demonstrated here for the terpenoid pathway genes. The product profiles of the functionally annotated mono-TPSs described here can account for the major monoterpene metabolites identified in lodgepole pine and jack pine.
Conifer defence; Pine oleoresin; Terpenoid biosynthesis; Metabolite profile; Prenyl transferase; Cytochrome P450; Conifer genome