The complete genomic sequence of a new tobamovirus in tomatoes was determined through deep sequencing and assembly of small RNAs, then validated through Sanger sequencing. Based on the low sequence identity (≤85%) to known viruses and a close phylogenetic relationship to tobamoviruses, it was identified as a new species.
MicroRNAs play an important role in plant development and plant responses to various biotic and abiotic stimuli. As one of the most important ornamental crops, rose (Rosa hybrida) possesses several specific morphological and physiological features, including recurrent flowering, highly divergent flower shapes, colors and volatiles. Ethylene plays an important role in regulating petal cell expansion during rose flower opening. Here, we report the population and expression profiles of miRNAs in rose petals during flower opening and in response to ethylene based on high throughput sequencing. We identified a total of 33 conserved miRNAs, as well as 47 putative novel miRNAs were identified from rose petals. The conserved and novel targets to those miRNAs were predicted using the rose floral transcriptome database. Expression profiling revealed that expression of 28 known (84.8% of known miRNAs) and 39 novel (83.0% of novel miRNAs) miRNAs was substantially changed in rose petals during the earlier opening period. We also found that 28 known and 22 novel miRNAs showed expression changes in response to ethylene treatment. Furthermore, we performed integrative analysis of expression profiles of miRNAs and their targets. We found that ethylene-caused expression changes of five miRNAs (miR156, miR164, miR166, miR5139 and rhy-miRC1) were inversely correlated to those of their seven target genes. These results indicate that these miRNA/target modules might be regulated by ethylene and were involved in ethylene-regulated petal growth.
The SBP-box gene family is specific to plants and encodes a class of zinc finger-containing transcription factors with a broad range of functions. Although SBP-box genes have been identified in numerous plants including green algae, moss, silver birch, snapdragon, Arabidopsis, rice and maize, there is little information concerning SBP-box genes, or the corresponding miR156/157, function in grapevine.
Eighteen SBP-box gene family members were identified in Vitis vinifera, twelve of which bore sequences that were complementary to miRNA156/157. Phylogenetic reconstruction demonstrated that plant SBP-domain proteins could be classified into seven subgroups, with the V. vinifera SBP-domain proteins being more closely related to SBP-domain proteins from dicotyledonous angiosperms than those from monocotyledonous angiosperms. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologs of several grape SBP genes were found in corresponding syntenic blocks of Arabidopsis. Expression analysis of the grape SBP-box genes in various organs and at different stages of fruit development in V. quinquangularis ‘Shang-24’ revealed distinct spatiotemporal patterns. While the majority of the grape SBP-box genes lacking a miR156/157 target site were expressed ubiquitously and constitutively, most genes bearing a miR156/157 target site exhibited distinct expression patterns, possibly due to the inhibitory role of the microRNA. Furthermore, microarray data mining and quantitative real-time RT-PCR analysis identified several grape SBP-box genes that are potentially involved in the defense against biotic and abiotic stresses.
The results presented here provide a further understanding of SBP-box gene function in plants, and yields additional insights into the mechanism of stress management in grape, which may have important implications for the future success of this crop.
Chromoplasts are unique plastids that accumulate massive amounts of carotenoids. To gain a general and comparative characterization of chromoplast proteins, this study performed proteomic analysis of chromoplasts from six carotenoid-rich crops: watermelon, tomato, carrot, orange cauliflower, red papaya, and red bell pepper. Stromal and membrane proteins of chromoplasts were separated by 1D gel electrophoresis and analysed using nLC-MS/MS. A total of 953–2262 proteins from chromoplasts of different crop species were identified. Approximately 60% of the identified proteins were predicted to be plastid localized. Functional classification using MapMan bins revealed large numbers of proteins involved in protein metabolism, transport, amino acid metabolism, lipid metabolism, and redox in chromoplasts from all six species. Seventeen core carotenoid metabolic enzymes were identified. Phytoene synthase, phytoene desaturase, ζ-carotene desaturase, 9-cis-epoxycarotenoid dioxygenase, and carotenoid cleavage dioxygenase 1 were found in almost all crops, suggesting relative abundance of them among the carotenoid pathway enzymes. Chromoplasts from different crops contained abundant amounts of ATP synthase and adenine nucleotide translocator, which indicates an important role of ATP production and transport in chromoplast development. Distinctive abundant proteins were observed in chromoplast from different crops, including capsanthin/capsorubin synthase and fibrillins in pepper, superoxide dismutase in watermelon, carrot, and cauliflower, and glutathione-S-transferease in papaya. The comparative analysis of chromoplast proteins among six crop species offers new insights into the general metabolism and function of chromoplasts as well as the uniqueness of chromoplasts in specific crop species. This work provides reference datasets for future experimental study of chromoplast biogenesis, development, and regulation in plants.
Carrot; cauliflower; chromoplast; papaya; pepper; proteomics; tomato; watermelon
Next generation DNA sequencing technologies are driving increasingly rapid, affordable and high resolution analyses of plant transcriptomes through sequencing of their associated cDNA (complementary DNA) populations; an analytical platform commonly referred to as RNA-sequencing (RNA-seq). Since entering the arena of whole genome profiling technologies only a few years ago, RNA-seq has proven itself to be a powerful tool with a remarkably diverse range of applications, from detailed studies of biological processes at the cell type-specific level, to providing insights into fundamental questions in plant biology on an evolutionary time scale. Applications include generating genomic data for heretofore unsequenced species, thus expanding the boundaries of what had been considered “model organisms,” elucidating structural and regulatory gene networks, revealing how plants respond to developmental cues and their environment, allowing a better understanding of the relationships between genes and their products, and uniting the “omics” fields of transcriptomics, proteomics, and metabolomics into a now common systems biology paradigm. We provide an overview of the breadth of such studies and summarize the range of RNA-seq protocols that have been developed to address questions spanning cell type-specific-based transcriptomics, transcript secondary structure and gene mapping.
RNA-seq; plant transcriptome; transcriptomics; systems biology; next generation sequencing
Deep sequencing is a powerful tool for novel small RNA discovery. Illumina small RNA sequencing library preparation requires a pre-adenylated 3’ end adapter containing a 5’,5’-adenyl pyrophosphoryl moiety. In the absence of ATP, this adapter can be ligated to the 3’ hydroxyl group of small RNA, while RNA self-ligation and concatenation are repressed. Pre-adenylated adapters are one of the most essential and costly components required for library preparation, and few are commercially available.
We demonstrate that DNA oligo with 5’ phosphate and 3’ amine groups can be enzymatically adenylated by T4 RNA ligase 1 to generate customized pre-adenylated adapters. We have constructed and sequenced a small RNA library for tomato (Solanum lycopersicum) using the T4 RNA ligase 1 adenylated adapter.
We provide an efficient and low-cost method for small RNA sequencing library preparation, which takes two days to complete and costs around $20 per library. This protocol has been tested in several plant species for small RNA sequencing including sweet potato, pepper, watermelon, and cowpea, and could be readily applied to any RNA samples.
Small RNA sequencing; Directional mRNA sequencing; 3’ RNA adapter; Adenylation; T4 RNA ligase 1
The TIFY gene family constitutes a plant-specific group of genes with a broad range of functions. This family encodes four subfamilies of proteins, including ZML, TIFY, PPD and JASMONATE ZIM-Domain (JAZ) proteins. JAZ proteins are targets of the SCFCOI1 complex, and function as negative regulators in the JA signaling pathway. Recently, it has been reported in both Arabidopsis and rice that TIFY genes, and especially JAZ genes, may be involved in plant defense against insect feeding, wounding, pathogens and abiotic stresses. Nonetheless, knowledge concerning the specific expression patterns and evolutionary history of plant TIFY family members is limited, especially in a woody species such as grape.
A total of two TIFY, four ZML, two PPD and 11 JAZ genes were identified in the Vitis vinifera genome. Phylogenetic analysis of TIFY protein sequences from grape, Arabidopsis and rice indicated that the grape TIFY proteins are more closely related to those of Arabidopsis than those of rice. Both segmental and tandem duplication events have been major contributors to the expansion of the grape TIFY family. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologues of several grape TIFY genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of lineages that led to grape and Arabidopsis. Analyses of microarray and quantitative real-time RT-PCR expression data revealed that grape TIFY genes are not a major player in the defense against biotrophic pathogens or viruses. However, many of these genes were responsive to JA and ABA, but not SA or ET.
The genome-wide identification, evolutionary and expression analyses of grape TIFY genes should facilitate further research of this gene family and provide new insights regarding their evolutionary history and regulatory control.
Small RNAs (sRNA), including microRNAs (miRNA) and small interfering RNAs (siRNA), are produced abundantly in plants and animals and function in regulating gene expression or in defense against virus or viroid infection. Analysis of siRNA profiles upon virus infection in plant may allow for virus identification, strain differentiation, and de novo assembly of virus genomes. In the present study, four suspected virus-infected tomato samples collected in the U.S. and Mexico were used for sRNA library construction and deep sequencing. Each library generated between 5–7 million sRNA reads, of which more than 90% were from the tomato genome. Upon in-silico subtraction of the tomato sRNAs, the remaining highly enriched, virus-like siRNA pools were assembled with or without reference virus or viroid genomes. A complete genome was assembled for Potato spindle tuber viroid (PSTVd) using siRNA alone. In addition, a near complete virus genome (98%) also was assembled for Pepino mosaic virus (PepMV). A common mixed infection of two strains of PepMV (EU and US1), which shared 82% of genome nucleotide sequence identity, also could be differentially assembled into their respective genomes. Using de novo assembly, a novel potyvirus with less than 60% overall genome nucleotide sequence identity to other known viruses was discovered and its full genome sequence obtained. Taken together, these data suggest that the sRNA deep sequencing technology will likely become an efficient and powerful generic tool for virus identification in plants and animals.
The completion of the grape genome sequencing project has paved the way for novel gene discovery and functional analysis. Aldehyde dehydrogenases (ALDHs) comprise a gene superfamily encoding NAD(P)+-dependent enzymes that catalyze the irreversible oxidation of a wide range of endogenous and exogenous aromatic and aliphatic aldehydes. Although ALDHs have been systematically investigated in several plant species including Arabidopsis and rice, our knowledge concerning the ALDH genes, their evolutionary relationship and expression patterns in grape has been limited.
A total of 23 ALDH genes were identified in the grape genome and grouped into ten families according to the unified nomenclature system developed by the ALDH Gene Nomenclature Committee (AGNC). Members within the same grape ALDH families possess nearly identical exon-intron structures. Evolutionary analysis indicates that both segmental and tandem duplication events have contributed significantly to the expansion of grape ALDH genes. Phylogenetic analysis of ALDH protein sequences from seven plant species indicates that grape ALDHs are more closely related to those of Arabidopsis. In addition, synteny analysis between grape and Arabidopsis shows that homologs of a number of grape ALDHs are found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the speciation of the grape and Arabidopsis. Microarray gene expression analysis revealed large number of grape ALDH genes responsive to drought or salt stress. Furthermore, we found a number of ALDH genes showed significantly changed expressions in responses to infection with different pathogens and during grape berry development, suggesting novel roles of ALDH genes in plant-pathogen interactions and berry development.
The genome-wide identification, evolutionary and expression analysis of grape ALDH genes should facilitate research in this gene family and provide new insights regarding their evolution history and functional roles in plant stress tolerance.
Powdery mildew (PM), caused by fungus Erysiphe necator, is one of the most devastating diseases of grapevine. To better understand grapevine-PM interaction and provide candidate resources for grapevine breeding, a suppression subtractive hybridization (SSH) cDNA library was constructed from E. necator-infected leaves of a resistant Chinese wild Vitis quinquangularis clone “Shang-24”. A total of 492 high quality expressed sequence tags (ESTs) were obtained and assembled into 266 unigenes. Gene ontology (GO) analysis indicated that 188 unigenes could be assigned with at least one GO term in the biological process category, and 176 in the molecular function category. Sequence analysis showed that a large number of these genes were homologous to those involved in defense responses. Genes involved in metabolism, photosynthesis, transport and signal transduction were also enriched in the library. Expression analysis of 13 selected genes by qRT-PCR revealed that most were induced more quickly and intensely in the resistant material “Shang-24” than in the sensitive V. pseudoreticulata clone “Hunan-1” by E. necator infection. The ESTs reported here provide new clues to understand the disease-resistance mechanism in Chinese wild grapevine species and may enable us to investigate E. necator-responsive genes involved in PM resistance in grapevine germplasm.
Chinese wild Vitis quinquangularis; Erysiphe necator; SSH; EST; qRT-PCR
Noncoding RNAs (ncRNA) are widely expressed in both prokaryotes and eukaryotes. Eukaryotic ncRNAs are commonly micro- and small-interfering RNAs (18–25 nt) involved in posttranscriptional gene silencing, whereas prokaryotic ncRNAs vary in size and are involved in various aspects of gene regulation. Given the prokaryotic origin of organelles, the presence of ncRNAs might be expected; however, the full spectrum of organellar ncRNAs has not been determined systematically. Here, strand-specific RNA-Seq analysis was used to identify 107 candidate ncRNAs from Arabidopsis thaliana chloroplasts, primarily encoded opposite protein-coding and tRNA genes. Forty-eight ncRNAs were shown to accumulate by RNA gel blot as discrete transcripts in wild-type (WT) plants and/or the pnp1-1 mutant, which lacks the chloroplast ribonuclease polynucleotide phosphorylase (cpPNPase). Ninety-eight percent of the ncRNAs detected by RNA gel blot had different transcript patterns between WT and pnp1-1, suggesting cpPNPase has a significant role in chloroplast ncRNA biogenesis and accumulation. Analysis of materials deficient for other major chloroplast ribonucleases, RNase R, RNase E, and RNase J, showed differential effects on ncRNA accumulation and/or form, suggesting specificity in RNase-ncRNA interactions. 5′ end mapping demonstrates that some ncRNAs are transcribed from dedicated promoters, whereas others result from transcriptional read-through. Finally, correlations between accumulation of some ncRNAs and the symmetrically transcribed sense RNA are consistent with a role in RNA stability. Overall, our data suggest that this extensive population of ncRNAs has the potential to underpin a previously underappreciated regulatory mode in the chloroplast.
RNA-Seq; posttranscriptional regulation; transcription; organelle; plastid
Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies.
We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology.
We compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.
Chloroplasts are the green plastids where photosynthesis takes place. The biogenesis of chloroplasts requires the coordinate expression of both nuclear and chloroplast genes and is regulated by developmental and environmental signals. Despite extensive studies of this process, the genetic basis and the regulatory control of chloroplast biogenesis and development remain to be elucidated.
Green cauliflower mutant causes ectopic development of chloroplasts in the curd tissue of the plant, turning the otherwise white curd green. To investigate the transcriptional control of chloroplast development, we compared gene expression between green and white curds using the RNA-seq approach. Deep sequencing produced over 15 million reads with lengths of 86 base pairs from each cDNA library. A total of 7,155 genes were found to exhibit at least 3-fold changes in expression between green and white curds. These included light-regulated genes, genes encoding chloroplast constituents, and genes involved in chlorophyll biosynthesis. Moreover, we discovered that the cauliflower ELONGATED HYPOCOTYL5 (BoHY5) was expressed higher in green curds than white curds and that 2616 HY5-targeted genes, including 1600 up-regulated genes and 1016 down-regulated genes, were differently expressed in green in comparison to white curd tissue. All these 1600 up-regulated genes were HY5-targeted genes in the light.
The genome-wide profiling of gene expression by RNA-seq in green curds led to the identification of large numbers of genes associated with chloroplast development, and suggested the role of regulatory genes in the high hierarchy of light signaling pathways in mediating the ectopic chloroplast development in the green curd cauliflower mutant.
As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.
The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/.
We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.
Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues.
We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development.
We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.
Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited.
We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences.
The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.
Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-Methylcyclopropene.
To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated.
The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies.
The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening.
Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species.
Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu.
To unravel the molecular mechanisms of drought responses in tomato, gene expression profiles of two drought-tolerant lines identified from a population of Solanum pennellii introgression lines, and the recurrent parent S. lycopersicum cv. M82, a drought-sensitive cultivar, were investigated under drought stress using tomato microarrays. Around 400 genes identified were responsive to drought stress only in the drought-tolerant lines. These changes in genes expression are most likely caused by the two inserted chromosome segments of S. pennellii, which possibly contain drought-tolerance quantitative trait loci (QTLs). Among these genes are a number of transcription factors and signalling proteins which could be global regulators involved in the tomato responses to drought stress. Genes involved in organism growth and development processes were also specifically regulated by drought stress, including those controlling cell wall structure, wax biosynthesis, and plant height. Moreover, key enzymes in the pathways of gluconeogenesis (fructose-bisphosphate aldolase), purine and pyrimidine nucleotide biosynthesis (adenylate kinase), tryptophan degradation (aldehyde oxidase), starch degradation (β-amylase), methionine biosynthesis (cystathionine β-lyase), and the removal of superoxide radicals (catalase) were also specifically affected by drought stress. These results indicated that tomato plants could adapt to water-deficit conditions through decreasing energy dissipation, increasing ATP energy provision, and reducing oxidative damage. The drought-responsive genes identified in this study could provide further information for understanding the mechanisms of drought tolerance in tomato.
Drought stress; gene expression; introgression lines; microarray; tomato
Cucumber, Cucumis sativus L., is an economically and nutritionally important crop of the Cucurbitaceae family and has long served as a primary model system for sex determination studies. Recently, the sequencing of its whole genome has been completed. However, transcriptome information of this species is still scarce, with a total of around 8,000 Expressed Sequence Tag (EST) and mRNA sequences currently available in GenBank. In order to gain more insights into molecular mechanisms of plant sex determination and provide the community a functional genomics resource that will facilitate cucurbit research and breeding, we performed transcriptome sequencing of cucumber flower buds of two near-isogenic lines, WI1983G, a gynoecious plant which bears only pistillate flowers, and WI1983H, a hermaphroditic plant which bears only bisexual flowers.
Using Roche-454 massive parallel pyrosequencing technology, we generated a total of 353,941 high quality EST sequences with an average length of 175bp, among which 188,255 were from gynoecious flowers and 165,686 from hermaphroditic flowers. These EST sequences, together with ~5,600 high quality cucumber EST and mRNA sequences available in GenBank, were clustered and assembled into 81,401 unigenes, of which 28,452 were contigs and 52,949 were singletons. The unigenes and ESTs were further mapped to the cucumber genome and more than 500 alternative splicing events were identified in 443 cucumber genes. The unigenes were further functionally annotated by comparing their sequences to different protein and functional domain databases and assigned with Gene Ontology (GO) terms. A biochemical pathway database containing 343 predicted pathways was also created based on the annotations of the unigenes. Digital expression analysis identified ~200 differentially expressed genes between flowers of WI1983G and WI1983H and provided novel insights into molecular mechanisms of plant sex determination process. Furthermore, a set of SSR motifs and high confidence SNPs between WI1983G and WI1983H were identified from the ESTs, which provided the material basis for future genetic linkage and QTL analysis.
A large set of EST sequences were generated from cucumber flower buds of two different sex types. Differentially expressed genes between these two different sex-type flowers, as well as putative SSR and SNP markers, were identified. These EST sequences provide valuable information to further understand molecular mechanisms of plant sex determination process and forms a rich resource for future functional genomics analysis, marker development and cucumber breeding.
Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum.
cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9).
A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species.
Gene regulation is a key mechanism in higher eukaryotic cellular processes. One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. Despite their importance, regulators involved in diverse biological processes still remain largely unrevealed. In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles.
We developed a framework that can detect correlations between gene expression profiles and the upstream sequences on the basis of the kernel canonical correlation analysis (kernel CCA). Using a yeast cell cycle dataset, we demonstrated that upstream sequence patterns were closely related to gene expression profiles based on the canonical correlation scores obtained by measuring the correlation between them. Our results showed that the cell cycle-specific regulatory motifs could be found successfully based on the motif weights derived through kernel CCA. Furthermore, we identified co-regulatory motif pairs using the same framework.
Given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes.
MicroRNAs (miRNAs) are small and noncoding RNAs that play important roles in various biological processes. They regulate target mRNAs post-transcriptionally through complementary base pairing. Since the changes of miRNAs affect the expression of target genes, the expression levels of target genes in specific biological processes could be different from those of non-target genes. Here we demonstrate that gene expression profiles contain useful information in separating miRNA targets from non-targets.
The gene expression profiles related to various developmental processes and stresses, as well as the sequences of miRNAs and mRNAs in Arabidopsis, were used to determine whether a given gene is a miRNA target. It is based on the model combining the support vector machine (SVM) classifier and the scoring method based on complementary base pairing between miRNAs and mRNAs. The proposed model yielded low false positive rate and retrieved condition-specific candidate targets through a genome-wide screening.
Our approach provides a novel framework into screening target genes by considering the gene regulation of miRNAs. It can be broadly applied to identify condition-specific targets computationally by embedding information of gene expression profiles.
The unique flavour of a tomato fruit is the sum of a complex interaction among sugars, acids, and a large set of volatile compounds. While it is generally acknowledged that the flavour of commercially produced tomatoes is inferior, the biochemical and genetic complexity of the trait has made breeding for improved flavour extremely difficult. The volatiles, in particular, present a major challenge for flavour improvement, being generated from a diverse set of lipid, amino acid, and carotenoid precursors. Very few genes controlling their biosynthesis have been identified. New quantitative trait loci (QTLs) that affect the volatile emissions of red-ripe fruits are described here. A population of introgression lines derived from a cross between the cultivated tomato Solanum lycopersicum and its wild relative, S. habrochaites, was characterized over multiple seasons and locations. A total of 30 QTLs affecting the emission of one or more volatiles were mapped. The data from this mapping project, combined with previously collected data on an IL population derived from a cross between S. lycopersicum and S. pennellii populations, were used to construct a correlational database. A metabolite tree derived from these data provides new insights into the pathways for the synthesis of several of these volatiles. One QTL is a novel locus affecting fruit carotenoid content on chromosome 2. Volatile emissions from this and other lines indicate that the linear and cyclic apocarotenoid volatiles are probably derived from separate carotenoid pools.
Apocarotenoids; flavour; metabolism; quantitative trait loci; Solanum lycopersicum
Cultivated watermelon form large fruits that are highly variable in size, shape, color, and content, yet have extremely narrow genetic diversity. Whereas a plethora of genes involved in cell wall metabolism, ethylene biosynthesis, fruit softening, and secondary metabolism during fruit development and ripening have been identified in other plant species, little is known of the genes involved in these processes in watermelon. A microarray and quantitative Real-Time PCR-based study was conducted in watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] in order to elucidate the flow of events associated with fruit development and ripening in this species. RNA from three different maturation stages of watermelon fruits, as well as leaf, were collected from field grown plants during three consecutive years, and analyzed for gene expression using high-density photolithography microarrays and quantitative PCR.
High-density photolithography arrays, composed of probes of 832 EST-unigenes from a subtracted, fruit development, cDNA library of watermelon were utilized to examine gene expression at three distinct time-points in watermelon fruit development. Analysis was performed with field-grown fruits over three consecutive growing seasons. Microarray analysis identified three hundred and thirty-five unique ESTs that are differentially regulated by at least two-fold in watermelon fruits during the early, ripening, or mature stage when compared to leaf. Of the 335 ESTs identified, 211 share significant homology with known gene products and 96 had no significant matches with any database accession. Of the modulated watermelon ESTs related to annotated genes, a significant number were found to be associated with or involved in the vascular system, carotenoid biosynthesis, transcriptional regulation, pathogen and stress response, and ethylene biosynthesis. Ethylene bioassays, performed with a closely related watermelon genotype with a similar phenotype, i.e. seeded, bright red flesh, dark green rind, etc., determined that ethylene levels were highest during the green fruit stage followed by a decrease during the white and pink fruit stages. Additionally, quantitative Real-Time PCR was used to validate modulation of 127 ESTs that were differentially expressed in developing and ripening fruits based on array analysis.
This study identified numerous ESTs with putative involvement in the watermelon fruit developmental and ripening process, in particular the involvement of the vascular system and ethylene. The production of ethylene during fruit development in watermelon gives further support to the role of ethylene in fruit development in non-climacteric fruits.