Baculoviruses are important insect pathogens that have been developed as protein expression vectors in insect cells and as transduction vectors for mammalian cells. They have large double-stranded DNA genomes containing approximately 156 tightly spaced genes, and they present significant challenges for transcriptome analysis. In this study, we report the first comprehensive analysis of AcMNPV transcription over the course of infection in Trichoplusia ni cells, by a combination of strand-specific RNA sequencing (RNA-Seq) and deep sequencing of 5′ capped transcription start sites and 3′ polyadenylation sites. We identified four clusters of genes associated with distinctive patterns of mRNA accumulation through the AcMNPV infection cycle. A total of 218 transcription start sites (TSS) and 120 polyadenylation sites (PAS) were mapped. Only 29 TSS were associated with a canonical TATA box, and 14 initiated within or near the previously identified CAGT initiator motif. The majority of viral transcripts (126) initiated within the baculovirus late promoter motif (TAAG), and late transcripts initiated precisely at the second position of the motif. Analysis of 3′ ends showed that 92 (77%) of the 3′ PAS were located within 30 nucleotides (nt) downstream of a consensus termination signal (AAUAAA or AUUAAA). A conserved U-rich region was found approximately 2 to 10 nt downstream of the PAS for 58 transcripts. Twelve splicing events and an unexpectedly large number of antisense RNAs were identified, revealing new details of possible regulatory mechanisms controlling AcMNPV gene expression. Combined, these data provide an emerging global picture of the organization and regulation of AcMNPV transcription through the infection cycle.
Many fruits, including watermelon, are proficient in carotenoid accumulation during ripening. While most genes encoding steps in the carotenoid biosynthetic pathway have been cloned, few transcriptional regulators of these genes have been defined to date. Here we describe the identification of a set of putative carotenoid-related transcription factors resulting from fresh watermelon carotenoid and transcriptome analysis during fruit development and ripening. Our goal is to both clarify the expression profiles of carotenoid pathway genes and to identify candidate regulators and molecular targets for crop improvement.
Total carotenoids progressively increased during fruit ripening up to ~55 μg g-1 fw in red-ripe fruits. Trans-lycopene was the carotenoid that contributed most to this increase. Many of the genes related to carotenoid metabolism displayed changing expression levels during fruit ripening generating a metabolic flux toward carotenoid synthesis. Constitutive low expression of lycopene cyclase genes resulted in lycopene accumulation. RNA-seq expression profiling of watermelon fruit development yielded a set of transcription factors whose expression was correlated with ripening and carotenoid accumulation. Nineteen putative transcription factor genes from watermelon and homologous to tomato carotenoid-associated genes were identified. Among these, six were differentially expressed in the flesh of both species during fruit development and ripening.
Taken together the data suggest that, while the regulation of a common set of metabolic genes likely influences carotenoid synthesis and accumulation in watermelon and tomato fruits during development and ripening, specific and limiting regulators may differ between climacteric and non-climacteric fruits, possibly related to their differential susceptibility to and use of ethylene during ripening.
Carotenoid biosynthesis; Citrullus lanatus; Fruit ripening; Gene expression; Isoprenoids; Non-climacteric fruits; Transcription factors; Watermelon
Radish (Raphanus sativus L., 2n = 2× = 18) is an economically important vegetable crop worldwide. A large collection of radish expressed sequence tags (ESTs) has been generated but remains largely uncharacterized.
In this study, approximately 315,000 ESTs derived from 22 Raphanus cDNA libraries from 18 different genotypes were analyzed, for the purpose of gene and marker discovery and to evaluate large-scale genome duplication and phylogenetic relationships among Raphanus spp. The ESTs were assembled into 85,083 unigenes, of which 90%, 65%, 89% and 89% had homologous sequences in the GenBank nr, SwissProt, TrEMBL and Arabidopsis protein databases, respectively. A total of 66,194 (78%) could be assigned at least one gene ontology (GO) term. Comparative analysis identified 5,595 gene families unique to radish that were significantly enriched with genes related to small molecule metabolism, as well as 12,899 specific to the Brassicaceae that were enriched with genes related to seed oil body biogenesis and responses to phytohormones. The analysis further indicated that the divergence of radish and Brassica rapa occurred approximately 8.9-14.9 million years ago (MYA), following a whole-genome duplication event (12.8-21.4 MYA) in their common ancestor. An additional whole-genome duplication event in radish occurred at 5.1-8.4 MYA, after its divergence from B. rapa. A total of 13,570 simple sequence repeats (SSRs) and 28,758 high-quality single nucleotide polymorphisms (SNPs) were also identified. Using a subset of SNPs, the phylogenetic relationships of eight different accessions of Raphanus was inferred.
Comprehensive analysis of radish ESTs provided new insights into radish genome evolution and the phylogenetic relationships of different radish accessions. Moreover, the radish EST sequences and the associated SSR and SNP markers described in this study represent a valuable resource for radish functional genomics studies and breeding.
Radish; EST; SNP; SSR; Comparative analysis; Whole genome duplication; Phylogenetic relationship
The complete genomic sequence of a new tobamovirus in tomatoes was determined through deep sequencing and assembly of small RNAs, then validated through Sanger sequencing. Based on the low sequence identity (≤85%) to known viruses and a close phylogenetic relationship to tobamoviruses, it was identified as a new species.
Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology.
Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply.
Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars.
Chrysanthemum; Dehydration stress; Gene expression; Pathways; RNA-seq; Transcriptome
MicroRNAs play an important role in plant development and plant responses to various biotic and abiotic stimuli. As one of the most important ornamental crops, rose (Rosa hybrida) possesses several specific morphological and physiological features, including recurrent flowering, highly divergent flower shapes, colors and volatiles. Ethylene plays an important role in regulating petal cell expansion during rose flower opening. Here, we report the population and expression profiles of miRNAs in rose petals during flower opening and in response to ethylene based on high throughput sequencing. We identified a total of 33 conserved miRNAs, as well as 47 putative novel miRNAs were identified from rose petals. The conserved and novel targets to those miRNAs were predicted using the rose floral transcriptome database. Expression profiling revealed that expression of 28 known (84.8% of known miRNAs) and 39 novel (83.0% of novel miRNAs) miRNAs was substantially changed in rose petals during the earlier opening period. We also found that 28 known and 22 novel miRNAs showed expression changes in response to ethylene treatment. Furthermore, we performed integrative analysis of expression profiles of miRNAs and their targets. We found that ethylene-caused expression changes of five miRNAs (miR156, miR164, miR166, miR5139 and rhy-miRC1) were inversely correlated to those of their seven target genes. These results indicate that these miRNA/target modules might be regulated by ethylene and were involved in ethylene-regulated petal growth.
The SBP-box gene family is specific to plants and encodes a class of zinc finger-containing transcription factors with a broad range of functions. Although SBP-box genes have been identified in numerous plants including green algae, moss, silver birch, snapdragon, Arabidopsis, rice and maize, there is little information concerning SBP-box genes, or the corresponding miR156/157, function in grapevine.
Eighteen SBP-box gene family members were identified in Vitis vinifera, twelve of which bore sequences that were complementary to miRNA156/157. Phylogenetic reconstruction demonstrated that plant SBP-domain proteins could be classified into seven subgroups, with the V. vinifera SBP-domain proteins being more closely related to SBP-domain proteins from dicotyledonous angiosperms than those from monocotyledonous angiosperms. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologs of several grape SBP genes were found in corresponding syntenic blocks of Arabidopsis. Expression analysis of the grape SBP-box genes in various organs and at different stages of fruit development in V. quinquangularis ‘Shang-24’ revealed distinct spatiotemporal patterns. While the majority of the grape SBP-box genes lacking a miR156/157 target site were expressed ubiquitously and constitutively, most genes bearing a miR156/157 target site exhibited distinct expression patterns, possibly due to the inhibitory role of the microRNA. Furthermore, microarray data mining and quantitative real-time RT-PCR analysis identified several grape SBP-box genes that are potentially involved in the defense against biotic and abiotic stresses.
The results presented here provide a further understanding of SBP-box gene function in plants, and yields additional insights into the mechanism of stress management in grape, which may have important implications for the future success of this crop.
Chromoplasts are unique plastids that accumulate massive amounts of carotenoids. To gain a general and comparative characterization of chromoplast proteins, this study performed proteomic analysis of chromoplasts from six carotenoid-rich crops: watermelon, tomato, carrot, orange cauliflower, red papaya, and red bell pepper. Stromal and membrane proteins of chromoplasts were separated by 1D gel electrophoresis and analysed using nLC-MS/MS. A total of 953–2262 proteins from chromoplasts of different crop species were identified. Approximately 60% of the identified proteins were predicted to be plastid localized. Functional classification using MapMan bins revealed large numbers of proteins involved in protein metabolism, transport, amino acid metabolism, lipid metabolism, and redox in chromoplasts from all six species. Seventeen core carotenoid metabolic enzymes were identified. Phytoene synthase, phytoene desaturase, ζ-carotene desaturase, 9-cis-epoxycarotenoid dioxygenase, and carotenoid cleavage dioxygenase 1 were found in almost all crops, suggesting relative abundance of them among the carotenoid pathway enzymes. Chromoplasts from different crops contained abundant amounts of ATP synthase and adenine nucleotide translocator, which indicates an important role of ATP production and transport in chromoplast development. Distinctive abundant proteins were observed in chromoplast from different crops, including capsanthin/capsorubin synthase and fibrillins in pepper, superoxide dismutase in watermelon, carrot, and cauliflower, and glutathione-S-transferease in papaya. The comparative analysis of chromoplast proteins among six crop species offers new insights into the general metabolism and function of chromoplasts as well as the uniqueness of chromoplasts in specific crop species. This work provides reference datasets for future experimental study of chromoplast biogenesis, development, and regulation in plants.
Carrot; cauliflower; chromoplast; papaya; pepper; proteomics; tomato; watermelon
Next generation DNA sequencing technologies are driving increasingly rapid, affordable and high resolution analyses of plant transcriptomes through sequencing of their associated cDNA (complementary DNA) populations; an analytical platform commonly referred to as RNA-sequencing (RNA-seq). Since entering the arena of whole genome profiling technologies only a few years ago, RNA-seq has proven itself to be a powerful tool with a remarkably diverse range of applications, from detailed studies of biological processes at the cell type-specific level, to providing insights into fundamental questions in plant biology on an evolutionary time scale. Applications include generating genomic data for heretofore unsequenced species, thus expanding the boundaries of what had been considered “model organisms,” elucidating structural and regulatory gene networks, revealing how plants respond to developmental cues and their environment, allowing a better understanding of the relationships between genes and their products, and uniting the “omics” fields of transcriptomics, proteomics, and metabolomics into a now common systems biology paradigm. We provide an overview of the breadth of such studies and summarize the range of RNA-seq protocols that have been developed to address questions spanning cell type-specific-based transcriptomics, transcript secondary structure and gene mapping.
RNA-seq; plant transcriptome; transcriptomics; systems biology; next generation sequencing
Deep sequencing is a powerful tool for novel small RNA discovery. Illumina small RNA sequencing library preparation requires a pre-adenylated 3’ end adapter containing a 5’,5’-adenyl pyrophosphoryl moiety. In the absence of ATP, this adapter can be ligated to the 3’ hydroxyl group of small RNA, while RNA self-ligation and concatenation are repressed. Pre-adenylated adapters are one of the most essential and costly components required for library preparation, and few are commercially available.
We demonstrate that DNA oligo with 5’ phosphate and 3’ amine groups can be enzymatically adenylated by T4 RNA ligase 1 to generate customized pre-adenylated adapters. We have constructed and sequenced a small RNA library for tomato (Solanum lycopersicum) using the T4 RNA ligase 1 adenylated adapter.
We provide an efficient and low-cost method for small RNA sequencing library preparation, which takes two days to complete and costs around $20 per library. This protocol has been tested in several plant species for small RNA sequencing including sweet potato, pepper, watermelon, and cowpea, and could be readily applied to any RNA samples.
Small RNA sequencing; Directional mRNA sequencing; 3’ RNA adapter; Adenylation; T4 RNA ligase 1
The TIFY gene family constitutes a plant-specific group of genes with a broad range of functions. This family encodes four subfamilies of proteins, including ZML, TIFY, PPD and JASMONATE ZIM-Domain (JAZ) proteins. JAZ proteins are targets of the SCFCOI1 complex, and function as negative regulators in the JA signaling pathway. Recently, it has been reported in both Arabidopsis and rice that TIFY genes, and especially JAZ genes, may be involved in plant defense against insect feeding, wounding, pathogens and abiotic stresses. Nonetheless, knowledge concerning the specific expression patterns and evolutionary history of plant TIFY family members is limited, especially in a woody species such as grape.
A total of two TIFY, four ZML, two PPD and 11 JAZ genes were identified in the Vitis vinifera genome. Phylogenetic analysis of TIFY protein sequences from grape, Arabidopsis and rice indicated that the grape TIFY proteins are more closely related to those of Arabidopsis than those of rice. Both segmental and tandem duplication events have been major contributors to the expansion of the grape TIFY family. In addition, synteny analysis between grape and Arabidopsis demonstrated that homologues of several grape TIFY genes were found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the divergence of lineages that led to grape and Arabidopsis. Analyses of microarray and quantitative real-time RT-PCR expression data revealed that grape TIFY genes are not a major player in the defense against biotrophic pathogens or viruses. However, many of these genes were responsive to JA and ABA, but not SA or ET.
The genome-wide identification, evolutionary and expression analyses of grape TIFY genes should facilitate further research of this gene family and provide new insights regarding their evolutionary history and regulatory control.
Small RNAs (sRNA), including microRNAs (miRNA) and small interfering RNAs (siRNA), are produced abundantly in plants and animals and function in regulating gene expression or in defense against virus or viroid infection. Analysis of siRNA profiles upon virus infection in plant may allow for virus identification, strain differentiation, and de novo assembly of virus genomes. In the present study, four suspected virus-infected tomato samples collected in the U.S. and Mexico were used for sRNA library construction and deep sequencing. Each library generated between 5–7 million sRNA reads, of which more than 90% were from the tomato genome. Upon in-silico subtraction of the tomato sRNAs, the remaining highly enriched, virus-like siRNA pools were assembled with or without reference virus or viroid genomes. A complete genome was assembled for Potato spindle tuber viroid (PSTVd) using siRNA alone. In addition, a near complete virus genome (98%) also was assembled for Pepino mosaic virus (PepMV). A common mixed infection of two strains of PepMV (EU and US1), which shared 82% of genome nucleotide sequence identity, also could be differentially assembled into their respective genomes. Using de novo assembly, a novel potyvirus with less than 60% overall genome nucleotide sequence identity to other known viruses was discovered and its full genome sequence obtained. Taken together, these data suggest that the sRNA deep sequencing technology will likely become an efficient and powerful generic tool for virus identification in plants and animals.
The completion of the grape genome sequencing project has paved the way for novel gene discovery and functional analysis. Aldehyde dehydrogenases (ALDHs) comprise a gene superfamily encoding NAD(P)+-dependent enzymes that catalyze the irreversible oxidation of a wide range of endogenous and exogenous aromatic and aliphatic aldehydes. Although ALDHs have been systematically investigated in several plant species including Arabidopsis and rice, our knowledge concerning the ALDH genes, their evolutionary relationship and expression patterns in grape has been limited.
A total of 23 ALDH genes were identified in the grape genome and grouped into ten families according to the unified nomenclature system developed by the ALDH Gene Nomenclature Committee (AGNC). Members within the same grape ALDH families possess nearly identical exon-intron structures. Evolutionary analysis indicates that both segmental and tandem duplication events have contributed significantly to the expansion of grape ALDH genes. Phylogenetic analysis of ALDH protein sequences from seven plant species indicates that grape ALDHs are more closely related to those of Arabidopsis. In addition, synteny analysis between grape and Arabidopsis shows that homologs of a number of grape ALDHs are found in the corresponding syntenic blocks of Arabidopsis, suggesting that these genes arose before the speciation of the grape and Arabidopsis. Microarray gene expression analysis revealed large number of grape ALDH genes responsive to drought or salt stress. Furthermore, we found a number of ALDH genes showed significantly changed expressions in responses to infection with different pathogens and during grape berry development, suggesting novel roles of ALDH genes in plant-pathogen interactions and berry development.
The genome-wide identification, evolutionary and expression analysis of grape ALDH genes should facilitate research in this gene family and provide new insights regarding their evolution history and functional roles in plant stress tolerance.
Powdery mildew (PM), caused by fungus Erysiphe necator, is one of the most devastating diseases of grapevine. To better understand grapevine-PM interaction and provide candidate resources for grapevine breeding, a suppression subtractive hybridization (SSH) cDNA library was constructed from E. necator-infected leaves of a resistant Chinese wild Vitis quinquangularis clone “Shang-24”. A total of 492 high quality expressed sequence tags (ESTs) were obtained and assembled into 266 unigenes. Gene ontology (GO) analysis indicated that 188 unigenes could be assigned with at least one GO term in the biological process category, and 176 in the molecular function category. Sequence analysis showed that a large number of these genes were homologous to those involved in defense responses. Genes involved in metabolism, photosynthesis, transport and signal transduction were also enriched in the library. Expression analysis of 13 selected genes by qRT-PCR revealed that most were induced more quickly and intensely in the resistant material “Shang-24” than in the sensitive V. pseudoreticulata clone “Hunan-1” by E. necator infection. The ESTs reported here provide new clues to understand the disease-resistance mechanism in Chinese wild grapevine species and may enable us to investigate E. necator-responsive genes involved in PM resistance in grapevine germplasm.
Chinese wild Vitis quinquangularis; Erysiphe necator; SSH; EST; qRT-PCR
Noncoding RNAs (ncRNA) are widely expressed in both prokaryotes and eukaryotes. Eukaryotic ncRNAs are commonly micro- and small-interfering RNAs (18–25 nt) involved in posttranscriptional gene silencing, whereas prokaryotic ncRNAs vary in size and are involved in various aspects of gene regulation. Given the prokaryotic origin of organelles, the presence of ncRNAs might be expected; however, the full spectrum of organellar ncRNAs has not been determined systematically. Here, strand-specific RNA-Seq analysis was used to identify 107 candidate ncRNAs from Arabidopsis thaliana chloroplasts, primarily encoded opposite protein-coding and tRNA genes. Forty-eight ncRNAs were shown to accumulate by RNA gel blot as discrete transcripts in wild-type (WT) plants and/or the pnp1-1 mutant, which lacks the chloroplast ribonuclease polynucleotide phosphorylase (cpPNPase). Ninety-eight percent of the ncRNAs detected by RNA gel blot had different transcript patterns between WT and pnp1-1, suggesting cpPNPase has a significant role in chloroplast ncRNA biogenesis and accumulation. Analysis of materials deficient for other major chloroplast ribonucleases, RNase R, RNase E, and RNase J, showed differential effects on ncRNA accumulation and/or form, suggesting specificity in RNase-ncRNA interactions. 5′ end mapping demonstrates that some ncRNAs are transcribed from dedicated promoters, whereas others result from transcriptional read-through. Finally, correlations between accumulation of some ncRNAs and the symmetrically transcribed sense RNA are consistent with a role in RNA stability. Overall, our data suggest that this extensive population of ncRNAs has the potential to underpin a previously underappreciated regulatory mode in the chloroplast.
RNA-Seq; posttranscriptional regulation; transcription; organelle; plastid
Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies.
We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology.
We compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.
Chloroplasts are the green plastids where photosynthesis takes place. The biogenesis of chloroplasts requires the coordinate expression of both nuclear and chloroplast genes and is regulated by developmental and environmental signals. Despite extensive studies of this process, the genetic basis and the regulatory control of chloroplast biogenesis and development remain to be elucidated.
Green cauliflower mutant causes ectopic development of chloroplasts in the curd tissue of the plant, turning the otherwise white curd green. To investigate the transcriptional control of chloroplast development, we compared gene expression between green and white curds using the RNA-seq approach. Deep sequencing produced over 15 million reads with lengths of 86 base pairs from each cDNA library. A total of 7,155 genes were found to exhibit at least 3-fold changes in expression between green and white curds. These included light-regulated genes, genes encoding chloroplast constituents, and genes involved in chlorophyll biosynthesis. Moreover, we discovered that the cauliflower ELONGATED HYPOCOTYL5 (BoHY5) was expressed higher in green curds than white curds and that 2616 HY5-targeted genes, including 1600 up-regulated genes and 1016 down-regulated genes, were differently expressed in green in comparison to white curd tissue. All these 1600 up-regulated genes were HY5-targeted genes in the light.
The genome-wide profiling of gene expression by RNA-seq in green curds led to the identification of large numbers of genes associated with chloroplast development, and suggested the role of regulatory genes in the high hierarchy of light signaling pathways in mediating the ectopic chloroplast development in the green curd cauliflower mutant.
As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.
The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/.
We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.
Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues.
We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development.
We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.
Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited.
We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences.
The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.
Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-Methylcyclopropene.
To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated.
The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies.
The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening.
Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species.
Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu.
To unravel the molecular mechanisms of drought responses in tomato, gene expression profiles of two drought-tolerant lines identified from a population of Solanum pennellii introgression lines, and the recurrent parent S. lycopersicum cv. M82, a drought-sensitive cultivar, were investigated under drought stress using tomato microarrays. Around 400 genes identified were responsive to drought stress only in the drought-tolerant lines. These changes in genes expression are most likely caused by the two inserted chromosome segments of S. pennellii, which possibly contain drought-tolerance quantitative trait loci (QTLs). Among these genes are a number of transcription factors and signalling proteins which could be global regulators involved in the tomato responses to drought stress. Genes involved in organism growth and development processes were also specifically regulated by drought stress, including those controlling cell wall structure, wax biosynthesis, and plant height. Moreover, key enzymes in the pathways of gluconeogenesis (fructose-bisphosphate aldolase), purine and pyrimidine nucleotide biosynthesis (adenylate kinase), tryptophan degradation (aldehyde oxidase), starch degradation (β-amylase), methionine biosynthesis (cystathionine β-lyase), and the removal of superoxide radicals (catalase) were also specifically affected by drought stress. These results indicated that tomato plants could adapt to water-deficit conditions through decreasing energy dissipation, increasing ATP energy provision, and reducing oxidative damage. The drought-responsive genes identified in this study could provide further information for understanding the mechanisms of drought tolerance in tomato.
Drought stress; gene expression; introgression lines; microarray; tomato
Cucumber, Cucumis sativus L., is an economically and nutritionally important crop of the Cucurbitaceae family and has long served as a primary model system for sex determination studies. Recently, the sequencing of its whole genome has been completed. However, transcriptome information of this species is still scarce, with a total of around 8,000 Expressed Sequence Tag (EST) and mRNA sequences currently available in GenBank. In order to gain more insights into molecular mechanisms of plant sex determination and provide the community a functional genomics resource that will facilitate cucurbit research and breeding, we performed transcriptome sequencing of cucumber flower buds of two near-isogenic lines, WI1983G, a gynoecious plant which bears only pistillate flowers, and WI1983H, a hermaphroditic plant which bears only bisexual flowers.
Using Roche-454 massive parallel pyrosequencing technology, we generated a total of 353,941 high quality EST sequences with an average length of 175bp, among which 188,255 were from gynoecious flowers and 165,686 from hermaphroditic flowers. These EST sequences, together with ~5,600 high quality cucumber EST and mRNA sequences available in GenBank, were clustered and assembled into 81,401 unigenes, of which 28,452 were contigs and 52,949 were singletons. The unigenes and ESTs were further mapped to the cucumber genome and more than 500 alternative splicing events were identified in 443 cucumber genes. The unigenes were further functionally annotated by comparing their sequences to different protein and functional domain databases and assigned with Gene Ontology (GO) terms. A biochemical pathway database containing 343 predicted pathways was also created based on the annotations of the unigenes. Digital expression analysis identified ~200 differentially expressed genes between flowers of WI1983G and WI1983H and provided novel insights into molecular mechanisms of plant sex determination process. Furthermore, a set of SSR motifs and high confidence SNPs between WI1983G and WI1983H were identified from the ESTs, which provided the material basis for future genetic linkage and QTL analysis.
A large set of EST sequences were generated from cucumber flower buds of two different sex types. Differentially expressed genes between these two different sex-type flowers, as well as putative SSR and SNP markers, were identified. These EST sequences provide valuable information to further understand molecular mechanisms of plant sex determination process and forms a rich resource for future functional genomics analysis, marker development and cucumber breeding.