Quantification of a transcriptional profile is a useful way to evaluate the activity of a cell at a given point in time. Although RNA-Seq has revolutionized transcriptional profiling, the costs of RNA-Seq are still significantly higher than microarrays, and often the depth of data delivered from RNA-Seq is in excess of what is needed for simple transcript quantification. Digital Gene Expression (DGE) is a cost-effective, sequence-based approach for simple transcript quantification: by sequencing one read per molecule of RNA, this technique can be used to efficiently count transcripts while obviating the need for transcript-length normalization and reducing the total numbers of reads necessary for accurate quantification. Here, we present trieFinder, a program specifically designed to rapidly map, parse, and annotate DGE tags of various lengths against cDNA and/or genomic sequence databases.
The trieFinder algorithm maps DGE tags in a two-step process. First, it scans FASTA files of RefSeq, UniGene, and genomic DNA sequences to create a database of all tags that can be derived from a predefined restriction site. Next, it compares the experimental DGE tags to this tag database, taking advantage of the fact that the tags are stored as a prefix tree, or “trie”, which allows for linear-time searches for exact matches. DGE tags with mismatches are analyzed by recursive calls in the data structure. We find that, in terms of alignment speed, the mapping functionality of trieFinder compares favorably with Bowtie.
trieFinder can quickly provide the user an annotation of the DGE tags from three sources simultaneously, simplifying transcript quantification and novel transcript detection, delivering the data in a simple parsed format, obviating the need to post-process the alignment results. trieFinder is available at http://research.nhgri.nih.gov/software/trieFinder/.
RNA-Seq; Transcriptional profiling; DGE; SAGE
Pear (Pyrus spp) is an important fruit species worldwide; however, its genetics and genomic information is limited. Combining the Solexa/Illumina RNA-seq high-throughput sequencing approach (RNA-seq) with Digital Gene Expression (DGE) analysis would be a powerful tool for transcriptomic study. This paper reports the transcriptome profiling analysis of Chinese white pear (P. bretschneideri) using RNA-seq and DGE to better understand the molecular mechanisms in fruit development and maturation of Chinese white pear.
De novo transcriptome assembly and gene expression analysis of Chinese white pear were performed in an unprecedented depth (5.47 gigabase pairs) using high-throughput Illumina RNA-seq combined with a tag-based Digital Gene Expression (DGE) system. Approximately, 60.77 million reads were sequenced, trimmed, and assembled into 90,227 unigenes. These unigenes comprised 17,619 contigs and 72,608 singletons with an average length of 508 bp and had an N50 of 635 bp. Sequence similarity analyses against six public databases (Uniprot, NR, and COGs at NCBI, Pfam, InterPro, and KEGG) found that 61,636 unigenes can be annotated with gene descriptions, conserved protein domains, or gene ontology terms. By BLASTing all 61,636 unigenes in KEGG, a total of 31,215 unigenes were annotated into 121 known metabolic or signaling pathways in which a few primary, intermediate, and secondary metabolic pathways are directly related to pear fruit quality. DGE libraries were constructed for each of the five fruit developmental stages. Variations in gene expression among all developmental stages of pear fruit were significantly different in a large amount of unigenes.
Extensive transcriptome and DGE profiling data at five fruit developmental stages of Chinese white pear have been obtained from a deep sequencing, which provides comprehensive gene expression information at the transcriptional level. This could facilitate understanding of the molecular mechanisms in fruit development and maturation. Such a database can also be used as a public information platform for research on molecular biology and functional genomics in pear and other related species.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-823) contains supplementary material, which is available to authorized users.
Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.
The brown planthopper (BPH) Nilaparvata lugens (Stål) is one of the most serious insect pests of rice in Asia. However, little is known about the mechanisms responsible for the development, wing dimorphism and sex difference in this species. Genomic information for BPH is currently unavailable, and, therefore, transcriptome and expression profiling data for this species are needed as an important resource to better understand the biological mechanisms of BPH.
In this study, we performed de novo transcriptome assembly and gene expression analysis using short-read sequencing technology (Illumina) combined with a tag-based digital gene expression (DGE) system. The transcriptome analysis assembles the gene information for different developmental stages, sexes and wing forms of BPH. In addition, we constructed six DGE libraries: eggs, second instar nymphs, fifth instar nymphs, brachypterous female adults, macropterous female adults and macropterous male adults. Illumina sequencing revealed 85,526 unigenes, including 13,102 clusters and 72,424 singletons. Transcriptome sequences larger than 350 bp were subjected to Gene Orthology (GO) and KEGG Orthology (KO) annotations. To analyze the DGE profiling, we mainly compared the gene expression variations between eggs and second instar nymphs; second and fifth instar nymphs; fifth instar nymphs and three types of adults; brachypterous and macropterous female adults as well as macropterous female and male adults. Thousands of genes showed significantly different expression levels based on the various comparisons. And we randomly selected some genes to confirm their altered expression levels by quantitative real-time PCR (qRT-PCR).
The obtained BPH transcriptome and DGE profiling data provide comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms from various physiological aspects including development, wing dimorphism and sex difference in BPH.
Trichome hairs affect diverse agronomic characters such as seed weight and yield, prevent insect damage and reduce loss of water but their molecular control has not been extensively studied in soybean. Several detailed models for trichome development have been proposed for Arabidopsis thaliana, but their applicability to important crops such as cotton and soybean is not fully known.
Two high throughput transcript sequencing methods, Digital Gene Expression (DGE) Tag Profiling and RNA-Seq, were used to compare the transcriptional profiles in wild-type (cv. Clark standard, CS) and a mutant (cv. Clark glabrous, i.e., trichomeless or hairless, CG) soybean isoline that carries the dominant P1 allele. DGE data and RNA-Seq data were mapped to the cDNAs (Glyma models) predicted from the reference soybean genome, Williams 82. Extending the model length by 250 bp at both ends resulted in significantly more matches of authentic DGE tags indicating that many of the predicted gene models are prematurely truncated at the 5' and 3' UTRs. The genome-wide comparative study of the transcript profiles of the wild-type versus mutant line revealed a number of differentially expressed genes. One highly-expressed gene, Glyma04g35130, in wild-type soybean was of interest as it has high homology to the cotton gene GhRDL1 gene that has been identified as being involved in cotton fiber initiation and is a member of the BURP protein family. Sequence comparison of Glyma04g35130 among Williams 82 with our sequences derived from CS and CG isolines revealed various SNPs and indels including addition of one nucleotide C in the CG and insertion of ~60 bp in the third exon of CS that causes a frameshift mutation and premature truncation of peptides in both lines as compared to Williams 82.
Although not a candidate for the P1 locus, a BURP family member (Glyma04g35130) from soybean has been shown to be abundantly expressed in the CS line and very weakly expressed in the glabrous CG line. RNA-Seq and DGE data are compared and provide experimental data on the expression of predicted soybean gene models as well as an overview of the genes expressed in young shoot tips of two closely related isolines.
Japanese scallop has been cultured on a large-scale in China for many years. However, serious marine pollution in recent years has resulted in considerable loss to this industry. Moreover, due to the lack of genomic resources, limited research has been carried out on this species. To facilitate the understanding at molecular level immune and stress response mechanism, an extensive transcriptomic profiling and digital gene expression (DGE) database of Japanese scallop upon cadmium exposure was carried out using the Illumina sequencing platform.
RNA-seq produced about 112 million sequencing reads from the tissues of adult Japanese scallops. These reads were assembled into 194,839 non-redundant sequences with open reading frame (ORF), of which 14,240 putative amino acid sequences were assigned biological function annotation and were annotated with gene ontology and eukaryotic orthologous group terms. In addition, we identified 720 genes involved in response to stimulus and 302 genes involved in immune-response pathways. Furthermore, we investigated the transcriptomic changes in the gill and digestive gland of Japanese scallops following cadmium exposure using a tag-based DGE system. A total of 7,556 and 3,002 differentially expressed genes were detected, respectively, and functionally annotated with KEGG pathway annotations.
This study provides a comprehensive transcripts sequence resource for the Japanese scallop and presents a survey of gene expression in response to heavy metal exposure in a non-model marine invertebrate via the Illumina sequencing platform. These results may contribute to the in-depth elucidation of the molecular mechanisms involved in bivalve responses to marine pollutants.
Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish.
RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution.
This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals.
Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense.
In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development.
RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid.
Floral development; Flowering time; Digital gene expression; Transcriptome; Cymbidium sinense
The desiccation-tolerant moss Bryum argenteum is an important component of the Biological Soil Crusts (BSCs) found in the Gurbantunggut desert. Desiccation tolerance is defined as the ability to revive from the air dried state. To elucidate the molecular mechanisms related to desiccation tolerance, we employed RNA-Seq and digital gene expression (DGE) technologies to study the genome-wide expression profiles of the dehydration and rehydration processes in this important desert plant.
We applied a two-step approach to investigate the gene expression profile upon rehydration in the moss Bryum argenteum using Illumina HiSeq2000 sequencing platform. First, a total of 57,247 transcript assembly contigs (TACs) were obtained from 54.79 million reads by de novo assembly, with an average length of 863 bp and N50 of 1,372 bp. Among the reconstructed TACs, 36,916 (64.5 %) revealed similarity with existing protein sequences in the public databases. 23,509 and 21,607 TACs were assigned GO and KEGG annotation information, respectively. Second, samples were taken from 3 hydration stages: desiccated (Dry), rehydrated 2 h (R2) and rehydrated 24 h (R24), and DEG libraries were constructed for Differentially Expressed Genes (DEGs) discovery. 4,081 and 6,709 DEGs were identified in R2 and R24, compared with Dry, respectively. Compared to the desiccated sample, up-regulated genes after two hours of hydration are primarily related to stress responses. GO function enrichment network, EKGG metabolic pathway and MapMan analysis supports the idea of the rapid recovery of photosynthesis after 24 h of rehydration. We identified 770 transcription factors (TFs) which were classified into 50 TF families. 142 TF transcripts were up-regulated upon rehydration including 23 members of the ERF family.
In this study, we constructed a pioneering, high-quality reference transcriptome in B. argenteum and generated three DGE libraries to elucidate the changes of gene expression upon rehydration. Expression profiles consistent with the rapid recovery of photosynthesis (at R2) and the re-establishment of a positive carbon balance following rehydration (at R24) were observed. Our study will extend our knowledge of bryophyte transcriptomes and provide further insight into the molecular mechanisms related to rehydration and desiccation-tolerance.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1633-y) contains supplementary material, which is available to authorized users.
Transcriptome; Gene expression; Desiccation; Bryum; Physcomitrella; Biological soil crust
Tea is the most popular non-alcoholic health beverage in the world. The tea plant (Camellia sinensis (L.) O. Kuntze) needs to undergo a cold acclimation process to enhance its freezing tolerance in winter. Changes that occur at the molecular level in response to low temperatures are poorly understood in tea plants. To elucidate the molecular mechanisms of cold acclimation, we employed RNA-Seq and digital gene expression (DGE) technologies to the study of genome-wide expression profiles during cold acclimation in tea plants.
Using the Illumina sequencing platform, we obtained approximately 57.35 million RNA-Seq reads. These reads were assembled into 216,831 transcripts, with an average length of 356 bp and an N50 of 529 bp. In total, 1,770 differentially expressed transcripts were identified, of which 1,168 were up-regulated and 602 down-regulated. These include a group of cold sensor or signal transduction genes, cold-responsive transcription factor genes, plasma membrane stabilization related genes, osmosensing-responsive genes, and detoxification enzyme genes. DGE and quantitative RT-PCR analysis further confirmed the results from RNA-Seq analysis. Pathway analysis indicated that the “carbohydrate metabolism pathway” and the “calcium signaling pathway” might play a vital role in tea plants’ responses to cold stress.
Our study presents a global survey of transcriptome profiles of tea plants in response to low, non-freezing temperatures and yields insights into the molecular mechanisms of tea plants during the cold acclimation process. It could also serve as a valuable resource for relevant research on cold-tolerance and help to explore the cold-related genes in improving the understanding of low-temperature tolerance and plant-environment interactions.
Camellia Sinensis; Cold Acclimation; RNA-Seq; DGE; Genome-wide Expression Profiles; Tea Plants
Plants have evolved a sensitive defense response system that detects and recognizes various pathogen-associated molecular patterns (PAMPs) (e.g. flagellin) and induces immune responses to protect against invasion. Transcriptional responses in rice to PAMPs produced by Xanthomonas oryzae pv. oryzae (Xoo), the bacterial blight pathogen, have not yet been defined.
We characterized transcriptomic responses in rice inoculated with the wildtype (WT) Xoo and flagellin-deficient mutant ∆fliC through RNA-seq analysis. Digital gene expression (DGE) analysis based on Solexa/Illumina sequencing was used to investigate transcriptomic responses in 30 day-old seedlings of rice (Oryza sativa L. cv. Nipponbare). 1,680 genes were differentially-expressed (DEGs) in rice inoculated with WT relative to ∆fliC; among which 1,159 genes were up-regulated and 521 were down-regulated. Expression patterns of 12 randomly-selected DEGs assayed by quantitative real time PCR (qRT-PCR) were similar to those detected by DGE analyses, confirming reliability of the DGE data. Functional annotations revealed the up-regulated DEGs are involved in the cell wall, lipid and secondary metabolism, defense response and hormone signaling, whereas the down-regulated ones are associated with photosynthesis. Moreover, 57 and 21 specifically expressed genes were found after WT and ∆fliC treatments, respectively.
DEGs were identified in rice inoculated with WT Xoo relative to ∆fliC. These genes were predicted to function in multiple biological processes, including the defense response and photosynthesis in rice. This study provided additional insights into molecular basis of rice response to bacterial infection and revealed potential functions of bacterial flagellin in the rice-Xoo interactions.
Rice; Differentially-expressed genes (DEGs); Flagellin; Immune response; Xanthomonas oryzae pv. oryzae
While trade-offs between flight capability and reproduction is a common phenomenon in wing dimorphic insects, the molecular basis is largely unknown. In this study, we examined the transcriptomic differences between winged and wingless morphs of cotton aphids, Aphis gossypii, using a tag-based digital gene expression (DGE) approach. Ultra high-throughput Illumina sequencing generated 5.30 and 5.39 million raw tags, respectively, from winged and wingless A. gossypii DGE libraries. We identified 1,663 differentially expressed transcripts, among which 58 were highly expressed in the winged A. gossypii, whereas 1,605 expressed significantly higher in the wingless morphs. Bioinformatics tools, including Gene Ontology, Cluster of Orthologous Groups, euKaryotic Orthologous Groups and Kyoto Encyclopedia of Genes and Genomes pathways, were used to functionally annotate these transcripts. In addition, 20 differentially expressed transcripts detected by DGE were validated by the quantitative real-time PCR. Comparative transcriptomic analysis of sedentary (wingless) and migratory (winged) A. gossyii not only advances our understanding of the trade-offs in wing dimorphic insects, but also provides a candidate molecular target for the genetic control of this agricultural insect pest.
Aphis gossypii; trade-off; migration; digital gene expression; wing polyphenism.
The Eastern hive honey bee, Apis cerana cerana is a native and widely bred honey bee species in China. Molecular biology research about this honey bee species is scarce, and genomic information for A. c. cerana is not currently available. Transcriptome and expression profiling data for this species are therefore important resources needed to better understand the biological mechanisms of A. c. cerana. In this study, we obtained the transcriptome information of A. c. cerana by RNA-sequencing and compared gene expression differences between queens and workers of A. c. cerana by digital gene expression (DGE) analysis.
Using high-throughput Illumina RNA sequencing we obtained 51,581,510 clean reads corresponding to 4.64 Gb total nucleotides from a single run. These reads were assembled into 46,999 unigenes with a mean length of 676 bp. Based on a sequence similarity search against the five public databases (NR, Swissport, GO, COG, KEGG) with a cut-off E-value of 10−5 using BLASTX, a total of 24,630 unigenes were annotated with gene descriptions, gene ontology terms, or metabolic pathways. Using these transcriptome data as references we analyzed the gene expression differences between the queens and workers of A. c. cerana using a tag-based digital gene expression method. We obtained 5.96 and 5.66 million clean tags from the queen and worker samples, respectively. A total of 414 genes were differentially expressed between them, with 189 up-regulated and 225 down-regulated in queens.
Our transcriptome data provide a comprehensive sequence resource for future A. c. cerana study, establishing an important public information platform for functional genomic studies in A. c. cerana. Furthermore, the DGE data provide comprehensive gene expression information for the queens and workers, which will facilitate our understanding of the molecular mechanisms of the different physiological aspects of the two castes.
Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy.
We performed de novo transcriptome assembly and digital gene expression (DGE) profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group) using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp), including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1%) unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR.
The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.
‘Suli’ pear (Pyrus pyrifolia white pear group); Transcriptome; Bud dormancy; RNA-Seq
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Many insects enter a developmental arrest (diapause) that allows them to survive harsh seasonal conditions. Despite the well-established ecological significance of diapause, the molecular basis of this crucial adaptation remains largely unresolved. Sitodiplosis mosellana (Gehin), the orange wheat blossom midge (OWBM), causes serious damage to wheat throughout the northern hemisphere, and sporadic outbreaks occur in the world. Traits related to diapause appear to be important factors contributing to their rapid spread and outbreak. To better understand the diapause mechanisms of OWBM, we sequenced the transcriptome and determined the gene expression profile of this species.
In this study, we performed de novo transcriptome analysis using short-read sequencing technology (Illumina) and gene expression analysis with a tag-based digital gene expression (DGE) system. The sequencing results generated 89,117 contigs, and 45,713 unigenes. These unigenes were annotated by Blastx alignment against the NCBI non-redundant (nr), Clusters of orthologous groups (COG), gene orthology (GO), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. 20,802 unigenes (45.5% of the total) matched with protein in the NCBI nr database. Two digital gene expression (DGE) libraries were constructed to determine differences in gene expression profiles during diapause and non-diapause developmental stages. Genes related to diapause were analyzed in detail and in addition, nine diapause-related genes were analyzed by real time PCR.
The OWBM transcriptome greatly improves our genetic understanding and provides a platform for functional genomics research of this species. The DGE profiling data provides comprehensive information at the transcriptional level that facilitates our understanding of the molecular mechanisms of various physiological aspects including development and diapause stages in OWBM. From this study it is evident that various genes coding metabolic enzymes are crucial for diapause and metamorphosis.
The rice leaf folder (RLF), Cnaphalocrocis medinalis (Guenee) (Lepidoptera: Pyralidae), is one of the most destructive pests affecting rice in Asia. Although several studies have been performed on the ecological and physiological aspects of this species, the molecular mechanisms underlying its developmental regulation, behavior, and insecticide resistance remain largely unknown. Presently, there is a lack of genomic information for RLF; therefore, studies aimed at profiling the RLF transcriptome expression would provide a better understanding of its biological function at the molecular level.
De novo assembly of the RLF transcriptome was performed via the short read sequencing technology (Illumina). In a single run, we produced more than 23 million sequencing reads that were assembled into 44,941 unigenes (mean size = 474 bp) by Trinity. Through a similarity search, 25,281 (56.82%) unigenes matched known proteins in the NCBI Nr protein database. The transcriptome sequences were annotated with gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). Additionally, we profiled gene expression during RLF development using a tag-based digital gene expression (DGE) system. Five DGE libraries were constructed, and variations in gene expression were compared between collected samples: eggs vs. 3rd instar larvae, 3rd instar larvae vs. pupae, pupae vs. adults. The results demonstrated that thousands of genes were significantly differentially expressed during various developmental stages. A number of the differentially expressed genes were confirmed by quantitative real-time PCR (qRT-PCR).
The RLF transcriptome and DGE data provide a comprehensive and global gene expression profile that would further promote our understanding of the molecular mechanisms underlying various biological characteristics, including development, elevated fecundity, flight, sex differentiation, olfactory behavior, and insecticide resistance in RLF. Therefore, these findings could help elucidate the intrinsic factors involved in the RLF-mediated destruction of rice and offer sustainable insect pest management.
Trichinellosis is a typical food-borne zoonotic disease which is epidemic worldwide and the nematode Trichinella spiralis is the main pathogen. The life cycle of T. spiralis contains three developmental stages, i.e. adult worms, new borne larva (new borne L1 larva) and muscular larva (infective L1 larva). Stage-specific gene expression in the parasites has been investigated with various immunological and cDNA cloning approaches, whereas the genome-wide transcriptome and expression features of the parasite have been largely unknown. The availability of the genome sequence information of T. spiralis has made it possible to deeply dissect parasite biology in association with global gene expression and pathogenesis.
Methodology and Principal Findings
In this study, we analyzed the global gene expression patterns in the three developmental stages of T. spiralis using digital gene expression (DGE) analysis. Almost 15 million sequence tags were generated with the Illumina RNA-seq technology, producing expression data for more than 9,000 genes, covering 65% of the genome. The transcriptome analysis revealed thousands of differentially expressed genes within the genome, and importantly, a panel of genes encoding functional proteins associated with parasite invasion and immuno-modulation were identified. More than 45% of the genes were found to be transcribed from both strands, indicating the importance of RNA-mediated gene regulation in the development of the parasite. Further, based on gene ontological analysis, over 3000 genes were functionally categorized and biological pathways in the three life cycle stage were elucidated.
Conclusions and Significance
The global transcriptome of T. spiralis in three developmental stages has been profiled, and most gene activity in the genome was found to be developmentally regulated. Many metabolic and biological pathways have been revealed. The findings of the differential expression of several protein families facilitate understanding of the molecular mechanisms of parasite biology and the pathological aspects of trichinellosis.
Trichinellosis of human and other mammals was caused through the ingestion of the parasite Trichinella sparilis in contaminated meat. It is a typical zoonotic disease that affects more than 10 million people world-wide. Parasites of the genus Trichinella are unique intracellular pathogens. Adult Trichinella parasites directly release newborn larvae which invade striated muscle cells and causes diseases. In this study, we profiled the global transcriptome in the three developmental stages of T. spiralis. The transcriptomic analysis revealed the global gene expression patterns from newborn larval stage through muscle larval stage to adults. Thousands of genes with stage-specific transcriptional patterns were described and novel genes involving host-parasite interaction were identified. More than 45% of the protein-coding genes showed evidence of transcription from both sense and antisense strands which suggests the importance of RNA-mediated gene regulation in the parasite. This study presents a first deep analysis of the transcriptome of T. spiralis, providing insight information of the parasite biology.
The diamondback moth Plutella xyllostella has developed a high level of resistance to the latest insecticide chlorantraniliprole. A better understanding of P. xylostella’s resistance mechanism to chlorantraniliprole is needed to develop effective approaches for insecticide resistance management.
To provide a comprehensive insight into the resistance mechanisms of P. xylostella to chlorantraniliprole, transcriptome assembly and tag-based digital gene expression (DGE) system were performed using Illumina HiSeq™ 2000. The transcriptome analysis of the susceptible strain (SS) provided 45,231 unigenes (with the size ranging from 200 bp to 13,799 bp), which would be efficient for analyzing the differences in different chlorantraniliprole-resistant P. xylostella stains. DGE analysis indicated that a total of 1215 genes (189 up-regulated and 1026 down-regulated) were gradient differentially expressed among the susceptible strain (SS) and different chlorantraniliprole-resistant P. xylostella strains, including low-level resistance (GXA), moderate resistance (LZA) and high resistance strains (HZA). A detailed analysis of gradient differentially expressed genes elucidated the existence of a phase-dependent divergence of biological investment at the molecular level. The genes related to insecticide resistance, such as P450, GST, the ryanodine receptor, and connectin, had different expression profiles in the different chlorantraniliprole-resistant DGE libraries, suggesting that the genes related to insecticide resistance are involved in P. xylostella resistance development against chlorantraniliprole. To confirm the results from the DGE, the expressional profiles of 4 genes related to insecticide resistance were further validated by qRT-PCR analysis.
The obtained transcriptome information provides large gene resources available for further studying the resistance development of P. xylostella to pesticides. The DGE data provide comprehensive insights into the gene expression profiles of the different chlorantraniliprole-resistant stains. These genes are specifically related to insecticide resistance, with different expressional profiles facilitating the study of the role of each gene in chlorantraniliprole resistance development.
The geese have strong broodiness and poor egg performance. These characteristics are the key issues that hinder the goose industry development. Yet little is known about the mechanisms responsible for follicle development due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to produce a comprehensive and integrated genomic resource and to better understand the biological mechanisms of goose follicle development.
In this study, we performed de novo transcriptome assembly and gene expression analysis using short-read sequencing technology (Illumina). We obtained 67,315,996 short reads of 100 bp, which were assembled into 130,514 unique sequences by Trinity strategy (mean size = 753bp). Based on BLAST results with known proteins, these analyses identified 52,642 sequences with a cut-off E-value above 10−5. Assembled sequences were annotated with gene descriptions, gene ontology and clusters of orthologous group terms. In addition, we investigated the transcription changes during the goose laying/broodiness period using a tag-based digital gene expression (DGE) system. We obtained a sequencing depth of over 4.2 million tags per sample and identified a large number of genes associated with follicle development and reproductive biology including cholesterol side-chain cleavage enzyme gene and dopamine beta-hydroxylas gene. We confirm the altered expression levels of the two genes using quantitative real-time PCR (qRT-PCR).
The obtained goose transcriptome and DGE profiling data provide comprehensive gene expression information at the transcriptional level that could promote better understanding of the molecular mechanisms underlying follicle development and productivity.
Fruit color is one of the most important economic traits of the sweet cherry (Prunus avium L.). The red coloration of sweet cherry fruit is mainly attributed to anthocyanins. However, limited information is available regarding the molecular mechanisms underlying anthocyanin biosynthesis and its regulation in sweet cherry.
In this study, a reference transcriptome of P. avium L. was sequenced and annotated to identify the transcriptional determinants of fruit color. Normalized cDNA libraries from red and yellow fruits were sequenced using the next-generation Illumina/Solexa sequencing platform and de novo assembly. Over 66 million high-quality reads were assembled into 43,128 unigenes using a combined assembly strategy. Then a total of 22,452 unigenes were compared to public databases using homology searches, and 20,095 of these unigenes were annotated in the Nr protein database. Furthermore, transcriptome differences between the four stages of fruit ripening were analyzed using Illumina digital gene expression (DGE) profiling. Biological pathway analysis revealed that 72 unigenes were involved in anthocyanin biosynthesis. The expression patterns of unigenes encoding phenylalanine ammonia-lyase (PAL), 4-coumarate-CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavanone 3’-hydroxylase (F3’H), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS) and UDP glucose: flavonol 3-O-glucosyltransferase (UFGT) during fruit ripening differed between red and yellow fruit. In addition, we identified some transcription factor families (such as MYB, bHLH and WD40) that may control anthocyanin biosynthesis. We confirmed the altered expression levels of eighteen unigenes that encode anthocyanin biosynthetic enzymes and transcription factors using quantitative real-time PCR (qRT-PCR).
The obtained sweet cherry transcriptome and DGE profiling data provide comprehensive gene expression information that lends insights into the molecular mechanisms underlying anthocyanin biosynthesis. These results will provide a platform for further functional genomic research on this fruit crop.
To understand the regulation mechanism of eyestalk ablation on the activities of hepatopancreas, Illumina RNA-Seq and digital gene expression (DGE) analyses were performed to investigate the transcriptome of the eyestalk, Y-organ, and hepatopancreas of E. sinensis and to identify the genes associated with the hepatopancreas metabolism that are differentially expressed under eyestalk ablation conditions.
A total of 58,582 unigenes were constructed from 157,168 contigs with SOAPdenovo. A BlastX search against the NCBI Nr database identified 21,678 unigenes with an E-value higher than 10−5. Using the BLAST2Go and BlastAll software programs, 6,883 unigenes (11.75% of the total) were annotated to the Gene Ontology (GO) database, 7,386 (12.6%) unigenes were classified into 25 Clusters of Orthologous Groups of Proteins (COGs), 16,200 (27.7%) unigenes were assigned to 242 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and1,846 unigenes were matched to “metabolism pathways”. The DGE analysis revealed that 1,416 unigenes were significantly differentially expressed in the hepatopancreas, of which 890 unigenes were up-regulated and 526 unigenes were down-regulated. Of the differentially expressed genes, 382 unigenes were annotated and 63 were classified into metabolism pathways. The results of the real-time polymerase chain reaction (PCR) analysis of four unigenes related to carbohydrate metabolism were consistent with those obtained from the DGE analysis, which demonstrates that the sequencing data were satisfactory for further gene expression analyses.
This paper reported the transcriptom of the eyestalk, Y-organ, and hepatopancreas from E. sinensis. DGE analysis provided the different expressed genes of the metabolism processes in hepatopancreas that are affected by eyestalk ablation. These findings will facilitate further investigations on the mechanisms of the metabolism of organic substances during development and reproduction in crustaceans.
The genomes of three major mosquito vectors of human diseases, Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been previously sequenced. C. p. quinquefasciatus has the largest number of predicted protein-coding genes, which partially results from the expansion of three detoxification gene families: cytochrome P450 monooxygenases (P450), glutathione S-transferases (GST), and carboxyl/cholinesterases (CCE). However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, C. p. quinquefasciatus has limited transcriptomic resources. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes. In the present study, the three detoxification gene families of C. p. quinquefasciatus were analyzed for phylogenetic classification and compared with those of three other dipteran insects. Gene expression during various developmental stages and the differential expression responsible for parathion resistance were profiled using the digital gene expression (DGE) technique.
A total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 cytosolic GST genes. Compared with three other dipteran species, gene expansion in Culex mainly occurred in the CCE and P450 families, where the genes of α-esterases, juvenile hormone esterases, and CYP325 of the CYP4 subfamily showed the most pronounced expansion on the genome. For the five DGE libraries, 3.5-3.8 million raw tags were generated and mapped to 13314 reference genes. Among 302 detoxification genes, 225 (75%) were detected for expression in at least one DGE library. One fourth of the CCE and P450 genes were detected uniquely in one stage, indicating potential developmentally regulated expression. A total of 1511 genes showed different expression levels between a parathion-resistant and a susceptible strain. Fifteen detoxification genes, including 2 CCEs, 6 GSTs, and 7 P450s, were expressed at higher levels in the resistant strain.
The results of the present study provide new insights into the functions and evolution of three detoxification gene families in mosquitoes and comprehensive transcriptomic resources for C. p. quinquefasciatus, which will facilitate the elucidation of molecular mechanisms underlying the different biological characteristics of the three major mosquito vectors.
Carboxyl/cholinesterases; Cytochrome P450 monooxygenases; Glutathione S-transferases; Insecticide resistance; Gene expansion; Gene expression
Sweet potato (Ipomoea batatas L. [Lam.]) ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages.
Illumina paired-end (PE) RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥100 bp), which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE) tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified.
The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots, tissue-specific gene expression, potential biotic and abiotic stress response in sweet potato.
The fertile and sterile plants were derived from the self-pollinated offspring of the F1 hybrid between the novel restorer line NR1 and the Nsa CMS line in Brassica napus. To elucidate gene expression and regulation caused by the A and C subgenomes of B. napus, as well as the alien chromosome and cytoplasm from Sinapis arvensis during the development of young floral buds, we performed a genome-wide high-throughput transcriptomic sequencing for young floral buds of sterile and fertile plants.
In this study, equal amounts of total RNAs taken from young floral buds of sterile and fertile plants were sequenced using the Illumina/Solexa platform. After filtered out low quality data, a total of 2,760,574 and 2,714,441 clean tags were remained in the two libraries, from which 242,163 (Ste) and 253,507 (Fer) distinct tags were obtained. All distinct sequencing tags were annotated using all possible CATG+17-nt sequences of the genome and transcriptome of Brassica rapa and those of Brassica oleracea as the reference sequences, respectively. In total, 3231 genes of B. rapa and 3371 genes of B. oleracea were detected with significant differential expression levels. GO and pathway-based analyses were performed to determine and further to understand the biological functions of those differentially expressed genes (DEGs). In addition, there were 1089 specially expressed unknown tags in Fer, which were neither mapped to B. oleracea nor to B. rapa, and these unique tags were presumed to arise basically from the added alien chromosome of S. arvensis. Fifteen genes were randomly selected and their expression levels were confirmed by quantitative RT-PCR, and fourteen of them showed consistent expression patterns with the digital gene expression (DGE) data.
A number of genes were differentially expressed between the young floral buds of sterile and fertile plants. Some of these genes may be candidates for future research on CMS in Nsa line, fertility restoration and improved agronomic traits in NR1 line. Further study of the unknown tags which were specifically expressed in Fer will help to explore desirable agronomic traits from wild species.