1.  trieFinder: an efficient program for annotating Digital Gene Expression (DGE) tags 
BMC Bioinformatics  2014;15(1):329.
Quantification of a transcriptional profile is a useful way to evaluate the activity of a cell at a given point in time. Although RNA-Seq has revolutionized transcriptional profiling, the costs of RNA-Seq are still significantly higher than microarrays, and often the depth of data delivered from RNA-Seq is in excess of what is needed for simple transcript quantification. Digital Gene Expression (DGE) is a cost-effective, sequence-based approach for simple transcript quantification: by sequencing one read per molecule of RNA, this technique can be used to efficiently count transcripts while obviating the need for transcript-length normalization and reducing the total numbers of reads necessary for accurate quantification. Here, we present trieFinder, a program specifically designed to rapidly map, parse, and annotate DGE tags of various lengths against cDNA and/or genomic sequence databases.
The trieFinder algorithm maps DGE tags in a two-step process. First, it scans FASTA files of RefSeq, UniGene, and genomic DNA sequences to create a database of all tags that can be derived from a predefined restriction site. Next, it compares the experimental DGE tags to this tag database, taking advantage of the fact that the tags are stored as a prefix tree, or “trie”, which allows for linear-time searches for exact matches. DGE tags with mismatches are analyzed by recursive calls in the data structure. We find that, in terms of alignment speed, the mapping functionality of trieFinder compares favorably with Bowtie.
trieFinder can quickly provide the user an annotation of the DGE tags from three sources simultaneously, simplifying transcript quantification and novel transcript detection, delivering the data in a simple parsed format, obviating the need to post-process the alignment results. trieFinder is available at
PMCID: PMC4287429  PMID: 25311246
RNA-Seq; Transcriptional profiling; DGE; SAGE
2.  Transcriptome profiling of fruit development and maturation in Chinese white pear (Pyrus bretschneideri Rehd) 
BMC Genomics  2013;14(1):823.
Pear (Pyrus spp) is an important fruit species worldwide; however, its genetics and genomic information is limited. Combining the Solexa/Illumina RNA-seq high-throughput sequencing approach (RNA-seq) with Digital Gene Expression (DGE) analysis would be a powerful tool for transcriptomic study. This paper reports the transcriptome profiling analysis of Chinese white pear (P. bretschneideri) using RNA-seq and DGE to better understand the molecular mechanisms in fruit development and maturation of Chinese white pear.
De novo transcriptome assembly and gene expression analysis of Chinese white pear were performed in an unprecedented depth (5.47 gigabase pairs) using high-throughput Illumina RNA-seq combined with a tag-based Digital Gene Expression (DGE) system. Approximately, 60.77 million reads were sequenced, trimmed, and assembled into 90,227 unigenes. These unigenes comprised 17,619 contigs and 72,608 singletons with an average length of 508 bp and had an N50 of 635 bp. Sequence similarity analyses against six public databases (Uniprot, NR, and COGs at NCBI, Pfam, InterPro, and KEGG) found that 61,636 unigenes can be annotated with gene descriptions, conserved protein domains, or gene ontology terms. By BLASTing all 61,636 unigenes in KEGG, a total of 31,215 unigenes were annotated into 121 known metabolic or signaling pathways in which a few primary, intermediate, and secondary metabolic pathways are directly related to pear fruit quality. DGE libraries were constructed for each of the five fruit developmental stages. Variations in gene expression among all developmental stages of pear fruit were significantly different in a large amount of unigenes.
Extensive transcriptome and DGE profiling data at five fruit developmental stages of Chinese white pear have been obtained from a deep sequencing, which provides comprehensive gene expression information at the transcriptional level. This could facilitate understanding of the molecular mechanisms in fruit development and maturation. Such a database can also be used as a public information platform for research on molecular biology and functional genomics in pear and other related species.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-823) contains supplementary material, which is available to authorized users.
PMCID: PMC4046828  PMID: 24267665
3.  Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome 
Nucleic Acids Research  2013;42(5):2820-2832.
Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at
PMCID: PMC3950697  PMID: 24357408
4.  Transcriptome Analysis of the Brown Planthopper Nilaparvata lugens 
PLoS ONE  2010;5(12):e14233.
The brown planthopper (BPH) Nilaparvata lugens (Stål) is one of the most serious insect pests of rice in Asia. However, little is known about the mechanisms responsible for the development, wing dimorphism and sex difference in this species. Genomic information for BPH is currently unavailable, and, therefore, transcriptome and expression profiling data for this species are needed as an important resource to better understand the biological mechanisms of BPH.
Methodology/Principal Findings
In this study, we performed de novo transcriptome assembly and gene expression analysis using short-read sequencing technology (Illumina) combined with a tag-based digital gene expression (DGE) system. The transcriptome analysis assembles the gene information for different developmental stages, sexes and wing forms of BPH. In addition, we constructed six DGE libraries: eggs, second instar nymphs, fifth instar nymphs, brachypterous female adults, macropterous female adults and macropterous male adults. Illumina sequencing revealed 85,526 unigenes, including 13,102 clusters and 72,424 singletons. Transcriptome sequences larger than 350 bp were subjected to Gene Orthology (GO) and KEGG Orthology (KO) annotations. To analyze the DGE profiling, we mainly compared the gene expression variations between eggs and second instar nymphs; second and fifth instar nymphs; fifth instar nymphs and three types of adults; brachypterous and macropterous female adults as well as macropterous female and male adults. Thousands of genes showed significantly different expression levels based on the various comparisons. And we randomly selected some genes to confirm their altered expression levels by quantitative real-time PCR (qRT-PCR).
The obtained BPH transcriptome and DGE profiling data provide comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms from various physiological aspects including development, wing dimorphism and sex difference in BPH.
PMCID: PMC2997790  PMID: 21151909
5.  Transcript profiling reveals expression differences in wild-type and glabrous soybean lines 
BMC Plant Biology  2011;11:145.
Trichome hairs affect diverse agronomic characters such as seed weight and yield, prevent insect damage and reduce loss of water but their molecular control has not been extensively studied in soybean. Several detailed models for trichome development have been proposed for Arabidopsis thaliana, but their applicability to important crops such as cotton and soybean is not fully known.
Two high throughput transcript sequencing methods, Digital Gene Expression (DGE) Tag Profiling and RNA-Seq, were used to compare the transcriptional profiles in wild-type (cv. Clark standard, CS) and a mutant (cv. Clark glabrous, i.e., trichomeless or hairless, CG) soybean isoline that carries the dominant P1 allele. DGE data and RNA-Seq data were mapped to the cDNAs (Glyma models) predicted from the reference soybean genome, Williams 82. Extending the model length by 250 bp at both ends resulted in significantly more matches of authentic DGE tags indicating that many of the predicted gene models are prematurely truncated at the 5' and 3' UTRs. The genome-wide comparative study of the transcript profiles of the wild-type versus mutant line revealed a number of differentially expressed genes. One highly-expressed gene, Glyma04g35130, in wild-type soybean was of interest as it has high homology to the cotton gene GhRDL1 gene that has been identified as being involved in cotton fiber initiation and is a member of the BURP protein family. Sequence comparison of Glyma04g35130 among Williams 82 with our sequences derived from CS and CG isolines revealed various SNPs and indels including addition of one nucleotide C in the CG and insertion of ~60 bp in the third exon of CS that causes a frameshift mutation and premature truncation of peptides in both lines as compared to Williams 82.
Although not a candidate for the P1 locus, a BURP family member (Glyma04g35130) from soybean has been shown to be abundantly expressed in the CS line and very weakly expressed in the glabrous CG line. RNA-Seq and DGE data are compared and provide experimental data on the expression of predicted soybean gene models as well as an overview of the genes expressed in young shoot tips of two closely related isolines.
PMCID: PMC3217893  PMID: 22029708
6.  De Novo Characterization of Japanese Scallop Mizuhopecten yessoensis Transcriptome and Analysis of Its Gene Expression following Cadmium Exposure 
PLoS ONE  2013;8(5):e64485.
Japanese scallop has been cultured on a large-scale in China for many years. However, serious marine pollution in recent years has resulted in considerable loss to this industry. Moreover, due to the lack of genomic resources, limited research has been carried out on this species. To facilitate the understanding at molecular level immune and stress response mechanism, an extensive transcriptomic profiling and digital gene expression (DGE) database of Japanese scallop upon cadmium exposure was carried out using the Illumina sequencing platform.
RNA-seq produced about 112 million sequencing reads from the tissues of adult Japanese scallops. These reads were assembled into 194,839 non-redundant sequences with open reading frame (ORF), of which 14,240 putative amino acid sequences were assigned biological function annotation and were annotated with gene ontology and eukaryotic orthologous group terms. In addition, we identified 720 genes involved in response to stimulus and 302 genes involved in immune-response pathways. Furthermore, we investigated the transcriptomic changes in the gill and digestive gland of Japanese scallops following cadmium exposure using a tag-based DGE system. A total of 7,556 and 3,002 differentially expressed genes were detected, respectively, and functionally annotated with KEGG pathway annotations.
This study provides a comprehensive transcripts sequence resource for the Japanese scallop and presents a survey of gene expression in response to heavy metal exposure in a non-model marine invertebrate via the Illumina sequencing platform. These results may contribute to the in-depth elucidation of the molecular mechanisms involved in bivalve responses to marine pollutants.
PMCID: PMC3669299  PMID: 23741332
7.  Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish 
BMC Genomics  2010;11:472.
Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish.
RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution.
This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals.
PMCID: PMC3091668  PMID: 20707909
8.  Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development 
BMC Genomics  2013;14:279.
Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense.
In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development.
RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid.
PMCID: PMC3639151  PMID: 23617896
Floral development; Flowering time; Digital gene expression; Transcriptome; Cymbidium sinense
9.  Transcriptome Analysis of Chlorantraniliprole Resistance Development in the Diamondback Moth Plutella xylostella 
PLoS ONE  2013;8(8):e72314.
The diamondback moth Plutella xyllostella has developed a high level of resistance to the latest insecticide chlorantraniliprole. A better understanding of P. xylostella’s resistance mechanism to chlorantraniliprole is needed to develop effective approaches for insecticide resistance management.
Principal Findings
To provide a comprehensive insight into the resistance mechanisms of P. xylostella to chlorantraniliprole, transcriptome assembly and tag-based digital gene expression (DGE) system were performed using Illumina HiSeq™ 2000. The transcriptome analysis of the susceptible strain (SS) provided 45,231 unigenes (with the size ranging from 200 bp to 13,799 bp), which would be efficient for analyzing the differences in different chlorantraniliprole-resistant P. xylostella stains. DGE analysis indicated that a total of 1215 genes (189 up-regulated and 1026 down-regulated) were gradient differentially expressed among the susceptible strain (SS) and different chlorantraniliprole-resistant P. xylostella strains, including low-level resistance (GXA), moderate resistance (LZA) and high resistance strains (HZA). A detailed analysis of gradient differentially expressed genes elucidated the existence of a phase-dependent divergence of biological investment at the molecular level. The genes related to insecticide resistance, such as P450, GST, the ryanodine receptor, and connectin, had different expression profiles in the different chlorantraniliprole-resistant DGE libraries, suggesting that the genes related to insecticide resistance are involved in P. xylostella resistance development against chlorantraniliprole. To confirm the results from the DGE, the expressional profiles of 4 genes related to insecticide resistance were further validated by qRT-PCR analysis.
The obtained transcriptome information provides large gene resources available for further studying the resistance development of P. xylostella to pesticides. The DGE data provide comprehensive insights into the gene expression profiles of the different chlorantraniliprole-resistant stains. These genes are specifically related to insecticide resistance, with different expressional profiles facilitating the study of the role of each gene in chlorantraniliprole resistance development.
PMCID: PMC3748044  PMID: 23977278
10.  Global transcriptome profiles of Camellia sinensis during cold acclimation 
BMC Genomics  2013;14:415.
Tea is the most popular non-alcoholic health beverage in the world. The tea plant (Camellia sinensis (L.) O. Kuntze) needs to undergo a cold acclimation process to enhance its freezing tolerance in winter. Changes that occur at the molecular level in response to low temperatures are poorly understood in tea plants. To elucidate the molecular mechanisms of cold acclimation, we employed RNA-Seq and digital gene expression (DGE) technologies to the study of genome-wide expression profiles during cold acclimation in tea plants.
Using the Illumina sequencing platform, we obtained approximately 57.35 million RNA-Seq reads. These reads were assembled into 216,831 transcripts, with an average length of 356 bp and an N50 of 529 bp. In total, 1,770 differentially expressed transcripts were identified, of which 1,168 were up-regulated and 602 down-regulated. These include a group of cold sensor or signal transduction genes, cold-responsive transcription factor genes, plasma membrane stabilization related genes, osmosensing-responsive genes, and detoxification enzyme genes. DGE and quantitative RT-PCR analysis further confirmed the results from RNA-Seq analysis. Pathway analysis indicated that the “carbohydrate metabolism pathway” and the “calcium signaling pathway” might play a vital role in tea plants’ responses to cold stress.
Our study presents a global survey of transcriptome profiles of tea plants in response to low, non-freezing temperatures and yields insights into the molecular mechanisms of tea plants during the cold acclimation process. It could also serve as a valuable resource for relevant research on cold-tolerance and help to explore the cold-related genes in improving the understanding of low-temperature tolerance and plant-environment interactions.
PMCID: PMC3701547  PMID: 23799877
Camellia Sinensis; Cold Acclimation; RNA-Seq; DGE; Genome-wide Expression Profiles; Tea Plants
11.  Differentially-expressed genes in rice infected by Xanthomonas oryzae pv. oryzae relative to a flagellin-deficient mutant reveal potential functions of flagellin in host–pathogen interactions 
Rice  2014;7(1):20.
Plants have evolved a sensitive defense response system that detects and recognizes various pathogen-associated molecular patterns (PAMPs) (e.g. flagellin) and induces immune responses to protect against invasion. Transcriptional responses in rice to PAMPs produced by Xanthomonas oryzae pv. oryzae (Xoo), the bacterial blight pathogen, have not yet been defined.
We characterized transcriptomic responses in rice inoculated with the wildtype (WT) Xoo and flagellin-deficient mutant ∆fliC through RNA-seq analysis. Digital gene expression (DGE) analysis based on Solexa/Illumina sequencing was used to investigate transcriptomic responses in 30 day-old seedlings of rice (Oryza sativa L. cv. Nipponbare). 1,680 genes were differentially-expressed (DEGs) in rice inoculated with WT relative to ∆fliC; among which 1,159 genes were up-regulated and 521 were down-regulated. Expression patterns of 12 randomly-selected DEGs assayed by quantitative real time PCR (qRT-PCR) were similar to those detected by DGE analyses, confirming reliability of the DGE data. Functional annotations revealed the up-regulated DEGs are involved in the cell wall, lipid and secondary metabolism, defense response and hormone signaling, whereas the down-regulated ones are associated with photosynthesis. Moreover, 57 and 21 specifically expressed genes were found after WT and ∆fliC treatments, respectively.
DEGs were identified in rice inoculated with WT Xoo relative to ∆fliC. These genes were predicted to function in multiple biological processes, including the defense response and photosynthesis in rice. This study provided additional insights into molecular basis of rice response to bacterial infection and revealed potential functions of bacterial flagellin in the rice-Xoo interactions.
PMCID: PMC4152760  PMID: 25187853
Rice; Differentially-expressed genes (DEGs); Flagellin; Immune response; Xanthomonas oryzae pv. oryzae
12.  Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows 
PLoS ONE  2011;6(7):e22953.
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
PMCID: PMC3146525  PMID: 21829563
13.  Transcriptome Analysis of the Asian Honey Bee Apis cerana cerana 
PLoS ONE  2012;7(10):e47954.
The Eastern hive honey bee, Apis cerana cerana is a native and widely bred honey bee species in China. Molecular biology research about this honey bee species is scarce, and genomic information for A. c. cerana is not currently available. Transcriptome and expression profiling data for this species are therefore important resources needed to better understand the biological mechanisms of A. c. cerana. In this study, we obtained the transcriptome information of A. c. cerana by RNA-sequencing and compared gene expression differences between queens and workers of A. c. cerana by digital gene expression (DGE) analysis.
Using high-throughput Illumina RNA sequencing we obtained 51,581,510 clean reads corresponding to 4.64 Gb total nucleotides from a single run. These reads were assembled into 46,999 unigenes with a mean length of 676 bp. Based on a sequence similarity search against the five public databases (NR, Swissport, GO, COG, KEGG) with a cut-off E-value of 10−5 using BLASTX, a total of 24,630 unigenes were annotated with gene descriptions, gene ontology terms, or metabolic pathways. Using these transcriptome data as references we analyzed the gene expression differences between the queens and workers of A. c. cerana using a tag-based digital gene expression method. We obtained 5.96 and 5.66 million clean tags from the queen and worker samples, respectively. A total of 414 genes were differentially expressed between them, with 189 up-regulated and 225 down-regulated in queens.
Our transcriptome data provide a comprehensive sequence resource for future A. c. cerana study, establishing an important public information platform for functional genomic studies in A. c. cerana. Furthermore, the DGE data provide comprehensive gene expression information for the queens and workers, which will facilitate our understanding of the molecular mechanisms of the different physiological aspects of the two castes.
PMCID: PMC3480438  PMID: 23112877
14.  Gene Expression Profiling in Winged and Wingless Cotton Aphids, Aphis gossypii (Hemiptera: Aphididae) 
While trade-offs between flight capability and reproduction is a common phenomenon in wing dimorphic insects, the molecular basis is largely unknown. In this study, we examined the transcriptomic differences between winged and wingless morphs of cotton aphids, Aphis gossypii, using a tag-based digital gene expression (DGE) approach. Ultra high-throughput Illumina sequencing generated 5.30 and 5.39 million raw tags, respectively, from winged and wingless A. gossypii DGE libraries. We identified 1,663 differentially expressed transcripts, among which 58 were highly expressed in the winged A. gossypii, whereas 1,605 expressed significantly higher in the wingless morphs. Bioinformatics tools, including Gene Ontology, Cluster of Orthologous Groups, euKaryotic Orthologous Groups and Kyoto Encyclopedia of Genes and Genomes pathways, were used to functionally annotate these transcripts. In addition, 20 differentially expressed transcripts detected by DGE were validated by the quantitative real-time PCR. Comparative transcriptomic analysis of sedentary (wingless) and migratory (winged) A. gossyii not only advances our understanding of the trade-offs in wing dimorphic insects, but also provides a candidate molecular target for the genetic control of this agricultural insect pest.
PMCID: PMC3957081  PMID: 24644424
Aphis gossypii; trade-off; migration; digital gene expression; wing polyphenism.
15.  Global Transcriptome Analysis of Orange Wheat Blossom Midge, Sitodiplosis mosellana (Gehin) (Diptera: Cecidomyiidae) to Identify Candidate Transcripts Regulating Diapause 
PLoS ONE  2013;8(8):e71564.
Many insects enter a developmental arrest (diapause) that allows them to survive harsh seasonal conditions. Despite the well-established ecological significance of diapause, the molecular basis of this crucial adaptation remains largely unresolved. Sitodiplosis mosellana (Gehin), the orange wheat blossom midge (OWBM), causes serious damage to wheat throughout the northern hemisphere, and sporadic outbreaks occur in the world. Traits related to diapause appear to be important factors contributing to their rapid spread and outbreak. To better understand the diapause mechanisms of OWBM, we sequenced the transcriptome and determined the gene expression profile of this species.
Methodology/Principal Findings
In this study, we performed de novo transcriptome analysis using short-read sequencing technology (Illumina) and gene expression analysis with a tag-based digital gene expression (DGE) system. The sequencing results generated 89,117 contigs, and 45,713 unigenes. These unigenes were annotated by Blastx alignment against the NCBI non-redundant (nr), Clusters of orthologous groups (COG), gene orthology (GO), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. 20,802 unigenes (45.5% of the total) matched with protein in the NCBI nr database. Two digital gene expression (DGE) libraries were constructed to determine differences in gene expression profiles during diapause and non-diapause developmental stages. Genes related to diapause were analyzed in detail and in addition, nine diapause-related genes were analyzed by real time PCR.
The OWBM transcriptome greatly improves our genetic understanding and provides a platform for functional genomics research of this species. The DGE profiling data provides comprehensive information at the transcriptional level that facilitates our understanding of the molecular mechanisms of various physiological aspects including development and diapause stages in OWBM. From this study it is evident that various genes coding metabolic enzymes are crucial for diapause and metamorphosis.
PMCID: PMC3733836  PMID: 23940768
16.  Global Gene Expression Analysis of the Zoonotic Parasite Trichinella spiralis Revealed Novel Genes in Host Parasite Interaction 
Trichinellosis is a typical food-borne zoonotic disease which is epidemic worldwide and the nematode Trichinella spiralis is the main pathogen. The life cycle of T. spiralis contains three developmental stages, i.e. adult worms, new borne larva (new borne L1 larva) and muscular larva (infective L1 larva). Stage-specific gene expression in the parasites has been investigated with various immunological and cDNA cloning approaches, whereas the genome-wide transcriptome and expression features of the parasite have been largely unknown. The availability of the genome sequence information of T. spiralis has made it possible to deeply dissect parasite biology in association with global gene expression and pathogenesis.
Methodology and Principal Findings
In this study, we analyzed the global gene expression patterns in the three developmental stages of T. spiralis using digital gene expression (DGE) analysis. Almost 15 million sequence tags were generated with the Illumina RNA-seq technology, producing expression data for more than 9,000 genes, covering 65% of the genome. The transcriptome analysis revealed thousands of differentially expressed genes within the genome, and importantly, a panel of genes encoding functional proteins associated with parasite invasion and immuno-modulation were identified. More than 45% of the genes were found to be transcribed from both strands, indicating the importance of RNA-mediated gene regulation in the development of the parasite. Further, based on gene ontological analysis, over 3000 genes were functionally categorized and biological pathways in the three life cycle stage were elucidated.
Conclusions and Significance
The global transcriptome of T. spiralis in three developmental stages has been profiled, and most gene activity in the genome was found to be developmentally regulated. Many metabolic and biological pathways have been revealed. The findings of the differential expression of several protein families facilitate understanding of the molecular mechanisms of parasite biology and the pathological aspects of trichinellosis.
Author Summary
Trichinellosis of human and other mammals was caused through the ingestion of the parasite Trichinella sparilis in contaminated meat. It is a typical zoonotic disease that affects more than 10 million people world-wide. Parasites of the genus Trichinella are unique intracellular pathogens. Adult Trichinella parasites directly release newborn larvae which invade striated muscle cells and causes diseases. In this study, we profiled the global transcriptome in the three developmental stages of T. spiralis. The transcriptomic analysis revealed the global gene expression patterns from newborn larval stage through muscle larval stage to adults. Thousands of genes with stage-specific transcriptional patterns were described and novel genes involving host-parasite interaction were identified. More than 45% of the protein-coding genes showed evidence of transcription from both sense and antisense strands which suggests the importance of RNA-mediated gene regulation in the parasite. This study presents a first deep analysis of the transcriptome of T. spiralis, providing insight information of the parasite biology.
PMCID: PMC3429391  PMID: 22953016
17.  Changes in the Organics Metabolism in the Hepatopancreas Induced by Eyestalk Ablation of the Chinese Mitten Crab Eriocheir sinensis Determined via Transcriptome and DGE Analysis 
PLoS ONE  2014;9(4):e95827.
To understand the regulation mechanism of eyestalk ablation on the activities of hepatopancreas, Illumina RNA-Seq and digital gene expression (DGE) analyses were performed to investigate the transcriptome of the eyestalk, Y-organ, and hepatopancreas of E. sinensis and to identify the genes associated with the hepatopancreas metabolism that are differentially expressed under eyestalk ablation conditions.
A total of 58,582 unigenes were constructed from 157,168 contigs with SOAPdenovo. A BlastX search against the NCBI Nr database identified 21,678 unigenes with an E-value higher than 10−5. Using the BLAST2Go and BlastAll software programs, 6,883 unigenes (11.75% of the total) were annotated to the Gene Ontology (GO) database, 7,386 (12.6%) unigenes were classified into 25 Clusters of Orthologous Groups of Proteins (COGs), 16,200 (27.7%) unigenes were assigned to 242 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and1,846 unigenes were matched to “metabolism pathways”. The DGE analysis revealed that 1,416 unigenes were significantly differentially expressed in the hepatopancreas, of which 890 unigenes were up-regulated and 526 unigenes were down-regulated. Of the differentially expressed genes, 382 unigenes were annotated and 63 were classified into metabolism pathways. The results of the real-time polymerase chain reaction (PCR) analysis of four unigenes related to carbohydrate metabolism were consistent with those obtained from the DGE analysis, which demonstrates that the sequencing data were satisfactory for further gene expression analyses.
This paper reported the transcriptom of the eyestalk, Y-organ, and hepatopancreas from E. sinensis. DGE analysis provided the different expressed genes of the metabolism processes in hepatopancreas that are affected by eyestalk ablation. These findings will facilitate further investigations on the mechanisms of the metabolism of organic substances during development and reproduction in crustaceans.
PMCID: PMC3995808  PMID: 24755618
18.  Global Analysis of Transcriptome Responses and Gene Expression Profiles to Cold Stress of Jatropha curcas L. 
PLoS ONE  2013;8(12):e82817.
Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas.
In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis.
This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas.
PMCID: PMC3857291  PMID: 24349370
19.  EXPRSS: an Illumina based high-throughput expression-profiling method to reveal transcriptional dynamics 
BMC Genomics  2014;15(1):341.
Next Generation Sequencing technologies have facilitated differential gene expression analysis through RNA-seq and Tag-seq methods. RNA-seq has biases associated with transcript lengths, lacks uniform coverage of regions in mRNA and requires 10–20 times more reads than a typical Tag-seq. Most existing Tag-seq methods either have biases or not high throughput due to use of restriction enzymes or enzymatic manipulation of 5’ ends of mRNA or use of RNA ligations.
We have developed EXpression Profiling through Randomly Sheared cDNA tag Sequencing (EXPRSS) that employs acoustic waves to randomly shear cDNA and generate sequence tags at a relatively defined position (~150-200 bp) from the 3′ end of each mRNA. Implementation of the method was verified through comparative analysis of expression data generated from EXPRSS, NlaIII-DGE and Affymetrix microarray and through qPCR quantification of selected genes. EXPRSS is a strand specific and restriction enzyme independent tag sequencing method that does not require cDNA length-based data transformations. EXPRSS is highly reproducible, is high-throughput and it also reveals alternative polyadenylation and polyadenylated antisense transcripts. It is cost-effective using barcoded multiplexing, avoids the biases of existing SAGE and derivative methods and can reveal polyadenylation position from paired-end sequencing.
EXPRSS Tag-seq provides sensitive and reliable gene expression data and enables high-throughput expression profiling with relatively simple downstream analysis.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-341) contains supplementary material, which is available to authorized users.
PMCID: PMC4035070  PMID: 24884414
Next generation sequencing; Tag-seq; High throughput expression profiling; RNA-seq; EXPRSS
20.  Transcriptomic analysis of ‘Suli’ pear (Pyrus pyrifolia white pear group) buds during the dormancy by RNA-Seq 
BMC Genomics  2012;13:700.
Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy.
We performed de novo transcriptome assembly and digital gene expression (DGE) profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group) using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp), including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1%) unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR.
The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.
PMCID: PMC3562153  PMID: 23234335
‘Suli’ pear (Pyrus pyrifolia white pear group); Transcriptome; Bud dormancy; RNA-Seq
21.  Identification of Salt-Stress-Induced Genes from the RNA-Seq Data of Reaumuria trigyna Using Differential-Display Reverse Transcription PCR 
Next generation sequencing (NGS) technologies have been used to generate huge amounts of sequencing data from many organisms. However, the correct choice of candidate genes and prevention of false-positive results computed from digital gene expression (DGE) of RNA-seq data are vital when using these genetic resources. We indirectly identified 18 salt-stress-induced Reaumuria trigyna transcripts from the transcriptome sequencing data using differential-display reverse transcription PCR (DDRT-PCR) combined with local BLAST searches. Highly consistent with the DGE results, the quantitative real-time PCR expression patterns of these transcripts showed strong upregulation by salt stress, suggesting that these genes may play important roles in R. trigyna's survival under high-salt environments. The method presented here successfully identified responsive genes from the massive amount of RNA-seq data. Thus, we suggest that DDRT-PCR could be employed to mine NGS data in a wide range of applications in transcriptomic studies. In addition, the genes identified in the present study are promising candidates for further elucidation of the salt tolerance mechanisms in R. trigyna.
PMCID: PMC4322826
22.  Transcriptomic and phylogenetic analysis of Culex pipiens quinquefasciatus for three detoxification gene families 
BMC Genomics  2012;13:609.
The genomes of three major mosquito vectors of human diseases, Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been previously sequenced. C. p. quinquefasciatus has the largest number of predicted protein-coding genes, which partially results from the expansion of three detoxification gene families: cytochrome P450 monooxygenases (P450), glutathione S-transferases (GST), and carboxyl/cholinesterases (CCE). However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, C. p. quinquefasciatus has limited transcriptomic resources. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes. In the present study, the three detoxification gene families of C. p. quinquefasciatus were analyzed for phylogenetic classification and compared with those of three other dipteran insects. Gene expression during various developmental stages and the differential expression responsible for parathion resistance were profiled using the digital gene expression (DGE) technique.
A total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 cytosolic GST genes. Compared with three other dipteran species, gene expansion in Culex mainly occurred in the CCE and P450 families, where the genes of α-esterases, juvenile hormone esterases, and CYP325 of the CYP4 subfamily showed the most pronounced expansion on the genome. For the five DGE libraries, 3.5-3.8 million raw tags were generated and mapped to 13314 reference genes. Among 302 detoxification genes, 225 (75%) were detected for expression in at least one DGE library. One fourth of the CCE and P450 genes were detected uniquely in one stage, indicating potential developmentally regulated expression. A total of 1511 genes showed different expression levels between a parathion-resistant and a susceptible strain. Fifteen detoxification genes, including 2 CCEs, 6 GSTs, and 7 P450s, were expressed at higher levels in the resistant strain.
The results of the present study provide new insights into the functions and evolution of three detoxification gene families in mosquitoes and comprehensive transcriptomic resources for C. p. quinquefasciatus, which will facilitate the elucidation of molecular mechanisms underlying the different biological characteristics of the three major mosquito vectors.
PMCID: PMC3505183  PMID: 23140097
Carboxyl/cholinesterases; Cytochrome P450 monooxygenases; Glutathione S-transferases; Insecticide resistance; Gene expansion; Gene expression
23.  Transcriptome and Gene Expression Analysis of the Rice Leaf Folder, Cnaphalocrosis medinalis 
PLoS ONE  2012;7(11):e47401.
The rice leaf folder (RLF), Cnaphalocrocis medinalis (Guenee) (Lepidoptera: Pyralidae), is one of the most destructive pests affecting rice in Asia. Although several studies have been performed on the ecological and physiological aspects of this species, the molecular mechanisms underlying its developmental regulation, behavior, and insecticide resistance remain largely unknown. Presently, there is a lack of genomic information for RLF; therefore, studies aimed at profiling the RLF transcriptome expression would provide a better understanding of its biological function at the molecular level.
Principal Findings
De novo assembly of the RLF transcriptome was performed via the short read sequencing technology (Illumina). In a single run, we produced more than 23 million sequencing reads that were assembled into 44,941 unigenes (mean size = 474 bp) by Trinity. Through a similarity search, 25,281 (56.82%) unigenes matched known proteins in the NCBI Nr protein database. The transcriptome sequences were annotated with gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). Additionally, we profiled gene expression during RLF development using a tag-based digital gene expression (DGE) system. Five DGE libraries were constructed, and variations in gene expression were compared between collected samples: eggs vs. 3rd instar larvae, 3rd instar larvae vs. pupae, pupae vs. adults. The results demonstrated that thousands of genes were significantly differentially expressed during various developmental stages. A number of the differentially expressed genes were confirmed by quantitative real-time PCR (qRT-PCR).
The RLF transcriptome and DGE data provide a comprehensive and global gene expression profile that would further promote our understanding of the molecular mechanisms underlying various biological characteristics, including development, elevated fecundity, flight, sex differentiation, olfactory behavior, and insecticide resistance in RLF. Therefore, these findings could help elucidate the intrinsic factors involved in the RLF-mediated destruction of rice and offer sustainable insect pest management.
PMCID: PMC3501527  PMID: 23185238
24.  Second-Generation Sequencing Supply an Effective Way to Screen RNAi Targets in Large Scale for Potential Application in Pest Insect Control 
PLoS ONE  2011;6(4):e18644.
The key of RNAi approach success for potential insect pest control is mainly dependent on careful target selection and a convenient delivery system. We adopted second-generation sequencing technology to screen RNAi targets. Illumina's RNA-seq and digital gene expression tag profile (DGE-tag) technologies were used to screen optimal RNAi targets from Ostrinia furnalalis. Total 14690 stage specific genes were obtained which can be considered as potential targets, and 47 were confirmed by qRT-PCR. Ten larval stage specific expression genes were selected for RNAi test. When 50 ng/µl dsRNAs of the genes DS10 and DS28 were directly sprayed on the newly hatched larvae which placed on the filter paper, the larval mortalities were around 40∼50%, while the dsRNAs of ten genes were sprayed on the larvae along with artificial diet, the mortalities reached 73% to 100% at 5 d after treatment. The qRT-PCR analysis verified the correlation between larval mortality and the down-regulation of the target gene expression. Topically applied fluorescent dsRNA confirmed that dsRNA did penetrate the body wall and circulate in the body cavity. It seems likely that the combination of DGE-tag with RNA-seq is a rapid, high-throughput, cost less and an easy way to select the candidate target genes for RNAi. More importantly, it demonstrated that dsRNAs are able to penetrate the integument and cause larval developmental stunt and/or death in a lepidopteron insect. This finding largely broadens the target selection for RNAi from just gut-specific genes to the targets in whole insects and may lead to new strategies for designing RNAi-based technology against insect damage.
PMCID: PMC3073972  PMID: 21494551
25.  Transcriptome profile analysis of young floral buds of fertile and sterile plants from the self-pollinated offspring of the hybrid between novel restorer line NR1 and Nsa CMS line in Brassica napus 
BMC Genomics  2013;14:26.
The fertile and sterile plants were derived from the self-pollinated offspring of the F1 hybrid between the novel restorer line NR1 and the Nsa CMS line in Brassica napus. To elucidate gene expression and regulation caused by the A and C subgenomes of B. napus, as well as the alien chromosome and cytoplasm from Sinapis arvensis during the development of young floral buds, we performed a genome-wide high-throughput transcriptomic sequencing for young floral buds of sterile and fertile plants.
In this study, equal amounts of total RNAs taken from young floral buds of sterile and fertile plants were sequenced using the Illumina/Solexa platform. After filtered out low quality data, a total of 2,760,574 and 2,714,441 clean tags were remained in the two libraries, from which 242,163 (Ste) and 253,507 (Fer) distinct tags were obtained. All distinct sequencing tags were annotated using all possible CATG+17-nt sequences of the genome and transcriptome of Brassica rapa and those of Brassica oleracea as the reference sequences, respectively. In total, 3231 genes of B. rapa and 3371 genes of B. oleracea were detected with significant differential expression levels. GO and pathway-based analyses were performed to determine and further to understand the biological functions of those differentially expressed genes (DEGs). In addition, there were 1089 specially expressed unknown tags in Fer, which were neither mapped to B. oleracea nor to B. rapa, and these unique tags were presumed to arise basically from the added alien chromosome of S. arvensis. Fifteen genes were randomly selected and their expression levels were confirmed by quantitative RT-PCR, and fourteen of them showed consistent expression patterns with the digital gene expression (DGE) data.
A number of genes were differentially expressed between the young floral buds of sterile and fertile plants. Some of these genes may be candidates for future research on CMS in Nsa line, fertility restoration and improved agronomic traits in NR1 line. Further study of the unknown tags which were specifically expressed in Fer will help to explore desirable agronomic traits from wild species.
PMCID: PMC3556089  PMID: 23324545

