Sweet potato [
Ipomoea batatas L. (Lam.)], which belongs to the
Ipomoea genus of the
Convolvulaceae family, is widely grown around the world due to its strong adaptability, high and stable yield, rich nutrient content, low input requirement, easy to manage and multiple uses
[1]–
[3]. It has the highest energy yields per unit area per unit time among many plants, and is the sixth most important food crop in terms of production in the world
[4],
[5]. More than 105 million metric tons are produced globally each year, 95% of which are grown in developing countries
[5]. China usually accounts for 70% and 85% of total area and yield of the world, respectively
[6]. Sweet potato is a genetically challenging hexaploid (2n

=

6x

=

90) plant with a genome size between 2,200 to 3,000 Mbp and can hardly be considered as a model species for studies
[6]–
[8]. Comparing with other main crops or model organisms, the genomic resources for this crop are deficient until 2010 due to its complex hexaploid genome
[8]. Therefore, genomic data sources for sweet potato were eagerly needed for gene discovery and functional studies.
High-throughput transcriptome sequencing and digital gene expression (DGE) tag profiling are efficient and economic choice for characterizing non-model organisms without a reference genome
[9],
[10]. Until most recently, two sweet potato transcriptomes were sequenced by the International Potato Center (CIP) and the Guangdong Academy of Agricultural Sciences of China using the Roche-454 pyrosequencing technology
[11] and the Illumina/Solexa RNA-Seq technology
[12], respectively. The former used leaf and stem of an African landrace, and the latter used only the roots of a new edible variety as materials. Although the former released a sweet potato gene index, it was assembled with the existing ESTs, and the sequences mainly contained short open reading frames (ORFs)
[11]. The latter only delivered the original raw reads and reported the assembled scaffolds and unigenes which were assembled with SOAPdenovo
[12]. At about the same time, we obtained RNA-Seq data from another sweet potato cultivar ‘Xushu 18’, which was released in 1972, but is still the leading variety in China both in annual hectarage and in total root production
[13], for transcriptome studies. However, we found that there were many artifacts and defects in the assembled transcriptome data provided by the commercial assembler service. A lot of the assembled unigenes could not be read through and/or were homologous to more than one gene. Therefore, we reassembled the reads to enhance the gene accuracy and coverage, and to get a comprehensive and integrated description of the transcriptome and gene expression patterns of sweet potato.
Despite the fast development of assemblers that are able to efficiently handle more reads, transcriptome assembly is still difficult
[9],
[14]. The quality of a
de novo transcriptome assembly is highly dependent on the user-defined sequence overlap length
[9]. Different assemblers have different applicability and performance
[15],
[16]. Researchers usually chose only one assembler to assembly a transcriptome
[11],
[12],
[17]–
[19]. However, new assembly strategies such as merging the contigs of multiple assemblies
[20],
[21] and trimming of low-quality bases at the end of reads
[22] can give better assembly results
[9]. We assume that trimming of bases at the 3′-end of reads with different lengths and assembling with different assemblers, and then merging the assemblies with CAP3 can improve the
de novo assembly.
In the present study, in order to establish a useful database of transcriptome sequence as well as of differentially expressed genes in different tissues and at different developmental stages of sweet potato, we performed de novo transcriptome sequencing and DGE tag profiling using the Illumina next-generation sequencing (NGS) platform Genome Analyzer II (GAII). This platform generated over 3.6 billion base pairs of DNA sequences from RNA-Seq and an average of 3.7 million tags of seven tissues from DGE sequencing. We used a combined de novo transcriptome assembly strategy and obtained a comprehensive and integrated transcriptomic resource with 51,736 annotated transcripts and 147 associated Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of sweet potato. Furthermore, we compared the gene expression profiles of seven tissues using DGE system and the assembled transcriptome, and identified numerous differentially and specifically expressed transcripts in different tissues and at different developmental stages of roots. This represents a fully characterized sweet potato transcriptome among tissues and developmental stages through RNA-Seq. Our data should promote the understanding of the molecular mechanisms of cellular metabolism, and it is a valuable resource for genetic and genomic studies on sweet potato in the future.