|Home | About | Journals | Submit | Contact Us | Français|
Full-Parasites (http://fullmal.hgc.jp/) is a transcriptome database of apicomplexa parasites, which include Plasmodium and Toxoplasma species. The latest version of Full-Parasites contains a total of 105786 EST sequences from 12 parasites, of which 5925 full-length cDNAs have been completely sequenced. Full-Parasites also contain more than 30 million transcription start sites (TSS) for Plasmodium falciparum (Pf) and Toxoplasma gondii (Tg), which were identified using our novel oligo-capping-based protocol. Various types of cDNA data resources were interconnected with our original database functionalities. Specifically, in this update, we have included two unique RNA-Seq data sets consisting of 730 million mapped RNA-Seq tags. One is a dataset of 16 time-lapse experiments of cultured bradyzoite differentiation for Tg. The other dataset includes 31 clinical samples of Pf. Parasite RNA was extracted together with host human RNA, and the extracted mixed RNA was used for RNA sequencing, with the expectation that gene expression information from the host and parasite would be simultaneously represented. By providing the largest unique full-length cDNA and dynamic transcriptome data, Full-Parasites is useful for understanding host–parasite interactions and will help to eventually elucidate how monophyletic organisms have evolved to become parasites by adopting complex life cycles.
Parasites in the phylum Apicomplexa, which includes Plasmodium (malaria parasites) and Toxoplasma species, cause worldwide health problems that require immediate action. Because genome sequences and gene expression information have great potential to contribute to the understanding of the parasitism of Apicomplexa parasites, which could lead to better therapeutics and diagnostics, intensive international efforts have been made to conduct genome and transcriptome analyses of the parasites. The entire genome sequences of various malaria species, such as Plasmodium falciparum (Pf), P. vivax (Pv) and Toxoplasma gondii (Tg), have been reported (1–3). Additionally, for the transcriptome analysis, cDNA libraries were constructed, and cDNA sequences were analyzed for Pf, P. yoelii (Py), Pv, P. berghei (Pb), Tg, Cryptosporidium parvum (Cp) and Echinococcus multilocularis (Em) (1–4). We constructed a series of full-length cDNA libraries using our oligo-capping method, which selectively captures mRNAs containing a cap structure (5). The obtained full-length cDNA information together with physical cDNA clones have been made publicly available from our database, Full-Parasites (http://fullmal.hgc.jp). Additionally, in the Comparasite sub-database, cDNA information is associated with putative mutually orthologous genes so that comparative genomic studies between different species are possible (6).
Recent massively parallel sequencing technologies, such as the Illumina GA sequencer system (7), have drastically reduced the sequencing cost per base. To complement common analytical methods, we developed several original procedures to utilize massively parallel sequence data in transcriptome analyses. We devised shotgun sequencing and genome-based assembly methods for determining the entire sequences of full-length cDNAs with a cost of less than one dollar per clone (8). We have also developed a method to generate numerous transcription start site (TSS) tags, which are short sequences immediately downstream of TSSs, by combining our oligo-capping method and Illumina GA technology [TSS-Seq; (9)].
In this update, in addition to expanding our full-length cDNA data set generated by the aforementioned methods, we have included two unique RNA-Seq (10) data sets (730 million mapped tags in total). One is the time-course expression profiling of bradyzoite differentiation for Tg. We used mixed RNA for the RNA sequencing, so that gene expression from the host and parasite were simultaneously analyzed (~95% of the RNA was human, and 5% was from parasites). The other RNA-Seq data set includes clinical samples from Pf. We extracted Pf RNA and the host human RNA from peripheral blood (~0.5–5% of RNA was from parasites). We expected that expression changes in Pf and immune responses in humans could be simultaneously monitored by analyzing the generated mixed RNA-Seq tags. In addition, by comparing the RNA sequences, we were able to analyze genetic variation among the Pf samples as well. We believe this type of data, which represents the dynamic nature of the transcriptome, will provide the most biologically relevant information now that many of the basic genetic elements have been identified and catalogued.
With the expanded cDNA contents, the enhanced functionality of the databases, and the new type of dynamic transcriptome data, we believe that the updated Full-Parasites is a useful data resource for understanding host–parasite interactions. We believe that integrative analyses of both causative parasites and host human cells will prove to be crucial for the eventual development of an effective method for preventing infectious diseases. Full-Parasites is accessible at http://fullmal.hgc.jp/.
For the RNA-Seq analysis of in-culture bradyzoite differentiation for Tg, the Tg ME49 strain was cultured in a monolayer of human foreskin fibroblasts (HFF) with Dulbecco's modified Eagle's medium (GIBCO). For the in vitro induction of bradyzoites, 1.5×106 HFF cells in each experimental condition were infected at an MOI of 0.5 followed by pre-culture for 24h. Then culture media was replaced by RPMI 1640 medium (GIBCO) at pH 8.1 (adjusted with NaOH). The medium was exchanged every 2 days, and time course sampling was carried out over 144h after the induction of differentiation. For each sample, total RNA was extracted from the infected cells using TRI reagent (Sigma). Approximately 20μg of total RNA was extracted, and 1μg was used to prepare the template for RNA-Seq using the RNA-Seq template preparation kit (Illumina), following the manufacturer's instructions. A single lane of 36-bp single-end sequencing (one-eighth split of a run) was performed, and at least 10 million sequence tags were generated for each sample. RNA-Seq tags were mapped to the reference genomes of humans (hg19) and parasites (ToxoDB Release 5.2), allowing two-base mismatches. No RNA tags were simultaneously mapped to the human and Tg genomes.
For the RNA-Seq analysis of clinical Pf samples, peripheral blood samples from patients infected by Pf in Indonesia, which were collected according to the protocol approved by the ethical committee of Sam Ratulangi University, were used. From 2.5ml of the blood sample, total RNA was extracted using an RNA tube and an RNA extractor (PAX Gene). polyA+ RNA was selected and was used as a template for RNA-Seq. A single lane of 36-bp single-end sequencing was performed, and at least 10 million sequence tags were generated per sample. The generated RNA-Seq tags were mapped to the reference genomes of humans (hg19) and Pf (PlasmoDB Release 6.0), allowing two-base mismatches. As in the case of Tg, no RNA tags mapped to both human and Pf genomes.
For the TSS-Seq analysis of Tg, the Tg ME49 strain was cultured and differentiated in the same way as for the RNA-Seq analysis. One hundred and forty-four hours after bradyzoite induction, bradyzoites were purified by Arabic gum density-gradient centrifugation, which is a method for separating cysts of Tg from the infected mammalian cells by multi-layer centrifugation in an Arabic gum solution. Briefly, 10ml of Arabic gum solution having a specific gravity of 1.07 or 1.05 was layered in 50-ml tubes. Then, 16ml of suspended infected cells, which was homogenized with a 23 G needle, was added and then centrifuged for 10min at 800g at 20°C (11). Tachyzoites were purified by filtration through 5-mm pore membranes. Total RNA was extracted from the infected human cells using TRIzol reagent (Sigma). Approximately 200μg of total RNA was extracted and was used as a template for TSS-Seq. Template preparation for TSS-Seq analysis was carried out as previously described (9). Briefly, the 5′- and 3′-adaptor sequences necessary for the Illumina GA sequencing were introduced as the 5′-end oligo during the RNA ligation and as the random hexamer primer during the first-strand cDNA synthesis, respectively. For each sample, a single lane of 36-bp single-end sequencing was performed. Five to 10 million TSS tags were generated and mapped to the respective reference genome sequences. The position to which the 5′-end of the Illumina GA sequence tag was mapped was defined as a putative TSS. Statistics for TSS tags are shown in Table 2. The mapped TSS tags were clustered to identify putative promoter regions. Details of the analysis of the identified TSS will be described elsewhere.
For the Sanger sequencing of cDNAs, oligo-cap cDNA libraries were constructed as previously described. Among the 10000 randomly sequenced 5′-ESTs, non-redundant cDNAs were selected and subjected to shotgun sequencing using an Illumina GA system (8). On average, 800 cDNA clones were mixed, and 20 million shotgun tags were generated per pool. Genome-based assemblies were carried out as described in ref. (8).
Since the last update in 2009, the data set of Full-Parasites has been extended to cover cDNA sequences for more Apicomplexa species, including Babesia, Neospora, Eimeria and Theileria species (Table 1). In the latest version, Full-Parasites contains 105786 ESTs, of which 5925 cDNAs were selected for complete sequencing by shotgun sequencing coupled with genome-based assembly on an Illumina GAII system (8). TSS-Seq analysis was also carried out for different parts of the life cycles of Pf and Tg (Table 2).
Taking advantage of our unique full-length cDNA sequence data, multiple kinds of transcript-based annotations are possible in Full-Parasites. Various types of cDNA data are linked together to allow integrative interpretation of the data. To allow users to take advantage of this resource, Full-Parasites implements various types of viewers:
In general, although the contents of the cDNA data were expanded and several new features and new viewers were introduced, the overall look and feel of the viewer remains unchanged from the previous version to avoid confusion.
To analyze the interactions among transcripts of infecting parasites and infected human cells, we applied an RNA-Seq analysis of the mixed RNAs isolated from infected human cells. The current version comprised two data sets containing a total of 730 million mapped RNA-Seq tags (Table 3). The first data set is taken from Tg which were differentiated into bradyzoites in culture. We extracted human and parasite RNAs during the time course of bradyzoite differentiation. The collected RNAs were subjected to RNA-Seq analysis. At least 10 million RNA-Seq tags were generated for each sample. Of these RNA-Seq tags, ~95% originated from human RNA, and the rest were from Tg. Various gene expression patterns were observed during the differentiation process in both humans and Tg (details of the analysis will be published elsewhere).
In the Dynamic RNA-Seq Viewer (http://fullmal.hgc.jp/cgi-bin/dynamic.cgi), users can search for human and Tg genes by specifying the fold change in the gene expression level relative to that at time zero. Users can also use the absolute expression levels, which were evaluated by tag counts for the search. A search using overall gene expression patterns is also possible. As exemplified in the following section, users can search by whether the gene expression monotonically increased or decreased or whether there was an inflection point (details of how to set the search conditions, e.g. how to specify the standard point to define the relative expression levels, are described on the help page of the Dynamic RNA-Seq Viewer). Search results are linked to the cDNA viewer so that users can directly obtain the functional information for the genes that show particular expression patterns (left panel, Figure 2).
The second RNA-Seq data set is from Pf field samples. For this data set, we used blood samples collected from 31 patients infected by Pf in Indonesia. At least 10 million RNA-Seq tags for each sample were generated. Among the collected RNA-Seq tags, ~0.5–5% was from Pf, depending on the parasitemia (infection rate of parasites in erythrocytes). The rest were from humans, which gives information on the gene expression changes in human peripheral blood cells induced by malaria infection. Again, various expression patterns were observed from different host–parasite pairs. Users can search for genes showing particular expression patterns in different patients and infection stages. The basic search options are similar to those for the Tg data set. Clinical information for malaria symptoms, such as body temperature and suspected date of malaria infection, are also presented. In addition to the dynamic nature of human–parasite transcriptomes, we were also able to identify a large number of genetic variations, which were detected as cSNPs (cDNA SNP), in the field Pf samples compared to the reference genome sequence (Table 3). Users can also search for these genetic variations using our database. Details of the biological analysis of the tag information will be published elsewhere. We expect that further extensive transcriptome analysis using a larger number of clinical samples will provide useful information for understanding the clinical symptoms of malaria infections in Indonesia.
For an example of a search, follow these steps (Figure 1): Full-Parasites top; select the species, P. falciparum, and specify the ‘Annotated gene ID’ as ‘PF11_0069′ (in ‘Search Box’ shown in Figure 1). Evolutional conservation patterns (Phylogenic Analysis Viewer) and expression patterns of the gene (Dynamic RNA-Seq viewer) can also be followed from the Annotation Viewer, which are linked from the model transcript (indicated by a green circle, Figure 1).
A detailed user manual and a list of technical terms, definitions and parameters for the annotations are described in the ‘Glossary and Experimental Procedure’ sections of our websites (http://fullmal.hgc.jp/docs/glossary.html; http://fullmal.hgc.jp/docs/procedure.html). Users can follow the links for further information on each item displayed there. Statistics for the current database are also presented in the statistics section (http://fullmal.hgc.jp/docs/statistics.html). All of the short read sequences used for the database have been deposited in the NCBI Short Read Archives (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi). Newly generated RNA-Seq data have also been registered under the following accession numbers: DRA000224–DRA000273. The raw sequence data are publicly and freely available from the download site in our database (left frame in the middle in the top page). Additionally, the cDNA clones registered in the database are freely available.
Herein, we described an update of our Full-Parasites database with extensive cDNA data and a new type of dynamic RNA-Seq data. To visualize the newly generated short read sequences, we implemented a new version of the genome viewer and the RNA-Seq viewer. Particularly for the new type of RNA-Seq data sets, further enrichment of the data from additional clinical samples and more laboratory strain data during different life cycle stages and in different culture conditions is also being explored. Because it focuses on the dynamic nature of the transcriptome data and is based on various types of cDNA analyses, our database is different from other parasite databases, such as PlasmoDB (http://www.plasmodb.org/), CryptoDB (http://cryptodb.org/) and ToxoDB (http://www.toxodb.org/), whose main focus is on the static annotation of gene components. Through complementary use of our database and others, we believe that we will be able to lay a strong foundation for understanding how Apicomplexa parasites interact with host transcriptomes and achieve such complex life cycles with a limited number of genes.
RNA sequencing of malaria patients and Toxoplasma gondii-infected cells was funded by the grant Asia-Africa S & T Strategic Cooperation Promotion Program by the Special Coordination Funds for Promoting Science & Technology from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Funding for open access charge: Asia-Africa S & T Strategic Cooperation Promotion Program by the Special Coordination Funds for Promoting Science and Technology from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
Conflict of interest statement. None declared.
This work has been long conducted by Dr Junichi Watanabe, who recently passed away. We are grateful to Ms Etsuko Sekimori and Keiko Toya for excellent programming work. The full-length cDNA libraries were produced by Ms Kazumi Abe. Large-scale sequencing was performed by Ms Kiyomi Imamura, Rie Nishizawa and Mari Tsubaki. We are thankful to Mr Fah Sathirapongsasuti for critical reading of the manuscript. This database was constructed and maintained by a Grant-in-Aid for Publication of Scientific Research Results from the Japan Society for Promotion of Science and JST.