|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: NDY. Performed the experiments: NDY. Analyzed the data: NDY BEC RSH ARJ CC. Contributed reagents/materials/analysis tools: TL WMS BS. Wrote the paper: NDY. Contributed to drafting the manuscript: ARJ TL PJB AL. Supervised the project: RBG.
The two parasitic trematodes, Clonorchis sinensis and Opisthorchis viverrini, have a major impact on the health of tens of millions of humans throughout Asia. The greatest impact is through the malignant cancer (=cholangiocarcinoma) that these parasites induce in chronically infected people. Therefore, both C. sinensis and O. viverrini have been classified by the World Health Organization (WHO) as Group 1 carcinogens. Despite their impact, little is known about these parasites and their interplay with the host at the molecular level. Recent advances in genomics and bioinformatics provide unique opportunities to gain improved insights into the biology of parasites as well as their relationships with their hosts at the molecular level. The present study elucidates the transcriptomes of C. sinensis and O. viverrini using a platform based on next-generation (high throughput) sequencing and advanced in silico analyses. From 500,000 sequences, >50,000 sequences were assembled for each species and categorized as biologically relevant based on homology searches, gene ontology and/or pathway mapping. The results of the present study could assist in defining molecules that are essential for the development, reproduction and survival of liver flukes and/or that are linked to the development of cholangiocarcinoma. This study also lays a foundation for future genomic and proteomic research of C. sinensis and O. viverrini and the cancers that they are known to induce, as well as novel intervention strategies.
The parasitic worms, Clonorchis sinensis and Opisthorchis viverrini, have a serious impact on the health of tens of millions of people throughout Asia. The greatest impact, however, is through the malignant, untreatable cancer (cholangiocarcinoma) that these parasites induce in chronically infected people. These liver flukes are officially classified by the World Health Organization (WHO) as Group 1 carcinogens. In spite of their massive impact on human health, little is known about these parasites and their relationship with the host at the molecular level. Here, we provide the first detailed insight into the transcriptomes of these flukes, providing a solid foundation for all of the molecular/-omic work required to understand their biology, but, more importantly, to elucidate key aspects of the induction of cholangiocarcinoma. Although our focus has been on the parasites, the implications will extend far beyond the study of parasitic disease. Importantly, insights into the pathogenesis of the infection are likely to have major implications for the study and understanding of other cancers.
Liver flukes (Platyhelminthes: Digenea) include important food-borne eukaryotic pathogens of humans –. For example, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, which cause the diseases clonorchiasis and opisthorchiasis, respectively, represent a substantial public health problem in many parts of Asia , , . Clonorchis sinensis is endemic predominantly in regions of China (including Hong Kong and Taiwan), Korea and North Vietnam , , whilst O. viverrini is endemic throughout Thailand, the Lao People's Democratic Republic, Vietnam and Cambodia . Both of these parasites cause immense suffering in tens of millions of people, and more than 600 million people are estimated to be at risk of infection , . Despite efforts to control these two liver flukes, the prevalence of infection can be as high as 70% in some regions, including the Guangxi province in China (C. sinensis) and Khon Kaen province in Thailand (O. viverrini) , . A related fluke, O. felineus, is endemic in Siberia and eastern regions of the former USSR, and causes a similar disease and disease burden to O. viverrini and C. sinensis .
The life cycles of C. sinensis and O. viverrini are similar –, involving an aquatic snail (order Mesogastropoda), in which asexual reproduction takes place, and freshwater cyprinid fishes or palaemonid shrimps (for C. sinensis only) as intermediate hosts. Fish–eating (=piscivorous) mammals, including humans, dogs and cats, act as definitive hosts, in which sexual reproduction occurs. Clonorchiasis and opisthorchiasis are prevalent in geographical regions where raw cyprinid fish (C. sinensis and O. viverrini) and/or shrimp (C. sinensis) are a staple of the diet of humans , . Both parasites establish in the bile ducts of the liver as well as extrahepatic ducts and the gall bladder of the mammalian (definitive) host. These parasites are long-lived and cause chronic cholangitis, which can lead to periductal fibrosis, cholecystitis and cholelithiasis, obstructive jaundice, hepatomegaly and/or fibrosis of the periportal system –. Importantly, both experimental and epidemiological evidence – strongly implicates C. sinensis and O. viverrini infections in the etiology of cholangiocarcinoma, a malignant cancer of the bile ducts in humans which has a very poor prognosis. Indeed, C. sinensis and O. viverrini are both categorized by the International Agency for Research on Cancer (IARC) as Group 1 carcinogens .
In humans, the onset of cholangiocarcinoma occurs with chronic clonorchiasis or opisthorchiasis, associated with hepatobiliary damage, inflammation, periductal fibrosis and/or cellular responses to antigens from the infecting fluke . These conditions predispose to cholangiocarcinoma, possibly through an enhanced susceptibility of DNA to damage by carcinogens , , –. Chronic hepatobiliary damage is reported to be multi-factorial and considered to arise from a continued mechanical irritation of the epithelium by the flukes present, particularly via their suckers, metabolites and excreted/secreted antigens ,  as well as immunopathological processes . In regions where O. viverrini is highly endemic, the incidence of cholangiocarcinoma is unprecedented , . For instance, cholangiocarcinomas represent 15% of primary liver cancer worldwide, but in Thailand's Khon Kaen region, this figure escalates to 90%, the highest recorded incidence of this cancer in the world .
Currently, there is no effective chemotherapy to combat cholangiocarcinoma, such that intervention strategies need to rely on the prevention or treatment of liver fluke infection/disease. Although effective prevention could be readily achieved by persuading people to consume cooked fish and shrimp (via education programs), the ancient cultural custom to consume raw, undercooked or freshly pickled fish and shrimp persists in endemic areas , , . Thus, currently, the control of clonorchiasis/opisthorchiasis relies predominantly on anthelmintic treatment with praziquantel. Despite the efficacy of this compound, the lack of an acquired immunity to infection predisposes humans to reinfections in endemic regions , . In addition, under experimental conditions, the short-term treatment of O. viverrini-infected hamsters with praziquantel (400 mg per kg of live weight) has been shown to induce a dispersion of parasite antigens, resulting in adverse immunopathological changes as a result of oxidative and nitrative stresses following re-infection with O. viverrini , a process which has been proposed to initiate and/or promote the development of cholangiocarcinoma in humans . Given the current reliance on a single trematocidal drug against C. sinensis and O. viverrini, there is substantial merit in searching for new intervention methods, built on a detailed understanding of the interplay between the parasites and their hosts as well as the biology of the parasites themselves at the molecular level. Furthermore, the characterization of the genes expressed in these parasites should assist in elucidating the molecular mechanisms by which clonorchiasis and opisthorchiasis (or the respective parasites) initiate and enhance the development of cholangiocarcinoma , .
To date, most molecular biological research of socioeconomically important trematodes has focused on the human blood flukes, Schistosoma mansoni and S. japonicum, recently culminating in the determination of their nuclear genome sequences , . These genomic data sets provide an invaluable resource to support the exploration of the fundamental biology and evolution of flukes as well as their host–parasite interactions . However, the biology of schistosomes, which live as dioecious adults in the blood stream of mammalian hosts, is vastly distinct from that of hermaphroditic liver flukes, such as C. sinensis and O. viverrini. Currently, a total of only ~8,000 expressed sequence tags (ESTs) are publicly available for C. sinensis – and O. viverrini , a dataset far too small to give sufficient insights into transcriptomes for the purpose of supporting genomic and other fundamental molecular research.
Some recent genomic, bioinformatic and proteomic studies , – indicate unique and exciting prospects to explore key biochemical, physiological and biological pathways in liver flukes, and to predict and prioritize novel drug targets. In particular, the characterization of the transcriptome of the common liver fluke, Fasciola hepatica using next-generation sequencing-bioinformatic platform has discovered numerous molecules of biological relevance, some of which are inferred to be involved in key biological processes or pathways that could serve as key targets for new trematocidal drugs or vaccines . Using a similar platform, we characterized herein the transcriptomes of the adult stages of C. sinensis and O. viverrini, in order to provide essential resources for future genomic, proteomic, metabolomic and systems biological explorations of these important pathogens, and to underpin future efforts toward the improved intervention and control of cholangiocarcinoma.
Metacercariae were collected from naturally infected cyprinoid fish, using established methods , , in the Jinju-si, Gyeongsangnam-do province, South Korea (C. sinensis) and the Khon Kaen province, Thailand (O. viverrini). Helminth-free inbred Syrian golden hamsters (Mesocricetus auratus) were infected with metacercariae of each species as described previously , . Hamsters used in this study were maintained at the animal research facilities at the Faculty of Medicine, Khon Kaen University, Thailand and the School of Medicine, Gyeongsang National University, South Korea. All work was conducted in accordance with protocols approved by the animal ethics committees of respective institutions. Thirty-one (C. sinensis) to 42 (O. viverrini) days after infection, adult flukes were collected from the bile ducts of hamsters and cultured in vitro to allow the worms to regurgitate caecal contents using an established procedure . Subsequently, all flukes were washed extensively in physiological saline, snap-frozen in liquid nitrogen and then stored at −80°C. The specific identity of the adult worms was verified by isolating genomic DNA  and conducting PCR-coupled, bidirectional sequencing (ABI 3730xl DNA analyzer, Applied Biosystems, California, USA) of the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA under optimized conditions .
The transcriptomes of both C. sinensis and O. viverrini were characterized by 454 sequencing (Roche) from normalized, complementary DNA (cDNA) libraries (Eurofins MWG Operon, Ebersberg, Germany; www.eurofinsdna.com) following the approach applied to F. hepatica . For the construction of the libraries, total RNA was isolated from ~20 adult worms of each C. sinensis and O. viverrini, and polyadenylated (polyA+) RNA was then purified from 25 µg of pooled total RNA. First-strand cDNA synthesis of polyA+ RNA was primed using a hybrid, random hexamer (N6) oligonucleotide containing a specifically designed adapter (5′- TCGCAGTGAGTGACAGGCCA-3′) and transcribed using M-MLV H− reverse transcriptase. After RNA hydrolysis, a specifically designed adapter primer was attached to the 3′-end of the first-strand cDNA (5′-AGTCAGGACCTTGGCTGTCACTC-3′). The adapter sequences on both ends of the cDNA were then used to synthesize second-strand cDNA and amplify (18 cycles) the cDNA employing oligonucleotides complementary to the adapters by long and accurate PCR (LA-PCR) . Subsequently, specific adapter sequences A (5′- CCATCTCATCCCTGCGTGTCTCCGACTCAG -3′) and B (3′- CTGAGACTGCCAAGGCACACAGGGGATAGG -5′) (FLX Titanium, Roche) were added to the 5′- and 3′-ends of the cDNA, respectively. Normalization was conducted using one cycle of denaturation and reassociation of the cDNA. Reassociated double-stranded cDNA was separated from the remaining single-stranded cDNA (ss-cDNA, normalized cDNA) by purification on a hydroxylapatite column . The ss-cDNA was amplified (13 cycles) using LA-PCR and then size-selected (500–700 bp) following agarose gel electrophoresis and excision from the gel. Size-selected cDNA was eluted from the preparative gel and sequenced using a Genome Sequencer™ (GS) FLX Titanium Instrument (Roche Diagnostics) using a standard protocol . The 454 Life Sciences (Roche Diagnostic) software was used for image capture and signal processing. For each transcriptomic data set, a single file containing the trace, “base-calling” and quality score data was generated and stored in a standard flowgram format (SFF) for subsequent bioinformatic processing and analyses.
An automated, in silico-assembly pipeline (Eurofins MWG Operon) was used to assemble de novo the sequence data for each C. sinensis and O. viverrini. High quality, base-called and clipped reads from each data set were extracted from the SFF-files and their contigs assembled using MIRA v.2.9 (http://chevreux.org/projects_mira.html) . Mean lengths ± standard deviations in bases were calculated for particular nucleotide sequence data subsets. A second assembly of each data set was conducted using sequence regions predicted to encode open reading frames (ORFs) to specifically cluster sequences with similar protein coding regions . ORFs were predicted from the MIRA-assembled contigs and -unassembled singletons using ESTScan employing default settings . For each data set, sequences with ORFs were re-assembled into supercontigs using the Contig Assembly Program v.3 (CAP3) . To remove redundancy, nucleotide sequences were re-clustered using the BLASTclust program (BLAST v.2.2.20; ftp://ftp.ncbi.nlm.nih.gov/blast/executables/), allowing sequences to cluster if they aligned across >60% of their length and shared >95% amino acid residue identity.
The transcriptome data sets for C. sinensis and O. viverrini were each annotated using a semi-automated bioinformatic pipeline  using stringent statistical criteria. In brief, sequences were subjected to BLASTn (searching for gene homology) and BLASTx (searching for protein homology) analyses against publicly available (December, 2009) sequences from GenBank (National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov/est/) for C. sinensis (n=2,970), O. viverrini (4,194) and non-redundant sequence databases; ENSEMBL (http://www.ensembl.org/); SchistoDB (http://schistodb.net/schistodb20/) for S. mansoni; and the Shanghai Centre for Life Science & Biotechnology Information (http://lifecenter.sgst.cn/sjapathdb/data.html) for S. japonicum as well as a transcriptomic data set available for F. hepatica  using permissive (E-value: <1E−05), moderate (<1E−15) and/or stringent (<1E−30) search strategies. Homologues were identified in other eukaryotic organisms using permissive, moderate and stringent search strategies. ORFs were predicted from the final transcriptomic data sets for C. sinensis and O. viverrini using ESTScan, and proteins were inferred from ORFs by conceptual translation. Predicted proteins were classified functionally using InterProScan , employing the default search parameters. Based on their homology to conserved domains and protein families, predicted proteins of C. sinensis and O. viverrini were individually classified according Gene Ontology (GO) categories and assigned parental (i.e. level 2) terms (http://www.geneontology.org/). Inferred proteins with homologues in organisms for which sequence data were available were subjected to analysis, utilizing KEGG-Orthology Based Annotation System (KOBAS) , which predicts the biochemical pathways in which molecules are involved.
Amino acid sequences were subjected to analysis using TMHMM (a membrane topology prediction program)  to predict transmembrane domains. Putative excretory/secretory (ES) proteins were predicted from inferred amino acid sequences representing C. sinensis and O. viverrini using a previously described bioinformatic pipeline . Briefly, ES proteins were selected based on the presence of a signal peptide at the N-terminus using SignalP 3.0  and the absence of transmembrane domains. To provide further support for their classification, predicted ES proteins of >50 amino acid residues in length were compared with known secreted proteins  and signal peptides  (http://www.signalpeptide.de/), and the subset of proteins with known homologues (BLASTn, E-value<1E−05) were retained and summarized based on the biochemical pathway inferred using KOBAS.
More than 500,000 sequences were generated for each C. sinensis (n=574,448; 351±141 bases; i.e., mean ± standard deviation) and O. viverrini (642,918; 373±133 bases) (Table 1). Sequence data were deposited under accession number SRA012272 in the sequence read archive of NCBI (http://www.ncbi.nlm.nih.gov/sra). BLASTn searches (E-value<1E−05) revealed that most (92–97%) of sequences available in public databases for these flukes were contained within their respective data sets. As most (88–91%) sequences generated for each species were novel, only the present data were assembled (see Table 1). The assembly allowed ~84% of sequences to be clustered into >42,000 contigs. For C. sinensis, 42,179 contigs were 711±483 bases in length, with a mean depth of coverage of 10.8±20.0 reads per contig. For O. viverrini, 60,833 contigs were 680±438 bases in length, with a mean depth of coverage of 8.6±14.5 reads per contig. Total numbers of 92,123 (279±161 bases; C. sinensis) and 101,654 (307±162 bases; O. viverrini) sequences were singletons and could thus not be assembled.
In total, 134,301 C. sinensis sequences (415±363 bases) and 162,487 O. viverrini sequences (447±348 bases) were retained for further analyses. From the MIRA-assembled data, ORFs were predicted for 88,714 (66.1%) of C. sinensis sequences (383±371 bases) and 107,217 (66.0%) of O. viverrini sequences (389±355 bases). CAP3 clustered approximately half of these ORFs into ORF-enriched supercontigs, equating to 12,050 sequences (980±747 bases) for C. sinensis and 14,698 sequences (939±731 bases) for O. viverrini, with an average depth of coverage of 3.6–3.7 reads per supercontig for each species. For each species, the average G+C content (~47±4%) was similar to the estimates for F. hepatica, a digenean trematode related to C. sinensis and O. viverrini , . From either data set, a small number (49–82) of redundant sequences were excluded following the re-clustering of the sequences using BLASTclust. In addition, sequences with similarity at the nucleotide (E-value<1E−05) and protein (E-value<1E−50) levels to potential host (M. auratus) molecules or microbial organisms were excluded. The ORFs of both clustered and unique sequences (singletons) were subjected to further analysis.
The transcriptomic data sets for C. sinensis and O. viverrini were each used to interrogate genomic databases (i.e. F. hepatica, NCBI non-redundant, S. mansoni and S. japonicum databases) using BLASTx (Table 2). Of the ORF-enriched sequences, 16,892 of 50,769 (33.3%) C. sinensis and 19,047 of 61,417 (31.0%) O. viverrini sequences matched known proteins at a cut-off value of <1E−05 (Table 2). Proteins inferred for each C. sinensis and O. viverrini were compared specifically with one another and with complete proteomic data sets for selected organisms, (i) Saccharomyces cerevisiae (yeast) (ii) F. hepatica, S. mansoni and S. japonicum (trematodes) (iii) Caenorhabditis elegans (nematode) (iv) Drosophila melanogaster (insect), (v) Danio rerio, Gallus gallus, Xenopus tropicalis (non-mammalian vertebrates), and (vi) Homo sapiens and Mus musculus (mammals) (Table 3). Proteins predicted for C. sinensis (n=50,769) and O. viverrini (61,417) had the highest homology to one another using the permissive (27,103–29,995 sequence matches, equating to 48.4–53.4%), moderate (21,036–22,216 matches; 36.2–41.4%) and stringent (15,769–16,324 matches; 26.6–31.1%) search strategies. Both C. sinensis and O. viverrini shared greatest amino acid sequence similarity to proteins of other members of the Trematoda considered here, resulting in 14,526–27,103 sequence matches (28.6–53.4%) for the former and in 15,982–29,995 matches (26.0–48.8%) for the latter species (at E-value <1E−05). In agreement with the data available for Schistosoma spp. ,  and F. hepatica , both C. sinensis and O. viverrini shared greater amino acid sequence similarity (E-value: <1E−05) to mammalian proteins [with 10,164–11,238 sequence matches (18.3–20.1%)] than to those of C. elegans [with 8,029–8,951 sequence matches (14.6–15.8%)].
Within the class Trematoda, a high degree of protein sequence homology (21.5–24%) was shared (E-value <1E−05) amongst representatives of the families Fasciolidae (F. hepatica), Schistosomatidae (S. mansoni) and Opisthorchiidae (C. sinensis or O. viverrini) (Fig. 1). More proteins (9,527–10,835 matches; 17.6–18.8%) were uniquely shared between the two members of the family Opisthorchiidae than among representatives of different families. Protein conservation was also evident when C. sinensis and O. viverrini data sets were compared with the other trematodes included herein, using permissive (10,875–11,780 matches; 19.2–21.4%), moderate (7,164–7,660; 12.5–14.1%) and stringent (4,320–4,529; 7.4–8.5%) search strategies (Table 4). Relative conservation of inferred proteins was observed also when the C. sinensis and O. viverrini data sets were compared with those for mammals (mouse or human); 9,954–10,983 sequences (18–19.6%) had significant matches (E-value <1E−05). A significant percentage (6.1–6.8%; E-value <1E−05) of the proteins predicted for the two Asian liver flukes were conserved across the eukaryotic model organisms considered. These molecules included actin-like proteins, alpha and beta-tubulins, dynein-1-alpha heavy chain, elongation factor EF-2, enolase, glycogen synthase 1, heat shock protein 70, nucleosome assembly protein 1-like protein and ubiquitin-activating enzyme E1 (E-value <1E−100), of which most sequence matches (72.3–83.2%; E-value <1E−05) were to proteins inferred for S. cerevisiae (Table 3).
When sequences of C. sinensis and O. viverrini with homology to those within non-redundant gene data sets (available from the S. mansoni, S. japonicum and ENSEMBL gene databases) were clustered (BLASTx, E-value <1E−05), the number of homologous sequences predicted to encode proteins was 1.4 to 2.4-fold greater than expected (see Table 5). The clustering of ORF-enriched sequences to unique genes resulted in a prediction of 22,824–31,054 genes for C. sinensis, and 25,871–42,692 for O. viverrini.
To establish whether transcriptomic data sets were representative of adult C. sinensis and O. viverrini, predicted proteins were summarized according to their inferred molecular function, cellular localization and association with biological pathways (Table 2). A significant proportion (~18–19%) of the C. sinensis and O. viverrini transcriptome was annotated using ~4,000 unique InterPro domain or protein family signatures. Based on their annotation, according to conserved motifs, 1,250 and 1,271 different GO categories could be defined for C. sinensis and O. viverrini, respectively. All parental (i.e. level 2) GO terms assigned to the data sets for each F. hepatica  and S. mansoni (http://amigo.geneontology.org/; http://schistodb.net/schistodb20/) were represented in the transcriptomic data sets of the two Asian flukes (Table 6), including 19 linked to ‘biological process’, eight to ‘cellular component’ and 13 to ‘molecular function’ terms. The GO profiles were similar between C. sinensis and O. viverrini, with only two molecular function terms, namely metallo-chaperone activity and auxiliary transport protein activity being unique to each respective data set. Predicted proteins assigned to the term ‘biological process’ were associated predominantly with: (i) cellular processes (35–36%), such as protein amino acid phosphorylation, translation and regulation of transcription; (ii) metabolic processes (33–34%), such as proteolysis, carbohydrate metabolic process and oxidation reduction; and, (iii) biological regulation processes (8%), such as regulation of transcription and signal transduction (Table 6). Proteins assigned to the term ‘molecular function’ were mainly linked to: (i) the binding of ATP, zinc ion and protein (48–49%); (ii) catalytic activities (39%) of enzymes, including protein kinases and oxidoreductases; and, (iii) transporter activity (5%), including ATPase and amino acid transmembrane transporter activity and hydrolase activity catalyzing transmembrane movement of substances (Table 6). Predicted proteins were also mapped according to cellular components such as: (i) intracellular locations (60–62%), including the nucleus, membrane, cytoplasm, ribosome and microtubule; (ii) organelles (21–22%), including the nucleus, ribosome, microtubule, microtubule associated complex and cytoskeleton; and, (iii) macromolecular complexes (13–13.7%), including the ribosome, microtubule associated complex, dynein complex, membrane coat and the nucleosome (Table 6).
Significant similarity between protein sequences predicted for each C. sinensis and O. viverrini and those in the KOBAS database allowed 9,847 and 11,092 sequences to be assigned to 242 and 249 standardized (KEGG) biological pathway terms, respectively (Table 2). Like the functional annotation inferred using GO terms, biological pathways were similarly represented for the two transcriptomic data sets (Table 7). A significant proportion of molecules was associated with carbohydrate (7–9%) or amino acid (8%) metabolism, in agreement with the results of the GO-based analysis (Table 6). Cellular processing pathways were also frequently identified, including those associated with signal transduction (11–12%), cell communication (6–7%) and the endocrine (7–8%) and immune (4–5%) systems (Table 7). Importantly, 7–8% of predicted proteins from the two Asian liver flukes were linked to biological pathways that, when perturbed, can result in the development of cancer in humans (see Table 7), including molecules similar to integrins, regulatory GTPases, tyrosine and serine/threonine kinases and growth factors , .
Proteins inferred from the C. sinensis and O. viverrini transcriptomes were screened for signal peptides and transmembrane domains. ORF-enriched sequences predicted to encode signal peptides (3,305–4,246 sequences; 6.5–6.9%) and transmembrane motifs (3,453–4,382; 6.8–7.1%) were identified (Table 2). Based on the presence of signal peptide domains, the absence of transmembrane domains and homology to known signal peptide domains, putative ES proteins (1,143–1,470; 2.3–2.4%) were identified in each data set (Table 2). Functionality was predicted for putative ES proteins by assigning them to standardized (KEGG) protein families, biological pathways and GO-inferred biological processes (Fig. 2). The majority of these molecules were predicted to be (i) metabolic proteins (99–150; 8.7–10.0%), such as peptidases, glycosyltransferases and protein kinases; (ii) genetic information processing proteins (15–20; 1.3–1.4%), such as chaperones, folding catalysts and transcription factors; and, (iii) cellular processes and signalling proteins (28–31; 2.1–2.4%), such as cell adhesion molecule ligands, cell adhesion molecules and cellular antigens (Fig. 2A). ES proteins (171–284; 15.0–19.3%) mapped to 29 parental KEGG pathways (Fig. 2B). ES protein sequences linked to the endocrine and immune systems were the predominant cellular processes inferred. Signal transduction and interaction molecules were mostly represented in environmental information processing pathways. ES proteins associated with metabolic pathways were predominantly linked to carbohydrate, lipid, amino acid or glycan metabolism. ES proteins inferred for C. sinensis and (particularly) O. viverrini were linked with biological pathways which are recognized or considered to be linked to carcinogenesis (<20 matches) in humans. These molecules include homologues to human proteins, such as calmodulin, c-Jun N-terminal kinase (JNK), laminin and the Ras family GTPase-Rap1, which, when deregulated, have tumorogenic potential –. ES proteins were also categorized according to biological processes inferred from homology with proteins for which GO information is available (259–382; 22.7–26.0%). Biological processes that were well represented included metabolic processes, biological regulation and localization (Fig. 2C).
The integrated genomic-bioinformatic approach used in the present study permitted a deep exploration of the transcriptomes of both C. sinensis and O. viverrini, with more than 50,000 unique sequences being identified for each species. More than 85% of the sequences generated from each of these transcriptomes (available via http://research.vet.unimelb.edu.au/gasserlab/index.html) were novel and thus represent a significant contribution to current databases –,  and to scientific communities investigating parasites and neglected tropical diseases (NTDs). Based on similarity searches (BLASTx), more than 50% of the predicted protein sequences of C. sinensis and O. viverrini were inferred to be homologues, reflecting their relatively close biological  and phylogenetic  relationships. Amongst the trematodes S. japonicum, S. mansoni (Schistosomatidae), F. hepatica (Fasciolidae), C. sinensis and O. viverrini (Opisthorchiidae), the latter three species shared the greatest (29–31%) protein sequence homology. Interestingly, ~70% of protein sequences in each of the data sets presented herein did not match those available in public databases and are interpreted to be specific to the biology of these liver flukes and their mode of existence in the hosts and, thus, could represent potential candidates for new drugs and/or vaccines. The considerable percentage of protein sequences predicted that have significantly greater similarity to those of mammals (20%) than nematodes (15%) is interesting, and in accordance with previous observations for parasitic trematodes , , . This sequence similarity to mammalian molecules might reflect the capacity of the parasite to regulate host responses at the biochemical and immunological levels , . However, since the free-living platyhelminth, Schmidtea mediterranea, also shares a high degree (60%) of sequence homology to the proteome of various vertebrates , several other factors might contribute to this observation, including the early evolutionary divergence of acoelomate platyhelminths (Lophotrochozoa) from higher invertebrates, such as parasitic nematodes (e.g., Brugia, Ecdysozoa) .
The present and future transcriptomic data sets, incorporating a wider array of free-living and parasitic invertebrates, should assist in identifying genes linked specifically to parasitism and also contribute significantly to our understanding of the evolution of the Metazoa. For each C. sinensis and O. viverrini, the numbers of sequences that clustered with genes from eukaryotic model organisms (including higher vertebrates, such as humans and mice, through to lower invertebrates, such as yeast) was approximately two-fold greater than expected. An estimate of the number of proteins expressed by an organism relates to the number of genes present as well as transcriptional variation ; thus, it is possible that alternative splice forms might have contributed to the high number of genes predicted from the ORF-enriched data sets. Other factors, such as a degree of heterozygosity within or among the individual worms used for sequencing, a higher representation of paralogous molecules than usual, multiple non-clustered ORFs spanning one large gene and/or sequencing errors within homopolymeric regions , –, might also have contributed to this apparent overestimation. Independent data, generated using a short-read, deep-sequencing platform, such as the Illumina Genome Analyzer II , could be mapped to the present data sets to better define the complete transcriptomic profile and the number of genes for each species through enhanced assembly and annotation. Nonetheless, the present data sets for C. sinensis and O. viverrini are high quality drafts, and the assignment of molecules encoded in the transcriptomes to molecular functions and biological pathways reveals a substantial diversity of terms, comparable with those predicted for other parasitic trematodes, including S. mansoni (http://amigo.geneontology.org/; http://schistodb.net/schistodb20/)and F. hepatica .
The present transcriptomes inferred for C. sinensis and O. viverrini form a solid foundation and present unique opportunities for studying the developmental and reproductive biology of these parasites, parasite-host interactions and the pathogenesis of the diseases that these flukes cause in humans and animals. Importantly, the annotated data sets should also assist in the testing of current theories regarding the molecular basis of the pathogenesis of cholangiocarcinoma induced by chronic clonorchiasis or opisthorchiasis , , . For instance, molecules excreted/secreted by the parasites are known to induce a proliferation of mammalian cells in vitro ,  and have been suggested to play a role in the development of cholangiocarcinoma. Indeed, O. viverrini secretes a granulin-like molecule, which causes host cells to proliferate and is thought to be intimately involved in the initiation of carcinogenesis . Clearly, the present transcriptomic data will be a valuable resource to support detailed proteomic analyses of extracellular molecules with mitogen-like activity to test and extend current hypotheses as to the role of C. sinensis and O. viverrini in the development of cholangiocarcinoma . More broadly, the availability of the current transcriptomic data sets will substantially enhance the identification of both somatic and excretory/secretory (ES) and tegumenal proteins from C. sinensis and O. viverrini, following mass spectrometric analyses , . These data, together with transcriptomic data for F. hepatica , contribute to a growing resource and a significant foundation for future comparative genomic, proteomic, metabolomic and pathophysiological investigations of liver flukes and the diseases that they cause.
The transcriptomes defined herein represent the adult stage of each C. sinensis and O. viverrini. However, there are surprisingly few transcriptional data for other developmental stages, with only 419 ESTs available for the metacercarial stage of C. sinensis  and none for O. viverrini. Currently, there are no sequence data for the other developmental stages (including miracidium, sporocyst, redia, cercaria and immature fluke) of these parasites. Future studies should now focus on the differential expression of genes through the multiple free-living and parasitic life history stages. The transcriptional data for adult C. sinensis and O. viverrini will underpin the characterization of transcriptional profiles of this stage, utilizing next-generation sequencing, microarray and/or quantitative real-time PCR analyses, which will have important implications for understanding development, reproduction as well as parasite-host interactions at the molecular level. Importantly, elucidating the transcriptomes of both immature and adult flukes provides the prospect of exploring immunopathological changes as well as carcinogenesis in humans at different stages of clonorchiasis and opisthorchiasis , .
In conclusion, the present transcriptomic data will assist in fundamental molecular studies of development, reproduction and metabolic pathways by providing a foundation for new developments in functional genomics. Gene perturbation assays are available for S. mansoni and F. hepatica –, which indicates potential for the functional characterization of a wide range of molecules encoded in the transcriptomes of members of the family Opisthorchiidae. In the near future, it might be possible to predict and then characterize the function of parasite genes by employing probablistic functional networking, gene silencing and/or transgenesis. Bridging the gap between genomics and phenomics could provide unique insights into, for example, cellular differentiation, developmentally regulated gene expression, reproductive processes, signal transduction pathways linked specifically to parasitism and parasite-host interactions. In addition, the transcriptomes characterized here could support the definition of molecular or biological markers for the early diagnosis of disease. Importantly, the present transcriptomic data sets will also be an essential and powerful resource for the future assembly of the nuclear genomes of C. sinensis and O. viverrini as well as the determination of gene structures, prediction of alternative transcript splicing and the characterization of regulatory elements. Clearly, the future annotation of the genomes of these two parasites should also provide a foundation for the prediction of drug targets, based on an improved understanding of global biochemical pathways as well as genetic interactions , . The recent success in the comparative analysis of nuclear genomes to infer metabolic pathways in microbial organisms ,  appears to be an intellectual precedent for genomic sequencing. Coupled to extensive transcriptomic data sets for different developmental stages, genomic sequence data will enable extensive fundamental explorations and could facilitate the development of gene silencing, DNA-mediated transformation for these parasites as well as gene expression profiling and large-scale proteomic studies. Such advances should provide a basis for the delivery of applied outcomes, including the development of novel drugs and/or vaccines against C. sinensis and/or O. viverrini as well as diagnostic tools, particularly for the early diagnosis of cholangiocarcinoma. Given possible adverse effects of praziquantel treatment  and the high risk of re-infection following treatment , an emphasis must be placed on finding new and innovative intervention strategies against clonorchiasis and opisthorchiasis.
The authors have declared that no competing interests exist.
Funding from the Australian Research Council is gratefully acknowledged (RBG). NDY was the grateful recipient of an Endeavour Award from the Australian Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.