1.  Simple re-instantiation of small databases using cloud computing 
BMC Genomics  2013;14(Suppl 5):S13.
Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress.
We describe a Web-accessible system, available online at, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (, preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases.
Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
PMCID: PMC3852246  PMID: 24564380
Database archival; Re-instantiation; Cloud computing; BioSLAX; biodb100; MIABi
2.  Helminth secretome database (HSD): a collection of helminth excretory/secretory proteins predicted from expressed sequence tags (ESTs) 
BMC Genomics  2012;13(Suppl 7):S8.
Helminths are important socio-economic organisms, responsible for causing major parasitic infections in humans, other animals and plants. These infections impose a significant public health and economic burden globally. Exceptionally, some helminth organisms like Caenorhabditis elegans are free-living in nature and serve as model organisms for studying parasitic infections. Excretory/secretory proteins play an important role in parasitic helminth infections which make these proteins attractive targets for therapeutic use. In the case of helminths, large volume of expressed sequence tags (ESTs) has been generated to understand parasitism at molecular level and for predicting excretory/secretory proteins for developing novel strategies to tackle parasitic infections. However, mostly predicted ES proteins are not available for further analysis and there is no repository available for such predicted ES proteins. Furthermore, predictions have, in the main, focussed on classical secretory pathways while it is well established that helminth parasites also utilise non-classical secretory pathways.
We developed a free Helminth Secretome Database (HSD), which serves as a repository for ES proteins predicted using classical and non-classical secretory pathways, from EST data for 78 helminth species (64 nematodes, 7 trematodes and 7 cestodes) ranging from parasitic to free-living organisms. Approximately 0.9 million ESTs compiled from the largest EST database, dbEST were cleaned, assembled and analysed by different computational tools in our bioinformatics pipeline and predicted ES proteins were submitted to HSD.
We report the large-scale prediction and analysis of classically and non-classically secreted ES proteins from diverse helminth organisms. All the Unigenes (contigs and singletons) and excretory/secretory protein datasets generated from this analysis are freely available. A BLAST server is available at, for checking the sequence similarity of new protein sequences against predicted helminth ES proteins.
PMCID: PMC3546426  PMID: 23281827
3.  An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications 
BMC Genomics  2012;13(Suppl 7):S10.
Teladorsagia circumcincta (order Strongylida) is an economically important parasitic nematode of small ruminants (including sheep and goats) in temperate climatic regions of the world. Improved insights into the molecular biology of this parasite could underpin alternative methods required to control this and related parasites, in order to circumvent major problems associated with anthelmintic resistance. The aims of the present study were to define the transcriptome of the adult stage of T. circumcincta and to infer the main pathways linked to molecules known to be expressed in this nematode. Since sheep develop acquired immunity against T. circumcincta, there is some potential for the development of a vaccine against this parasite. Hence, we infer excretory/secretory molecules for T. circumcincta as possible immunogens and vaccine candidates.
A total of 407,357 ESTs were assembled yielding 39,852 putative gene sequences. Conceptual translation predicted 24,013 proteins, which were then subjected to detailed annotation which included pathway mapping of predicted proteins (including 112 excreted/secreted [ES] and 226 transmembrane peptides), domain analysis and GO annotation was carried out using InterProScan along with BLAST2GO. Further analysis was carried out for secretory signal peptides using SignalP and non-classical sec pathway using SecretomeP tools.
For ES proteins, key pathways, including Fc epsilon RI, T cell receptor, and chemokine signalling as well as leukocyte transendothelial migration were inferred to be linked to immune responses, along with other pathways related to neurodegenerative diseases and infectious diseases, which warrant detailed future studies. KAAS could identify new and updated pathways like phagosome and protein processing in endoplasmic reticulum. Domain analysis for the assembled dataset revealed families of serine, cysteine and proteinase inhibitors which might represent targets for parasite intervention. InterProScan could identify GO terms pertaining to the extracellular region. Some of the important domain families identified included the SCP-like extracellular proteins which belong to the pathogenesis-related proteins (PRPs) superfamily along with C-type lectin, saposin-like proteins. The 'extracellular region' that corresponds to allergen V5/Tpx-1 related, considered important in parasite-host interactions, was also identified.
Six cysteine motif (SXC1) proteins, transthyretin proteins, C-type lectins, activation-associated secreted proteins (ASPs), which could represent potential candidates for developing novel anthelmintics or vaccines were few other important findings. Of these, SXC1, protein kinase domain-containing protein, trypsin family protein, trypsin-like protease family member (TRY-1), putative major allergen and putative lipid binding protein were identified which have not been reported in the published T. circumcincta proteomics analysis.
Detailed analysis of 6,058 raw EST sequences from dbEST revealed 315 putatively secreted proteins. Amongst them, C-type single domain activation associated secreted protein ASP3 precursor, activation-associated secreted proteins (ASP-like protein), cathepsin B-like cysteine protease, cathepsin L cysteine protease, cysteine protease, TransThyretin-Related and Venom-Allergen-like proteins were the key findings.
We have annotated a large dataset ESTs of T. circumcincta and undertaken detailed comparative bioinformatics analyses. The results provide a comprehensive insight into the molecular biology of this parasite and disease manifestation which provides potential focal point for future research. We identified a number of pathways responsible for immune response. This type of large-scale computational scanning could be coupled with proteomic and metabolomic studies of this parasite leading to novel therapeutic intervention and disease control strategies. We have also successfully affirmed the use of bioinformatics tools, for the study of ESTs, which could now serve as a benchmark for the development of new computational EST analysis pipelines.
PMCID: PMC3521389  PMID: 23282110
4.  InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia 
BMC Genomics  2011;12(Suppl 3):S1.
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
PMCID: PMC3333168  PMID: 22369160
5.  In silico secretome analysis approach for next generation sequencing transcriptomic data 
BMC Genomics  2011;12(Suppl 3):S14.
Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel therapeutic strategies against parasitic infections. However, none of transcriptomic studies have extensively covered ES protein prediction for identifying novel therapeutic targets, especially as parasites adopt non-classical secretion pathways.
We developed a semi-automated computational approach for prediction and annotation of ES proteins using transcriptomic data from next generation sequencing platforms. For the prediction of non-classically secreted proteins, we have used an improved computational strategy, together with homology matching to a dataset of experimentally determined parasitic helminth ES proteins. We applied this protocol to analyse 454 short reads of parasitic nematode, Strongyloides ratti. From 296231 reads, we derived 28901 contigs, which were translated into 20877 proteins. Based on our improved ES protein prediction pipeline, we identified 2572 ES proteins, of which 407 (1.9%) proteins have classical N-terminal signal peptides, 923 (4.4%) were computationally identified as non-classically secreted while 1516 (7.26%) were identified by homology to experimentally identified parasitic helminth ES proteins. Out of 2572 ES proteins, 2310 (89.8%) ES proteins had homologues in the free-living nematode Caenorhabditis elegans and 2220 (86.3%) in parasitic nematodes. We could functionally annotate 1591 (61.8%) ES proteins with protein families and domains and establish pathway associations for 691 (26.8%) proteins. In addition, we have identified 19 representative ES proteins, which have no homologues in the host organism but homologous to lethal RNAi phenotypes in C. elegans, as potential therapeutic targets.
We report a comprehensive approach using freely available computational tools for the secretome analysis of NGS data. This approach has been applied to S. ratti 454 transcriptomic data for in silico excretory/secretory proteins prediction and analysis, providing a foundation for developing new therapeutic solutions for parasitic infections.
PMCID: PMC3333173  PMID: 22369360
6.  A comparative structural bioinformatics analysis of inherited mutations in β-D-Mannosidase across multiple species reveals a genotype-phenotype correlation  
BMC Genomics  2011;12(Suppl 3):S22.
Lysosomal β-D-mannosidase is a glycosyl hydrolase that breaks down the glycosidic bonds at the non-reducing end of N-linked glycoproteins. Hence, it is a crucial enzyme in polysaccharide degradation pathway. Mutations in the MANBA gene that codes for lysosomal β-mannosidase, result in improper coding and malfunctioning of protein, leading to β-mannosidosis. Studying the location of mutations on the enzyme structure is a rational approach in order to understand the functional consequences of these mutations. Accordingly, the pathology and clinical manifestations of the disease could be correlated to the genotypic modifications.
The wild-type and inherited mutations of β-mannosidase were studied across four different species, human, cow, goat and mouse employing a previously demonstrated comprehensive homology modeling and mutational mapping technique, which reveals a correlation between the variation of genotype and the severity of phenotype in β-mannosidosis. X-ray crystallographic structure of β-mannosidase from Bacteroides thetaiotaomicron was used as template for 3D structural modeling of the wild-type enzymes containing all the associated ligands. These wild-type models subsequently served as templates for building mutational structures. Truncations account for approximately 70% of the mutational cases. In general, the proximity of mutations to the active site determines the severity of phenotypic expressions. Mapping mutations to the MANBA gene sequence has identified five mutational hot-spots.
Although restrained by a limited dataset, our comprehensive study suggests a genotype-phenotype correlation in β-mannosidosis. A predictive approach for detecting likely β-mannosidosis is also demonstrated where we have extrapolated observed mutations from one species to homologous positions in other organisms based on the proximity of the mutations to the enzyme active site and their co-location from different organisms. Apart from aiding the detection of mutational hotspots in the gene, where novel mutations could be disease-implicated, this approach also provides a way to predict new disease mutations. Higher expression of the exoglycosidase chitobiase is said to play a vital role in determining disease phenotypes in human and mouse. A bigger dataset of inherited mutations as well as a parallel study of β-mannosidase and chitobiase activities in prospective patients would be interesting to better understand the underlying reasons for β-mannosidosis.
PMCID: PMC3333182  PMID: 22369051
7.  Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and Minimum Information About a Bioinformatics investigation (MIABi) 
BMC Genomics  2010;11(Suppl 4):S27.
The 2010 International Conference on Bioinformatics, InCoB2010, which is the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet) has agreed to publish conference papers in compliance with the proposed Minimum Information about a Bioinformatics investigation (MIABi), proposed in June 2009. Authors of the conference supplements in BMC Bioinformatics, BMC Genomics and Immunome Research have consented to cooperate in this process, which will include the procedures described herein, where appropriate, to ensure data and software persistence and perpetuity, database and resource re-instantiability and reproducibility of results, author and contributor identity disambiguation and MIABi-compliance. Wherever possible, datasets and databases will be submitted to depositories with standardized terminologies. As standards are evolving, this process is intended as a prelude to the 100 BioDatabases (BioDB100) initiative whereby APBioNet collaborators will contribute exemplar databases to demonstrate the feasibility of standards-compliance and participate in refining the process for peer-review of such publications and validation of scientific claims and standards compliance. This testbed represents another step in advancing standards-based processes in the bioinformatics community which is essential to the growing interoperability of biological data, information, knowledge and computational resources.
PMCID: PMC3005918  PMID: 21143811
8.  Challenges of the next decade for the Asia Pacific region: 2010 International Conference in Bioinformatics (InCoB 2010) 
BMC Genomics  2010;11(Suppl 4):S1.
The 2010 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia’s oldest bioinformatics organisation formed in 1998, was organized as the 9th International Conference on Bioinformatics (InCoB), Sept. 26-28, 2010 in Tokyo, Japan. Initially, APBioNet created InCoB as forum to foster bioinformatics in the Asia Pacific region. Given the growing importance of interdisciplinary research, InCoB2010 included topics targeting scientists in the fields of genomic medicine, immunology and chemoinformatics, supporting translational research. Peer-reviewed manuscripts that were accepted for publication in this supplement, represent key areas of research interests that have emerged in our region. We also highlight some of the current challenges bioinformatics is facing in the Asia Pacific region and conclude our report with the announcement of APBioNet’s 100 BioDatabases (BioDB100) initiative. BioDB100 will comply with the database criteria set out earlier in our proposal for Minimum Information about a Bioinformatics and Investigation (MIABi), setting the standards for biocuration and bioinformatics research, on which we will report at the next InCoB, Nov. 27 – Dec. 2, 2011 at Kuala Lumpur, Malaysia.
PMCID: PMC3005919  PMID: 21143792
9.  A multi-factor model for caspase degradome prediction 
BMC Genomics  2009;10(Suppl 3):S6.
Caspases belong to a class of cysteine proteases which function as critical effectors in cellular processes such as apoptosis and inflammation by cleaving substrates immediately after unique tetrapeptide sites. With hundreds of reported substrates and many more expected to be discovered, the elucidation of the caspase degradome will be an important milestone in the study of these proteases in human health and disease. Several computational methods for predicting caspase cleavage sites have been developed recently for identifying potential substrates. However, as most of these methods are based primarily on the detection of the tetrapeptide cleavage sites - a factor necessary but not sufficient for predicting in vivo substrate cleavage - prediction outcomes will inevitably include many false positives.
In this paper, we show that structural factors such as the presence of disorder and solvent exposure in the vicinity of the cleavage site are important and can be used to enhance results from cleavage site prediction. We constructed a two-step model incorporating cleavage site prediction and these factors to predict caspase substrates. Sequences are first predicted for cleavage sites using CASVM or GraBCas. Predicted cleavage sites are then scored, ranked and filtered against a cut-off based on their propensities for locating in disordered and solvent exposed regions. Using an independent dataset of caspase substrates, the model was shown to achieve greater positive predictive values compared to CASVM or GraBCas alone, and was able to reduce the false positives pool by up to 13% and 53% respectively while retaining all true positives. We applied our prediction model on the family of receptor tyrosine kinases (RTKs) and highlighted several members as potential caspase targets. The results suggest that RTKs may be generally regulated by caspase cleavage and in some cases, promote the induction of apoptotic cell death - a function distinct from their role as transducers of survival and growth signals.
As a step towards the prediction of in vivo caspase substrates, we have developed an accurate method incorporating cleavage site prediction and structural factors. The multi-factor model augments existing methods and complements experimental efforts to define the caspase degradome on the systems-wide basis.
PMCID: PMC2788393  PMID: 19958504
10.  A proposed minimum skill set for university graduates to meet the informatics needs and challenges of the "-omics" era 
BMC Genomics  2009;10(Suppl 3):S36.
The development of high throughput experimental technologies have given rise to the "-omics" era where terabyte-scale datasets for systems-level measurements of various cellular and molecular phenomena pose considerable challenges in data processing and extraction of biological meaning. Moreover, it has created an unmet need for the effective integration of these datasets to achieve insights into biological systems. While it has increased the demand for bioinformatics experts who can interface with biologists, it has also raised the requirement for biologists to possess a basic capability in bioinformatics and to communicate seamlessly with these experts. This may be achieved by embedding in their undergraduate and graduate life science education, basic training in bioinformatics geared towards acquiring a minimum skill set in computation and informatics.
Based on previous attempts to define curricula suitable for addressing the bioinformatics capability gap, an initiative was taken during the Workshops on Education in Bioinformatics and Computational Biology (WEBCB) in 2008 and 2009 to identify a minimum skill set for the training of future bioinformaticians and molecular biologists with informatics capabilities. The minimum skill set proposed is cross-disciplinary in nature, involving a combination of knowledge and proficiency from the fields of biology, computer science, mathematics and statistics, and can be tailored to the needs of the "-omics".
The proposed bioinformatics minimum skill set serves as a guideline for biology curriculum design and development in universities at both the undergraduate and graduate levels.
PMCID: PMC2788390  PMID: 19958501
11.  A multi-species comparative structural bioinformatics analysis of inherited mutations in α-D-Mannosidase reveals strong genotype-phenotype correlation 
BMC Genomics  2009;10(Suppl 3):S33.
Lysosomal α-mannosidase is an enzyme that acts to degrade N-linked oligosaccharides and hence plays an important role in mannose metabolism in humans and other mammalian species, especially livestock. Mutations in the gene (MAN2B1) encoding lysosomal α-D-mannosidase cause improper coding, resulting in dysfunctional or non-functional protein, causing the disease α-mannosidosis. Mapping disease mutations to the structure of the protein can help in understanding the functional consequences of these mutations and thus indirectly, the finer aspects of the pathology and clinical manifestations of the disease, including phenotypic severity as a function of the genotype.
A comprehensive homology modeling study of all the wild-type and inherited mutations of lysosomal α-mannosidase in four different species, human, cow, cat and guinea pig, reveals a significant correlation between the severity of the genotype and the phenotype in α-mannosidosis. We used the X-ray crystallographic structure of bovine lysosomal α-mannosidase as template, containing only two disulphide bonds and some ligands, to build structural models of wild-type structures with four disulfide linkages and all bound ligands. These wild-type models were then used as templates for disease mutations. All the truncations and substitutions involving the residues in and around the active site and those that destabilize the fold led to severe genotypes resulting in lethal phenotypes, whereas the mutations lying away from the active site were milder in both their genotypic and phenotypic expression.
Based on the co-location of mutations from different organisms and their proximity to the enzyme active site, we have extrapolated observed mutations from one species to homologous positions in other organisms, as a predictive approach for detecting likely α-mannosidosis. Besides predicting new disease mutations, this approach also provides a way for detecting mutation hotspots in the gene, where novel mutations could be implicated in disease. The current study has identified five mutational hot-spot regions along the MAN2B1 gene. Structural mapping can thus provide a rational approach for predicting the phenotype of a disease, based on observed genotypic variations.
PMCID: PMC2788387  PMID: 19958498
12.  Genome-wide analysis of alternative splicing in cow: implications in bovine as a model for human diseases 
BMC Genomics  2009;10(Suppl 3):S11.
Alternative splicing (AS) is a primary mechanism of functional regulation in the human genome, with 60% to 80% of human genes being alternatively spliced. As part of the bovine genome annotation team, we have analysed 4567 bovine AS genes, compared to 16715 human and 16491 mouse AS genes, along with Gene Ontology (GO) analysis. We also analysed the two most important events, cassette exons and intron retention in 94 human disease genes and mapped them to the bovine orthologous genes. Of the 94 human inherited disease genes, a protein domain analysis was carried out for the transcript sequences of 12 human genes that have orthologous genes and have been characterised in cow.
Of the 21,755 bovine genes, 4,567 genes (21%) are alternatively spliced, compared to 16,715 (68%) in human and 16,491 (57%) in mouse. Gene-level analysis of the orthologous set suggested that bovine genes show fewer AS events compared to human and mouse genes. A detailed examination of cassette exons across human and cow for 94 human disease genes, suggested that a majority of cassette exons in human were present and constitutive in bovine as opposed to intron retention which exhibited 50% of the exons as present and 50% as absent in cow. We observed that AS plays a major role in disease implications in human through manipulations of essential/functional protein domains. It was also evident that majority of these 12 genes had conservation of all essential domains in their bovine orthologous counterpart, for these human diseases.
While alternative splicing has the potential to create many mRNA isoforms from a single gene, in cow the majority of genes generate two to three isoforms, compared to six in human and four in mouse. Our analyses demonstrated that a smaller number of bovine genes show greater transcript diversity. GO definitions for bovine AS genes provided 38% more functional information than currently available in the sequence database. Our protein domain analysis helped us verify the suitability of using bovine as a model for human diseases and also recognize the contribution of AS towards the disease phenotypes.
PMCID: PMC2788363  PMID: 19958474
13.  Extending Asia Pacific bioinformatics into new realms in the "-omics" era 
BMC Genomics  2009;10(Suppl 3):S1.
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.
PMCID: PMC2788361  PMID: 19958472
14.  Comprehensive splicing graph analysis of alternative splicing patterns in chicken, compared to human and mouse 
BMC Genomics  2009;10(Suppl 1):S5.
Alternative transcript diversity manifests itself as a prime cause of complexity in higher eukaryotes. Recently, transcript diversity studies have suggested that 60–80% of human genes are alternatively spliced. We have used a splicing pattern approach for the bioinformatics analysis of Alternative Splicing (AS) in chicken, human and mouse. Exons involved in splicing are subdivided into distinct and variant exons, based on the prevalence of the exons across the transcripts. Four possible permutations of these two different groups of exons were categorised as class I (distinct-variant), class II (distinct-variant), class III (variant-distinct) and class IV (variant-variant). This classification quantifies the variation in transcript diversity in the three species.
In all, 3901 chicken AS genes have been compared with 16,715 human and 16,491 mouse AS genes, with 23% of chicken genes being alternatively spliced, compared to 68% in humans and 57% in mice. To minimize any gene structure bias in the input data, comparative genome analysis has been carried out on the orthologous subset of AS genes for the three species. Gene-level analysis suggested that chicken genes show fewer AS events compared to human and mouse. An event-level analysis showed that the percentage of AS events in chicken is similar to that of human, which implies that a smaller number of chicken genes show greater transcript diversity. Overall, chicken genes were found to have fewer transcripts per gene and shorter introns than human and mouse genes.
In chicken, the majority of genes generate only two or three isoforms, compared to almost eight in human and six in mouse. We observed that intron definition is expressed strongly when compared to exon definition for chicken genome, based on 3% intron retention in chicken, compared to 2% in human and mouse. Splicing patterns with variant exons account for 33% of AS chicken orthologous genes compared to 24% in human and 27% in mouse, providing a novel measure to describe the species-wise complexity due to alternative transcript diversity.
PMCID: PMC2709266  PMID: 19594882
15.  A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus 
BMC Genomics  2007;8:311.
Lungworms of the genus Dictyocaulus (family Dictyocaulidae) are parasitic nematodes of major economic importance. They cause pathological effects and clinical disease in various ruminant hosts, particularly in young animals. Dictyocaulus viviparus, called the bovine lungworm, is a major pathogen of cattle, with severe infections being fatal. In this study, we provide first insights into the transcriptome of the adult stage of D. viviparus through the analysis of expressed sequence tags (ESTs).
Using our EST analysis pipeline, we estimate that the present dataset of 4436 ESTs is derived from 2258 genes based on cluster and comparative genomic analyses of the ESTs. Of the 2258 representative ESTs, 1159 (51.3%) had homologues in the free-living nematode C. elegans, 1174 (51.9%) in parasitic nematodes, 827 (36.6%) in organisms other than nematodes, and 863 (38%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 569 had observed 'non-wildtype' RNAi phenotypes, including embryonic lethality, maternal sterility, sterility in progeny, larval arrest and slow growth. We could functionally classify 776 (35%) sequences using the Gene Ontologies (GO) and established pathway associations to 696 (31%) sequences in Kyoto Encyclopedia of Genes and Genomes (KEGG). In addition, we predicted 85 secreted proteins which could represent potential candidates for developing novel anthelmintics or vaccines.
The bioinformatic analyses of ESTs data for D. viviparus has elucidated sets of relatively conserved and potentially novel genes. The genes discovered in this study should assist research toward a better understanding of the basic molecular biology of D. viviparus, which could lead, in the longer term, to novel intervention strategies. The characterization of the D. viviparus transcriptome also provides a foundation for whole genome sequence analysis and future comparative transcriptomic analyses.
PMCID: PMC2131760  PMID: 17784965

