Search tips
Search criteria

Results 1-25 (67)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Cracking the nodule worm code advances knowledge of parasite biology and biotechnology to tackle major diseases of livestock 
Biotechnology advances  2015;33(6 0 1):980-991.
Many infectious diseases caused by eukaryotic pathogens have a devastating, long-term impact on animal health and welfare. Hundreds of millions of animals are affected by parasitic nematodes of the order Stronglida. Unlocking the molecular biology of representatives of this order, and understanding nematode-host interactions, drug resistance and disease using advanced technologies could lead to entirely new ways of controlling the diseases that they cause. Oesphagostomum dentatum (nodule worm; superfamily Strongyloidea) is an economically important strongylid nematode of swine worldwide. The present article reports recent advances made in biology and animal biotechnology through the draft genome and developmental transcriptome of O. dentatum, in order to support biological research of this and related parasitic nematodes as well as the search for new and improved interventions. This first genome of any member of the Strongyloidea is 443 Mb in size and predicted to encode 25,291 protein-coding genes. Here, we review the dynamics of transcription throughout the life cycle of O. dentatum, describe double-stranded RNA interference (RNAi) machinery and infer molecules involved in development and reproduction, and in inducing or modulating immune responses or disease. The secretome predicted for O. dentatum is particularly rich in peptidases linked to interactions with host tissues and/or feeding activity, and a diverse array of molecules likely involved in immune responses. This research progress provides an important resource for future comparative genomic and molecular biological investigations as well as for biotechnological research toward new anthelmintics, vaccines and diagnostic tests.
PMCID: PMC4746232  PMID: 26026709
Biotechnology; Genomics; Transcriptomics; Bioinformatics; Nodule worm disease; Livestock
2.  GIW and InCoB, two premier bioinformatics conferences in Asia with a combined 40 years of history 
BMC Genomics  2015;16(Suppl 12):I1.
Knowledge discovery in bioinformatics thrives on joint and inclusive efforts of stakeholders. Similarly, knowledge dissemination is expected to be more effective and scalable through joint efforts. Therefore, the International Conference on Bioinformatics (InCoB) and the International Conference on Genome Informatics (GIW) were organized as a joint conference for the first time in 13 years of coexistence. The Asia-Pacific Bioinformatics Network (APBioNet) and the Japanese Society for Bioinformatics (JSBi) collaborated to host GIW/InCoB2015 in Tokyo, September 9-11, 2015. The joint endeavour yielded 51 research articles published in seven journals, 78 poster and 89 oral presentations, showcasing bioinformatics research in the Asia-Pacific region. Encouraged by the results and reduced organizational overheads, APBioNet will collaborate with other bioinformatics societies in organizing co-located bioinformatics research and training meetings in the future. InCoB2016 will be hosted in Singapore, September 21-23, 2016.
PMCID: PMC4682400  PMID: 26679412
3.  Discrete structural features among interface residue-level classes 
BMC Bioinformatics  2015;16(Suppl 18):S8.
Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding.
Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes.
Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs.
PMCID: PMC4682381  PMID: 26679043
Protein-protein interaction (PPI); heterodimers; interface; surface; polarity
4.  Coherence analysis discriminates between retroviral integration patterns in CD34+ cells transduced under differing clinical trial conditions 
Unequivocal demonstration of the therapeutic utility of γ-retroviral vectors for gene therapy applications targeting the hematopoietic system was accompanied by instances of insertional mutagenesis. These events stimulated the ongoing development of putatively safer integrating vector systems and analysis methods to characterize and compare integration site (IS) biosafety profiles. Continuing advances in next-generation sequencing technologies are driving the generation of ever-more complex IS datasets. Available bioinformatic tools to compare such datasets focus on the association of integration sites (ISs) with selected genomic and epigenetic features, and the choice of these features determines the ability to discriminate between datasets. We describe the scalable application of point-process coherence analysis (CA) to compare patterns produced by vector ISs across genomic intervals, uncoupled from association with genomic features. To explore the utility of CA in the context of an unresolved question, we asked whether the differing transduction conditions used in the initial Paris and London SCID-X1 gene therapy trials result in divergent genome-wide integration profiles. We tested a transduction carried out under each condition, and showed that CA could indeed resolve differences in IS distributions. Existence of these differences was confirmed by the application of established methods to compare integration datasets.
PMCID: PMC4445430  PMID: 26029726
5.  A novel multiplexed immunoassay identifies CEA, IL-8 and prolactin as prospective markers for Dukes’ stages A-D colorectal cancers 
Clinical Proteomics  2015;12(1):10.
Current methods widely deployed for colorectal cancers (CRC) screening lack the necessary sensitivity and specificity required for population-based early disease detection. Cancer-specific protein biomarkers are thought to be produced either by the tumor itself or other tissues in response to the presence of cancers or associated conditions. Equally, known examples of cancer protein biomarkers (e.g., PSA, CA125, CA19-9, CEA, AFP) are frequently found in plasma at very low concentration (pg/mL-ng/mL). New sensitive and specific assays are therefore urgently required to detect the disease at an early stage when prognosis is good following surgical resection. This study was designed to meet the longstanding unmet clinical need for earlier CRC detection by measuring plasma candidate biomarkers of cancer onset and progression in a clinical stage-specific manner. EDTA plasma samples (1 μL) obtained from 75 patients with Dukes’ staged CRC or unaffected controls (age and sex matched with stringent inclusion/exclusion criteria) were assayed for expression of 92 human proteins employing the Proseek® Multiplex Oncology I proximity extension assay. An identical set of plasma samples were analyzed utilizing the Bio-Plex Pro™ human cytokine 27-plex immunoassay.
Similar quantitative expression patterns for 13 plasma antigens common to both platforms endorsed the potential efficacy of Proseek as an immune-based multiplex assay for proteomic biomarker research. Proseek found that expression of Carcinoembryonic Antigen (CEA), IL-8 and prolactin are significantly correlated with CRC stage.
CEA, IL-8 and prolactin expression were found to identify between control (unaffected), non-malignant (Dukes’ A + B) and malignant (Dukes’ C + D) stages.
Electronic supplementary material
The online version of this article (doi:10.1186/s12014-015-9081-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4435647  PMID: 25987887
Multiplex immunoassay; Plasma biomarker; Colorectal cancer
6.  Genome of the human hookworm Necator americanus 
Nature genetics  2014;46(3):261-269.
The hookworm Necator americanus is the predominant soil-transmitted human parasite. Adult worms feed on blood in the small intestine, causing iron deficiency anaemia, malnutrition, growth and development stunting in children, and severe morbidity and mortality during pregnancy in women. Characterization of the first hookworm genome sequence (244 Mb, 19,151 genes) identified genes orchestrating the hookworm's invasion of the human host, genes involved in blood feeding and development, and genes encoding proteins that represent new potential drug targets against hookworms. N. americanus has undergone a considerable and unique expansion of immunomodulator proteins, some of which we highlight as potential novel treatments against inflammatory diseases. We also utilize a protein microarray to demonstrate a post-genomic application of the hookworm genome sequence. This genome provides an invaluable resource to boost ongoing efforts towards fundamental and applied post-genomic research, including the development of new methods to control hookworm and human immunological diseases.
PMCID: PMC3978129  PMID: 24441737
nematodes; hookworm; necatoriasis; blood-feeding; SCP/TAPS protein; immunoregulation; anti-inflammation; genome; RNA-Seq; protein microarray
7.  Deep insights into Dictyocaulus viviparus transcriptomes provides unique prospects for new drug targets and disease intervention 
Biotechnology advances  2010;29(3):10.1016/j.biotechadv.2010.11.005.
The lungworm, Dictyocaulus viviparus, causes parasitic bronchitis in cattle, and is responsible for substantial economic losses in temperate regions of the world. Here, we undertake the first large-scale exploration of available transcriptomic data for this lungworm, examine differences in transcription between different stages/both genders and identify and prioritize essential molecules linked to fundamental metabolic pathways, which could represent novel drug targets. Approximately 3 million expressed sequence tags (ESTs), generated by 454 sequencing from third-stage larvae (L3) as well as adult females and males of D. viviparus, were assembled and annotated. The assembly of these sequences yielded ~61,000 contigs, of which relatively large proportions encoded collagens (4.3%), ubiquitins (2.1%) and serine/threonine protein kinases (1.9%). Subtractive analysis in silico identified 6,928 nucleotide sequences as being uniquely transcribed in L3, and 5,203 and 7,889 transcripts as being exclusive to the adult female and male, respectively. Most peptides predicted from the conceptual translations were nucleoplasmins (L3), serine/threonine protein kinases (female) and major sperm proteins (male). Additional analyses allowed the prediction of three drug target candidates, whose Caenorhabditis elegans homologues were linked to a lethal RNA interference phenotype. This detailed exploration, combined with future transcriptomic sequencing of all developmental stages of D. viviparus, will facilitate future investigations of the molecular biology of this parasitic nematode as well as genomic sequencing. These advances will underpin the discovery of new drug and/or vaccine targets, focused on biotechnological outcomes.
PMCID: PMC3827682  PMID: 21182926
Dictyocaulus viviparous; Bovine lungworm; Next-generation sequencing; Bioinformatics; Transcriptome; Ancylostoma-secreted proteins; Drug target prediction
8.  InCoB2013 introduces Systems Biology as a major conference theme 
BMC Systems Biology  2013;7(Suppl 3):S1.
The Asia-Pacific Bioinformatics Network (APBioNet) held the first International Conference on Bioinformatics (InCoB) in Bangkok in 2002 to promote North-South networking. Commencing as a forum for Asia-Pacific researchers to interact with and learn from with scientists of developed countries, InCoB has become a major regional bioinformatics conference, with participants from the region as well as North America and Europe. Since 2006, InCoB has selected the best submissions for publication in BMC Bioinformatics. In response to the growth and maturation of data-driven approaches, InCoB added BMC Genomics in 2009 and with the introduction of this conference supplement, BMC Systems Biology to its journal choices for submitting authors. Co-hosting InCoB2013 with the second International Conference for Translational Bioinformatics (ICTBI) is in line with InCoB's support for the current trend in taking bioinformatics to the bedside, along with a systems approach to solving biological problems.
PMCID: PMC3816296  PMID: 24555777
9.  APBioNet—Transforming Bioinformatics in the Asia-Pacific Region 
PLoS Computational Biology  2013;9(10):e1003317.
PMCID: PMC3814852  PMID: 24204244
10.  Simple re-instantiation of small databases using cloud computing 
BMC Genomics  2013;14(Suppl 5):S13.
Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress.
We describe a Web-accessible system, available online at, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (, preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases.
Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
PMCID: PMC3852246  PMID: 24564380
Database archival; Re-instantiation; Cloud computing; BioSLAX; biodb100; MIABi
11.  First transcriptomic analysis of the economically important parasitic nematode, Trichostrongylus colubriformis, using a next-generation sequencing approach 
Trichostrongylus colubriformis (Strongylida), a small intestinal nematode of small ruminants, is a major cause of production and economic losses in many countries. The aims of the present study were to define the transcriptome of the adult stage of T. colubriformis, using 454 sequencing technology and bioinformatic analyses, and to predict the main pathways that key groups of molecules are linked to in this nematode. A total of 21,259 contigs were assembled from the sequence data produced from a normalized cDNA library; 7,876 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and encoded, amongst others, proteins with ‘transthyretin-like’ (8.8%), ‘RNA recognition’ (8.4%) and ‘metridin-like ShK toxin’ (7.6%) motifs. Bioinformatic analyses inferred that relatively high proportions of the C. elegans homologues are involved in biological pathways linked to ‘peptidases’ (4%), ‘ribosome’ (3.6%) and ‘oxidative phosphorylation’ (3%). Highly represented were peptides predicted to be associated with the nervous system, digestion of host proteins or inhibition of host proteases. Probabilistic functional gene networking of the complement of C. elegans orthologues (n = 2,126) assigned significance to particular subsets of molecules, such as protein kinases and serine/threonine phosphatases. The present study represents the first, comprehensive insight into the transcriptome of adult T. colubriformis, which provides a foundation for fundamental studies of the molecular biology and biochemistry of this parasitic nematode as well as prospects for identifying targets for novel nematocides. Future investigations should focus on comparing the transcriptomes of different developmental stages, both genders and various tissues of this parasitic nematode for the prediction of essential genes/gene products that are specific to nematodes.
PMCID: PMC3666958  PMID: 20692378
Trichostrongylus colubriformis; Transcriptome; Next-generation sequencing; Bioinformatics; Peptidases; Ancylostoma-secreted proteins
12.  Identification of ovarian cancer associated genes using an integrated approach in a Boolean framework 
BMC Systems Biology  2013;7:12.
Cancer is a complex disease where molecular mechanism remains elusive. A systems approach is needed to integrate diverse biological information for the prognosis and therapy risk assessment using mechanistic approach to understand gene interactions in pathways and networks and functional attributes to unravel the biological behaviour of tumors.
We weighted the functional attributes based on various functional properties observed between cancerous and non-cancerous genes reported from literature. This weighing schema was then encoded in a Boolean logic framework to rank differentially expressed genes. We have identified 17 genes to be differentially expressed from a total of 11,173 genes, where ten genes are reported to be down-regulated via epigenetic inactivation and seven genes are up-regulated. Here, we report that the overexpressed genes IRAK1, CHEK1 and BUB1 may play an important role in ovarian cancer. We also show that these 17 genes can be used to form an ovarian cancer signature, to distinguish normal from ovarian cancer subjects and that the set of three genes, CHEK1, AR, and LYN, can be used to classify good and poor prognostic tumors.
We provided a workflow using a Boolean logic schema for the identification of differentially expressed genes by integrating diverse biological information. This integrated approach resulted in the identification of genes as potential biomarkers in ovarian cancer.
PMCID: PMC3605242  PMID: 23383610
13.  Helminth secretome database (HSD): a collection of helminth excretory/secretory proteins predicted from expressed sequence tags (ESTs) 
BMC Genomics  2012;13(Suppl 7):S8.
Helminths are important socio-economic organisms, responsible for causing major parasitic infections in humans, other animals and plants. These infections impose a significant public health and economic burden globally. Exceptionally, some helminth organisms like Caenorhabditis elegans are free-living in nature and serve as model organisms for studying parasitic infections. Excretory/secretory proteins play an important role in parasitic helminth infections which make these proteins attractive targets for therapeutic use. In the case of helminths, large volume of expressed sequence tags (ESTs) has been generated to understand parasitism at molecular level and for predicting excretory/secretory proteins for developing novel strategies to tackle parasitic infections. However, mostly predicted ES proteins are not available for further analysis and there is no repository available for such predicted ES proteins. Furthermore, predictions have, in the main, focussed on classical secretory pathways while it is well established that helminth parasites also utilise non-classical secretory pathways.
We developed a free Helminth Secretome Database (HSD), which serves as a repository for ES proteins predicted using classical and non-classical secretory pathways, from EST data for 78 helminth species (64 nematodes, 7 trematodes and 7 cestodes) ranging from parasitic to free-living organisms. Approximately 0.9 million ESTs compiled from the largest EST database, dbEST were cleaned, assembled and analysed by different computational tools in our bioinformatics pipeline and predicted ES proteins were submitted to HSD.
We report the large-scale prediction and analysis of classically and non-classically secreted ES proteins from diverse helminth organisms. All the Unigenes (contigs and singletons) and excretory/secretory protein datasets generated from this analysis are freely available. A BLAST server is available at, for checking the sequence similarity of new protein sequences against predicted helminth ES proteins.
PMCID: PMC3546426  PMID: 23281827
14.  TranSeqAnnotator: large-scale analysis of transcriptomic data 
BMC Bioinformatics  2012;13(Suppl 17):S24.
The transcriptome of an organism can be studied with the analysis of expressed sequence tag (EST) data sets that offers a rapid and cost effective approach with several new and updated bioinformatics approaches and tools for assembly and annotation. The comprehensive analyses comprehend an organism along with the genome and proteome analysis. With the advent of large-scale sequencing projects and generation of sequence data at protein and cDNA levels, automated analysis pipeline is necessary to store, organize and annotate ESTs.
TranSeqAnnotator is a workflow for large-scale analysis of transcriptomic data with the most appropriate bioinformatics tools for data management and analysis. The pipeline automatically cleans, clusters, assembles and generates consensus sequences, conceptually translates these into possible protein products and assigns putative function based on various DNA and protein similarity searches. Excretory/secretory (ES) proteins inferred from ESTs/short reads are also identified. The TranSeqAnnotator accepts FASTA format raw and quality ESTs along with protein and short read sequences and are analysed with user selected programs. After pre-processing and assembly, the dataset is annotated at the nucleotide, protein and ES protein levels.
TranSeqAnnotator has been developed in a Linux cluster, to perform an exhaustive and reliable analysis and provide detailed annotation. TranSeqAnnotator outputs gene ontologies, protein functional identifications in terms of mapping to protein domains and metabolic pathways. The pipeline is applied to annotate large EST datasets to identify several novel and known genes with therapeutic experimental validations and could serve as potential targets for parasite intervention. TransSeqAnnotator is freely available for the scientific community at
PMCID: PMC3521237  PMID: 23282024
15.  An analysis of the transcriptome of Teladorsagia circumcincta: its biological and biotechnological implications 
BMC Genomics  2012;13(Suppl 7):S10.
Teladorsagia circumcincta (order Strongylida) is an economically important parasitic nematode of small ruminants (including sheep and goats) in temperate climatic regions of the world. Improved insights into the molecular biology of this parasite could underpin alternative methods required to control this and related parasites, in order to circumvent major problems associated with anthelmintic resistance. The aims of the present study were to define the transcriptome of the adult stage of T. circumcincta and to infer the main pathways linked to molecules known to be expressed in this nematode. Since sheep develop acquired immunity against T. circumcincta, there is some potential for the development of a vaccine against this parasite. Hence, we infer excretory/secretory molecules for T. circumcincta as possible immunogens and vaccine candidates.
A total of 407,357 ESTs were assembled yielding 39,852 putative gene sequences. Conceptual translation predicted 24,013 proteins, which were then subjected to detailed annotation which included pathway mapping of predicted proteins (including 112 excreted/secreted [ES] and 226 transmembrane peptides), domain analysis and GO annotation was carried out using InterProScan along with BLAST2GO. Further analysis was carried out for secretory signal peptides using SignalP and non-classical sec pathway using SecretomeP tools.
For ES proteins, key pathways, including Fc epsilon RI, T cell receptor, and chemokine signalling as well as leukocyte transendothelial migration were inferred to be linked to immune responses, along with other pathways related to neurodegenerative diseases and infectious diseases, which warrant detailed future studies. KAAS could identify new and updated pathways like phagosome and protein processing in endoplasmic reticulum. Domain analysis for the assembled dataset revealed families of serine, cysteine and proteinase inhibitors which might represent targets for parasite intervention. InterProScan could identify GO terms pertaining to the extracellular region. Some of the important domain families identified included the SCP-like extracellular proteins which belong to the pathogenesis-related proteins (PRPs) superfamily along with C-type lectin, saposin-like proteins. The 'extracellular region' that corresponds to allergen V5/Tpx-1 related, considered important in parasite-host interactions, was also identified.
Six cysteine motif (SXC1) proteins, transthyretin proteins, C-type lectins, activation-associated secreted proteins (ASPs), which could represent potential candidates for developing novel anthelmintics or vaccines were few other important findings. Of these, SXC1, protein kinase domain-containing protein, trypsin family protein, trypsin-like protease family member (TRY-1), putative major allergen and putative lipid binding protein were identified which have not been reported in the published T. circumcincta proteomics analysis.
Detailed analysis of 6,058 raw EST sequences from dbEST revealed 315 putatively secreted proteins. Amongst them, C-type single domain activation associated secreted protein ASP3 precursor, activation-associated secreted proteins (ASP-like protein), cathepsin B-like cysteine protease, cathepsin L cysteine protease, cysteine protease, TransThyretin-Related and Venom-Allergen-like proteins were the key findings.
We have annotated a large dataset ESTs of T. circumcincta and undertaken detailed comparative bioinformatics analyses. The results provide a comprehensive insight into the molecular biology of this parasite and disease manifestation which provides potential focal point for future research. We identified a number of pathways responsible for immune response. This type of large-scale computational scanning could be coupled with proteomic and metabolomic studies of this parasite leading to novel therapeutic intervention and disease control strategies. We have also successfully affirmed the use of bioinformatics tools, for the study of ESTs, which could now serve as a benchmark for the development of new computational EST analysis pipelines.
PMCID: PMC3521389  PMID: 23282110
16.  The Transcriptome Analysis of Strongyloides stercoralis L3i Larvae Reveals Targets for Intervention in a Neglected Disease 
Strongyloidiasis is one of the most neglected diseases distributed worldwide with endemic areas in developed countries, where chronic infections are life threatening. Despite its impact, very little is known about the molecular biology of the parasite involved and its interplay with its hosts. Next generation sequencing technologies now provide unique opportunities to rapidly address these questions.
Principal Findings
Here we present the first transcriptome of the third larval stage of S. stercoralis using 454 sequencing coupled with semi-automated bioinformatic analyses. 253,266 raw sequence reads were assembled into 11,250 contiguous sequences, most of which were novel. 8037 putative proteins were characterized based on homology, gene ontology and/or biochemical pathways. Comparison of the transcriptome of S. strongyloides with those of other nematodes, including S. ratti, revealed similarities in transcription of molecules inferred to have key roles in parasite-host interactions. Enzymatic proteins, like kinases and proteases, were abundant. 1213 putative excretory/secretory proteins were compiled using a new pipeline which included non-classical secretory proteins. Potential drug targets were also identified.
Overall, the present dataset should provide a solid foundation for future fundamental genomic, proteomic and metabolomic explorations of S. stercoralis, as well as a basis for applied outcomes, such as the development of novel methods of intervention against this neglected parasite.
Author Summary
Strongyloides stercoralis (Nematoda) is an important parasite of humans, causing Strongyloidiasis, considered as one of the most neglected diseases, affecting more than 100 million people worldwide. Chronic infections in endemic areas can be maintained for decades through the autoinfective cycle with the L3 filariform larvae. In these areas, misdiagnosis, inadequate treatment and the facilitation of hyperinfection syndrome by immunosupression are frequent and contribute to a high mortality rate. Among the affected areas, chronic patients have been described in the Valencian Mediterranean coastal region of Spain. Despite its serious impact, very little is known about this parasite and its relationship with its hosts at the molecular level, and more effective diagnostic tests and treatments are needed. Next generation sequencing technologies now provide unique opportunities to rapidly advance in these areas. In this study, we present the first transcriptome of S. stercoralis L3i using 454 sequencing followed by semi-automated bioinformatic analyses. Our study identifies 8037 putative proteins based on homology, gene ontology, and/or biochemical pathways, including putative excretory/secretory proteins as well as potential drug targets. The present dataset provides a useful resource and adds greatly to our understanding of a human parasite affecting both developed and developing countries.
PMCID: PMC3289599  PMID: 22389732
17.  Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference 
BMC Bioinformatics  2011;12(Suppl 13):S1.
The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design.
PMCID: PMC3278825  PMID: 22372736
18.  In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance 
BMC Bioinformatics  2011;12(Suppl 13):S25.
Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.
A set of compounds active against parasitic nematodes were collated from various literature sources including PubChem while the inactive set was derived from DrugBank database. The support vector machine (SVM) algorithm was used for model development, and stratified ten-fold cross validation was used to evaluate the performance of each classifier. The best results were obtained using the radial basis function kernel. The SVM method achieved an accuracy of 81.79% on an independent test set. Using the model developed above, we were able to indentify novel compounds with potential anthelmintic activity.
In this study, we successfully present the SVM approach for predicting compounds active against parasitic nematodes which suggests the effectiveness of computational approaches for antiparasitic drug discovery. Although, the accuracy obtained is lower than the previously reported in a similar study but we believe that our model is more robust because we intentionally employed stringent criteria to select inactive dataset thus making it difficult for the model to classify compounds. The method presents an alternative approach to the existing traditional methods and may be useful for predicting hitherto novel anthelmintic compounds.
PMCID: PMC3278842  PMID: 22373185
19.  InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia 
BMC Genomics  2011;12(Suppl 3):S1.
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
PMCID: PMC3333168  PMID: 22369160
20.  In silico secretome analysis approach for next generation sequencing transcriptomic data 
BMC Genomics  2011;12(Suppl 3):S14.
Excretory/secretory proteins (ESPs) play a major role in parasitic infection as they are present at the host-parasite interface and regulate host immune system. In case of parasitic helminths, transcriptomics has been used extensively to understand the molecular basis of parasitism and for developing novel therapeutic strategies against parasitic infections. However, none of transcriptomic studies have extensively covered ES protein prediction for identifying novel therapeutic targets, especially as parasites adopt non-classical secretion pathways.
We developed a semi-automated computational approach for prediction and annotation of ES proteins using transcriptomic data from next generation sequencing platforms. For the prediction of non-classically secreted proteins, we have used an improved computational strategy, together with homology matching to a dataset of experimentally determined parasitic helminth ES proteins. We applied this protocol to analyse 454 short reads of parasitic nematode, Strongyloides ratti. From 296231 reads, we derived 28901 contigs, which were translated into 20877 proteins. Based on our improved ES protein prediction pipeline, we identified 2572 ES proteins, of which 407 (1.9%) proteins have classical N-terminal signal peptides, 923 (4.4%) were computationally identified as non-classically secreted while 1516 (7.26%) were identified by homology to experimentally identified parasitic helminth ES proteins. Out of 2572 ES proteins, 2310 (89.8%) ES proteins had homologues in the free-living nematode Caenorhabditis elegans and 2220 (86.3%) in parasitic nematodes. We could functionally annotate 1591 (61.8%) ES proteins with protein families and domains and establish pathway associations for 691 (26.8%) proteins. In addition, we have identified 19 representative ES proteins, which have no homologues in the host organism but homologous to lethal RNAi phenotypes in C. elegans, as potential therapeutic targets.
We report a comprehensive approach using freely available computational tools for the secretome analysis of NGS data. This approach has been applied to S. ratti 454 transcriptomic data for in silico excretory/secretory proteins prediction and analysis, providing a foundation for developing new therapeutic solutions for parasitic infections.
PMCID: PMC3333173  PMID: 22369360
21.  A comparative structural bioinformatics analysis of inherited mutations in β-D-Mannosidase across multiple species reveals a genotype-phenotype correlation  
BMC Genomics  2011;12(Suppl 3):S22.
Lysosomal β-D-mannosidase is a glycosyl hydrolase that breaks down the glycosidic bonds at the non-reducing end of N-linked glycoproteins. Hence, it is a crucial enzyme in polysaccharide degradation pathway. Mutations in the MANBA gene that codes for lysosomal β-mannosidase, result in improper coding and malfunctioning of protein, leading to β-mannosidosis. Studying the location of mutations on the enzyme structure is a rational approach in order to understand the functional consequences of these mutations. Accordingly, the pathology and clinical manifestations of the disease could be correlated to the genotypic modifications.
The wild-type and inherited mutations of β-mannosidase were studied across four different species, human, cow, goat and mouse employing a previously demonstrated comprehensive homology modeling and mutational mapping technique, which reveals a correlation between the variation of genotype and the severity of phenotype in β-mannosidosis. X-ray crystallographic structure of β-mannosidase from Bacteroides thetaiotaomicron was used as template for 3D structural modeling of the wild-type enzymes containing all the associated ligands. These wild-type models subsequently served as templates for building mutational structures. Truncations account for approximately 70% of the mutational cases. In general, the proximity of mutations to the active site determines the severity of phenotypic expressions. Mapping mutations to the MANBA gene sequence has identified five mutational hot-spots.
Although restrained by a limited dataset, our comprehensive study suggests a genotype-phenotype correlation in β-mannosidosis. A predictive approach for detecting likely β-mannosidosis is also demonstrated where we have extrapolated observed mutations from one species to homologous positions in other organisms based on the proximity of the mutations to the enzyme active site and their co-location from different organisms. Apart from aiding the detection of mutational hotspots in the gene, where novel mutations could be disease-implicated, this approach also provides a way to predict new disease mutations. Higher expression of the exoglycosidase chitobiase is said to play a vital role in determining disease phenotypes in human and mouse. A bigger dataset of inherited mutations as well as a parallel study of β-mannosidase and chitobiase activities in prospective patients would be interesting to better understand the underlying reasons for β-mannosidosis.
PMCID: PMC3333182  PMID: 22369051
22.  Structural diversity of biologically interesting datasets: a scaffold analysis approach 
The recent public availability of the human metabolome and natural product datasets has revitalized "metabolite-likeness" and "natural product-likeness" as a drug design concept to design lead libraries targeting specific pathways. Many reports have analyzed the physicochemical property space of biologically important datasets, with only a few comprehensively characterizing the scaffold diversity in public datasets of biological interest. With large collections of high quality public data currently available, we carried out a comparative analysis of current day leads with other biologically relevant datasets.
In this study, we note a two-fold enrichment of metabolite scaffolds in drug dataset (42%) as compared to currently used lead libraries (23%). We also note that only a small percentage (5%) of natural product scaffolds space is shared by the lead dataset. We have identified specific scaffolds that are present in metabolites and natural products, with close counterparts in the drugs, but are missing in the lead dataset. To determine the distribution of compounds in physicochemical property space we analyzed the molecular polar surface area, the molecular solubility, the number of rings and the number of rotatable bonds in addition to four well-known Lipinski properties. Here, we note that, with only few exceptions, most of the drugs follow Lipinski's rule. The average values of the molecular polar surface area and the molecular solubility in metabolites is the highest while the number of rings is the lowest. In addition, we note that natural products contain the maximum number of rings and the rotatable bonds than any other dataset under consideration.
Currently used lead libraries make little use of the metabolites and natural products scaffold space. We believe that metabolites and natural products are recognized by at least one protein in the biosphere therefore, sampling the fragment and scaffold space of these compounds, along with the knowledge of distribution in physicochemical property space, can result in better lead libraries. Hence, we recommend the greater use of metabolites and natural products while designing lead libraries. Nevertheless, metabolites have a limited distribution in chemical space that limits the usage of metabolites in library design.
PMCID: PMC3179739  PMID: 21824432
23.  Understanding TR Binding to pMHC Complexes: How Does a TR Scan Many pMHC Complexes yet Preferentially Bind to One 
PLoS ONE  2011;6(2):e17194.
Understanding the basis of the binding of a T cell receptor (TR) to the peptide-MHC (pMHC) complex is essential due to the vital role it plays in adaptive immune response. We describe the use of computed binding (free) energy (BE), TR paratope, pMHC epitope, molecular surface electrostatic potential (MSEP) and calculated TR docking angle (θ) to analyse 61 TR/pMHC crystallographic structures to comprehend TR/pMHC interaction. In doing so, we have successfully demonstrated a novel/rational approach for θ calculation, obtained a linear correlation between BE and θ without any “codon” or amino acid preference, provided an explanation for TR ability to scan many pMHC ligands yet specifically bind one, proposed a mechanism for pMHC recognition by TR leading to T cell activation and illustrated the importance of the peptide in determining TR specificity, challenging the “germline bias” theory.
PMCID: PMC3043089  PMID: 21364947
24.  Towards BioDBcore: a community-defined information specification for biological databases 
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
PMCID: PMC3017395  PMID: 21205783
25.  Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop 
Standards in Genomic Sciences  2010;3(3):259-266.
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at
PMCID: PMC3035314  PMID: 21304730

Results 1-25 (67)