PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (37)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  West Nile Virus T-Cell Ligand Sequences Shared with Other Flaviviruses: a Multitude of Variant Sequences as Potential Altered Peptide Ligands 
Journal of Virology  2012;86(14):7616-7624.
Phylogenetic relatedness and cocirculation of several major human pathogen flaviviruses are recognized as a possible cause of deleterious immune responses to mixed infection or immunization and call for a greater understanding of the inter-Flavivirus protein homologies. This study focused on the identification of human leukocyte antigen (HLA)-restricted West Nile virus (WNV) T-cell ligands and characterization of their distribution in reported sequence data of WNV and other flaviviruses. H-2-deficient mice transgenic for either A2, A24, B7, DR2, DR3, or DR4 HLA alleles were immunized with overlapping peptides of the WNV proteome, and peptide-specific T-cell activation was measured by gamma interferon (IFN-γ) enzyme-linked immunosorbent spot (ELISpot) assays. Approximately 30% (137) of the WNV proteome peptides were identified as HLA-restricted T-cell ligands. The majority of these ligands were conserved in ∼≥88% of analyzed WNV sequences. Notably, only 51 were WNV specific, and the remaining 86, chiefly of E, NS3, and NS5, shared an identity of nine or more consecutive amino acids with sequences of 64 other flaviviruses, including several major human pathogens. Many of the shared ligands had an incidence of >50% in the analyzed sequences of one or more of six major flaviviruses. The multitude of WNV sequences shared with other flaviviruses as interspecies variants highlights the possible hazard of defective T-cell activation by altered peptide ligands in the event of dual exposure to WNV and other flaviviruses, by either infection or immunization. The data suggest the possible preferred use of sequences that are pathogen specific with minimum interspecies sequence homology for the design of Flavivirus vaccines.
doi:10.1128/JVI.00166-12
PMCID: PMC3416302  PMID: 22573867
2.  Advances in translational bioinformatics and population genomics in the Asia-Pacific 
BMC Genomics  2012;13(Suppl 7):S1.
The theme of the 2012 International Conference on Bioinformatics (InCoB) in Bangkok, Thailand was "From Biological Data to Knowledge to Technological Breakthroughs." Besides providing a forum for life scientists and bioinformatics researchers in the Asia-Pacific region to meet and interact, the conference also hosted thematic sessions on the Pan-Asian Pacific Genome Initiative and immunoinformatics. Over the seven years of conference papers published in BMC Bioinformatics and four years in BMC Genomics, we note that there is increasing interest in the applications of -omics technologies to the understanding of diseases, as a forerunner to personalized genomic medicine.
doi:10.1186/1471-2164-13-S7-S1
PMCID: PMC3521394  PMID: 23282089
3.  InCoB2012 Conference: from biological data to knowledge to technological breakthroughs 
BMC Bioinformatics  2012;13(Suppl 17):S1.
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China.
doi:10.1186/1471-2105-13-S17-S1
PMCID: PMC3521245  PMID: 23281929
4.  Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference 
BMC Bioinformatics  2011;12(Suppl 13):S1.
The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design.
doi:10.1186/1471-2105-12-S13-S1
PMCID: PMC3278825  PMID: 22372736
5.  InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia 
BMC Genomics  2011;12(Suppl 3):S1.
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
doi:10.1186/1471-2164-12-S3-S1
PMCID: PMC3333168  PMID: 22369160
6.  Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and Minimum Information About a Bioinformatics investigation (MIABi) 
BMC Genomics  2010;11(Suppl 4):S27.
The 2010 International Conference on Bioinformatics, InCoB2010, which is the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet) has agreed to publish conference papers in compliance with the proposed Minimum Information about a Bioinformatics investigation (MIABi), proposed in June 2009. Authors of the conference supplements in BMC Bioinformatics, BMC Genomics and Immunome Research have consented to cooperate in this process, which will include the procedures described herein, where appropriate, to ensure data and software persistence and perpetuity, database and resource re-instantiability and reproducibility of results, author and contributor identity disambiguation and MIABi-compliance. Wherever possible, datasets and databases will be submitted to depositories with standardized terminologies. As standards are evolving, this process is intended as a prelude to the 100 BioDatabases (BioDB100) initiative whereby APBioNet collaborators will contribute exemplar databases to demonstrate the feasibility of standards-compliance and participate in refining the process for peer-review of such publications and validation of scientific claims and standards compliance. This testbed represents another step in advancing standards-based processes in the bioinformatics community which is essential to the growing interoperability of biological data, information, knowledge and computational resources.
doi:10.1186/1471-2164-11-S4-S27
PMCID: PMC3005918  PMID: 21143811
7.  Challenges of the next decade for the Asia Pacific region: 2010 International Conference in Bioinformatics (InCoB 2010) 
BMC Genomics  2010;11(Suppl 4):S1.
The 2010 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia’s oldest bioinformatics organisation formed in 1998, was organized as the 9th International Conference on Bioinformatics (InCoB), Sept. 26-28, 2010 in Tokyo, Japan. Initially, APBioNet created InCoB as forum to foster bioinformatics in the Asia Pacific region. Given the growing importance of interdisciplinary research, InCoB2010 included topics targeting scientists in the fields of genomic medicine, immunology and chemoinformatics, supporting translational research. Peer-reviewed manuscripts that were accepted for publication in this supplement, represent key areas of research interests that have emerged in our region. We also highlight some of the current challenges bioinformatics is facing in the Asia Pacific region and conclude our report with the announcement of APBioNet’s 100 BioDatabases (BioDB100) initiative. BioDB100 will comply with the database criteria set out earlier in our proposal for Minimum Information about a Bioinformatics and Investigation (MIABi), setting the standards for biocuration and bioinformatics research, on which we will report at the next InCoB, Nov. 27 – Dec. 2, 2011 at Kuala Lumpur, Malaysia.
doi:10.1186/1471-2164-11-S4-S1
PMCID: PMC3005919  PMID: 21143792
8.  InCoB2010 - 9th International Conference on Bioinformatics at Tokyo, Japan, September 26-28, 2010 
BMC Bioinformatics  2010;11(Suppl 7):S1.
The International Conference on Bioinformatics (InCoB), the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted in one of countries of the Asia-Pacific region. The 2010 conference was awarded to Japan and has attracted more than one hundred high-quality research paper submissions. Thorough peer reviewing resulted in 47 (43.5%) accepted papers out of 108 submissions. Submissions from Japan, R.O. Korea, P.R. China, Australia, Singapore and U.S.A totaled 43.8% and contributed to 57.4% of accepted papers. Manuscripts originating from Taiwan and India added up to 42.8% of submissions and 28.3% of acceptances. The fifteen articles published in this BMC Bioinformatics supplement cover disease informatics, structural bioinformatics and drug design, biological databases and software tools, signaling pathways, gene regulatory and biochemical networks, evolution and sequence analysis.
doi:10.1186/1471-2105-11-S7-S1
PMCID: PMC2957677  PMID: 21106116
9.  Classification of Dengue Fever Patients Based on Gene Expression Data Using Support Vector Machines 
PLoS ONE  2010;5(6):e11267.
Background
Symptomatic infection by dengue virus (DENV) can range from dengue fever (DF) to dengue haemorrhagic fever (DHF), however, the determinants of DF or DHF progression are not completely understood. It is hypothesised that host innate immune response factors are involved in modulating the disease outcome and the expression levels of genes involved in this response could be used as early prognostic markers for disease severity.
Methodology/Principal Findings
mRNA expression levels of genes involved in DENV innate immune responses were measured using quantitative real time PCR (qPCR). Here, we present a novel application of the support vector machines (SVM) algorithm to analyze the expression pattern of 12 genes in peripheral blood mononuclear cells (PBMCs) of 28 dengue patients (13 DHF and 15 DF) during acute viral infection. The SVM model was trained using gene expression data of these genes and achieved the highest accuracy of ∼85% with leave-one-out cross-validation. Through selective removal of gene expression data from the SVM model, we have identified seven genes (MYD88, TLR7, TLR3, MDA5, IRF3, IFN-α and CLEC5A) that may be central in differentiating DF patients from DHF, with MYD88 and TLR7 observed to be the most important. Though the individual removal of expression data of five other genes had no impact on the overall accuracy, a significant combined role was observed when the SVM model of the two main genes (MYD88 and TLR7) was re-trained to include the five genes, increasing the overall accuracy to ∼96%.
Conclusions/Significance
Here, we present a novel use of the SVM algorithm to classify DF and DHF patients, as well as to elucidate the significance of the various genes involved. It was observed that seven genes are critical in classifying DF and DHF patients: TLR3, MDA5, IRF3, IFN-α, CLEC5A, and the two most important MYD88 and TLR7. While these preliminary results are promising, further experimental investigation is necessary to validate their specific roles in dengue disease.
doi:10.1371/journal.pone.0011267
PMCID: PMC2890409  PMID: 20585645
10.  Computational Epigenetics: the new scientific paradigm 
Bioinformation  2010;4(7):331-337.
Epigenetics has recently emerged as a critical field for studying how non-gene factors can influence the traits and functions of an organism. At the core of this new wave of research is the use of computational tools that play critical roles not only in directing the selection of key experiments, but also in formulating new testable hypotheses through detailed analysis of complex genomic information that is not achievable using traditional approaches alone. Epigenomics, which combines traditional genomics with computer science, mathematics, chemistry, biochemistry and proteomics for the large-scale analysis of heritable changes in phenotype, gene function or gene expression that are not dependent on gene sequence, offers new opportunities to further our understanding of transcriptional regulation, nuclear organization, development and disease. This article examines existing computational strategies for the study of epigenetic factors. The most important databases and bioinformatic tools in this rapidly growing field have been reviewed.
PMCID: PMC2957762  PMID: 20978607
epigenetic informatics; epigenetics; epigenomics; bioinformatics
11.  A multi-factor model for caspase degradome prediction 
BMC Genomics  2009;10(Suppl 3):S6.
Background
Caspases belong to a class of cysteine proteases which function as critical effectors in cellular processes such as apoptosis and inflammation by cleaving substrates immediately after unique tetrapeptide sites. With hundreds of reported substrates and many more expected to be discovered, the elucidation of the caspase degradome will be an important milestone in the study of these proteases in human health and disease. Several computational methods for predicting caspase cleavage sites have been developed recently for identifying potential substrates. However, as most of these methods are based primarily on the detection of the tetrapeptide cleavage sites - a factor necessary but not sufficient for predicting in vivo substrate cleavage - prediction outcomes will inevitably include many false positives.
Results
In this paper, we show that structural factors such as the presence of disorder and solvent exposure in the vicinity of the cleavage site are important and can be used to enhance results from cleavage site prediction. We constructed a two-step model incorporating cleavage site prediction and these factors to predict caspase substrates. Sequences are first predicted for cleavage sites using CASVM or GraBCas. Predicted cleavage sites are then scored, ranked and filtered against a cut-off based on their propensities for locating in disordered and solvent exposed regions. Using an independent dataset of caspase substrates, the model was shown to achieve greater positive predictive values compared to CASVM or GraBCas alone, and was able to reduce the false positives pool by up to 13% and 53% respectively while retaining all true positives. We applied our prediction model on the family of receptor tyrosine kinases (RTKs) and highlighted several members as potential caspase targets. The results suggest that RTKs may be generally regulated by caspase cleavage and in some cases, promote the induction of apoptotic cell death - a function distinct from their role as transducers of survival and growth signals.
Conclusion
As a step towards the prediction of in vivo caspase substrates, we have developed an accurate method incorporating cleavage site prediction and structural factors. The multi-factor model augments existing methods and complements experimental efforts to define the caspase degradome on the systems-wide basis.
doi:10.1186/1471-2164-10-S3-S6
PMCID: PMC2788393  PMID: 19958504
12.  A proposed minimum skill set for university graduates to meet the informatics needs and challenges of the "-omics" era 
BMC Genomics  2009;10(Suppl 3):S36.
Background
The development of high throughput experimental technologies have given rise to the "-omics" era where terabyte-scale datasets for systems-level measurements of various cellular and molecular phenomena pose considerable challenges in data processing and extraction of biological meaning. Moreover, it has created an unmet need for the effective integration of these datasets to achieve insights into biological systems. While it has increased the demand for bioinformatics experts who can interface with biologists, it has also raised the requirement for biologists to possess a basic capability in bioinformatics and to communicate seamlessly with these experts. This may be achieved by embedding in their undergraduate and graduate life science education, basic training in bioinformatics geared towards acquiring a minimum skill set in computation and informatics.
Results
Based on previous attempts to define curricula suitable for addressing the bioinformatics capability gap, an initiative was taken during the Workshops on Education in Bioinformatics and Computational Biology (WEBCB) in 2008 and 2009 to identify a minimum skill set for the training of future bioinformaticians and molecular biologists with informatics capabilities. The minimum skill set proposed is cross-disciplinary in nature, involving a combination of knowledge and proficiency from the fields of biology, computer science, mathematics and statistics, and can be tailored to the needs of the "-omics".
Conclusion
The proposed bioinformatics minimum skill set serves as a guideline for biology curriculum design and development in universities at both the undergraduate and graduate levels.
doi:10.1186/1471-2164-10-S3-S36
PMCID: PMC2788390  PMID: 19958501
13.  Extending Asia Pacific bioinformatics into new realms in the "-omics" era 
BMC Genomics  2009;10(Suppl 3):S1.
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation dating back to 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 7-11, 2009 at Biopolis, Singapore. Besides bringing together scientists from the field of bioinformatics in this region, InCoB has actively engaged clinicians and researchers from the area of systems biology, to facilitate greater synergy between these two groups. InCoB2009 followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India), Hong Kong and Taipei (Taiwan), with InCoB2010 scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. The Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and symposia on Clinical Bioinformatics (CBAS), the Singapore Symposium on Computational Biology (SYMBIO) and training tutorials were scheduled prior to the scientific meeting, and provided ample opportunity for in-depth learning and special interest meetings for educators, clinicians and students. We provide a brief overview of the peer-reviewed bioinformatics manuscripts accepted for publication in this supplement, grouped into thematic areas. In order to facilitate scientific reproducibility and accountability, we have, for the first time, introduced minimum information criteria for our pubilcations, including compliance to a Minimum Information about a Bioinformatics Investigation (MIABi). As the regional research expertise in bioinformatics matures, we have delineated a minimum set of bioinformatics skills required for addressing the computational challenges of the "-omics" era.
doi:10.1186/1471-2164-10-S3-S1
PMCID: PMC2788361  PMID: 19958472
14.  A comprehensive assessment of N-terminal signal peptides prediction methods 
BMC Bioinformatics  2009;10(Suppl 15):S2.
Background
Amino-terminal signal peptides (SPs) are short regions that guide the targeting of secretory proteins to the correct subcellular compartments in the cell. They are cleaved off upon the passenger protein reaching its destination. The explosive growth in sequencing technologies has led to the deposition of vast numbers of protein sequences necessitating rapid functional annotation techniques, with subcellular localization being a key feature. Of the myriad software prediction tools developed to automate the task of assigning the SP cleavage site of these new sequences, we review here, the performance and reliability of commonly used SP prediction tools.
Results
The available signal peptide data has been manually curated and organized into three datasets representing eukaryotes, Gram-positive and Gram-negative bacteria. These datasets are used to evaluate thirteen prediction tools that are publicly available. SignalP (both the HMM and ANN versions) maintains consistency and achieves the best overall accuracy in all three benchmarking experiments, ranging from 0.872 to 0.914 although other prediction tools are narrowing the performance gap.
Conclusion
The majority of the tools evaluated in this study encounter no difficulty in discriminating between secretory and non-secretory proteins. The challenge clearly remains with pinpointing the correct SP cleavage site. The composite scoring schemes employed by SignalP may help to explain its accuracy. Prediction task is divided into a number of separate steps, thus allowing each score to tackle a particular aspect of the prediction.
doi:10.1186/1471-2105-10-S15-S2
PMCID: PMC2788353  PMID: 19958512
15.  Conservation and Variability of West Nile Virus Proteins 
PLoS ONE  2009;4(4):e5352.
West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of ≤1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (≤10% of the WNV sequences analyzed). Eighty-eight fragments of length 9–29 amino acids, representing ∼34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivivirus infections, and for studies of homologous sequences among other flaviviruses.
doi:10.1371/journal.pone.0005352
PMCID: PMC2670515  PMID: 19401763
16.  Emerging strengths in Asia Pacific bioinformatics 
BMC Bioinformatics  2008;9(Suppl 12):S1.
The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts.
doi:10.1186/1471-2105-9-S12-S1
PMCID: PMC2638166  PMID: 19091008
17.  Bioinformatics research in the Asia Pacific: a 2007 update 
BMC Bioinformatics  2008;9(Suppl 1):S1.
We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27–30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours.
doi:10.1186/1471-2105-9-S1-S1
PMCID: PMC2259402  PMID: 18315840
18.  Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses 
BMC Bioinformatics  2008;9(Suppl 1):S7.
Background
The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation.
Results
Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections.
Conclusion
To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field.
doi:10.1186/1471-2105-9-S1-S7
PMCID: PMC2259408  PMID: 18315860
19.  Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis 
BMC Bioinformatics  2008;9(Suppl 1):S18.
Background
The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components.
Results
We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts.
Conclusion
By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes.
doi:10.1186/1471-2105-9-S1-S18
PMCID: PMC2259419  PMID: 18315849
20.  A systematic bioinformatics approach for selection of epitope-based vaccine targets 
Cellular immunology  2007;244(2):141-147.
Epitope-based vaccines provide a new strategy for prophylactic and therapeutic application of pathogen-specific immunity. A critical requirement of this strategy is the identification and selection of T-cell epitopes that act as vaccine targets. This study describes current methodologies for the selection process, with dengue virus as a model system. A combination of publicly available bioinformatics algorithms and computational tools are used to screen and select antigen sequences as potential T-cell epitopes of supertype HLA alleles. The selected sequences are tested for biological function by their activation of T-cells of HLA transgenic mice and of pathogen infected subjects. This approach provides an experimental basis for the design of pathogen specific, T-cell epitope-based vaccines that are targeted to majority of the genetic variants of the pathogen, and are effective for a broad range of differences in human leukocyte antigens among the global human population.
doi:10.1016/j.cellimm.2007.02.005
PMCID: PMC2041846  PMID: 17434154
T-cell epitopes; epitope-based vaccines; bioinformatics; pathogens; immune system; entropy; conserved sequences; immunological hotspots; altered-ligand effect; supertypes
21.  In silico characterization of immunogenic epitopes presented by HLA-Cw*0401 
Immunome Research  2007;3:7.
Background
HLA-C locus products are poorly understood in part due to their low expression at the cell surface. Recent data indicate that these molecules serve as major restriction elements for human immunodeficiency virus type 1 (HIV-1) cytotoxic T lymphocyte (CTL) epitopes. We report here a structure-based technique for the prediction of peptides binding to Cw*0401. The models were rigorously trained, tested and validated using experimentally verified Cw*0401 binding and non-binding peptides obtained from biochemical studies. A new scoring scheme facilitates the identification of immunological hot spots within antigens, based on the sum of predicted binding energies of the top four binders within a window of 30 amino acids.
Results
High predictivity is achieved when tested on the training (r2 = 0.88, s = 3.56 kJ/mol, q2 = 0.84, spress = 5.18 kJ/mol) and test (AROC = 0.93) datasets. Characterization of the predicted Cw*0401 binding sequences indicate that amino acids at key anchor positions share common physico-chemical properties which correlate well with existing experimental studies.
Conclusion
The analysis of predicted Cw*0401-binding peptides showed that anchor residues may not be restrictive and the Cw*0401 binding pockets may possibly accommodate a wide variety of peptides with common physico-chemical properties. The potential Cw*0401-specific T-cell epitope repertoires for HIV-1 p24gag and gp160gag glycoproteins are well distributed throughout both glycoproteins, with thirteen and nine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins respectively. These findings provide new insights into HLA-C peptide selectivity, indicating that pre-selection of candidate HLA-C peptides may occur at the TAP level, prior to peptide loading in the endoplasmic reticulum.
doi:10.1186/1745-7580-3-7
PMCID: PMC2040137  PMID: 17705876
22.  Establishing bioinformatics research in the Asia Pacific 
BMC Bioinformatics  2006;7(Suppl 5):S1.
In 1998, the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation was set up to champion the advancement of bioinformatics in the Asia Pacific. By 2002, APBioNet was able to gain sufficient critical mass to initiate the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2006 Conference was organized as the 5th annual conference of the Asia-Pacific Bioinformatics Network, on Dec. 18–20, 2006 in New Delhi, India, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand) and Busan (South Korea). This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. It exemplifies a typical snapshot of the growing research excellence in bioinformatics of the region as we embark on a trajectory of establishing a solid bioinformatics research culture in the Asia Pacific that is able to contribute fully to the global bioinformatics community.
doi:10.1186/1471-2105-7-S5-S1
PMCID: PMC1764485
23.  SVM-based prediction of caspase substrate cleavage sites 
BMC Bioinformatics  2006;7(Suppl 5):S14.
Background
Caspases belong to a class of cysteine proteases which function as critical effectors in apoptosis and inflammation by cleaving substrates immediately after unique sites. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. Recently, different computational methods have been developed to predict the cleavage sites of caspase substrates with varying degrees of success. As the support vector machines (SVM) algorithm has been shown to be useful in several biological classification problems, we have implemented an SVM-based method to investigate its applicability to this domain.
Results
A set of unique caspase substrates cleavage sites were obtained from literature and used for evaluating the SVM method. Datasets containing (i) the tetrapeptide cleavage sites, (ii) the tetrapeptide cleavage sites, augmented by two adjacent residues, P1' and P2' amino acids and (iii) the tetrapeptide cleavage sites with ten additional upstream and downstream flanking sequences (where available) were tested. The SVM method achieved an accuracy ranging from 81.25% to 97.92% on independent test sets. The SVM method successfully predicted the cleavage of a novel caspase substrate and its mutants.
Conclusion
This study presents an SVM approach for predicting caspase substrate cleavage sites based on the cleavage sites and the downstream and upstream flanking sequences. The method shows an improvement over existing methods and may be useful for predicting hitherto undiscovered cleavage sites.
doi:10.1186/1471-2105-7-S5-S14
PMCID: PMC1764470  PMID: 17254298
24.  Prediction of desmoglein-3 peptides reveals multiple shared T-cell epitopes in HLA DR4- and DR6- associated Pemphigus vulgaris 
BMC Bioinformatics  2006;7(Suppl 5):S7.
Background
Pemphigus vulgaris (PV) is a severe autoimmune blistering skin disorder that is strongly associated with major histocompatibility complex class II alleles DRB1*0402 and DQB1*0503. The target antigen of PV, desmoglein 3 (Dsg3), is crucial for initiating T-cell response in early disease. Although a number of T-cell specificities within Dsg3 have been reported, the number is limited and the role of T-cells in the pathogenesis of PV remains poorly understood. We report here a structure-based model for the prediction of peptide binding to DRB1*0402 and DQB1*0503. The scoring functions were rigorously trained, tested and validated using experimentally verified peptide sequences.
Results
High predictivity is obtained for both DRB1*0402 (r2 = 0.90, s = 1.20 kJ/mol, q2 = 0.82, spress = 1.61 kJ/mol) and DQB1*0503 (r2 = 0.95, s = 1.20 kJ/mol, q2 = 0.75, spress = 2.15 kJ/mol) models, compared to experimental data. We investigated the binding patterns of Dsg3 peptides and illustrate the existence of multiple immunodominant epitopes that may be responsible for both disease initiation and propagation in PV. Further analysis reveals that DRB1*0402 and DQB1*0503 may share similar specificities by binding peptides at different binding registers, thus providing a molecular mechanism for the dual HLA association observed in PV.
Conclusion
Collectively, the results of this study provide interesting new insights into the pathology of PV. This is the first report illustrating high-level of cross-reactivity between both PV-implicated alleles, DRB1*0402 and DQB1*0503, as well as the existence of a potentially large number of T-cell epitopes throughout the entire Dsg3 extracellular domain (ECD) and transmembrane region. Our results reveal that DR4 and DR6 PV may initiate in the ECD and transmembrane region respectively, with implications for immunotherapeutic strategies for the treatment of this autoimmune disease.
doi:10.1186/1471-2105-7-S5-S7
PMCID: PMC1764484  PMID: 17254312
25.  Large-scale analysis of antigenic diversity of T-cell epitopes in dengue virus 
BMC Bioinformatics  2006;7(Suppl 5):S4.
Background
Antigenic diversity in dengue virus strains has been studied, but large-scale and detailed systematic analyses have not been reported. In this study, we report a bioinformatics method for analyzing viral antigenic diversity in the context of T-cell mediated immune responses. We applied this method to study the relationship between short-peptide antigenic diversity and protein sequence diversity of dengue virus. We also studied the effects of sequence determinants on viral antigenic diversity. Short peptides, principally 9-mers were studied because they represent the predominant length of binding cores of T-cell epitopes, which are important for formulation of vaccines.
Results
Our analysis showed that the number of unique protein sequences required to represent complete antigenic diversity of short peptides in dengue virus is significantly smaller than that required to represent complete protein sequence diversity. Short-peptide antigenic diversity shows an asymptotic relationship to the number of unique protein sequences, indicating that for large sequence sets (~200) the addition of new protein sequences has marginal effect to increasing antigenic diversity. A near-linear relationship was observed between the extent of antigenic diversity and the length of protein sequences, suggesting that, for the practical purpose of vaccine development, antigenic diversity of short peptides from dengue virus can be represented by short regions of sequences (~<100 aa) within viral antigens that are specific targets of immune responses (such as T-cell epitopes specific to particular human leukocyte antigen alleles).
Conclusion
This study provides evidence that there are limited numbers of antigenic combinations in protein sequence variants of a viral species and that short regions of the viral protein are sufficient to capture antigenic diversity of T-cell epitopes. The approach described herein has direct application to the analysis of other viruses, in particular those that show high diversity and/or rapid evolution, such as influenza A virus and human immunodeficiency virus (HIV).
doi:10.1186/1471-2105-7-S5-S4
PMCID: PMC1764481  PMID: 17254309

Results 1-25 (37)