BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/
T-cell epitope; B-cell epitope; protein-protein interaction; block entropy; sequence variability and conservation
The Asia-Pacific Bioinformatics Network (APBioNet) held the first International Conference on Bioinformatics (InCoB) in Bangkok in 2002 to promote North-South networking. Commencing as a forum for Asia-Pacific researchers to interact with and learn from with scientists of developed countries, InCoB has become a major regional bioinformatics conference, with participants from the region as well as North America and Europe. Since 2006, InCoB has selected the best submissions for publication in BMC Bioinformatics. In response to the growth and maturation of data-driven approaches, InCoB added BMC Genomics in 2009 and with the introduction of this conference supplement, BMC Systems Biology to its journal choices for submitting authors. Co-hosting InCoB2013 with the second International Conference for Translational Bioinformatics (ICTBI) is in line with InCoB's support for the current trend in taking bioinformatics to the bedside, along with a systems approach to solving biological problems.
Computational vaccinology or vaccine informatics is an interdisciplinary field that addresses scientific and clinical questions in vaccinology using computational and informatics approaches. Computational vaccinology overlaps with many other fields such as immunoinformatics, reverse vaccinology, postlicensure vaccine research, vaccinomics, literature mining, and systems vaccinology. The second ISV Pre-conference Computational Vaccinology Workshop (ICoVax 2012) was held on October 13, 2013 in Shanghai, China. A number of topics were presented in the workshop, including allergen predictions, prediction of linear T cell epitopes and functional conformational epitopes, prediction of protein-ligand binding regions, vaccine design using reverse vaccinology, and case studies in computational vaccinology. Although a significant progress has been made to date, a number of challenges still exist in the field. This Editorial provides a list of major challenges for the future of computational vaccinology and identifies developing themes that will expand and evolve over the next few years.
Peroxisomes are subcellular organelles involved in lipid metabolic processes, including those of very-long-chain fatty acids and branched-chain fatty acids, among others. Peroxisome matrix proteins are synthesized in the cytoplasm. Targeting signals (PTS or peroxisomal targeting signal) at the C-terminus (PTS1) or N-terminus (PTS2) of peroxisomal matrix proteins mediate their import into the organelle. In the case of PTS2-containing proteins, the PTS2 signal is cleaved from the protein when transported into peroxisomes. The functional mechanism of PTS2 processing, however, is poorly understood. Previously we identified Tysnd1 (Trypsin domain containing 1) and biochemically characterized it as a peroxisomal cysteine endopeptidase that directly processes PTS2-containing prethiolase Acaa1 and PTS1-containing Acox1, Hsd17b4, and ScpX. The latter three enzymes are crucial components of the very-long-chain fatty acids β-oxidation pathway. To clarify the in vivo functions and physiological role of Tysnd1, we analyzed the phenotype of Tysnd1−/− mice. Male Tysnd1−/− mice are infertile, and the epididymal sperms lack the acrosomal cap. These phenotypic features are most likely the result of changes in the molecular species composition of choline and ethanolamine plasmalogens. Tysnd1−/− mice also developed liver dysfunctions when the phytanic acid precursor phytol was orally administered. Phyh and Agps are known PTS2-containing proteins, but were identified as novel Tysnd1 substrates. Loss of Tysnd1 interferes with the peroxisomal localization of Acaa1, Phyh, and Agps, which might cause the mild Zellweger syndrome spectrum-resembling phenotypes. Our data established that peroxisomal processing protease Tysnd1 is necessary to mediate the physiological functions of PTS2-containing substrates.
Peroxisomes are subcellular organelles that are present in almost all eukaryotic cells. The syllables “per-oxi” reflect the oxidative functions of these single-membrane-bound organelles in various metabolic processes, including those of very-long-chain fatty acids and branched-chain fatty acids. In an earlier study we identified a protease named Tysnd1 that is specifically located in the peroxisomes and processes the enzymes catalyzing the peroxisomal β-oxidation of very-long-chain fatty acids. In this study, we identified two novel Tysnd1 substrates, Agps and Phyh, which are involved in plasmalogen synthesis and phytanic acid metabolism, respectively. To further investigate the in vivo function of Tysnd1, we analyzed Tysnd1 knock-out mice. Mice that lack Tysnd1 showed reduced peroxisomal β-oxidation activity and an altered plasmalogen composition, as well as an abnormal phytanic acid metabolism. Male infertility is one of the major phenotypic manifestations of Tysnd1 deficiency. Our data support the idea that Tysnd1 affects the localization and activity of some of its substrates inside peroxisomes. Altogether, our Tysnd1-deficient mouse model expands the current peroxisome biology knowledge with regard to the molecular pathogenic mechanisms that may be relevant to some patients with Zellweger syndrome spectrum disorders.
The pandemic 2009-H1N1 influenza virus circulated in the human population and caused thousands deaths worldwide. Studies on pandemic influenza vaccines have shown that T cell recognition to conserved epitopes and cross-reactive T cell responses are important when new strains emerge, especially in the absence of antibody cross-reactivity. In this work, using HLA-B*4405 and DM1-TCR structure model, we systematically generated high confidence conserved 2009-H1N1 T cell epitope candidates and investigated their potential cross-reactivity against H5N1 avian flu virus.
Molecular docking analysis of differential DM1-TCR recognition of the 2009-H1N1 epitope candidates yielded a mosaic epitope (KEKMNTEFW) and potential H5N1 HA cross-reactive epitopes that could be applied as multivalent peptide towards influenza A vaccine development. Structural models of TCR cross-recognition between 2009-H1N1 and 2004-H5N1 revealed steric and topological effects of TCR contact residue mutations on TCR binding affinity.
The results are novel with regard to HA epitopes and useful for developing possible vaccination strategies against the rapidly changing influenza viruses. Yet, the challenge of identifying epitope candidates that result in heterologous T cell immunity under natural influenza infection conditions can only be overcome if more structural data on the TCR repertoire become available.
The theme of the 2012 International Conference on Bioinformatics (InCoB) in Bangkok, Thailand was "From Biological Data to Knowledge to Technological Breakthroughs." Besides providing a forum for life scientists and bioinformatics researchers in the Asia-Pacific region to meet and interact, the conference also hosted thematic sessions on the Pan-Asian Pacific Genome Initiative and immunoinformatics. Over the seven years of conference papers published in BMC Bioinformatics and four years in BMC Genomics, we note that there is increasing interest in the applications of -omics technologies to the understanding of diseases, as a forerunner to personalized genomic medicine.
Ten years ago when Asia-Pacific Bioinformatics Network held the first International Conference on Bioinformatics (InCoB) in Bangkok its theme was North-South Networking. At that time InCoB aimed to provide biologists and bioinformatics researchers in the Asia-Pacific region a forum to meet, interact with, and disseminate knowledge about the burgeoning field of bioinformatics. Meanwhile InCoB has evolved into a major regional bioinformatics conference that attracts not only talented and established scientists from the region but increasingly also from East Asia, North America and Europe. Since 2006 InCoB yielded 114 articles in BMC Bioinformatics supplement issues that have been cited nearly 1,000 times to date. In part, these developments reflect the success of bioinformatics education and continuous efforts to integrate and utilize bioinformatics in biotechnology and biosciences in the Asia-Pacific region. A cross-section of research leading from biological data to knowledge and to technological applications, the InCoB2012 theme, is introduced in this editorial. Other highlights included sessions organized by the Pan-Asian Pacific Genome Initiative and a Machine Learning in Immunology competition. InCoB2013 is scheduled for September 18-21, 2013 at Suzhou, China.
The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design.
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
The 2010 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia’s oldest bioinformatics organisation formed in 1998, was organized as the 9th International Conference on Bioinformatics (InCoB), Sept. 26-28, 2010 in Tokyo, Japan. Initially, APBioNet created InCoB as forum to foster bioinformatics in the Asia Pacific region. Given the growing importance of interdisciplinary research, InCoB2010 included topics targeting scientists in the fields of genomic medicine, immunology and chemoinformatics, supporting translational research. Peer-reviewed manuscripts that were accepted for publication in this supplement, represent key areas of research interests that have emerged in our region. We also highlight some of the current challenges bioinformatics is facing in the Asia Pacific region and conclude our report with the announcement of APBioNet’s 100 BioDatabases (BioDB100) initiative. BioDB100 will comply with the database criteria set out earlier in our proposal for Minimum Information about a Bioinformatics and Investigation (MIABi), setting the standards for biocuration and bioinformatics research, on which we will report at the next InCoB, Nov. 27 – Dec. 2, 2011 at Kuala Lumpur, Malaysia.
The International Conference on Bioinformatics (InCoB), the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted in one of countries of the Asia-Pacific region. The 2010 conference was awarded to Japan and has attracted more than one hundred high-quality research paper submissions. Thorough peer reviewing resulted in 47 (43.5%) accepted papers out of 108 submissions. Submissions from Japan, R.O. Korea, P.R. China, Australia, Singapore and U.S.A totaled 43.8% and contributed to 57.4% of accepted papers. Manuscripts originating from Taiwan and India added up to 42.8% of submissions and 28.3% of acceptances. The fifteen articles published in this BMC Bioinformatics supplement cover disease informatics, structural bioinformatics and drug design, biological databases and software tools, signaling pathways, gene regulatory and biochemical networks, evolution and sequence analysis.
Excessive accumulation of bone marrow adipocytes observed in senile osteoporosis or age-related osteopenia is caused by the unbalanced differentiation of MSCs into bone marrow adipocytes or osteoblasts. Several transcription factors are known to regulate the balance between adipocyte and osteoblast differentiation. However, the molecular mechanisms that regulate the balance between adipocyte and osteoblast differentiation in the bone marrow have yet to be elucidated. To identify candidate genes associated with senile osteoporosis, we performed genome-wide expression analyses of differentiating osteoblasts and adipocytes. Among transcription factors that were enriched in the early phase of differentiation, Id4 was identified as a key molecule affecting the differentiation of both cell types. Experiments using bone marrow-derived stromal cell line ST2 and Id4-deficient mice showed that lack of Id4 drastically reduces osteoblast differentiation and drives differentiation toward adipocytes. On the other hand knockdown of Id4 in adipogenic-induced ST2 cells increased the expression of Pparγ2, a master regulator of adipocyte differentiation. Similar results were observed in bone marrow cells of femur and tibia of Id4-deficient mice. However the effect of Id4 on Pparγ2 and adipocyte differentiation is unlikely to be of direct nature. The mechanism of Id4 promoting osteoblast differentiation is associated with the Id4-mediated release of Hes1 from Hes1-Hey2 complexes. Hes1 increases the stability and transcriptional activity of Runx2, a key molecule of osteoblast differentiation, which results in an enhanced osteoblast-specific gene expression. The new role of Id4 in promoting osteoblast differentiation renders it a target for preventing the onset of senile osteoporosis.
Increased bone marrow adiposity is observed in the bone marrow of senile osteoporosis patients. This is caused by unbalanced differentiation of mesenchymal stem cells (MSCs) into osteoblast or adipocyte. Previous reports have indicated that several transcription factors play important roles in determining the direction of MSCs differentiation into osteoblast or adipocyte. So far, little is known about the overall dynamics and regulation of transcription factor expression changes leading to the imbalance of osteoblast and adipocyte differentiation inside the bone marrow. We have performed genome-wide gene expression analyses during the differentiation of MSCs into osteoblast or adipocyte. We identified basic helix-loop-helix transcription factor family member Id4 as a leading candidate controlling the differentiation toward adipocyte or osteoblast. Suppression of Id4 expression in MSCs repressed osteoblast differentiation and increased adipocyte differentiation. In contrast, overexpression of Id4 in MSCs promoted osteoblast differentiation and attenuated adipocyte differentiation. Moreover, Id4-mutant mice showed abnormal accumulation of lipid droplets in bone marrow and impaired bone formation activity. In summary, we have demonstrated a molecular function of Id4 in osteoblast differentiation. The findings revealed that Id4 is a molecular switch enhancing osteoblast differentiation at the expense of adipocyte differentiation.
The import of most intraperoxisomal proteins is mediated by peroxisome targeting signals at their C-termini (PTS1) or N-terminal regions (PTS2). Both signals have been integrated in subcellular location prediction programs. However their present performance, particularly of PTS2-targeting did not seem fitting for large-scale screening of sequences.
We modified an earlier reported PTS1 screening method to identify PTS2-containing mouse candidates using a combination of computational and manual annotation. For rapid confirmation of five new PTS2- and two previously identified PTS1-containing candidates we developed the new cell line CHO-perRed which stably expresses the peroxisomal marker dsRed-PTS1. Using CHO-perRed we confirmed the peroxisomal localization of PTS1-targeted candidate Zadh2. Preliminary characterization of Zadh2 expression suggested non-PPARα mediated activation. Notably, none of the PTS2 candidates located to peroxisomes.
In a few cases the PTS may oscillate from "silent" to "functional" depending on its surface accessibility indicating the potential for context-dependent conditional subcellular sorting. Overall, PTS2-targeting predictions are unlikely to improve without generation and integration of new experimental data from location proteomics, protein structures and quantitative Pex7 PTS2 peptide binding assays.
Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized.
Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific.
Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.
Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.
Tens of thousands of mammalian genes are expressed in various cells at different times, controlled mainly at the promoter level through the interaction of transcription factors with cis-elements. The authors analyzed properties of a large collection of experimental mouse (Mus musculus) and human (Homo sapiens) transcription start sites (TSSs). They defined four types of TSSs based on the compositional properties of surrounding regions and showed that (a) the regions surrounding TSSs are much richer in properties than previously thought, (b) the four TSSs types are associated with distinct groups of cis-elements and initiating dinucleotides, (c) the regions upstream of TSSs are distinctly different from the downstream ones in terms of the associated cis-elements, and (d) mouse and human TSS properties relative to CpG islands (CGIs) and TATA box elements suggest species-specific adaptation. The authors linked TSS characteristics to gene expression through categories defined by the Gene Ontology and eVOC classifications and tissue expression libraries. They provided examples of the preference of immune response genes for TSS types and specific genomic organization. Their results shed light on the fine compositional properties of TSSs in mammals and could lead to better design of promoter- and gene-finding tools, better annotation of promoters by cis-elements, and better regulatory network reconstructions. These areas represent some of the focal topics of bioinformatics and genomics research that are of interest to a wide range of life scientists.
A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern.
Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders.
Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
FANTOM database; human; disease gene; cancer; hereditary disease; transcripts; bioinformatics; genomics
The FREP database (http://facts.gsc.riken.go.jp/FREP/) contains 31 396 RepeatMasker-identified non-redundant variant repeat sequences derived from 16 527 mouse cDNAs with protein-coding potential. The repeats were computationally associated with potential effects on transcriptional variation, translation, protein function or involvement in disease to identify Functional REPeats (FREPs). FREPs are defined by the (i) occurrence of exon–exon boundaries in repeats, (ii) presence of polyadenylation sites in 3′UTR-located repeats, (iii) effect on translation, (iv) position in the protein- coding region or protein domains or (v) conditional association with disease MeSH terms. Currently the database contains 9261 (29.5%) inferred FREPs derived from 6861 (41.5%) mouse cDNAs. Integrated evidence of the functional assignments and dynamically generated sequence similarity search results support the exploration and annotation of functional, ancestral or taxon-specific repeats. Keyword and pre-selected feature searches (e.g. coding sequence–repeat or splice site–repeat relations) support intuitive database querying as well as the retrieval of repeat sequences. Integrated sequence search and alignment tools allow the analysis of known or identification of new functional repeat candidates. FREP is a unique resource for illuminating the role of transposons and repetitive sequences in shaping the coding part of the mouse transcriptome and for selecting the appropriate experimental model to study diseases with suspected repeat etiology contributions.
BACKGROUND: A variety of methods for prediction of peptide binding to major histocompatibility complex (MHC) have been proposed. These methods are based on binding motifs, binding matrices, hidden Markov models (HMM), or artificial neural networks (ANN). There has been little prior work on the comparative analysis of these methods. MATERIALS AND METHODS: We performed a comparison of the performance of six methods applied to the prediction of two human MHC class I molecules, including binding matrices and motifs, ANNs, and HMMs. RESULTS: The selection of the optimal prediction method depends on the amount of available data (the number of peptides of known binding affinity to the MHC molecule of interest), the biases in the data set and the intended purpose of the prediction (screening of a single protein versus mass screening). When little or no peptide data are available, binding motifs are the most useful alternative to random guessing or use of a complete overlapping set of peptides for selection of candidate binders. As the number of known peptide binders increases, binding matrices and HMM become more useful predictors. ANN and HMM are the predictive methods of choice for MHC alleles with more than 100 known binding peptides. CONCLUSION: The ability of bioinformatic methods to reliably predict MHC binding peptides, and thereby potential T-cell epitopes, has major implications for clinical immunology, particularly in the area of vaccine design.
FIMM database (http://sdmc.krdl.org.sg:8080/fimm) contains data relevant to functional molecular immunology, focusing on cellular immunology. It contains fully referenced data on protein antigens, major histocompatibility complex (MHC) molecules, MHC-associated peptides and relevant disease associations. FIMM has a set of search tools for extraction of information and results are presented as lists or as reports.
FIMM database (http://sdmc.krdl.org.sg:8080/fimm ) contains data relevant to functional molecular immunology, focusing on cellular immunology. It contains fully referenced data on protein antigens, major histocompatibility complex (MHC) molecules, MHC-associated peptides and relevant disease associations. FIMM has a set of search tools for extraction of information and results are presented as lists or as reports.