Search tips
Search criteria

Results 1-25 (308)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  ‘In silico expression analysis’, a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences 
Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana.
Database URL:
PMCID: PMC3983564
2.  CancerEST: a web-based tool for automatic meta-analysis of public EST data 
The identification of cancer-restricted biomarkers is fundamental to the development of novel cancer therapies and diagnostic tools. The construction of comprehensive profiles to define tissue- and cancer-specific gene expression has been central to this. To this end, the exploitation of the current wealth of ‘omic’-scale databases can be facilitated by automated approaches, allowing researchers to directly address specific biological questions. Here we present CancerEST, a user-friendly and intuitive web-based tool for the automated identification of candidate cancer markers/targets, for examining tissue specificity as well as for integrated expression profiling. CancerEST operates by means of constructing and meta-analyzing expressed sequence tag (EST) profiles of user-supplied gene sets across an EST database supporting 36 tissue types. Using a validation data set from the literature, we show the functionality and utility of CancerEST.
Database URL:
PMCID: PMC3978373  PMID: 24715218
3.  The quail anatomy portal 
The Japanese quail is a widely used model organism for the study of embryonic development; however, anatomical resources are lacking. The Quail Anatomy Portal (QAP) provides 22 detailed three-dimensional (3D) models of quail embryos during development from embryonic day (E)1 to E15 generated using optical projection tomography. The 3D models provided can be virtually sectioned to investigate anatomy. Furthermore, using the 3D nature of the models, we have generated a tool to assist in the staging of quail samples. Volume renderings of each stage are provided and can be rotated to allow visualization from multiple angles allowing easy comparison of features both between stages in the database and between images or samples in the laboratory. The use of JavaScript, PHP and HTML ensure the database is accessible to users across different operating systems, including mobile devices, facilitating its use in the laboratory.The QAP provides a unique resource for researchers using the quail model. The ability to virtually section anatomical models throughout development provides the opportunity for researchers to virtually dissect the quail and also provides a valuable tool for the education of students and researchers new to the field.
Database URL:
(For review username: demo, password: quail123)
PMCID: PMC3978374  PMID: 24715219
4.  tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles 
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the ‘tagtog’ system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation.
Database URL:,
PMCID: PMC3978375  PMID: 24715220
5.  HPVdb: a data mining system for knowledge discovery in human papillomavirus with applications in T cell immunology and vaccinology 
High-risk human papillomaviruses (HPVs) are the causes of many cancers, including cervical, anal, vulvar, vaginal, penile and oropharyngeal. To facilitate diagnosis, prognosis and characterization of these cancers, it is necessary to make full use of the immunological data on HPV available through publications, technical reports and databases. These data vary in granularity, quality and complexity. The extraction of knowledge from the vast amount of immunological data using data mining techniques remains a challenging task. To support integration of data and knowledge in virology and vaccinology, we developed a framework called KB-builder to streamline the development and deployment of web-accessible immunological knowledge systems. The framework consists of seven major functional modules, each facilitating a specific aspect of the knowledgebase construction process. Using KB-builder, we constructed the Human Papillomavirus T cell Antigen Database (HPVdb). It contains 2781 curated antigen entries of antigenic proteins derived from 18 genotypes of high-risk HPV and 18 genotypes of low-risk HPV. The HPVdb also catalogs 191 verified T cell epitopes and 45 verified human leukocyte antigen (HLA) ligands. Primary amino acid sequences of HPV antigens were collected and annotated from the UniProtKB. T cell epitopes and HLA ligands were collected from data mining of scientific literature and databases. The data were subject to extensive quality control (redundancy elimination, error detection and vocabulary consolidation). A set of computational tools for an in-depth analysis, such as sequence comparison using BLAST search, multiple alignments of antigens, classification of HPV types based on cancer risk, T cell epitope/HLA ligand visualization, T cell epitope/HLA ligand conservation analysis and sequence variability analysis, has been integrated within the HPVdb. Predicted Class I and Class II HLA binding peptides for 15 common HLA alleles are included in this database as putative targets. HPVdb is a knowledge-based system that integrates curated data and information with tailored analysis tools to facilitate data mining for HPV vaccinology and immunology. To our best knowledge, HPVdb is a unique data source providing a comprehensive list of HPV antigens and peptides.
Database URL:
PMCID: PMC3975992
6.  FixPred: a resource for correction of erroneous protein sequences 
Protein databases are heavily contaminated with erroneous (mispredicted, abnormal and incomplete) sequences and these erroneous data significantly distort the conclusions drawn from genome-scale protein sequence analyses. In our earlier work we described the MisPred resource that serves to identify erroneous sequences; here we present the FixPred computational pipeline that automatically corrects sequences identified by MisPred as erroneous. The current version of the associated FixPred database contains corrected UniProtKB/Swiss-Prot and NCBI/RefSeq sequences from Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Fugu rubripes, Ciona intestinalis, Branchostoma floridae, Drosophila melanogaster and Caenorhabditis elegans; future releases of the FixPred database will include corrected sequences of additional Metazoan species. The FixPred computational pipeline and database ( are easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats.
Database URL:
PMCID: PMC3975993
7.  3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families 
Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such a selection could be challenging since some of these domain families exhibit huge variation depending on the number of members in the family, the average family sequence length or the extent of sequence divergence within a family. We had earlier created 3PFDB database as a repository of best representative sequences, selected from each PFAM domain family on the basis of high coverage. In this study, we have improved the database using more efficient strategies for the initial generation of sequence profiles and implement two independent methods, FASSM and HMMER, for identifying family members. HMMER employs a global sequence similarity search, while FASSM relies on motif identification and matching. This improved and updated database, 3PFDB+ generated in this study, provides representative sequences and profiles for PFAM families, with 13 519 family representatives having more than 90% family coverage. The representative sequence is also highlighted in a two-dimensional plot, which reflects the relative divergence between family members. Representatives belonging to small families with short sequences are mainly associated with low coverage. The set of sequences not recognized by the family representative profiles, highlight several potential false or weak family associations in PFAM. Partial domains and fragments dominate such cases, along with sequences that are highly diverged or different from other family members. Some of these outliers were also predicted to have different secondary structure contents, which reflect different putative structure or functional roles for these domain sequences.
Database URL:
PMCID: PMC3974335  PMID: 24700812
8.  SysPTM 2.0: an updated systematic resource for post-translational modification 
Post-translational modifications (PTMs) of proteins play essential roles in almost all cellular processes, and are closely related to physiological activity and disease development of living organisms. The development of tandem mass spectrometry (MS/MS) has resulted in a rapid increase of PTMs identified on proteins from different species. The collection and systematic ordering of PTM data should provide invaluable information for understanding cellular processes and signaling pathways regulated by PTMs. For this original purpose we developed SysPTM, a systematic resource installed with comprehensive PTM data and a suite of web tools for annotation of PTMs in 2009. Four years later, there has been a significant advance with the generation of PTM data and, consequently, more sophisticated analysis requirements have to be met. Here we submit an updated version of SysPTM 2.0 (, with almost doubled data content, enhanced web-based analysis tools of PTMBlast, PTMPathway, PTMPhylog, PTMCluster. Moreover, a new session SysPTM-H is constructed to graphically represent the combinatorial histone PTMs and dynamic regulation of histone modifying enzymes, and a new tool PTMGO is added for functional annotation and enrichment analysis. SysPTM 2.0 not only facilitates resourceful annotation of PTM sites but allows systematic investigation of PTM functions by the user. Citation details: Li,J., Jia,J., Li,H. et al. SysPTM 2.0: an updated systematic resource for post-translational modification. Database (2014) Vol. 2014: article ID bau025; doi:10.1093/database/bau025.
Database URL:
PMCID: PMC3975108
9.  A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) 
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies.
Database URL: BioMuta:; CSR:; HIVE:
PMCID: PMC3965850  PMID: 24667251
10.  Comparison of sequence variants in transcriptomic control regions across 17 mouse genomes 
The laboratory mouse is the most widely used mammalian model organism in biomedical research, so a thorough annotation of functional variation in the mouse genome would be of significant value. In this study, we compared sequence variation in a comprehensive list of functional elements (e.g. promoters, enhancers and CTCF binding sites) across 17 inbred mouse strains. Sequences were derived for ∼300 000 functional elements experimentally identified by the mouse ENCODE project as regulating gene expression in 19 different tissue sources. We aligned sequences for each predicted cis-regulatory element to genomes of 17 mouse strains. This yielded a database comprising ∼5 million aligned sequences, allowing interrogation of sequence variation of functional elements for each of the 19 tissues/cell types in commonly used mouse strains. We also developed an online tool to visualize the genome around each predicted cis-regulatory element in each tissue context and which allows efficient comparison of variation between any two sets of strains. This will be particularly useful in the context of the Collaborative Cross (CC), which was conceived as a powerful new systems genetics resource to accelerate gene discovery. Comprising a large number of inbred strains derived from eight genetically diverse founders, the CC offers rapid mapping and identification of genes that mediate complex traits. We show that, among the 17 sequenced strains, the set of CC founder strains captures the most variability in the ENCODE elements, further emphasizing the value of this resource.
Database URL:
PMCID: PMC3958616  PMID: 24647628
11.  DBatVir: the database of bat-associated viruses 
Emerging infectious diseases remain a significant threat to public health. Most emerging infectious disease agents in humans are of zoonotic origin. Bats are important reservoir hosts of many highly lethal zoonotic viruses and have been implicated in numerous emerging infectious disease events in recent years. It is essential to enhance our knowledge and understanding of the genetic diversity of the bat-associated viruses to prevent future outbreaks. To facilitate further research, we constructed the database of bat-associated viruses (DBatVir). Known viral sequences detected in bat samples were manually collected and curated, along with the related metadata, such as the sampling time, location, bat species and specimen type. Additional information concerning the bats, including common names, diet type, geographic distribution and phylogeny were integrated into the database to bridge the gap between virologists and zoologists. The database currently covers >4100 bat-associated animal viruses of 23 viral families detected from 196 bat species in 69 countries worldwide. It provides an overview and snapshot of the current research regarding bat-associated viruses, which is essential now that the field is rapidly expanding. With a user-friendly interface and integrated online bioinformatics tools, DBatVir provides a convenient and powerful platform for virologists and zoologists to analyze the virome diversity of bats, as well as for epidemiologists and public health researchers to monitor and track current and future bat-related infectious diseases.
Database URL:
PMCID: PMC3958617  PMID: 24647629
12.  Linking tissues to phenotypes using gene expression profiles 
Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue–phenotype associations reveals that 72–76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55–64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between ‘total body fat’ abnormalities and genes expressed in the ‘brain’, which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue–phenotype associations can improve the detection of a known disease–gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL:
PMCID: PMC3982582  PMID: 24634472
13.  Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data 
UniProtKB/Swiss-Prot provides expert curation with information extracted from literature and curator-evaluated computational analysis. As knowledgebases continue to play an increasingly important role in scientific research, a number of studies have evaluated their accuracy and revealed various errors. While some are curation errors, others are the result of incorrect information published in the scientific literature. By taking the example of sirtuin-5, a complex annotation case, we will describe the curation procedure of UniProtKB/Swiss-Prot and detail how we report conflicting information in the database. We will demonstrate the importance of collaboration between resources to ensure curation consistency and the value of contributions from the user community in helping maintain error-free resources.
Database URL:
PMCID: PMC3950660  PMID: 24622611
14.  GigaDB: promoting data dissemination and reproducibility 
Often papers are published where the underlying data supporting the research are not made available because of the limitations of making such large data sets publicly and permanently accessible. Even if the raw data are deposited in public archives, the essential analysis intermediaries, scripts or software are frequently not made available, meaning the science is not reproducible. The GigaScience journal is attempting to address this issue with the associated data storage and dissemination portal, the GigaScience database (GigaDB). Here we present the current version of GigaDB and reveal plans for the next generation of improvements. However, most importantly, we are soliciting responses from you, the users, to ensure that future developments are focused on the data storage and dissemination issues that still need resolving.
Database URL:
PMCID: PMC3950661  PMID: 24622612
15.  SoyFN: a knowledge database of soybean functional networks 
Many databases for soybean genomic analysis have been built and made publicly available, but few of them contain knowledge specifically targeting the omics-level gene–gene, gene–microRNA (miRNA) and miRNA–miRNA interactions. Here, we present SoyFN, a knowledge database of soybean functional gene networks and miRNA functional networks. SoyFN provides user-friendly interfaces to retrieve, visualize, analyze and download the functional networks of soybean genes and miRNAs. In addition, it incorporates much information about KEGG pathways, gene ontology annotations and 3′-UTR sequences as well as many useful tools including SoySearch, ID mapping, Genome Browser, eFP Browser and promoter motif scan. SoyFN is a schema-free database that can be accessed as a Web service from any modern programming language using a simple Hypertext Transfer Protocol call. The Web site is implemented in Java, JavaScript, PHP, HTML and Apache, with all major browsers supported. We anticipate that this database will be useful for members of research communities both in soybean experimental science and bioinformatics.
Database URL:
PMCID: PMC3949006  PMID: 24618044
16.  YTRP: a repository for yeast transcriptional regulatory pathways 
Regulatory targets of transcription factors (TFs) can be identified by the TF perturbation experiments, which reveal the expression changes owing to the perturbation (deletion or overexpression) of TFs. But the identified targets of a given TF consist of both direct and indirect regulatory targets. It has been shown that most of the TFPE-identified regulatory targets are indirect, indicating that TF-gene regulation is mainly through transcriptional regulatory pathways (TRPs) consisting of intermediate TFs. Without identification of these TRPs, it is not easy to understand how a TF regulates its indirect targets. Because there is no such database depositing the potential TRPs for Saccharomyces cerevisiae now, this motivates us to construct the YTRP (Yeast Transcriptional Regulatory Pathway) database. For each TF-gene regulatory pair under different experimental conditions, all possible TRPs in two underlying networks (constructed using experimentally verified TF-gene binding pairs and TF-gene regulatory pairs from the literature) for the specified experimental conditions were automatically enumerated by TRP mining procedures developed from the graph theory. The enumerated TRPs of a TF-gene regulatory pair provide experimentally testable hypotheses for the molecular mechanisms behind a TF and its regulatory target. YTRP is available online at We believe that the TRPs deposited in this database will greatly improve the usefulness of TFPE data for yeast biologists to study the regulatory mechanisms between a TF and its knocked-out targets.
Database URL:
PMCID: PMC3948430  PMID: 24608172
17.  NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes 
NCG 4.0 is the latest update of the Network of Cancer Genes, a web-based repository of systems-level properties of cancer genes. In its current version, the database collects information on 537 known (i.e. experimentally supported) and 1463 candidate (i.e. inferred using statistical methods) cancer genes. Candidate cancer genes derive from the manual revision of 67 original publications describing the mutational screening of 3460 human exomes and genomes in 23 different cancer types. For all 2000 cancer genes, duplicability, evolutionary origin, expression, functional annotation, interaction network with other human proteins and with microRNAs are reported. In addition to providing a substantial update of cancer-related information, NCG 4.0 also introduces two new features. The first is the annotation of possible false-positive cancer drivers, defined as candidate cancer genes inferred from large-scale screenings whose association with cancer is likely to be spurious. The second is the description of the systems-level properties of 64 human microRNAs that are causally involved in cancer progression (oncomiRs). Owing to the manual revision of all information, NCG 4.0 constitutes a complete and reliable resource on human coding and non-coding genes whose deregulation drives cancer onset and/or progression. NCG 4.0 can also be downloaded as a free application for Android smart phones.
Database URL:
PMCID: PMC3948431  PMID: 24608173
18.  The EMPRES-i genetic module: a novel tool linking epidemiological outbreak information and genetic characteristics of influenza viruses 
Combining epidemiological information, genetic characterization and geomapping in the analysis of influenza can contribute to a better understanding and description of influenza epidemiology and ecology, including possible virus reassortment events. Furthermore, integration of information such as agroecological farming system characteristics can provide new knowledge on risk factors of influenza emergence and spread. Integrating viral characteristics into an animal disease information system is therefore expected to provide a unique tool to trace-and-track particular virus strains; generate clade distributions and spatiotemporal clusters; screen for distribution of viruses with specific molecular markers; identify potential risk factors; and analyze or map viral characteristics related to vaccines used for control and/or prevention. For this purpose, a genetic module was developed within EMPRES-i (FAO’s global animal disease information system) linking epidemiological information from influenza events with virus characteristics and enabling combined analysis. An algorithm was developed to act as the interface between EMPRES-i disease event data and publicly available influenza virus sequences in OpenfluDB. This algorithm automatically computes potential links between outbreak event and sequences, which are subsequently manually validated by experts. Subsequently, other virus characteristics such as antiviral resistance can then be associated to outbreak data. To visualize such characteristics on a geographic map, shape files with virus characteristics to overlay on other EMPRES-i map layers (e.g. animal densities) can be generated. The genetic module allows export of associated epidemiological and sequence data for further analysis. FAO has made this tool available for scientists and policy makers. Contributions are expected from users to improve and validate the number of linked influenza events and isolate information as well as the quality of information. Possibilities to interconnect with other influenza sequence databases or to expand the genetic module to other viral diseases (e.g. foot and mouth disease) are being explored.
Database OpenfluDB URL:
Database EMPRES-i URL:
PMCID: PMC3945526  PMID: 24608033
19.  BCL2DB: database of BCL-2 family members and BH3-only proteins 
BCL2DB ( is a database designed to integrate data on BCL-2 family members and BH3-only proteins. These proteins control the mitochondrial apoptotic pathway and probably many other cellular processes as well. This large protein group is formed by a family of pro-apoptotic and anti-apoptotic homologs that have phylogenetic relationships with BCL-2, and by a collection of evolutionarily and structurally unrelated proteins characterized by the presence of a region of local sequence similarity with BCL-2, termed the BH3 motif. BCL2DB is monthly built, thanks to an automated procedure relying on a set of homemade profile HMMs computed from seed reference sequences representative of the various BCL-2 homologs and BH3-only proteins. The BCL2DB entries integrate data from the Ensembl, Ensembl Genomes, European Nucleotide Archive and Protein Data Bank databases and are enriched with specific information like protein classification into orthology groups and distribution of BH motifs along the sequences. The Web interface allows for easy browsing of the site and fast access to data, as well as sequence analysis with generic and specific tools. BCL2DB provides a helpful and powerful tool to both ‘BCL-2-ologists’ and researchers working in the various fields of physiopathology.
Database URL:
PMCID: PMC3945527  PMID: 24608034
20.  BambooGDB: a bamboo genome database with functional annotation and an analysis platform 
Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein–protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at
Database URL:
PMCID: PMC3944406  PMID: 24602877
21.  The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation 
A large repertoire of gene-centric data has been generated in the field of zebrafish biology. Although the bulk of these data are available in the public domain, most of them are not readily accessible or available in nonstandard formats. One major challenge is to unify and integrate these widely scattered data sources. We tested the hypothesis that active community participation could be a viable option to address this challenge. We present here our approach to create standards for assimilation and sharing of information and a system of open standards for database intercommunication. We have attempted to address this challenge by creating a community-centric solution for zebrafish gene annotation. The Zebrafish GenomeWiki is a ‘wiki’-based resource, which aims to provide an altruistic shared environment for collective annotation of the zebrafish genes. The Zebrafish GenomeWiki has features that enable users to comment, annotate, edit and rate this gene-centric information. The credits for contributions can be tracked through a transparent microattribution system. In contrast to other wikis, the Zebrafish GenomeWiki is a ‘structured wiki’ or rather a ‘semantic wiki’. The Zebrafish GenomeWiki implements a semantically linked data structure, which in the future would be amenable to semantic search.
Database URL:
PMCID: PMC3936183  PMID: 24578356
22.  TSLP signaling pathway map: a platform for analysis of TSLP-mediated signaling 
Thymic stromal lymphopoietin (TSLP) is a four-helix bundle cytokine that plays a critical role in the regulation of immune responses and in the differentiation of hematopoietic cells. TSLP signals through a heterodimeric receptor complex consisting of an interleukin-7 receptor α chain and a unique TSLP receptor (TSLPR) [also known as cytokine receptor-like factor 2 (CRLF2)]. Cellular targets of TSLP include dendritic cells, B cells, mast cells, regulatory T (Treg) cells and CD4+ and CD8+ T cells. The TSLP/TSLPR axis can activate multiple signaling transduction pathways including the JAK/STAT pathway and the PI-3 kinase pathway. Aberrant TSLP/TSLPR signaling has been associated with a variety of human diseases including asthma, atopic dermatitis, nasal polyposis, inflammatory bowel disease, eosinophilic eosophagitis and, most recently, acute lymphoblastic leukemia. A centralized resource of the TSLP signaling pathway cataloging signaling events is not yet available. In this study, we present a literature-annotated resource of reactions in the TSLP signaling pathway. This pathway map is publicly available through NetPath (, an open access signal transduction pathway resource developed previously by our group. This map includes 236 molecules and 252 reactions that are involved in TSLP/TSLPR signaling pathway. We expect that the TSLP signaling pathway map will provide a rich resource to study the biology of this important cytokine as well as to identify novel therapeutic targets for diseases associated with dysregulated TSLP/TSLPR signaling.
Database URL:
PMCID: PMC3935308  PMID: 24573880
23.  AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource 
Transcriptome studies have shown the pervasive nature of transcription, demonstrating almost all the genes undergo alternative splicing. Accurately annotating all transcripts of a gene is crucial. It is needed to understand the impact of mutations on phenotypes, to shed light on genetic and epigenetic regulation of mRNAs and more generally to widen our knowledge about cell functionality and tissue diversity. RNA-sequencing (RNA-Seq), and the other applications of the next-generation sequencing, provides precious data to improve annotations' accuracy, simultaneously creating issues related to the variety, complexity and the size of produced data. In this ‘scenario’, the lack of user-friendly resources, easily accessible to researchers with low skills in bioinformatics, makes difficult to retrieve complete information about one or few genes without browsing a jungle of databases. Concordantly, the increasing amount of data from ‘omics’ technologies imposes to develop integrated databases merging different data formats coming from distinct but complementary sources. In light of these considerations, and given the wide interest in studying Down syndrome—a genetic condition due to the trisomy of human chromosome 21 (HSA21)—we developed an integrated relational database and a web interface, named ALE-HSA21 (AnaLysis of Expression on HSA21), accessible at This comprehensive and user-friendly web resource integrates—for all coding and noncoding transcripts of chromosome 21—existing gene annotations and transcripts identified de novo through RNA-Seq analysis with predictive computational analysis of regulatory sequences. Given the role of noncoding RNAs and untranslated regions of coding genes in key regulatory mechanisms, ALE-HSA21 is also an interesting web-based platform to investigate such processes. The ‘transcript-centric’ and easily-accessible nature of ALE-HSA21 makes this resource a valuable tool to rapidly retrieve data at the isoform level, rather than at gene level, useful to investigate any disease, molecular pathway or cell process involving chromosome 21 genes.
Database URL:
PMCID: PMC3935309  PMID: 24573881
24.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence 
Information on protein subcellular localization is important to understand the cellular functions of proteins. Currently, such information is manually curated from the literature, obtained from high-throughput microscopy-based screens and predicted from primary sequence. To get a comprehensive view of the localization of a protein, it is thus necessary to consult multiple databases and prediction tools. To address this, we present the COMPARTMENTS resource, which integrates all sources listed above as well as the results of automatic text mining. The resource is automatically kept up to date with source databases, and all localization evidence is mapped onto common protein identifiers and Gene Ontology terms. We further assign confidence scores to the localization evidence to facilitate comparison of different types and sources of evidence. To further improve the comparability, we assign confidence scores based on the type and source of the localization evidence. Finally, we visualize the unified localization evidence for a protein on a schematic cell to provide a simple overview.
Database URL:
PMCID: PMC3935310  PMID: 24573882
25.  MICdb3.0: a comprehensive resource of microsatellite repeats from prokaryotic genomes 
The MICdb is a comprehensive relational database of perfect microsatellites extracted from completely sequenced and annotated genomes of bacteria and archaea. The current version MICdb3.0 is an updated and revised version of MICdb2.0. As compared with the previous version MICdb2.0, the current release is significantly improved in terms of much larger coverage of genomes, improved presentation of queried results, user-friendly administration module to manage Simple Sequence Repeat (SSR) data such as addition of new genomes, deletion of obsolete data, etc., and also removal of certain features deemed to be redundant. The new web-interface to the database called Microsatellite Analysis Server (MICAS) version 3.0 has been improved by the addition of powerful high-quality visualization tools to view the query results in the form of pie charts and bar graphs. All the query results and graphs can be exported in different formats so that the users can use them for further analysis. MICAS3.0 is also equipped with a unique genome comparison module using which users can do pair-wise comparison of genomes with regard to their microsatellite distribution. The advanced search module can be used to filter the repeats based on certain criteria such as filtering repeats of a particular motif/repeat size, extracting repeats of coding/non-coding regions, sort repeats, etc. The MICdb database has, therefore, been made portable to be administered by a person with the necessary administrative privileges. The MICdb3.0 database and analysis server can be accessed for free from
Database URL:
PMCID: PMC3926409  PMID: 24536078

Results 1-25 (308)