PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (37)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes 
PLoS ONE  2015;10(1):e0116040.
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems’ output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems’ annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.
doi:10.1371/journal.pone.0116040
PMCID: PMC4301805  PMID: 25607983
3.  Deletions of chromosomal regulatory boundaries are associated with congenital disease 
Genome Biology  2014;15(9):423.
Background
Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption.
Results
We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects.
Conclusions
Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0423-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s13059-014-0423-1
PMCID: PMC4180961  PMID: 25315429
4.  Getting Ready for the Human Phenome Project: The 2012 Forum of the Human Variome Project 
Human mutation  2013;34(4):661-666.
A forum of the Human Variome Project (HVP) was held as a satellite to the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco, California. The theme of this meeting was “Getting Ready for the Human Phenome Project.” Understanding the genetic contribution to both rare single-gene “Mendelian” disorders and more complex common diseases will require integration of research efforts among many fields and better defined phenotypes. The HVP is dedicated to bringing together researchers and research populations throughout the world to provide the resources to investigate the impact of genetic variation on disease. To this end, there needs to be a greater sharing of phenotype and genotype data. For this to occur, many databases that currently exist will need to become interoperable to allow for the combining of cohorts with similar phenotypes to increase statistical power for studies attempting to identify novel disease genes or causative genetic variants. Improved systems and tools that enhance the collection of phenotype data from clinicians are urgently needed. This meeting begins the HVP’s effort toward this important goal.
doi:10.1002/humu.22293
PMCID: PMC4130157  PMID: 23401191
meeting report; database; phenotype; database interoperability
5.  Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases 
Bioinformatics  2014;30(22):3215-3222.
Motivation: Whole-exome sequencing (WES) has opened up previously unheard of possibilities for identifying novel disease genes in Mendelian disorders, only about half of which have been elucidated to date. However, interpretation of WES data remains challenging.
Results: Here, we analyze protein–protein association (PPA) networks to identify candidate genes in the vicinity of genes previously implicated in a disease. The analysis, using a random-walk with restart (RWR) method, is adapted to the setting of WES by developing a composite variant-gene relevance score based on the rarity, location and predicted pathogenicity of variants and the RWR evaluation of genes harboring the variants. Benchmarking using known disease variants from 88 disease-gene families reveals that the correct gene is ranked among the top 10 candidates in ≥50% of cases, a figure which we confirmed using a prospective study of disease genes identified in 2012 and PPA data produced before that date. We implement our method in a freely available Web server, ExomeWalker, that displays a ranked list of candidates together with information on PPAs, frequency and predicted pathogenicity of the variants to allow quick and effective searches for candidates that are likely to reward closer investigation.
Availability and implementation: http://compbio.charite.de/ExomeWalker
Contact: peter.robinson@charite.de
doi:10.1093/bioinformatics/btu508
PMCID: PMC4221119  PMID: 25078397
6.  Using association rule mining to determine promising secondary phenotyping hypotheses 
Bioinformatics  2014;30(12):i52-i59.
Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene–phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are ∼20 000 genes in higher vertebrate genomes and the experimental verification of gene–phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing.
Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed.
Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/gene/secondaryphenotype/list.
Contact: ao5@sanger.ac.uk or ds5@sanger.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btu260
PMCID: PMC4059059  PMID: 24932005
7.  The influence of disease categories on gene candidate predictions from model organism phenotypes 
Journal of Biomedical Semantics  2014;5(Suppl 1):S4.
Background
The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component.
Results
In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model.
Conclusions
In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.
doi:10.1186/2041-1480-5-S1-S4
PMCID: PMC4108905  PMID: 25093073
8.  Linking tissues to phenotypes using gene expression profiles 
Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue–phenotype associations reveals that 72–76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55–64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between ‘total body fat’ abnormalities and genes expressed in the ‘brain’, which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue–phenotype associations can improve the detection of a known disease–gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list
doi:10.1093/database/bau017
PMCID: PMC3982582  PMID: 24634472
9.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2014;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
doi:10.12688/f1000research.2-30.v2
PMCID: PMC3799545  PMID: 24358873
10.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data 
Nucleic Acids Research  2013;42(Database issue):D966-D974.
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
doi:10.1093/nar/gkt1026
PMCID: PMC3965098  PMID: 24217912
11.  The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data 
Nucleic Acids Research  2013;42(Database issue):D802-D809.
The International Mouse Phenotyping Consortium (IMPC) web portal (http://www.mousephenotype.org) provides the biomedical community with a unified point of access to mutant mice and rich collection of related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated ‘data wranglers’ work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with biomedical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mammalian gene function and human disease. As phenotype data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases.
doi:10.1093/nar/gkt977
PMCID: PMC3964955  PMID: 24194600
12.  Beyond knockouts: cre resources for conditional mutagenesis 
With the effort of the International Phenotyping Consortium (IMPC) to produce thousands of strains with conditional potential gathering steam, there is growing recognition that it must be supported by a rich toolbox of cre driver strains. The approaches to build cre strains have evolved in both sophistication and reliability, replacing first generation strains with tools that can target individual cell populations with incredible precision and specificity. The modest set of cre drivers generated by individual labs over the past 15+ years is now growing rapidly, thanks to a number of large-scale projects to produce new cre strains for the community. The power of this growing resource, however, depends upon the proper deep characterization of strain function, as even the best designed strain can display a variety of undesirable features that must be considered in experimental design. This must be coupled with the parallel development of informatics tools to provide functional data to the user, and facilitated access to the strains through public repositories. We will discuss the current progress on all of these fronts and the challenges that remain to ensure the scientific community can capitalize on the tremendous number of mouse resources at their disposal.
doi:10.1007/s00335-012-9430-2
PMCID: PMC3655717  PMID: 22926223
13.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases 
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene–disease associations.
Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene–disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm’s web interface and illustrate its usefulness in supporting research.
Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm
doi:10.1093/database/bat025
PMCID: PMC3649640  PMID: 23660285
14.  MouseFinder: candidate disease genes from mouse phenotype data 
Human Mutation  2012;33(5):858-866.
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene’s function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
doi:10.1002/humu.22051
PMCID: PMC3327758  PMID: 22331800
phenotype; candidate disease genes; model organism; mouse
15.  TRACER: a resource to study the regulatory architecture of the mouse genome 
BMC Genomics  2013;14:215.
Background
Mammalian genes are regulated through the action of multiple regulatory elements, often distributed across large regions. The mechanisms that control the integration of these diverse inputs into specific gene expression patterns are still poorly understood. New approaches enabling the dissection of these mechanisms in vivo are needed.
Results
Here, we describe TRACER (http://tracerdatabase.embl.de), a resource that centralizes information from a large on-going functional exploration of the mouse genome with different transposon-associated regulatory sensors. Hundreds of insertions have been mapped to specific genomic positions, and their corresponding regulatory potential has been documented by analysis of the expression of the reporter sensor gene in mouse embryos. The data can be easily accessed and provides information on the regulatory activities present in a large number of genomic regions, notably in gene-poor intervals that have been associated with human diseases.
Conclusions
TRACER data enables comparisons with the expression pattern of neighbouring genes, activity of surrounding regulatory elements or with other genomic features, revealing the underlying regulatory architecture of these loci. TRACER mouse lines can also be requested for in vivo transposition and chromosomal engineering, to analyse further regions of interest.
doi:10.1186/1471-2164-14-215
PMCID: PMC3618316  PMID: 23547943
Gene regulation and expression; Genome organisation; Regulatory landscapes; Chromosomal engineering; Mouse models of human structural variation
16.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish 
Disease Models & Mechanisms  2012;6(2):358-372.
SUMMARY
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
doi:10.1242/dmm.010322
PMCID: PMC3597018  PMID: 23104991
17.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2013;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
doi:10.12688/f1000research.2-30.v1
PMCID: PMC3799545  PMID: 24358873
18.  The mammalian gene function resource: the international knockout mouse consortium 
Bradley, Allan | Anastassiadis, Konstantinos | Ayadi, Abdelkader | Battey, James F. | Bell, Cindy | Birling, Marie-Christine | Bottomley, Joanna | Brown, Steve D. | Bürger, Antje | Bult, Carol J. | Bushell, Wendy | Collins, Francis S. | Desaintes, Christian | Doe, Brendan | Economides, Aris | Eppig, Janan T. | Finnell, Richard H. | Fletcher, Colin | Fray, Martin | Frendewey, David | Friedel, Roland H. | Grosveld, Frank G. | Hansen, Jens | Hérault, Yann | Hicks, Geoffrey | Hörlein, Andreas | Houghton, Richard | Hrabé de Angelis, Martin | Huylebroeck, Danny | Iyer, Vivek | de Jong, Pieter J. | Kadin, James A. | Kaloff, Cornelia | Kennedy, Karen | Koutsourakis, Manousos | Kent Lloyd, K. C. | Marschall, Susan | Mason, Jeremy | McKerlie, Colin | McLeod, Michael P. | von Melchner, Harald | Moore, Mark | Mujica, Alejandro O. | Nagy, Andras | Nefedov, Mikhail | Nutter, Lauryl M. | Pavlovic, Guillaume | Peterson, Jane L. | Pollock, Jonathan | Ramirez-Solis, Ramiro | Rancourt, Derrick E. | Raspa, Marcello | Remacle, Jacques E. | Ringwald, Martin | Rosen, Barry | Rosenthal, Nadia | Rossant, Janet | Ruiz Noppinger, Patricia | Ryder, Ed | Schick, Joel Zupicich | Schnütgen, Frank | Schofield, Paul | Seisenberger, Claudia | Selloum, Mohammed | Simpson, Elizabeth M. | Skarnes, William C. | Smedley, Damian | Stanford, William L. | Francis Stewart, A. | Stone, Kevin | Swan, Kate | Tadepally, Hamsa | Teboul, Lydia | Tocchini-Valentini, Glauco P. | Valenzuela, David | West, Anthony P. | Yamamura, Ken-ichi | Yoshinaga, Yuko | Wurst, Wolfgang
Mammalian Genome  2012;23(9-10):580-586.
In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal (www.knockoutmouse.org) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research.
doi:10.1007/s00335-012-9422-2
PMCID: PMC3463800  PMID: 22968824
19.  CreZOO—the European virtual repository of Cre and other targeted conditional driver strains 
The CreZOO (http://www.crezoo.org/) is the European virtual repository of Cre and other targeted conditional driver strains. These mice serve as tools for researchers to selectively ‘switch off’ gene expression in mouse models to examine gene function and disease pathology. CreZOO aims to capture and disseminate extant and new information on these Cre driver strains, such as genetic background and availability information, and details pertaining promoter, allele, inducibility and expression patterns, which are also presented. All transgenic strains carry detailed information according to MGI's official nomenclature, whereas their availability [e.g. live mice, cryopreserved embryos, sperm and embryonic stem (ES) cells] is clearly indicated with links to European and International databases and repositories (EMMA, MGI/IMSR, MMRRC, etc) and laboratories where the particular mouse strain is available together with the respective IDs. Each promoter/gene includes IDs and direct links to MGI, Entrez Gene, Ensembl, OMIM and RGD databases depending on their species origin, whereas allele information is presented with MGI IDs and active hyperlinks to redirect the user to the respective page in a new tab. The tissue/cell (special) and developmental (temporal) specificity expression patterns are clearly presented, whereas handling and genotyping details (in the form of documents or hyperlinks) together with all relevant publications are clearly presented with PMID(s) and direct PubMed links. CreZOO's design offers a user-friendly query interface and provides instant access to the list of conditional driver strains, promoters and inducibility details. Database access is free of charge and there are no registration requirements for data querying. CreZOO is being developed in the context of the CREATE consortium (http://www.creline.org/), a core of major European and international mouse database holders and research groups involved in conditional mutagenesis.
Database URL: http://www.crezoo.org/; alternative URL: http://www.e-mouse.org/
doi:10.1093/database/bas029
PMCID: PMC3381224  PMID: 22730454
20.  BioMart as an integration solution for the International Knockout Mouse Consortium 
In this article, we describe the use of the BioMart data management system to provide integrated access to International Knockout Mouse Consortium (IKMC) data and other related mouse resources. The IKMC is currently mutating all mouse protein-coding genes in embryonic stem (ES) cells using gene targeting and gene trapping approaches. The BioMart portal allows researchers to identify and obtain IKMC knockout vectors, ES cells and mice for genes of interest. Gene annotation, expression, phenotype and disease data is also integrated from external BioMarts, allowing selection of IKMC products by a wide variety of criteria. These products are invaluable for researchers involved in the elucidation of gene function and the role of individual genes in human disease. Here, we describe these datasets in more detail and illustrate the functionality of the portal using several examples.
Database URL: http://www.knockoutmouse.org/mart
doi:10.1093/database/bar028
PMCID: PMC3263594  PMID: 21930503
21.  BioMart Central Portal: an open database network for the biological community 
BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.
Database URL: http://central.biomart.org.
doi:10.1093/database/bar041
PMCID: PMC3263598  PMID: 21930507
22.  Towards BioDBcore: a community-defined information specification for biological databases 
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/database/baq027
PMCID: PMC3017395  PMID: 21205783
23.  Towards BioDBcore: a community-defined information specification for biological databases 
Nucleic Acids Research  2010;39(Database issue):D7-D10.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/nar/gkq1173
PMCID: PMC3013734  PMID: 21097465
24.  The IKMC web portal: a central point of entry to data and resources from the International Knockout Mouse Consortium 
Nucleic Acids Research  2010;39(Database issue):D849-D855.
The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
doi:10.1093/nar/gkq879
PMCID: PMC3013768  PMID: 20929875
25.  Finding and sharing: new approaches to registries of databases and services for the biomedical sciences 
The recent explosion of biological data and the concomitant proliferation of distributed databases make it challenging for biologists and bioinformaticians to discover the best data resources for their needs, and the most efficient way to access and use them. Despite a rapid acceleration in uptake of syntactic and semantic standards for interoperability, it is still difficult for users to find which databases support the standards and interfaces that they need. To solve these problems, several groups are developing registries of databases that capture key metadata describing the biological scope, utility, accessibility, ease-of-use and existence of web services allowing interoperability between resources. Here, we describe some of these initiatives including a novel formalism, the Database Description Framework, for describing database operations and functionality and encouraging good database practise. We expect such approaches will result in improved discovery, uptake and utilization of data resources.
Database URL: http://www.casimir.org.uk/casimir_ddf
doi:10.1093/database/baq014
PMCID: PMC2911849  PMID: 20627863

Results 1-25 (37)