PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (49)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
1.  The Resource Identification Initiative: A cultural shift in publishing 
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as “How did other studies use resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e. software and databases). RRIDs are assigned by an authoritative database, for example a model organism database, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40 with RRIDs appearing in 62 different journals to date. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources.
doi:10.1002/cne.23913
PMCID: PMC4684178  PMID: 26599696
2.  The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species 
Nucleic Acids Research  2016;45(Database issue):D712-D722.
The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype–phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype–phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.
doi:10.1093/nar/gkw1128
PMCID: PMC5210586  PMID: 27899636
3.  The Human Phenotype Ontology in 2017 
Nucleic Acids Research  2016;45(Database issue):D865-D876.
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
doi:10.1093/nar/gkw1039
PMCID: PMC5210535  PMID: 27899602
4.  Distributed Cognition and Process Management Enabling Individualized Translational Research: The NIH Undiagnosed Diseases Program Experience 
The National Institutes of Health Undiagnosed Diseases Program (NIH UDP) applies translational research systematically to diagnose patients with undiagnosed diseases. The challenge is to implement an information system enabling scalable translational research. The authors hypothesized that similar complex problems are resolvable through process management and the distributed cognition of communities. The team, therefore, built the NIH UDP integrated collaboration system (UDPICS) to form virtual collaborative multidisciplinary research networks or communities. UDPICS supports these communities through integrated process management, ontology-based phenotyping, biospecimen management, cloud-based genomic analysis, and an electronic laboratory notebook. UDPICS provided a mechanism for efficient, transparent, and scalable translational research and thereby addressed many of the complex and diverse research and logistical problems of the NIH UDP. Full definition of the strengths and deficiencies of UDPICS will require formal qualitative and quantitative usability and process improvement measurement.
doi:10.3389/fmed.2016.00039
PMCID: PMC5060938  PMID: 27785453
translational research; information system; ontology-based phenotyping; process management system; precision medicine
5.  The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery 
Human mutation  2015;36(10):915-921.
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for “the needle in a haystack” to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can “match” these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.
doi:10.1002/humu.22858
PMCID: PMC4610002  PMID: 26295439
Matchmaking; rare disease; genomic API; gene discovery; Matchmaker Exchange; GA4GH; IRDiRC
6.  The Resource Identification Initiative: A cultural shift in publishing 
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as “How did other studies use resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the Methods sections of articles and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their articles prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example, a model organism database for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central Web portal (http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine‐readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 articles have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40, with RRIDs appearing in 62 different journals to date. Here we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post‐pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources. J. Comp. Neurol. 524:8–22, 2016. © 2015 The Authors The Journal of Comparative Neurology Published by Wiley Periodicals, Inc.
doi:10.1002/cne.23913
PMCID: PMC4684178  PMID: 26599696
research resources; Resource Identification Initiative; identifiability
7.  The health care and life sciences community profile for dataset descriptions 
PeerJ  2016;4:e2331.
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
doi:10.7717/peerj.2331
PMCID: PMC4991880  PMID: 27602295
Data profiling; Dataset descriptions; Metadata; Provenance; FAIR data
8.  Summarizing and Visualizing Structural Changes during the Evolution of Biomedical Ontologies Using a Diff Abstraction Network 
Biomedical ontologies are a critical component in biomedical research and practice. As an ontology evolves, its structure and content change in response to additions, deletions and updates. When editing a biomedical ontology, small local updates may affect large portions of the ontology, leading to unintended and potentially erroneous changes. Such unwanted side effects often go unnoticed since biomedical ontologies are large and complex knowledge structures. Abstraction networks, which provide compact summaries of an ontology’s content and structure, have been used to uncover structural irregularities, inconsistencies and errors in ontologies. In this paper, we introduce Diff Abstraction Networks (“Diff AbNs”), compact networks that summarize and visualize global structural changes due to ontology editing operations that result in a new ontology release. A Diff AbN can be used to support curators in identifying unintended and unwanted ontology changes. The derivation of two Diff AbNs, the Diff Area Taxonomy and the Diff Partial-area Taxonomy, is explained and Diff Partial-area Taxonomies are derived and analyzed for the Ontology of Clinical Research, Sleep Domain Ontology, and Eagle-I Research Resource Ontology. Diff Taxonomy usage for identifying unintended erroneous consequences of quality assurance and ontology merging are demonstrated.
Graphical abstract
doi:10.1016/j.jbi.2015.05.018
PMCID: PMC4532611  PMID: 26048076
ontology version change; summarizing ontology change; ontology quality assurance; summarizing ontology evolution; visualizing ontology evolution; abstraction networks; ontology diff
9.  Laying a Community-Based Foundation for Data-Driven Semantic Standards in Environmental Health Sciences 
Environmental Health Perspectives  2016;124(8):1136-1140.
Background:
Despite increasing availability of environmental health science (EHS) data, development, and implementation of relevant semantic standards, such as ontologies or hierarchical vocabularies, has lagged. Consequently, integration and analysis of information needed to better model environmental influences on human health remains a significant challenge.
Objectives:
We aimed to identify a committed community and mechanisms needed to develop EHS semantic standards that will advance understanding about the impacts of environmental exposures on human disease.
Methods:
The National Institute of Environmental Health Sciences sponsored the “Workshop for the Development of a Framework for Environmental Health Science Language” hosted at North Carolina State University on 15–16 September 2014. Through the assembly of data generators, users, publishers, and funders, we aimed to develop a foundation for enabling the development of community-based and data-driven standards that will ultimately improve standardization, sharing, and interoperability of EHS information.
Discussion:
Creating and maintaining an EHS common language is a continuous and iterative process, requiring community building around research interests and needs, enabling integration and reuse of existing data, and providing a low barrier of access for researchers needing to use or extend such a resource.
Conclusions:
Recommendations included developing a community-supported web-based toolkit that would enable a) collaborative development of EHS research questions and use cases, b) construction of user-friendly tools for searching and extending existing semantic resources, c) education and guidance about standards and their implementation, and d) creation of a plan for governance and sustainability.
Citation:
Mattingly CJ, Boyles R, Lawler CP, Haugen AC, Dearry A, Haendel M. 2016. Laying a community-based foundation for data-driven semantic standards in environmental health sciences. Environ Health Perspect 124:1136–1140; http://dx.doi.org/10.1289/ehp.1510438
doi:10.1289/ehp.1510438
PMCID: PMC4977056  PMID: 26871594
10.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability 
Background
The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies.
Construction and content
Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class ‘cell in vitro’ have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning.
Utility and discussion
The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies—for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs.
Conclusions
The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the CL both among developers and within the user community.
doi:10.1186/s13326-016-0088-7
PMCID: PMC4932724  PMID: 27377652
11.  Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency 
Genetics in Medicine  2015;18(6):608-617.
Purpose:
Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles.
Genet Med 18 6, 608–617.
Methods:
Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease–gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein–protein association neighbors.
Genet Med 18 6, 608–617.
Results:
Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease–gene associations and ranked the correct seeded variant in up to 87% when detectable disease–gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation.
Genet Med 18 6, 608–617.
Conclusion:
Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.
Genet Med 18 6, 608–617.
doi:10.1038/gim.2015.137
PMCID: PMC4916229  PMID: 26562225
exome sequencing; model organisms; phenotype; semantic comparison; undiagnosed diseases
12.  The Ontology for Biomedical Investigations 
PLoS ONE  2016;11(4):e0154556.
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
doi:10.1371/journal.pone.0154556
PMCID: PMC4851331  PMID: 27128319
13.  Proceedings of a Sickle Cell Disease Ontology workshop — Towards the first comprehensive ontology for Sickle Cell Disease 
Sickle cell disease (SCD) is a debilitating single gene disorder caused by a single point mutation that results in physical deformation (i.e. sickling) of erythrocytes at reduced oxygen tensions. Up to 75% of SCD in newborns world-wide occurs in sub-Saharan Africa, where neonatal and childhood mortality from sickle cell related complications is high. While SCD research across the globe is tackling the disease on multiple fronts, advances have yet to significantly impact on the health and quality of life of SCD patients, due to lack of coordination of these disparate efforts. Ensuring data across studies is directly comparable through standardization is a necessary step towards realizing this goal. Such a standardization requires the development and implementation of a disease-specific ontology for SCD that is applicable globally. Ontology development is best achieved by bringing together experts in the domain to contribute their knowledge.
The SCD community and H3ABioNet members joined forces at a recent SCD Ontology workshop to develop an ontology covering aspects of SCD under the classes: phenotype, diagnostics, therapeutics, quality of life, disease modifiers and disease stage. The aim of the workshop was for participants to contribute their expertise to development of the structure and contents of the SCD ontology. Here we describe the proceedings of the Sickle Cell Disease Ontology Workshop held in Cape Town South Africa in February 2016 and its outcomes. The objective of the workshop was to bring together experts in SCD from around the world to contribute their expertise to the development of various aspects of the SCD ontology.
doi:10.1016/j.atg.2016.03.005
PMCID: PMC4911424  PMID: 27354937
14.  Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals 
PLoS ONE  2016;11(2):e0149102.
Background
In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required.
Development and Testing of the Ontology
Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar.
Results and Significance
Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
doi:10.1371/journal.pone.0149102
PMCID: PMC4752357  PMID: 26870952
15.  The Resource Identification Initiative: a cultural shift in publishing 
Brain and Behavior  2015;6(1):e00417.
Abstract
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, that is, reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as “How did other studies use resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example, a model organism database for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40 with RRIDs appearing in 62 different journals to date. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post‐pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources.
doi:10.1002/brb3.417
PMCID: PMC4834942  PMID: 27110440
16.  The Resource Identification Initiative: A cultural shift in publishing 
F1000Research  2015;4:134.
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.
doi:10.12688/f1000research.6555.2
PMCID: PMC4648211  PMID: 26594330
Resource identifiers; Multi-centre initiative; Publishing; Pre-pilot data; Post-pilot data
17.  Capturing phenotypes for precision medicine 
Deep phenotyping followed by integrated computational analysis of genotype and phenotype is becoming ever more important for many areas of genomic diagnostics and translational research. The overwhelming majority of clinical descriptions in the medical literature are available only as natural language text, meaning that searching, analysis, and integration of medically relevant information in databases such as PubMed is challenging. The new journal Cold Spring Harbor Molecular Case Studies will require authors to select Human Phenotype Ontology terms for research papers that will be displayed alongside the manuscript, thereby providing a foundation for ontology-based indexing and searching of articles that contain descriptions of phenotypic abnormalities—an important step toward improving the ability of researchers and clinicians to get biomedical information that is critical for clinical care or translational research.
doi:10.1101/mcs.a000372
PMCID: PMC4850887  PMID: 27148566
18.  Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome 
Science translational medicine  2014;6(252):252ra123.
Less than half of patients with suspected genetic disease receive a molecular diagnosis. We have therefore integrated next-generation sequencing (NGS), bioinformatics, and clinical data into an effective diagnostic workflow. We used variants in the 2741 established Mendelian disease genes [the disease-associated genome (DAG)] to develop a targeted enrichment DAG panel (7.1 Mb), which achieves a coverage of 20-fold or better for 98% of bases. Furthermore, we established a computational method [Phenotypic Interpretation of eXomes (PhenIX)] that evaluated and ranked variants based on pathogenicity and semantic similarity of patients’ phenotype described by Human Phenotype Ontology (HPO) terms to those of 3991 Mendelian diseases. In computer simulations, ranking genes based on the variant score put the true gene in first place less than 5% of the time; PhenIX placed the correct gene in first place more than 86% of the time. In a retrospective test of PhenIX on 52 patients with previously identified mutations and known diagnoses, the correct gene achieved a mean rank of 2.1. In a prospective study on 40 individuals without a diagnosis, PhenIX analysis enabled a diagnosis in 11 cases (28%, at a mean rank of 2.4). Thus, the NGS of the DAG followed by phenotype-driven bioinformatic analysis allows quick and effective differential diagnostics in medical genetics.
doi:10.1126/scitranslmed.3009262
PMCID: PMC4512639  PMID: 25186178
19.  Clinical interpretation of CNVs with cross-species phenotype data 
Journal of medical genetics  2014;51(11):766-772.
Background
Clinical evaluation of CNVs identified via techniques such as array comparative genome hybridisation (aCGH) involves the inspection of lists of known and unknown duplications and deletions with the goal of distinguishing pathogenic from benign CNVs. A key step in this process is the comparison of the individual's phenotypic abnormalities with those associated with Mendelian disorders of the genes affected by the CNV. However, because often there is not much known about these human genes, an additional source of data that could be used is model organism phenotype data. Currently, almost 6000 genes in mouse and zebrafish are, when knocked out, associated with a phenotype in the model organism, but no disease is known to be caused by mutations in the human ortholog. Yet, searching model organism databases and comparing model organism phenotypes with patient phenotypes for identifying novel disease genes and medical evaluation of CNVs is hindered by the difficulty in integrating phenotype information across species and the lack of appropriate software tools.
Methods
Here, we present an integrated ranking scheme based on phenotypic matching, degree of overlap with known benign or pathogenic CNVs and the haploinsufficiency score for the prioritisation of CNVs responsible for a patient's clinical findings.
Results
We show that this scheme leads to significant improvements compared with rankings that do not exploit phenotypic information. We provide a software tool called PhenogramViz, which supports phenotype-driven interpretation of aCGH findings based on multiple data sources, including the integrated cross-species phenotype ontology Uberpheno, in order to visualise gene-to-phenotype relations.
Conclusions
Integrating and visualising cross-species phenotype information on the affected genes may help in routine diagnostics of CNVs.
doi:10.1136/jmedgenet-2014-102633
PMCID: PMC4501634  PMID: 25280750
20.  Achieving human and machine accessibility of cited data in scholarly publications 
Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.
doi:10.7717/peerj-cs.1
PMCID: PMC4498574  PMID: 26167542
Human–Computer Interaction; Data Science; Digital Libraries; World Wide Web and Web Science; Data citation; Machine accessibility; Data archiving; Data accessibility
21.  Disease insights through cross-species phenotype comparisons 
Mammalian Genome  2015;26(9-10):548-555.
New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient’s set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.
doi:10.1007/s00335-015-9577-8
PMCID: PMC4602072  PMID: 26092691
22.  The Resource Identification Initiative: A cultural shift in publishing 
F1000Research  2015;4:134.
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.
doi:10.12688/f1000research.6555.1
PMCID: PMC4648211  PMID: 26594330
Resource identifiers; Multi-centre initiative; Publishing; Pre-pilot data; Post-pilot data
23.  Finding Our Way through Phenotypes 
Deans, Andrew R. | Lewis, Suzanna E. | Huala, Eva | Anzaldo, Salvatore S. | Ashburner, Michael | Balhoff, James P. | Blackburn, David C. | Blake, Judith A. | Burleigh, J. Gordon | Chanet, Bruno | Cooper, Laurel D. | Courtot, Mélanie | Csösz, Sándor | Cui, Hong | Dahdul, Wasila | Das, Sandip | Dececchi, T. Alexander | Dettai, Agnes | Diogo, Rui | Druzinsky, Robert E. | Dumontier, Michel | Franz, Nico M. | Friedrich, Frank | Gkoutos, George V. | Haendel, Melissa | Harmon, Luke J. | Hayamizu, Terry F. | He, Yongqun | Hines, Heather M. | Ibrahim, Nizar | Jackson, Laura M. | Jaiswal, Pankaj | James-Zorn, Christina | Köhler, Sebastian | Lecointre, Guillaume | Lapp, Hilmar | Lawrence, Carolyn J. | Le Novère, Nicolas | Lundberg, John G. | Macklin, James | Mast, Austin R. | Midford, Peter E. | Mikó, István | Mungall, Christopher J. | Oellrich, Anika | Osumi-Sutherland, David | Parkinson, Helen | Ramírez, Martín J. | Richter, Stefan | Robinson, Peter N. | Ruttenberg, Alan | Schulz, Katja S. | Segerdell, Erik | Seltmann, Katja C. | Sharkey, Michael J. | Smith, Aaron D. | Smith, Barry | Specht, Chelsea D. | Squires, R. Burke | Thacker, Robert W. | Thessen, Anne | Fernandez-Triana, Jose | Vihinen, Mauno | Vize, Peter D. | Vogt, Lars | Wall, Christine E. | Walls, Ramona L. | Westerfeld, Monte | Wharton, Robert A. | Wirkner, Christian S. | Woolley, James B. | Yoder, Matthew J. | Zorn, Aaron M. | Mabee, Paula
PLoS Biology  2015;13(1):e1002033.
Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
doi:10.1371/journal.pbio.1002033
PMCID: PMC4285398  PMID: 25562316
24.  Meeting report: Identifying practical applications of ontologies for biodiversity informatics 
This report describes the outcomes of a recent workshop, building on a series of workshops from the last three years with the goal if integrating genomics and biodiversity research, with a more specific goal here to express terms in Darwin Core and Audubon Core, where class constructs have been historically underspecified, into a Biological Collections Ontology (BCO) framework. For the purposes of this workshop, the BCO provided the context for fully defining classes as well as object and data properties, including domain and range information, for both the Darwin Core and Audubon Core. In addition, the workshop participants reviewed technical specifications and approaches for annotating instance data with BCO terms. Finally, we laid out proposed activities for the next 3 to 18 months to continue this work.
doi:10.1186/s40793-015-0014-0
PMCID: PMC4511409
Ontology; Biodiversity; Population; Community; Darwin core; OWL; RDF; Microbial ecology; Sequencing

Results 1-25 (49)