Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trained in understanding complex languages such as the Web Ontology Language (OWL), in tools designed for such experts. As requests for new terms are made, the need for expert ontologists represents a bottleneck in the development process. Furthermore, the ability to rigorously enforce ontology design patterns in large, collaboratively developed ontologies is difficult with existing ontology authoring software.
We present Webulous, an application suite for supporting ontology creation by design patterns. Webulous provides infrastructure to specify templates for populating ontology design patterns that get transformed into OWL assertions in a target ontology. Webulous provides programmatic access to the template server and a client application has been developed for Google Sheets that allows templates to be loaded, populated and resubmitted to the Webulous server for processing.
The development and delivery of ontologies to the community requires software support that goes beyond the ontology editor. Building ontologies by design patterns and providing simple mechanisms for the addition of new content helps reduce the overall cost and effort required to develop an ontology. The Webulous system provides support for this process and is used as part of the development of several ontologies at the European Bioinformatics Institute.
OWL; Ontology; Spreadsheet; Webulous; Google App
The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts.
Semantic mapping uses a combination of custom scripting, our annotation tool ‘Zooma’, and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV.
EFO yields an average of over 80 % of mapping coverage in all data sources. A 42 % precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study.
Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, ‘OBAN’, as a means to integrate disease using shared phenotypes.
EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.
Electronic supplementary material
The online version of this article (doi:10.1186/s13326-016-0051-7) contains supplementary material, which is available to authorized users.
Rare disease; Phenotype disease associations; OBAN; CTTV; EFO
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.
Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.
Construction and content
Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms.
Utility and discussion
The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.
Cell line; Cell line cell; Immortal cell line cell; Mortal cell line cell; Cell line cell culturing; Anatomy
Biomedical ontologists to date have concentrated on ontological descriptions of biomedical entities such as gene products and their attributes, phenotypes and so on. Recently, effort has diversified to descriptions of the laboratory investigations by which these entities were produced. However, much biological insight is gained from the analysis of the data produced from these investigations, and there is a lack of adequate descriptions of the wide range of software that are central to bioinformatics. We need to describe how data are analyzed for discovery, audit trails, provenance and reproducibility.
The Software Ontology (SWO) is a description of software used to store, manage and analyze data. Input to the SWO has come from beyond the life sciences, but its main focus is the life sciences. We used agile techniques to gather input for the SWO and keep engagement with our users. The result is an ontology that meets the needs of a broad range of users by describing software, its information processing tasks, data inputs and outputs, data formats versions and so on. Recently, the SWO has incorporated EDAM, a vocabulary for describing data and related concepts in bioinformatics. The SWO is currently being used to describe software used in multiple biomedical applications.
The SWO is another element of the biomedical ontology landscape that is necessary for the description of biomedical entities and how they were discovered. An ontology of software used to analyze data produced by investigations in the life sciences can be made in such a way that it covers the important features requested and prioritized by its users. The SWO thus fits into the landscape of biomedical ontologies and is produced using techniques designed to keep it in line with user’s needs.
The Software Ontology is available under an Apache 2.0 license at http://theswo.sourceforge.net/; the Software Ontology blog can be read at http://softwareontology.wordpress.com.
Mice with deletion of genes for small heat shock proteins αA- and αB-crystallin (αA/αB−/−) develop cataracts. We used proteomic analysis to identify lens proteins that change in abundance after deletion of these α-crystallin genes. Wild-type (WT) and αA/αB−/− knockout (DKO) mice were compared using two-dimensional difference gel electrophoresis and mass spectrometric analysis, and protein identifications were validated by Mascot proteomic software. The abundance of histones H2A, H4, and H2B fragment, and a low molecular weight β1-catenin increased 2- to 3-fold in postnatal day 2 lenses of DKO lenses compared with WT lenses. Additional major increases were observed in abundance of βB2-crystallin and vimentin in 30-day-old lenses of DKO animals compared with WT animals. Lenses of DKO mice were comprised of 9 protein spots containing βB2-crystallin at 10- to 40-fold higher abundance and 3 protein spots containing vimentin at ≥ 2-fold higher abundance than in WT lenses. Gel permeation chromatography identified a unique 328 kDa protein in DKO lenses, containing β-crystallin, demonstrating aggregation of β-crystallin in the absence of α-crystallins. Together, these changes provide biochemical evidence for possible functions of specific cell adhesion proteins, cytoskeletal proteins, and crystallins in lens opacities caused by the absence of the major chaperones, αA- and αB-crystallins.
Crystallin; knockout; proteomics; substrate; alpha-crystallin; chaperone
αA-crystallin and αB-crystallin are members of the small heat shock protein family and function as molecular chaperones and major lens structural proteins. Although numerous studies have examined their chaperone-like activities in vitro, little is known about the proteins they protect in vivo. To elucidate the relationships between chaperone function, substrate binding, and human cataract formation, we used proteomic and mass spectrometric methods to analyze the effect of mutations associated with hereditary human cataract formation on protein abundance in αA-R49C and αB-R120G knock-in mutant lenses. Compared with age-matched wild type lenses, 2-day-old αA-R49C heterozygous lenses demonstrated the following: increased crosslinking (15-fold) and degradation (2.6-fold) of αA-crystallin; increased association between αA-crystallin and filensin, actin, or creatine kinase B; increased acidification of βB1-crystallin; increased levels of grifin; and an association between βA3/A1-crystallin and αA-crystallin. Homozygous αA-R49C mutant lenses exhibited increased associations between αA-crystallin and βB3-, βA4-, βA2-crystallins, and grifin, whereas levels of βB1-crystallin, gelsolin, and calpain 3 decreased. The amount of degraded glutamate dehydrogenase, α-enolase, and cytochrome c increased more than 50-fold in homozygous αA-R49C mutant lenses. In αB-R120G mouse lenses, our analyses identified decreased abundance of phosphoglycerate mutase, several β- and γ-crystallins, and degradation of αA- and αB-crystallin early in cataract development. Changes in the abundance of hemoglobin and histones with the loss of normal α-crystallin chaperone function suggest that these proteins also play important roles in the biochemical mechanisms of hereditary cataracts. Together, these studies offer a novel insight into the putative in vivo substrates of αA- and αB-crystallin.
Motivation: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
Retinal ganglion cells (RGCs) transmit visual information topographically from the eye to the brain, creating a map of visual space in retino-recipient nuclei (retinotopy). This process is affected by retinal activity and by activity-independent molecular cues. Phr1, which encodes a presumed E3 ubiquitin ligase (PHR1), is required presynaptically for proper placement of RGC axons in the lateral geniculate nucleus and the superior colliculus, suggesting that increased levels of PHR1 target proteins may be instructive for retinotopic mapping of retinofugal projections. To identify potential target proteins, we conducted a proteomic analysis of optic nerve to identify differentially abundant proteins in the presence or absence of Phr1 in RGCs. 1D gel electrophoresis identified a specific band in controls that was absent in mutants. Targeted proteomic analysis of this band demonstrated the presence of PHR1. Additionally, we conducted an unbiased proteomic analysis that identified 30 proteins as being significantly different between the two genotypes. One of these, heterogeneous nuclear ribonucleoprotein M (hnRNP-M), regulates antero-posterior patterning in invertebrates and can function as a cell surface adhesion receptor in vertebrates. Thus we have demonstrated that network analysis of quantitative proteomic data is a useful approach for hypothesis generation and for identifying biologically relevant targets in genetically altered biological models.
Phr1; Mycbp2; retinal ganglion cell; proteomics; hnRNP-M; retinotopy; ubiquitin ligase; label-free quantitative proteomics; LC-MS; network analysis
Pathogenesis of many bacterially-induced inflammatory diseases is driven by toll- like receptor (TLR) mediated immune responses following recognition of bacterial factors by different TLRs. Periodontitis is a chronic inflammation of the tooth supporting apparatus often leading to tooth loss, and is caused by a Gram-negative bacterial consortium that includes Tannerella forsythia. This bacterium expresses a virulence factor, the BspA, which drives periodontal inflammation by activating TLR2. The N- terminal portion of the BspA protein comprises a leucine-rich repeat (LRR) domain previously shown to be involved in the binding and activation of TLR2. The objective of the current study was to identify specific epitopes in the LRR domain of BspA that interact with TLR2. Our results demonstrate that a sequence motif GC(S/T)GLXSIT is involved in mediating the interaction of BspA with TLR2. Thus, our study has identified a peptide motif that mediates the binding of a bacterial protein to TLR2 and highlights the promiscuous nature of TLR2 with respect to ligand binding. This work could provide a structural basis for designing peptidomimetics to modulate the activity of TLR2 in order to block bacterially-induced inflammation.
leucine-rich repeat protein; BspA; TLR-2; Tannerella forsythia
Biomarkers are required for pre-symptomatic diagnosis, treatment, and monitoring of neurodegenerative diseases such as Alzheimer's disease. Cerebrospinal fluid (CSF) is a favored source because its proteome reflects the composition of the brain. Ideal biomarkers have low technical and inter-individual variability (subject variance) among control subjects to minimize overlaps between clinical groups. This study evaluates a process of multi-affinity fractionation (MAF) and quantitative label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) for CSF biomarker discovery by (1) identifying reparable sources of technical variability, (2) assessing subject variance and residual technical variability for numerous CSF proteins, and (3) testing its ability to segregate samples on the basis of desired biomarker characteristics.
Fourteen aliquots of pooled CSF and two aliquots from six cognitively normal individuals were randomized, enriched for low-abundance proteins by MAF, digested endoproteolytically, randomized again, and analyzed by nano-LC-MS. Nano-LC-MS data were time and m/z aligned across samples for relative peptide quantification. Among 11,433 aligned charge groups, 1360 relatively abundant ones were annotated by MS2, yielding 823 unique peptides. Analyses, including Pearson correlations of annotated LC-MS ion chromatograms, performed for all pairwise sample comparisons, identified several sources of technical variability: i) incomplete MAF and keratins; ii) globally- or segmentally-decreased ion current in isolated LC-MS analyses; and iii) oxidized methionine-containing peptides. Exclusion of these sources yielded 609 peptides representing 81 proteins. Most of these proteins showed very low coefficients of variation (CV<5%) whether they were quantified from the mean of all or only the 2 most-abundant peptides. Unsupervised clustering, using only 24 proteins selected for high subject variance, yielded perfect segregation of pooled and individual samples.
Quantitative label-free LC-MS/MS can measure scores of CSF proteins with low technical variability and can segregate samples according to desired criteria. Thus, this technique shows potential for biomarker discovery for neurological diseases.
Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required.
Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations.
Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl.
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
The molecular basis of the increased susceptibility of steatotic livers to warm ischemia/reperfusion (I/R) injury during transplantation remains undefined. Animal model for warm I/R injury was induced in obese Zucker rats. Lean Zucker rats provided controls. Two dimensional differential gel electrophoresis was performed with liver protein extracts. Protein features with significant abundance ratios (p < 0.01) between the two cohorts were selected and analyzed with HPLC/MS. Proteins were identified by Uniprot database. Interactive protein networks were generated using Ingenuity Pathway Analysis and GRANITE software.
The relative abundance of 105 proteins was observed in warm I/R injury. Functional grouping revealed four categories of importance: molecular chaperones/endoplasmic reticulum (ER) stress, oxidative stress, metabolism, and cell structure. Hypoxia up-regulated 1, calcium binding protein 1, calreticulin, heat shock protein (HSP) 60, HSP-90, and protein disulfide isomerase 3 were chaperonins significantly (p < 0.01) down-regulated and only one chaperonin, HSP-1was significantly upregulated in steatotic liver following I/R.
Down-regulation of the chaperones identified in this analysis may contribute to the increased ER stress and, consequently, apoptosis and necrosis. This study provides an initial platform for future investigation of the role of chaperones and therapeutic targets for increasing the viability of steatotic liver allografts.
Ischemia repurfusion injury; Two dimensional gel electrophoresis; Mass spectrometry; Liver transplantation; Chaperonins; Endoplasmic reticulum (ER) stress
Using a proteomics approach, the authors examined whether class 1 UV-blocking contact lenses protect against UVB radiation–induced damage in a human lens epithelial cell line (HLE B-3) and postmortem human lenses.
To determine whether class 1 UV-blocking contact lenses protect against UVB radiation–induced damage in a human lens epithelial cell line (HLE B-3) and postmortem human lenses using a proteomics approach.
HLE B-3 cells were exposed to 6.4 mW/cm2 UVB radiation at 302 nm for 2 minutes (768 mJ/cm2) with or without covering by senofilcon A class 1 UV-blocking contact lenses or lotrafilcon A non–UV-blocking (lotrafilcon A has some UV-blocking ability, albeit minimal) contact lenses. Control cells were not exposed to UVB radiation. Four hours after treatment, cells were analyzed by two-dimensional difference gel electrophoresis and tandem mass spectrometry, and changes in protein abundance were quantified. F-actin and microtubule cytoskeletons were examined by fluorescence staining. In addition, human donor lenses were exposed to UVB radiation at 302 nm for 4 minutes (1536 mJ/cm2). Cortical and epithelial cell proteins were scraped from lens surfaces and subjected to the same protein analyses.
Senofilcon A lenses were beneficial for protecting HLE B-3 cells against UVB radiation–induced changes in caldesmon 1 isoform, lamin A/C transcript variant 1, DEAD (Asp-Glu-Ala-Asp) box polypeptide, β-actin, glyceraldehyde 3-phosphate dehydrogenase (G3PDH), annexin A2, triose phosphate isomerase, and ubiquitin B precursor. These contact lenses also prevented actin and microtubule cytoskeleton changes typically induced by UVB radiation. Conversely, non–UV-blocking contact lenses were not protective. UVB-irradiated human lenses showed marked reductions in αA-crystallin, αB-crystallin, aldehyde dehydrogenase 1, βS-crystallin, βB2-crystallin, and G3PDH, and UV-absorbing contact lenses significantly prevented these alterations.
Senofilcon A class 1 UV-blocking contact lenses largely prevented UVB-induced changes in protein abundance in lens epithelial cells and in human lenses.
Objectives: Non-melanoma skin cancer is the most common malignancy in US, with an annual incidence of in excess of 1.5 million cases. In the majority of cases, locoregional treatment is curative and systemic therapy is not indicated. Platinum-based chemotherapy regimens have been used most commonly in refractory cases. The use of cetuximab, a monoclonal antibody targeting epidermal growth factor receptor [EGFR], has been reported for skin cancer treatment. This current study evaluated eight cases of locally advanced and refractory basal cell or squamous cell cancers which were treated with cetuximab.
Methods: This is a retrospective study on eight patients who had received cetuximab for treatment of cutaneous carcinoma since 2007 at Southern Illinois University School of Medicine (SIU-SOM) Medical Oncology clinic.
Results: Three of the four patients with basal cell carcinoma and two of the four patients with squamous cell carcinoma maintained remission on treatment.. The main side effect was acneiform rash which required termination of treatment for one patient and dose reduction in another.
Conclusion: The study indicates that cetuximab may have a beneficial role for patients with non-melanoma cutaneous carcinomas that are refractory to standard therapy.
cetuximab; non-melanoma; skin cancer
Disease-modifying therapies for Alzheimer’s disease (AD) would be most beneficial if applied during the ‘preclinical’ stage (pathology present with cognition intact) before significant neuronal loss occurs. Therefore, biomarkers that can detect AD pathology in its early stages and predict dementia onset and progression will be invaluable for patient care and efficient clinical trial design.
2D–difference gel electrophoresis and liquid chromatography tandem mass spectrometry were used to measure AD-associated changes in cerebrospinal fluid (CSF). Concentrations of CSF YKL-40 were further evaluated by enzyme-linked immunosorbent assay in the discovery cohort (N=47), an independent sample set (N=292) with paired plasma samples (N=237), frontotemporal lobar degeneration (N=9), and progressive supranuclear palsy (PSP, N=6). Human AD brain was studied immunohistochemically to identify potential source(s) of YKL-40.
In the discovery and validation cohorts, mean CSF YKL-40 was higher in very mild and mild AD-type dementia (Clinical Dementia Rating [CDR] 0.5 and 1) vs. controls (CDR 0) and PSP. Importantly, CSF YKL-40/Aβ42 ratio predicted risk of developing cognitive impairment (CDR 0 to CDR>0 conversion) as well as the best CSF biomarkers identified to date, tau/Aβ42 and p-tau181/Aβ42. Mean plasma YKL-40 was higher in CDR 0.5 and 1 vs. CDR 0 groups, and correlated with CSF levels. YKL-40 immunoreactivity was observed within astrocytes near a subset of amyloid plaques, implicating YKL-40 in the neuroinflammatory response to Aβ deposition.
These data demonstrate that YKL-40, a putative indicator of neuroinflammation, is elevated in AD, and that, together with Aβ42, has potential prognostic utility as a biomarker for preclinical AD.
YKL-40; Alzheimer’s disease; biomarkers; cerebrospinal fluid; chitinase-3 like-1; inflammation
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies.
To evaluate how well current anatomical ontologies fit the way real-world users apply anatomy terms in their data annotations.
Annotations from three diverse multi-species public-domain datasets provided a set of use cases for matching anatomical terms in two major anatomical ontologies (the Foundational Model of Anatomy and Uberon), using two lexical-matching applications (Zooma and Ontology Mapper).
Approximately 1500 terms were identified; Uberon/Zooma mappings provided 286 matches, compared to the control and Ontology Mapper returned 319 matches. For the Foundational Model of Anatomy, Zooma returned 312 matches, and Ontology Mapper returned 397.
Our results indicate that for our datasets the anatomical entities or concepts are embedded in user-generated complex terms, and while lexical mapping works, anatomy ontologies do not provide the majority of terms users supply when annotating data. Provision of searchable cross-products for compositional terms is a key requirement for using ontologies.
Text definitions for entities within bio-ontologies are a cornerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag between specification of the main part of an ontology (logical descriptions and definitions of entities) and the development of the text-based definitions. The goal of natural language generation (NLG) from ontologies is to take the logical description of entities and generate fluent natural language. The application described here uses NLG to automatically provide text-based definitions from an ontology that has logical descriptions of its entities, so avoiding the bottleneck of authoring these definitions by hand.
To produce the descriptions, the program collects all the axioms relating to a given entity, groups them according to common structure, realises each group through an English sentence, and assembles the resulting sentences into a paragraph, to form as ‘coherent’ a text as possible without human intervention. Sentence generation is accomplished using a generic grammar based on logical patterns in OWL, together with a lexicon for realising atomic entities. We have tested our output for the Experimental Factor Ontology (EFO) using a simple survey strategy to explore the fluency of the generated text and how well it conveys the underlying axiomatisation. Two rounds of survey and improvement show that overall the generated English definitions are found to convey the intended meaning of the axiomatisation in a satisfactory manner. The surveys also suggested that one form of generated English will not be universally liked; that intrusion of too much ‘formal ontology’ was not liked; and that too much explicit exposure of OWL semantics was also not liked.
Our prototype tools can generate reasonable paragraphs of English text that can act as definitions. The definitions were found acceptable by our survey and, as a result, the developers of EFO are sufficiently satisfied with the output that the generated definitions have been incorporated into EFO. Whilst not a substitute for hand-written textual definitions, our generated definitions are a useful starting point.
An on-line version of the NLG text definition tool can be found at http://swat.open.ac.uk/tools/. The questionaire and sample generated text definitions may be found at http://mcs.open.ac.uk/nlg/SWAT/bio-ontologies.html.
Biliary atresia (BA) is the most serious liver disease in infants. Diagnosis currently depends on surgical exploration of the biliary tree. Non-invasive tests that distinguish BA from other types of neonatal liver disease are not available.
To identify potential serum biomarkers that classify children with neonatal cholestasis, we performed 2-dimensional difference gel electrophoresis, statistical analysis, and tandem mass spectrometry using serum samples from 19 infants with BA and 19 infants with non-BA neonatal cholestasis.
11 potential serum biomarkers were found that could in combination classify children with neonatal cholestasis.
Although no single biomarker or imaging test adequately distinguishes BA from other types of neonatal cholestasis, combinations of biomarkers, imaging tests and non-invasive clinical criteria should be further explored as potential tests for rapid and accurate diagnosis of BA.
Proteomics; biliary atresia; neonatal cholestasis; biomarker
Cervical cancer screening is ideally suited for the development of biomarkers due to the ease of tissue acquisition and the well-established histological transitions. Furthermore, cell and biologic fluid obtained from cervix samples undergo specific molecular changes that can be profiled. However, the ideal manner and techniques for preparing cervical samples remains to be determined. To address this critical issue a patient screening protein and nucleic acid collection protocol was established. RNAlater was used to collect the samples followed by proteomic methods to identify proteins that were differentially expressed in normal cervical epithelial versus cervical cancer cells. Three hundred ninety spots were identified via two-dimensional difference gel electrophoresis (2-D DIGE) that were expressed at either higher or lower levels (>3-fold) in cervical cancer samples. These proteomic results were compared to genes in a cDNA microarray analysis of microdissected neoplastic cervical specimens to identify overlapping patterns of expression. The most frequent pathways represented by the combined dataset were: cell cycle: G2/M DNA damage checkpoint regulation; aryl hydrocarbon receptor signaling; p53 signaling; cell cycle: G1/S checkpoint regulation; and the endoplasmic reticulum stress pathway. HNRPA2B1 was identified as a biomarker candidate with increased expression in cancer compared to normal cervix and validated by Western blot.
2-D DIGE; biomarkers; cervical cancer; cDNA microarray; RNAlater
Ideally, disease modifying therapies for Alzheimer disease (AD) will be applied during the ‘preclinical’ stage (pathology present with cognition intact) before severe neuronal damage occurs, or upon recognizing very mild cognitive impairment. Developing and judiciously administering such therapies will require biomarker panels to identify early AD pathology, classify disease stage, monitor pathological progression, and predict cognitive decline. To discover such biomarkers, we measured AD-associated changes in the cerebrospinal fluid (CSF) proteome.
Methods and Findings
CSF samples from individuals with mild AD (Clinical Dementia Rating [CDR] 1) (n = 24) and cognitively normal controls (CDR 0) (n = 24) were subjected to two-dimensional difference-in-gel electrophoresis. Within 119 differentially-abundant gel features, mass spectrometry (LC-MS/MS) identified 47 proteins. For validation, eleven proteins were re-evaluated by enzyme-linked immunosorbent assays (ELISA). Six of these assays (NrCAM, YKL-40, chromogranin A, carnosinase I, transthyretin, cystatin C) distinguished CDR 1 and CDR 0 groups and were subsequently applied (with tau, p-tau181 and Aβ42 ELISAs) to a larger independent cohort (n = 292) that included individuals with very mild dementia (CDR 0.5). Receiver-operating characteristic curve analyses using stepwise logistic regression yielded optimal biomarker combinations to distinguish CDR 0 from CDR>0 (tau, YKL-40, NrCAM) and CDR 1 from CDR<1 (tau, chromogranin A, carnosinase I) with areas under the curve of 0.90 (0.85–0.94 95% confidence interval [CI]) and 0.88 (0.81–0.94 CI), respectively.
Four novel CSF biomarkers for AD (NrCAM, YKL-40, chromogranin A, carnosinase I) can improve the diagnostic accuracy of Aβ42 and tau. Together, these six markers describe six clinicopathological stages from cognitive normalcy to mild dementia, including stages defined by increased risk of cognitive decline. Such a panel might improve clinical trial efficiency by guiding subject enrollment and monitoring disease progression. Further studies will be required to validate this panel and evaluate its potential for distinguishing AD from other dementing conditions.
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.