Motivation: Data collection in spreadsheets is ubiquitous, but current solutions lack support for collaborative semantic annotation that would promote shared and interdisciplinary annotation practices, supporting geographically distributed players.
Results: OntoMaton is an open source solution that brings ontology lookup and tagging capabilities into a cloud-based collaborative editing environment, harnessing Google Spreadsheets and the NCBO Web services. It is a general purpose, format-agnostic tool that may serve as a component of the ISA software suite. OntoMaton can also be used to assist the ontology development process.
Availability: OntoMaton is freely available from Google widgets under the CPAL open source license; documentation and examples at: https://github.com/ISA-tools/OntoMaton.
Contact:
isatools@googlegroups.com
doi:10.1093/bioinformatics/bts718
PMCID: PMC3570217
PMID: 23267176
Haug, Kenneth | Salek, Reza M. | Conesa, Pablo | Hastings, Janna | de Matos, Paula | Rijnbeek, Mark | Mahendraker, Tejasvi | Williams, Mark | Neumann, Steffen | Rocca-Serra, Philippe | Maguire, Eamonn | González-Beltrán, Alejandra | Sansone, Susanna-Assunta | Griffin, Julian L. | Steinbeck, Christoph
MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
doi:10.1093/nar/gks1004
PMCID: PMC3531110
PMID: 23109552
Steinbeck, Christoph | Conesa, Pablo | Haug, Kenneth | Mahendraker, Tejasvi | Williams, Mark | Maguire, Eamonn | Rocca-Serra, Philippe | Sansone, Susanna-Assunta | Salek, Reza M. | Griffin, Julian L.
Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata (http://www.ebi.ac.uk/metabolights). It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6–8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.
doi:10.1007/s11306-012-0462-0
PMCID: PMC3465651
PMID: 23060735
Metabolomics; Databases; ISA-Tab; ISA commons
Sansone, Susanna-Assunta | Rocca-Serra, Philippe | Field, Dawn | Maguire, Eamonn | Taylor, Chris | Hofmann, Oliver | Fang, Hong | Neumann, Steffen | Tong, Weida | Amaral-Zettler, Linda | Begley, Kimberly | Booth, Tim | Bougueleret, Lydie | Burns, Gully | Chapman, Brad | Clark, Tim | Coleman, Lee-Ann | Copeland, Jay | Das, Sudeshna | de Daruvar, Antoine | de Matos, Paula | Dix, Ian | Edmunds, Scott | Evelo, Chris T | Forster, Mark J | Gaudet, Pascale | Gilbert, Jack | Goble, Carole | Griffin, Julian L | Jacob, Daniel | Kleinjans, Jos | Harland, Lee | Haug, Kenneth | Hermjakob, Henning | Ho Sui, Shannan J | Laederach, Alain | Liang, Shaoguang | Marshall, Stephen | McGrath, Annette | Merrill, Emily | Reilly, Dorothy | Roux, Magali | Shamu, Caroline E | Shang, Catherine A | Steinbeck, Christoph | Trefethen, Anne | Williams-Jones, Bryn | Wolstencroft, Katherine | Xenarios, Ioannis | Hide, Winston
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
doi:10.1038/ng.1054
PMCID: PMC3428019
PMID: 22281772
Liolios, Konstantinos | Schriml, Lynn | Hirschman, Lynette | Pagani, Ioanna | Nosrat, Bahador | Sterk, Peter | White, Owen | Rocca-Serra, Philippe | Sansone, Susanna-Assunta | Taylor, Chris | Kyrpides, Nikos C. | Field, Dawn
Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
doi:10.4056/sigs.2675953
PMCID: PMC3558968
PMID: 23409217
Yilmaz, Pelin | Kottmann, Renzo | Field, Dawn | Knight, Rob | Cole, James R | Amaral-Zettler, Linda | Gilbert, Jack A | Karsch-Mizrachi, Ilene | Johnston, Anjanette | Cochrane, Guy | Vaughan, Robert | Hunter, Christopher | Park, Joonhong | Morrison, Norman | Rocca-Serra, Philippe | Sterk, Peter | Arumugam, Manimozhiyan | Bailey, Mark | Baumgartner, Laura | Birren, Bruce W | Blaser, Martin J | Bonazzi, Vivien | Booth, Tim | Bork, Peer | Bushman, Frederic D | Buttigieg, Pier Luigi | Chain, Patrick S G | Charlson, Emily | Costello, Elizabeth K | Huot-Creasy, Heather | Dawyndt, Peter | DeSantis, Todd | Fierer, Noah | Fuhrman, Jed A | Gallery, Rachel E | Gevers, Dirk | Gibbs, Richard A | Gil, Inigo San | Gonzalez, Antonio | Gordon, Jeffrey I | Guralnick, Robert | Hankeln, Wolfgang | Highlander, Sarah | Hugenholtz, Philip | Jansson, Janet | Kau, Andrew L | Kelley, Scott T | Kennedy, Jerry | Knights, Dan | Koren, Omry | Kuczynski, Justin | Kyrpides, Nikos | Larsen, Robert | Lauber, Christian L | Legg, Teresa | Ley, Ruth E | Lozupone, Catherine A | Ludwig, Wolfgang | Lyons, Donna | Maguire, Eamonn | Methé, Barbara A | Meyer, Folker | Muegge, Brian | Nakielny, Sara | Nelson, Karen E | Nemergut, Diana | Neufeld, Josh D | Newbold, Lindsay K | Oliver, Anna E | Pace, Norman R | Palanisamy, Giriprakash | Peplies, Jörg | Petrosino, Joseph | Proctor, Lita | Pruesse, Elmar | Quast, Christian | Raes, Jeroen | Ratnasingham, Sujeevan | Ravel, Jacques | Relman, David A | Assunta-Sansone, Susanna | Schloss, Patrick D | Schriml, Lynn | Sinha, Rohini | Smith, Michelle I | Sodergren, Erica | Spor, Aymé | Stombaugh, Jesse | Tiedje, James M | Ward, Doyle V | Weinstock, George M | Wendel, Doug | White, Owen | Whiteley, Andrew | Wilke, Andreas | Wortman, Jennifer R | Yatsunenko, Tanya | Glöckner, Frank Oliver
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
doi:10.1038/nbt.1823
PMCID: PMC3367316
PMID: 21552244
Motivations: Spreadsheet-like tabular formats are ever more popular in the biomedical field as a mean for experimental reporting. The problem of converting the graph of an experimental workflow into a table-based representation occurs in many such formats and is not easy to solve.
Results: We describe graph2tab, a library that implements methods to realise such a conversion in a size-optimised way. Our solution is generic and can be adapted to specific cases of data exporters or data converters that need to be implemented.
Availability and Implementation: The library source code and documentation are available at http://github.com/ISA-tools/graph2tab.
Contact:
brandizi@ebi.ac.uk.
Supplementary Information: A supplementary document describes the theoretical and technical details about the library implementation.
doi:10.1093/bioinformatics/bts258
PMCID: PMC3371871
PMID: 22556367
Gaudet, Pascale | Bairoch, Amos | Field, Dawn | Sansone, Susanna-Assunta | Taylor, Chris | Attwood, Teresa K. | Bateman, Alex | Blake, Judith A. | Bult, Carol J. | Cherry, J. Michael | Chisholm, Rex L. | Cochrane, Guy | Cook, Charles E. | Eppig, Janan T. | Galperin, Michael Y. | Gentleman, Robert | Goble, Carole A. | Gojobori, Takashi | Hancock, John M. | Howe, Douglas G. | Imanishi, Tadashi | Kelso, Janet | Landsman, David | Lewis, Suzanna E. | Karsch Mizrachi, Ilene | Orchard, Sandra | Ouellette, B.F. Francis | Ranganathan, Shoba | Richardson, Lorna | Rocca-Serra, Philippe | Schofield, Paul N. | Smedley, Damian | Southan, Christopher | Tan, Tin W. | Tatusova, Tatiana | Whetzel, Patricia L. | White, Owen | Yamasaki, Chisato
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/database/baq027
PMCID: PMC3017395
PMID: 21205783
Kettner, Carsten | Field, Dawn | Sansone, Susanna-Assunta | Taylor, Chris | Aerts, Jan | Binns, Nigel | Blake, Andrew | Britten, Cedrik M. | de Marco, Ario | Fostel, Jennifer | Gaudet, Pascale | González-Beltrán, Alejandra | Hardy, Nigel | Hellemans, Jan | Hermjakob, Henning | Juty, Nick | Leebens-Mack, Jim | Maguire, Eamonn | Neumann, Steffen | Orchard, Sandra | Parkinson, Helen | Piel, William | Ranganathan, Shoba | Rocca-Serra, Philippe | Santarsiero, Annapaola | Shotton, David | Sterk, Peter | Untergasser, Andreas | Whetzel, Patricia L.
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at http://mibbi.org/.
doi:10.4056/sigs.147362
PMCID: PMC3035314
PMID: 21304730
Field, Dawn | Sansone, Susanna | DeLong, Edward F. | Sterk, Peter | Friedberg, Iddo | Gaudet, Pascale | Lewis, Susanna | Kottmann, Renzo | Hirschman, Lynette | Garrity, George | Cochrane, Guy | Wooley, John | Meyer, Folker | Hunter, Sarah | White, Owen | Bramlett, Brian | Gregurick, Susan | Lapp, Hilmar | Orchard, Sandra | Rocca-Serra, Philippe | Ruttenberg, Alan | Shah, Nigam | Taylor, Chris | Thessen, Anne
This report summarizes the proceedings of the one day BioSharing meeting held at the Intelligent Systems for Molecular Biology (ISMB) 2010 conference in Boston, MA, USA This inaugural BioSharing event was hosted by the Genomic Standards Consortium as part of its M3 & BioSharing special interest group (SIG) workshop. The BioSharing event included invited talks from a range of community leaders and a panel discussion at the end of the day. The panel session led to the formal agreement among community leaders to join together to promote cross-community knowledge exchange and collaborations. A key focus of the newly formed Biosharing community will be linking up resources to promote real-world data sharing (virtuous cycle of data) and supporting compliance with data policies through the creation of a one-stop-portal of information. Further information about the newly established BioSharing effort can be found at http://biosharing.org.
doi:10.4056/sigs/1403501
PMCID: PMC3035313
PMID: 21304729
Gaudet, Pascale | Bairoch, Amos | Field, Dawn | Sansone, Susanna-Assunta | Taylor, Chris | Attwood, Teresa K. | Bateman, Alex | Blake, Judith A. | Bult, Carol J. | Cherry, J. Michael | Chisholm, Rex L. | Cochrane, Guy | Cook, Charles E. | Eppig, Janan T. | Galperin, Michael Y. | Gentleman, Robert | Goble, Carole A. | Gojobori, Takashi | Hancock, John M. | Howe, Douglas G. | Imanishi, Tadashi | Kelso, Janet | Landsman, David | Lewis, Suzanna E. | Mizrachi, Ilene Karsch | Orchard, Sandra | Ouellette, B. F. Francis | Ranganathan, Shoba | Richardson, Lorna | Rocca-Serra, Philippe | Schofield, Paul N. | Smedley, Damian | Southan, Christopher | Tan, Tin Wee | Tatusova, Tatiana | Whetzel, Patricia L. | White, Owen | Yamasaki, Chisato
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
doi:10.1093/nar/gkq1173
PMCID: PMC3013734
PMID: 21097465
Rocca-Serra, Philippe | Brandizi, Marco | Maguire, Eamonn | Sklyar, Nataliya | Taylor, Chris | Begley, Kimberly | Field, Dawn | Harris, Stephen | Hide, Winston | Hofmann, Oliver | Neumann, Steffen | Sterk, Peter | Tong, Weida | Sansone, Susanna-Assunta
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org
Contact: isatools@googlegroups.com
doi:10.1093/bioinformatics/btq415
PMCID: PMC2935443
PMID: 20679334
Brinkman, Ryan R | Courtot, Mélanie | Derom, Dirk | Fostel, Jennifer M | He, Yongqun | Lord, Phillip | Malone, James | Parkinson, Helen | Peters, Bjoern | Rocca-Serra, Philippe | Ruttenberg, Alan | Sansone, Susanna-Assunta | Soldatova, Larisa N | Stoeckert, Christian J | Turner, Jessica A | Zheng, Jie
Background
Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval.
Results
The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI.
Conclusion
We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components.
Availability
OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl
doi:10.1186/2041-1480-1-S1-S7
PMCID: PMC2903726
PMID: 20626927
van Ommen, Ben | Bouwman, Jildau | Dragsted, Lars O. | Drevon, Christian A. | Elliott, Ruan | de Groot, Philip | Kaput, Jim | Mathers, John C. | Müller, Michael | Pepping, Fre | Saito, Jahn | Scalbert, Augustin | Radonjic, Marijana | Rocca-Serra, Philippe | Travis, Anthony | Wopereis, Suzan | Evelo, Chris T.
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure.
doi:10.1007/s12263-010-0167-9
PMCID: PMC2935528
PMID: 21052526
Nutritional phenotype; Nutrigenomics; Database
van Ommen, Ben | Bouwman, Jildau | Dragsted, Lars O. | Drevon, Christian A. | Elliott, Ruan | de Groot, Philip | Kaput, Jim | Mathers, John C. | Müller, Michael | Pepping, Fre | Saito, Jahn | Scalbert, Augustin | Radonjic, Marijana | Rocca-Serra, Philippe | Travis, Anthony | Wopereis, Suzan | Evelo, Chris T.
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure.
doi:10.1007/s12263-010-0167-9
PMCID: PMC2935528
PMID: 21052526
Nutritional phenotype; Nutrigenomics; Database
Smith, Barry | Ashburner, Michael | Rosse, Cornelius | Bard, Jonathan | Bug, William | Ceusters, Werner | Goldberg, Louis J | Eilbeck, Karen | Ireland, Amelia | Mungall, Christopher J | Leontis, Neocles | Rocca-Serra, Philippe | Ruttenberg, Alan | Sansone, Susanna-Assunta | Scheuermann, Richard H | Shah, Nigam | Whetzel, Patricia L | Lewis, Suzanna
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
doi:10.1038/nbt1346
PMCID: PMC2814061
PMID: 17989687
Whetzel, Patricia L. | Brinkman, Ryan R. | Causton, Helen C. | Fan, Liju | Field, Dawn | Fostel, Jennifer | Fragoso, Gilberto | Gray, Tanya | Heiskanen, Mervi | Hernandez-Boussard, Tina | Morrison, Norman | Parkinson, Helen | Rocca-Serra, Philippe | Sansone, Susanna-Assunta | Schober, Daniel | Smith, Barry | Stevens, Robert | Stoeckert, Christian J. | Taylor, Chris | White, Joe | Wood, Andrew
The development of the Functional Genomics Investigation Ontology (FuGO) is a collaborative, international effort that will provide a resource for annotating functional genomics investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. FuGO will contain both terms that are universal to all functional genomics investigations and those that are domain specific. In this way, the ontology will serve as the “semantic glue” to provide a common understanding of data from across these disparate data sources. In addition, FuGO will reference out to existing mature ontologies to avoid the need to duplicate these resources, and will do so in such a way as to enable their ease of use in annotation. This project is in the early stages of development; the paper will describe efforts to initiate the project, the scope and organization of the project, the work accomplished to date, and the challenges encountered, as well as future plans.
doi:10.1089/omi.2006.10.199
PMCID: PMC2783628
PMID: 16901226
Field, Dawn | Sansone, Susanna-Assunta | Collis, Amanda | Booth, Tim | Dukes, Peter | Gregurick, Susan K. | Kennedy, Karen | Kolar, Patrik | Kolker, Eugene | Maxon, Mary | Millard, Siân | Mugabushaka, Alexis-Michel | Perrin, Nicola | Remacle, Jacques E. | Remington, Karin | Rocca-Serra, Philippe | Taylor, Chris F. | Thorley, Mark | Tiwari, Bela | Wilbanks, John
doi:10.1126/science.1180598
PMCID: PMC2770171
PMID: 19815759
New ‘omics’ technologies are changing nutritional sciences research. They enable to tackle increasingly complex questions but also increase the need for collaboration between research groups. An important challenge for successful collaboration is the management and structured exchange of information that accompanies data-intense technologies. NuGO, the European Nutrigenomics Organization, the major collaborating network in molecular nutritional sciences, is supporting the application of modern information technologies in this area. We have developed and implemented a concept for data management and computing infrastructure that supports collaboration between nutrigenomics researchers. The system fills the gap between “private” storing with occasional file sharing by email and the use of centralized databases. It provides flexible tools to share data, also during experiments, while preserving ownership. The NuGO Information Network is a decentral, distributed system for data exchange based on standard web technology. Secure access to data, maintained by the individual researcher, is enabled by web services based on the the BioMoby framework. A central directory provides information about available web services. The flexibility of the infrastructure allows a wide variety of services for data processing and integration by combining several web services, including public services. Therefore, this integrated information system is suited for other research collaborations.
doi:10.1007/s12263-009-0123-8
PMCID: PMC2690731
PMID: 19408032
Nutrigenomics; Data management; Data integration; Distributed information system; Web services
Background
A wide variety of ontologies relevant to the biological and medical domains are available through the OBO Foundry portal, and their number is growing rapidly. Integration of these ontologies, while requiring considerable effort, is extremely desirable. However, heterogeneities in format and style pose serious obstacles to such integration. In particular, inconsistencies in naming conventions can impair the readability and navigability of ontology class hierarchies, and hinder their alignment and integration. While other sources of diversity are tremendously complex and challenging, agreeing a set of common naming conventions is an achievable goal, particularly if those conventions are based on lessons drawn from pooled practical experience and surveys of community opinion.
Results
We summarize a review of existing naming conventions and highlight certain disadvantages with respect to general applicability in the biological domain. We also present the results of a survey carried out to establish which naming conventions are currently employed by OBO Foundry ontologies and to determine what their special requirements regarding the naming of entities might be. Lastly, we propose an initial set of typographic, syntactic and semantic conventions for labelling classes in OBO Foundry ontologies.
Conclusion
Adherence to common naming conventions is more than just a matter of aesthetics. Such conventions provide guidance to ontology creators, help developers avoid flaws and inaccuracies when editing, and especially when interlinking, ontologies. Common naming conventions will also assist consumers of ontologies to more readily understand what meanings were intended by the authors of ontologies used in annotating bodies of data.
doi:10.1186/1471-2105-10-125
PMCID: PMC2684543
PMID: 19397794
Background
As the size and complexity of scientific datasets and the corresponding information stores grow, standards for collecting, describing, formatting, submitting and exchanging information are playing an increasingly active role. Several initiatives occupy strategic positions in the international scenario, both within and across domains. However, the job of harmonising reporting standards is still very much a work in progress; both software interoperability and the data integration remain challenging as things stand.
Results
The status quo with respect to standardization initiatives is summarized here, with particular emphasis on the motivation for, and the challenges of, ongoing synergistic activities amongst the academic community focused on the creation of truly interoperable standards.
Conclusions
Groups generating standards should engage with ongoing cross-domain activities to simplify the integration of heterogeneous data sets to the greatest possible extent.
PMCID: PMC3041584
PMID: 21347181
Widespread use of microarrays has generated large amounts of data, the interrogation of the public microarray repositories, identifying similarities between microarray experiments is now one of the major challenges. Approaches using defined group of genes, such as pathways and cellular networks (pathway analysis), have been proposed to improve the interpretation of microarray experiments. We propose a novel method to compare microarray experiments at the pathway level, this method consists of two steps: first, generate pathway signatures, a set of descriptors recapitulating the biologically meaningful pathways related to some clinical/biological variable of interest, second, use these signatures to interrogate microarray databases. We demonstrate that our approach provides more reliable results than with gene-based approaches. While gene-based approaches tend to suffer from bias generated by the analytical procedures employed, our pathway based method successfully groups together similar samples, independently of the experimental design. The results presented are potentially of great interest to improve the ability to query and compare experiments in public repositories of microarray data. As a matter of fact, this method can be used to retrieve data from public microarray databases and perform comparisons at the pathway level.
doi:10.1371/journal.pone.0004128
PMCID: PMC2610483
PMID: 19125200
Parkinson, Helen | Kapushesky, Misha | Kolesnikov, Nikolay | Rustici, Gabriella | Shojatalab, Mohammad | Abeygunawardena, Niran | Berube, Hugo | Dylag, Miroslaw | Emam, Ibrahim | Farne, Anna | Holloway, Ele | Lukk, Margus | Malone, James | Mani, Roby | Pilicheva, Ekaterina | Rayner, Tim F. | Rezwan, Faisal | Sharma, Anjan | Williams, Eleanor | Bradley, Xiangqun Zheng | Adamusiak, Tomasz | Brandizi, Marco | Burdett, Tony | Coulson, Richard | Krestyaninova, Maria | Kurnosov, Pavel | Maguire, Eamonn | Neogi, Sudeshna Guha | Rocca-Serra, Philippe | Sansone, Susanna-Assunta | Sklyar, Nataliya | Zhao, Mengyao | Sarkans, Ugis | Brazma, Alvis
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
doi:10.1093/nar/gkn889
PMCID: PMC2686529
PMID: 19015125
Rayner, Tim F | Rocca-Serra, Philippe | Spellman, Paul T | Causton, Helen C | Farne, Anna | Holloway, Ele | Irizarry, Rafael A | Liu, Junmin | Maier, Donald S | Miller, Michael | Petersen, Kjell | Quackenbush, John | Sherlock, Gavin | Stoeckert, Christian J | White, Joseph | Whetzel, Patricia L | Wymore, Farrell | Parkinson, Helen | Sarkans, Ugis | Ball, Catherine A | Brazma, Alvis
Background
Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support.
Results
We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion.
Conclusion
MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
doi:10.1186/1471-2105-7-489
PMCID: PMC1687205
PMID: 17087822
Background
Incorporation of ontologies into annotations has enabled 'semantic integration' of complex data, making explicit the knowledge within a certain field. One of the major bottlenecks in developing bio-ontologies is the lack of a unified methodology. Different methodologies have been proposed for different scenarios, but there is no agreed-upon standard methodology for building ontologies. The involvement of geographically distributed domain experts, the need for domain experts to lead the design process, the application of the ontologies and the life cycles of bio-ontologies are amongst the features not considered by previously proposed methodologies.
Results
Here, we present a methodology for developing ontologies within the biological domain. We describe our scenario, competency questions, results and milestones for each methodological stage. We introduce the use of concept maps during knowledge acquisition phases as a feasible transition between domain expert and knowledge engineer.
Conclusion
The contributions of this paper are the thorough description of the steps we suggest when building an ontology, example use of concept maps, consideration of applicability to the development of lower-level ontologies and application to decentralised environments. We have found that within our scenario conceptual maps played an important role in the development process.
doi:10.1186/1471-2105-7-267
PMCID: PMC1524992
PMID: 16725019