Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)
Year of Publication
1.  BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences 
BioSharing ( is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.
Database URL:
PMCID: PMC4869797  PMID: 27189610
2.  linkedISA: semantic representation of ISA-Tab experimental metadata 
BMC Bioinformatics  2014;15(Suppl 14):S4.
Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax.
We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication.
Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
PMCID: PMC4255742  PMID: 25472428
3.  The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again 
BMC Bioinformatics  2014;15(Suppl 1):S11.
The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment.
The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data.
The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking.
Software availability
The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub, where the issue tracker allows users to report bugs or feature requests.
PMCID: PMC4015122  PMID: 24564732
4.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data 
Nucleic Acids Research  2013;42(Database issue):D600-D606.
Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource ( that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.
PMCID: PMC3965009  PMID: 24165880
5.  The MetaboLights repository: curation challenges in metabolomics 
MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions.
Database URL:
PMCID: PMC3638156  PMID: 23630246
6.  OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets 
Bioinformatics  2012;29(4):525-527.
Motivation: Data collection in spreadsheets is ubiquitous, but current solutions lack support for collaborative semantic annotation that would promote shared and interdisciplinary annotation practices, supporting geographically distributed players.
Results: OntoMaton is an open source solution that brings ontology lookup and tagging capabilities into a cloud-based collaborative editing environment, harnessing Google Spreadsheets and the NCBO Web services. It is a general purpose, format-agnostic tool that may serve as a component of the ISA software suite. OntoMaton can also be used to assist the ontology development process.
Availability: OntoMaton is freely available from Google widgets under the CPAL open source license; documentation and examples at:
PMCID: PMC3570217  PMID: 23267176
7.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data 
Nucleic Acids Research  2012;41(Database issue):D781-D786.
MetaboLights ( is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
PMCID: PMC3531110  PMID: 23109552
8.  MetaboLights: towards a new COSMOS of metabolomics data management 
Metabolomics  2012;8(5):757-760.
Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata ( It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6–8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.
PMCID: PMC3465651  PMID: 23060735
Metabolomics; Databases; ISA-Tab; ISA commons
9.  Toward interoperable bioscience data 
Nature genetics  2012;44(2):121-126.
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
PMCID: PMC3428019  PMID: 22281772
10.  The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons 
Nucleic Acids Research  2011;40(Database issue):D984-D991.
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)—an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at
PMCID: PMC3245064  PMID: 22121217
11.  Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop 
Standards in Genomic Sciences  2010;3(3):259-266.
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at
PMCID: PMC3035314  PMID: 21304730
12.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level 
Bioinformatics  2010;26(18):2354-2356.
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at
PMCID: PMC2935443  PMID: 20679334
13.  Towards Interoperable Reporting Standards for Omics Data: Hopes and Hurdles 
As the size and complexity of scientific datasets and the corresponding information stores grow, standards for collecting, describing, formatting, submitting and exchanging information are playing an increasingly active role. Several initiatives occupy strategic positions in the international scenario, both within and across domains. However, the job of harmonising reporting standards is still very much a work in progress; both software interoperability and the data integration remain challenging as things stand.
The status quo with respect to standardization initiatives is summarized here, with particular emphasis on the motivation for, and the challenges of, ongoing synergistic activities amongst the academic community focused on the creation of truly interoperable standards.
Groups generating standards should engage with ongoing cross-domain activities to simplify the integration of heterogeneous data sets to the greatest possible extent.
PMCID: PMC3041584  PMID: 21347181
14.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression 
Nucleic Acids Research  2008;37(Database issue):D868-D872.
ArrayExpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
PMCID: PMC2686529  PMID: 19015125

Results 1-14 (14)