The development of high-throughput genomic and post-genomic (hereafter, ‘omics’) technologies entails changes in the handling, processing and sharing of data (Schofield et al.
). Omics datasets are often complex and rich in context. Studies may run material through several kinds of assay, using both omics and other technologies; for example, studying the effect of a compound on rat liver through transcriptome, proteome and metabolome profiling (using high-throughput sequencing and two kinds of mass spectrometry, respectively) alongside conventional analyses (e.g. histopathology). Such data must be accompanied by enough contextual information (i.e. metadata; sample characteristics, technology and measurement types; instrument parameters and sample-to-data relationships) to make datasets comprehensible and reusable if they are to underpin future investigations.
Many funders and journals require that researchers share data, and encourage the enrichment and standardization of experimental metadata (Field et al.
). Consequently, more and richer studies are flowing into public databases. However, two bottlenecks can significantly hamper this process, necessitating urgent solutions. First, international public repositories for ‘omics data such as GEO (Barrett et al.
), ArrayExpress (Parkinson et al.
), PRIDE (Vizcaíno et al.
), ENA, SRA and DRA (Shumway et al.
), have their own submission formats, data models and terminologies, created for specific types of assay. This complicates the submission process for researchers producing multi-assay studies (and greatly increases the risk that these datasets become irrevocably fragmented). Secondly, the shortage of curators to check and annotate submissions to public repositories—a situation unlikely to change soon—necessitates better annotation at source (by experimentalists or community-based efforts; Howe et al.
). Free software, with automated content validation, is required to facilitate the collection, management and curation of a variety of study inhouse, and to format those data for submission to public repositories. Such software should support community-defined reporting standards, such as the minimum information checklists listed by the MIBBI Portal (Taylor et al.
), and ontologies, (Côté et al.
; Smith et al.
; Noy et al.
The Investigation/Study/Assay (ISA) infrastructure described here is the first general-purpose format and freely available desktop software suite designed to regularize local management of experimental metadata by enabling curation at source, supporting community-defined reporting standards and preparing studies for submission to public repositories.