In addition to integrating processed data from multiple array platforms in the ArrayExpress Atlas, we have also performed a per platform integration using a re-annotation, data quality assessment and re-normalization approach. A large data set of more than 5000 hybridizations and 370 different biological conditions on the Affymetrix U133A platform is now available. The meta-analyses indicate that despite these data originating from multiple laboratories, the biological signal in these data is significantly stronger than the laboratory effects and new biological insights can be obtained from this approach (Lukk et al., manuscript in preparation). All raw and normalized data are available for this dataset (accession number E-TABM-185). Similar data sets have been produced for the mouse Affymetrix platforms U74Av2, MOE430A and 430 2.0 (accession numbers E-MTAB-26, E-MTAB-27, E-MTAB-28).
All ArrayExpress data are now available for download in MAGE-TAB format. To aid bioinformaticians and other users interested in large-scale functional genomics analysis, a Bioconductor package called ArrayExpress (http://www.bioconductor.org/packages/2.3/bioc/html/ArrayExpress.html
) has been developed in collaboration with the Huber Group (EBI). This package allows the direct import of MAGE-TAB files into Bioconductor as native ExpressionSet objects, compatible with existing Bioconductor data analysis and visualization modules for this environment.
The ArrayExpress Atlas, Repository and Warehouse have web service APIs, enabling programmatic queries. ArrayExpress can also be queried along with all EBI core databases via the EBI general query interface ‘EB-eye’. The ArrayExpress submission tools, MIAMExpress and Tab2MAGE, are undergoing continuous improvement to facilitate submissions of large experiments, to work with MAGE-TAB files, and to accept UHTS-based transcriptomics data.
To improve queries of the ArrayExpress Atlas, we have developed an application ontology called the Experimental Factor Ontology (EFO). EFO version 0.6 (16
) currently contains 1078 terms in an ‘is-a’ and ‘part-of’ hierarchy including diseases, multi-species anatomy, compounds and cell-type terms. It maps to several non-orthogonal ontologies, such as those for human anatomy, the Disease Ontology (17
), the Cell Type Ontology (18
) and the NCI Thesaurus (10
). Use of the EFO allows tuning of the ontology based on analysis of user queries and provision of annotation at an appropriate level of granularity for the database content. The EFO is deployed in the Atlas interface where queries can be expanded via the hierarchies. For example, a query for the condition ‘cancer’ will also retrieve conditions ‘sarcoma’, ‘carcinoma’ and other cancers. The EFO is available from the EBI Ontology Lookup Server-OLS (19
) and is available in OBO and OWL formats (http://www.ebi.ac.uk/microarray-srv/efo/