IntAct strives to provide users with data featuring a high level of structured annotation (i.e. as far as available in the publication). The ways to achieve this goal are manifold:
IntAct makes extensive use of ontologies to represent experimental conditions as well as general concepts such as databases or interactor types and thus enforces data integrity and provides a powerful means for searching data. IntAct mainly uses the ontologies of the PSI-MI standard for molecular interactions ().
Major categories of controlled vocabulary in IntAct
Mapping of biological objects
Interacting molecules are systematically mapped to stable identifiers from public databases such as UniProtKB (4
) for proteins, ChEBI (5
) and the DDBJ/EMBL/GenBank (6
) nucleotide databases for nucleic acids for small molecules. This is a highly time consuming part of the curation process but it is also crucial to ensure precision and comparability of the data. In cases where the authors give sequence information when describing a feature such as an interacting residue or binding site, this is mapped back to the parent sequence (or, when possible, the appropriate isoform) in UniProtKB. In cases where sequence information is not given, e.g. when identification is made by antibody detection, it is assumed that the authors annotation is correct however maintaining within IntAct an association between both the interaction and the corresponding descriptions of both the interaction and participant detection methods allows the user to make their own assessment of the accuracy of this data. When mapping high throughput datasets, there is often a small proportion of participants which cannot be traced due to the instability of the identifier used. Protein are remapped to UniProtKB, to allow use of their versioning and archiving services to maintain mappings and author identifiers are retained and revisited to attempt to improve coverage upto 100%.
Over the years, we have written and maintained a very detailed curation manual explaining how IntAct records are being curated. This manual is publicly available from the IntAct home page.
All records are manually annotated by domain experts, using the curation manual as a reference guide. Every record is then cross-checked by a second curator.
By studying the record produced over time, a set of recurrent data consistency issues has been identified. Computational checking for these cases is performed on a nightly basis. Curators are sent reports and requested to amend the records concerned.
Authors of publications reporting molecular interaction data are encouraged to submit the interaction data to IntAct prior to publication. On finalization of the record, we will issue a public accession number that can be referred to in the manuscript. However, the data will only be released on publication of the manuscript or on explicit request of the data submitter. For details of submission methods and formats, please refer to the deposition page of the International Molecular intraction Exchange (IMEx) consortium of molecular interaction databases at http://imex.sf.net
IntAct increasingly collaborates with partners on specific curation topics, either performing targeted curation for collaborators, or providing a private instance of IntAct as well as infrastructure and support for curation project by external partners. If you are interested in either of these, please contact intact-help/at/ebi.ac.uk. IntAct data is released on a weekly basis and is available on the web site as well as for download in PSI-MI 1.0 and 2.5 XML format (classified by organism and publication).