|Home | About | Journals | Submit | Contact Us | Français|
The Biological General Repository for Interaction Datasets (BioGRID) database (http://www.thebiogrid.org) was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.
Protein interactions underlie cell structure, biochemical activity and dynamic behavior; in turn, myriad genetic interactions reflect the vast functional interconnectivities of the protein network (1). High-throughput technologies now generate large datasets of protein and genetic interactions, which compliment more conventional detailed investigations of cellular processes (2). The collation of various types of interaction data is essential for interrogation of system-level attributes (3), and to this end a number of important interaction databases have been developed (4–8). Previously, we described a database called ‘Biological General Repository for Interaction Datasets’ (BioGRID) (www.thebiogrid.org) to archive and distribute comprehensive collections of physical and genetic interactions (9).
The BioGRID has grown into a general resource for the research community with an average of 80 000 queries per month and millions of interactions downloaded per year. The 1 October 2007 version of BioGRID (v2.0.33) contains 198 791 (129 584 non-redundant) interaction records comprised of 137 834 (90 577 non-redundant) protein interactions and 60 957 (39 007 non-redundant) genetic interactions (Table 1). BioGRID provides full annotation support for 13 major model organism species (9), and currently houses interactions for Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens. Gene annotation tables for all supported species are routinely updated to prevent ambiguous search results. Sources of data in BioGRID include publications that report high-throughput interaction datasets and many focused individual studies curated from inspection of the primary literature (2). Each interaction record in BioGRID contains experimental evidence codes and is linked to the supporting publication. In addition to the BioGRID website, all interactions in BioGRID are available through the dynamically linked Osprey visualization system, which can be used to query network organization in a user-defined fashion (10). BioGRID currently holds observer status in the IMEx consortium of interaction databases (http://imex.sourceforge.net/).
Comprehensive manual curation of the entire S. cerevisiae literature for protein and genetic interactions yielded 35 224 (21 281 non-redundant) protein interactions and 19 172 (13 963 non-redundant) genetic interactions (2), all of which are accessible through BioGRID. Comparison of the literature-curated protein interaction dataset to recent high-throughput studies (11–13) reveals a considerable degree of non-overlap, suggesting that many interactions remain to be validated and discovered in this yeast (Figure 1). We have since continued to curate the current S. cerevisiae literature and have added 29 575 (17 017 non-redundant) protein and 27 994 (19 391 non-redundant) genetic interactions to BioGRID since the original curation effort. Updates to BioGRID are made on the first of every month; each release of BioGRID is date stamped and archived for comparative purposes. All S. cerevisiae interactions deposited in BioGRID are immediately imported into the Saccharomyces Genome Database (SGD) with associated citations and evidence codes (14). Additional interaction attributes, including post-translational modifications associated with protein interactions and specific phenotypes associated with each genetic interaction, are currently being annotated and will be released for the entire S. cerevisiae dataset in the near future.
To complement the comprehensive S. cerevisiae dataset, we have recently completed exhaustive manual curation of the S. pombe literature. Interactions were classified based on the same experimental evidence codes for protein and genetic interactions used previously (2). This effort yielded 2631 (1209 non-redundant) protein interactions and 2275 (1769 non-redundant) genetic interactions, as derived from 1077 publications. This new dataset has recently been deposited in BioGRID and, as for S. cerevisiae, will be updated on a monthly basis and provided to the S. pombe genome database (GeneDB) currently hosted by the Sanger Institute at www.genedb.org/genedb/pombe/ (15). Comparison of orthologous interactions between these evolutionary distant yeasts should prove informative for biological network structure and function. Imminent high-throughput studies in S. pombe should rapidly elaborate the cellular interaction network in this organism (16–18).
In addition to systematic yeast curation, we have also undertaken partial interaction curation for higher species, for example, D. melanogaster and H. sapiens (Table 1). These curation efforts are often focused on specific aspects of biology and are in part guided by Gene Ontology inference codes for protein and genetic interactions (19) and the Textpresso text-mining algorithm (20). Other species interactions are added to BioGRID on an ongoing basis and, when available, released in monthly BioGRID updates. Contributions of curated interaction datasets from any species for deposition in the BioGRID are welcomed (www.thebiogrid.org).
We have expanded accessibility to interaction data in BioGRID via a primary mirror site at the SGD colony in Princeton (http://grid.princeton.edu/). In addition, source code for BioGRID and Osprey has been made available without restriction at SourceForge. BioGRID data files are currently linked to SGD, Flybase and NCBI, to which we provide automatic monthly updates. Analogous relationships are underway with the Arabidopsis Information Resource (TAIR) and S. pombe GeneDB (15,21). We will endeavor to fulfill all requests for custom datasets for export to other model organism databases; the download page at BioGRID contains examples of existing datasets created for export.
The tabular user interface of BioGRID has been improved through implementation of AJAX techniques. The interface now provides the option to narrow search results to quantitative datasets; this feature will soon be elaborated to enable user-defined search criteria according to data type, evidence codes and data source. The ability to expand hidden fields with a single mouse click to provide greater detail, such as for Gene Ontology classifications (23), has also been added. Search results now include bait and hit designations to indicate the directionality of interactions. Additional annotation features including phenotype, post-translational modification, domains and motifs are currently under construction.
We have implemented an interaction management system (IMS) to support multiple simultaneous curators for each species supported by BioGRID. The IMS is a multiuser web-based application written in PHP that interfaces directly with the BioGRID. An intuitive graphical interface allows curators to quickly record interactions from an automatically updated list of publications. All interactions added via the IMS are verified against current annotation tables to eliminate errors and ambiguity in gene nomenclature. The IMS instantly commits new interactions to the BioGRID update pipeline, unless specified otherwise; interactions are collated each month and released as updates to the primary BioGRID and mirror sites, as well as model organism collaboration sites. Interactions may also be removed or modified in each monthly build, for example, in response to community feedback. All retired datasets are archived on the BioGRID downloads page in case the need for back-comparison arises.
We will continue to curate interactions from major model organism species, with a view to comprehensive back-curation, as we have done for S. cerevisiae and S. pombe. Further refinement of tools and display features in the BioGRID graphical user interface based on a flexible record tag structure will enable greater control over data views and downloadable datasets by the user. New plugins for data visualization are under development for Osprey, Cytoscape (24) and the Edinburgh Pathway Editor (25), which will allow more sophisticated interrogation of interaction networks. In order to facilitate dissemination of our open source software tools, we will strive for compatibility with the Generic Model Organism Database (GMOD) project (26). Finally, we will continue to develop our record structures in compliance with the Proteomics Standards Initiative Molecular Interactions (PSI-MI) standard (27,28).
We thank Gary Bader, Sue Rhee, Michael Cherry and David Botstein for helpful discussions; Eurie Hong and Benjamin Hitz for support at SGD; Rachel Drysdale and Don Gilbert for assistance in parsing interactions from Flybase; and Nevan Krogan, Jef Boeke, Tim Hughes and Charlie Boone for pre-publication release of large-scale datasets. M.T. was supported by a Canada Research Chair in Functional Genomics and Bioinformatics, a Howard Hughes Medical Institute International Scholar Award and a Royal Society Wolfson Research Merit Award. D.H.L. and J.B. were supported by Cancer Research UK and V.W. was supported by the Wellcome Trust. This work was supported by a Canadian Institutes of Health Research grant (GSP-36651 to M.T.) and a NIH National Center for Research Resources grant (1R01RR024031-01 to M.T. and K.D.). Funding to pay the Open Access publication charges for this article was provided by the NIH.
Conflict of interest statement. None declared.