|Home | About | Journals | Submit | Contact Us | Français|
MatrixDB (http://matrixdb.ibcp.fr) is a freely available database focused on interactions established by extracellular proteins and polysaccharides. Only few databases report protein–polysaccharide interactions and, to the best of our knowledge, there is no other database of extracellular interactions. MatrixDB takes into account the multimeric nature of several extracellular protein families for the curation of interactions, and reports interactions with individual polypeptide chains or with multimers, considered as permanent complexes, when appropriate. MatrixDB is a member of the International Molecular Exchange consortium (IMEx) and has adopted the PSI-MI standards for the curation and the exchange of interaction data. MatrixDB stores experimental data from our laboratory, data from literature curation, data imported from IMEx databases, and data from the Human Protein Reference Database. MatrixDB is focused on mammalian interactions, but aims to integrate interaction datasets of model organisms when available. MatrixDB provides direct links to databases recapitulating mutations in genes encoding extracellular proteins, to UniGene and to the Human Protein Atlas that shows expression and localization of proteins in a large variety of normal human tissues and cells. MatrixDB allows researchers to perform customized queries and to build tissue- and disease-specific interaction networks that can be visualized and analyzed with Cytoscape or Medusa.
The extracellular matrix is comprised of proteins and complex polysaccharides that are organized in a tissue-specific manner. Major components of the extracellular matrix are collagens [~30% of proteins in humans; (1)], elastic fibers, proteoglycans and glycosaminoglycans. Several extracellular protein families (e.g. collagens, laminins and thrombospondins) form stable multimers in their native state, the multimers being comprised of either identical or different polypeptide chains. The extracellular matrix provides a structural scaffold contributing to the mechanical properties of tissues (2), and is a reservoir of bioactive fragments, called matricryptins, that are released upon limited proteolysis. These fragments exhibit biological and biomolecular recognition properties of their own and regulate a number of physiological and pathological processes including angiogenesis and tumor growth (3). The cohesion of the extracellular matrix is maintained by an intricate interaction network of protein–protein and protein–glycosaminoglycan interactions. These interactions are involved in the formation of supramolecular assemblies such as collagen fibrils and elastic fibers, in tissue architecture, and in cell-matrix interactions that regulate cell growth and behavior. The perturbation of the extracellular interaction network by mutations in genes coding for extracellular proteins lead to several diseases ranging from mild to severe phenotypes [e.g. osteogenesis imperfecta; (4)].
Interactions involving extracellular proteins are poorly represented in existing databases, and protein–glycosaminoglycan interactions are almost absent from databases although they contribute to the structural organization of the extracellular matrix, to the sequestration of growth factors and chemokines within the extracellular matrix, and to signalling at the cell surface (5). Furthermore, interactions involving multimers, which are frequent in the extracellular matrix (collagens, laminins, thrombospondins are trimers), are often reported as interactions established by individual polypeptide chains. This is a concern especially when molecules are heteromultimers. The above reasons prompted us to build an interaction database focused on interactions occurring between extracellular biomolecules [http://matrixdb.ibcp.fr; (6)]. The database has been updated to include additional interaction data, comprehensive extracellular interaction datasets (e.g. the elastic fiber interactome, extracellular interactions of leucine-rich repeat receptors), and new functionalities. MatrixDB is focused on mammalian molecules, but interaction data of a model organism (zebrafish) has been integrated in the updated database. MatrixDB provides direct links to Online Mendelian Inheritance in Man (OMIM), to databases recapitulating data on mutations occurring in genes encoding extracellular proteins, to UniGene and to the Human Protein Atlas that shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines. MatrixDB allows researchers to perform customized queries and to build tissue- and disease-specific interaction networks.
We have imported protein data from the UniProtKB/Swiss-Prot knowledgebase (7), and used UniProtKB accession numbers for proteins. We have created specific identifiers for multimers such as collagens, laminins, thrombospondins and integrins using the following format: MULT_x_species (e.g. MULT_3_human for human collagen I). These entries refer to the UniProtKB accession numbers of their constituent polypeptide chains. Complexes corresponding to stable multimers have been created by the IntAct database (European Bioinformatics Institute, UK) (e.g. EBI-2325312 for human collagen I), and MatrixDB identifiers are cross-referenced to these complexes. Protein isoforms are identified by a variant number (VARy), and the full MatrixDB identifier becomes MULT_x_VARy_species (e.g. MULT_4_VAR1_human). Matricryptins are identified as PFRAG_x_species and are cross-referenced to the feature identifier of UniProtKB. For example, the MatrixDB identifier of endostatin, a C-terminal fragment of collagen XVIII, is PFRAG_1_human and it is cross-referenced to the UniProtKB feature identifier PRO_0000005794. Glycosaminoglycans (GAG_x), lipids (LIP_x) and cations (CAT_x) are cross-referenced to ChEBI and KEGG compound databases (8, 9). Besides protein–protein and protein–glycosaminoglycan interactions, MatrixDB reports interactions involving cations (mostly calcium) and lipids because a number of extracellular molecules bind to cations and some of them to lipids. Detailed information on each molecule is displayed on the ‘Biomolecule Report Page’.
MatrixDB is an active member of the International Molecular Exchange (IMEx) consortium (10) and is in charge of the curation of papers published in Matrix Biology, a journal focused on the extracellular matrix, since January 2009. MatrixDB has adopted the PSI-MI standards for annotating and exchanging interaction data. Interaction data stored in MatrixDB are (i) experimentally determined in the laboratory using surface plasmon resonance (SPR) binding assays, including protein and glycosaminoglycan arrays probed by SPR imaging (11), (ii) extracted from the literature by manual curation and (iii) imported from other interaction databases belonging to the IMEx consortium [IntAct (12), DIP (13), MINT (14), BioGRID (15)], as well as from the Human Protein Reference Database (16). Imported data are restricted to interactions involving at least one extracellular protein. The extracellular proteins are identified using UniProtKB/Swiss-Prot keywords and Gene Ontology (17), complemented with manual annotations when required. The text files containing known extracellular human proteins, membrane human proteins and secreted human proteins can be freely downloaded from the download page of MatrixDB. Our curation process has followed the MIMIx guidelines [Minimum Information about a Molecular Interaction experiment; (18)] and has been updated to adhere to the IMEx curation rules in 2010. Interaction data curated by MatrixDB are freely available for download in the PSI-MI XML and TAB 2.5 formats (19).
Mammalian interaction data refer to human molecules in order to easily display the list of partners of a given molecule on the ‘Biomolecule Report’ page (cf. the schematic organization of MatrixDB, Figure 1). Clicking on an interaction gives access to the ‘Interaction Report’ page where the source of the data (name of the database) and the experiments supporting the interaction are listed along with links to the abstracts of the corresponding papers. The species experimentally used to demonstrate the interaction are indicated on the ‘Experiment Report’ page with a detailed report of the experiment according to MIMIx or IMEX standards (e.g. interaction detection method, partner detection method, biological and experimental roles of partners, binding sites, kinetics, and affinity when available). MatrixDB is focused on mammalian interactions, but a comprehensive extracellular interaction dataset (69 interactions) of zebrafish has been imported (20,21). We have also curated a recent dataset of the elastic fiber interactome (45 interactions) identified by affinity purification and mass spectrometry (22), and the interactions (~30) established by SPARC, an extracellular protein involved in a number of biological processes. The current release of MatrixDB contains 2174 extracellular matrix interactions including 1836 protein–protein and 119 protein–glycosaminoglycan interactions. We have curated 490 interactions, and 847 experiments from 192 articles, the other interaction data being imported from several databases (Figure 1). Statistics are available on the ‘Statistics’ page of MatrixDB.
The ‘Biomolecule Report’ page contains a direct link to data from the Human Protein Atlas that shows the expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines but is not available for downloading (23). We have imported UniGene expressed sequence tag profiles that reflect approximate expression patterns in tissues [http://www.ncbi.nlm.nih.gov/unigene; (24)] in order to create tissue-specific interaction networks.
We have also added on the Biomolecule Report page a link to databases recapitulating data on mutations occurring in the gene encoding the extracellular protein, including the osteogenesis imperfecta consortium [http://oiprogram.nichd.nih.gov/consortium.html; (25)], a database of osteogenesis imperfecta and Ehlers-Danlos syndrome variants [http://www.le.ac.uk/ge/collagen; (26,27)], and to COLdb, a database linking genetic data to molecular function in fibrillar collagens [http://collagen.stanford.edu/; (28)]. On the ‘Biomolecule Report’ page, and when appropriate, there is a link to the OMIM database of human genes and genetic disorders [http://www.ncbi.nlm.nih.gov/omim; (29)]. These data are used to build disease-specific interaction networks.
Links to individual extracellular interaction datasets are available on the homepage of MatrixDB. They include the map of candidate cell and matrix interaction domains on the human type I collagen fibril (30), the endostatin interaction network established in our laboratory (11), the elastic fiber interaction network (22) and the cell surface interaction network of neural leucine-rich repeat receptors identified in zebrafish (20,21). Comprehensive extracellular interaction datasets will be curated on a regular basis.
Two types of searches are offered by default. ‘Biomolecule category’ displays all the human molecules in a category (protein, glycosaminoglycan, fragment, lipid, cation and inorganic compound). Searching by ‘Biomolecule name’ can be performed with the biomolecule or gene name or with its UniprotKB and ChEBI accession number or MatrixDB identifier. Three other types of queries are available in the ‘Advanced Search’: free text search, search by PubMed identifier and dataset search. The dataset search displays all the interactions provided by a given database (IntAct, MINT, DIP, BioGRID and MatrixDB), or those reported in specific papers (11,20–22). Detailed data is displayed when a molecule is selected, and links are provided to access further information within MatrixDB or on external websites. For example, UniGene EST profiles or OMIM disease data associated with the gene coding for the protein(s) of interest are provided when available. A list of the protein partners is displayed with the number of experiments reporting each interaction. An interaction can be selected to examine these supporting experiments, and an experiment can be selected to access to kinetics, affinity, binding site and the experimental species.
Several options are available for building customized networks. The user can create (i) the entire network of interactions involving at least one extracellular partner, combined or not with interactions established by membrane and secreted molecules, (ii) the interaction network of proteins annotated with user-selected UniProtKB keywords, (iii) the interaction network of one or several molecule(s), including or not the interactions of its (their) partners, (iv) tissue-specific interaction networks (Figure 2) and (v) disease-specific interaction networks. The building of tissue-specific interaction networks is based on expression data imported from UniGene. One or several tissues can be selected and a threshold (minimum number of transcripts per million present in the tissue) can be defined to keep only interactions established by proteins expressed above this threshold in the selected tissues. An option restricts the interactions to those where the partners are specifically expressed in one or several selected tissues. This function allows the identification of tissue-specific partners. It is also possible to build disease-specific interaction networks, based on OMIM identifiers.
MatrixDB is a database providing interaction data involving extracellular proteins and glycosaminoglycans and interactions established by these two major constituents of the extracellular matrix with cations and lipids. Building the extracellular interactome is a prerequisite to delineate the molecular mechanisms underlying the assembly of the extracellular matrix and to understand how genetic diseases interfere with this process. Future releases will also include interaction data imported from the databases that will join the IMEx consortium. MatrixDB will increase its coverage by curation of interactions involving (i) matrix metalloproteinases and their inhibitors, which play a major role in tissue remodelling (links to the MEROPS database (33) will be provided), (ii) the adhesive matrix molecule family (microbial surface components recognizing adhesive matrix molecules, MSCRAMMs) responsible for the interaction of pathogens with the extracellular matrix (34) and (iii) other proteins and sugars of pathogens.
This work was supported by a CPER grant from the Région Rhône-Alpes; by Institut des Systèmes Complexes (IXXI 2010); and by the EU FP7 ‘PSIMEx’ grant (contract number FP7-HEALTH-2007-223411). Funding for open access charge: EU FP7 ‘PSIMEx’ grant (contract number FP7-HEALTH-2007-223411).
Conflict of interest statement. None declared.
We would like to thank Christophe Blanchet (UMR 5086, Lyon, France) for helping us to install the MatrixDB server, Samuel Kerrien and Bruno Aranda (EBI, Hinxton, UK) for their help regarding data format exchange and Sandra Orchard (EBI, Hinxton, UK) for guiding us through the curation process.