|Home | About | Journals | Submit | Contact Us | Français|
The extracellular matrix (ECM) is a complex meshwork of cross-linked proteins providing both biophysical and biochemical cues that are important regulators of cell proliferation, survival, differentiation, and migration. We present here a proteomic strategy developed to characterize the in vivo ECM composition of normal tissues and tumors using enrichment of protein extracts for ECM components and subsequent analysis by mass spectrometry. In parallel, we have developed a bioinformatic approach to predict the in silico “matrisome” defined as the ensemble of ECM proteins and associated factors. We report the characterization of the extracellular matrices of murine lung and colon, each comprising more than 100 ECM proteins and each presenting a characteristic signature. Moreover, using human tumor xenografts in mice, we show that both tumor cells and stromal cells contribute to the production of the tumor matrix and that tumors of differing metastatic potential differ in both the tumor- and the stroma-derived ECM components. The strategy we describe and illustrate here can be broadly applied and, to facilitate application of these methods by others, we provide resources including laboratory protocols, inventories of ECM domains and proteins, and instructions for bioinformatically deriving the human and mouse matrisome.
The extracellular matrix (ECM)1 is a fundamental and important component of metazoan organisms providing architectural support and anchorage for the cells. The ECM consists of a complex meshwork of highly cross-linked proteins and exists as interstitial forms within organs and as specialized forms, such as basement membranes underlying epithelia, vascular endothelium, and surrounding certain other tissues and cell types (e.g. neurons, muscles). Cells adhere to the ECM via transmembrane receptors, among which integrins are the most prominent (1, 2). These cell-matrix interactions result in the stimulation of various signaling pathways controlling proliferation and survival, differentiation, migration, etc. The composition of the ECM and the repertoire of ECM receptors determine the responses of the cells. The biophysical properties of the ECM (deformability or stiffness) have also been shown to modulate these cellular functions (3, 4). In addition to core ECM components (fibronectins, collagens, laminins, proteoglycans, etc.), the ECM serves as a reservoir for growth factors and cytokines and ECM-remodeling enzymes that collaborate with ECM proteins to signal to the cells (5, 6). Hence, the ECM provides not only biophysical cues but also biochemical cues that regulate cell behavior. In addition to being important for normal development, alterations of the ECM have been associated with various pathologies such as fibrosis, skeletal diseases, and cancer (7–9) and it has been emphasized recently that the ECM proteome needs better characterization (10).
The role of the ECM in cancer is of particular interest. Long-standing as well as recent data implicate tumor ECM as a significant contributor to tumor progression. Indeed, the ECM is a major component of the tumor microenvironment (11, 12) and classical pathology has shown that excessive deposition of ECM is a common feature of tumors with poor prognosis. More recently, gene expression screens have revealed that many genes encoding ECM components and ECM receptors are dysregulated during tumor progression (13–16). Finally, modifications of the extracellular matrix architecture and biophysical properties have been shown to influence tumor progression (6, 17, 18). Despite these clear indications that tumor ECM and the interactions of cells with it are very likely to play important roles in tumor progression, we do not have a good picture of ECM composition, origins and functions in tumors. One reason for this lies in the biochemical properties of ECM proteins (large size, insolubility, cross-linking, etc.) that have rendered very challenging attempts to characterize systematically the composition of the ECM from tissues and tumors.
Thanks to the completion of the genomes of many species and to previous studies (19–21), it is now clear that vertebrate genomes contain hundreds of genes encoding ECM proteins. Specific features of ECM proteins have emerged from these studies, in particular their distinctive structures based on the repetition of conserved domains (22, 23). During the last few years, several attempts have been made at in silico predictions of the complement of ECM proteins (24–26). Furthermore, recent studies have begun to characterize experimentally the composition of the extracellular matrix of specific model systems such as retinal and vascular basement membranes (27–29), mammary gland (30, 31), and cartilage (32). However, there remains a pressing need for a better definition of the number and diversity of ECM proteins and even of what should be included in that definition. Limitations arise also from the lack of experimental reagents and approaches because of the biochemical intractability of ECM and the lack of an adequate library of antibodies or other probes to characterize ECM proteins in situ. Thus, deciphering the complexity of the extracellular matrix in vivo represents an important scientific challenge.
We describe here the development of proteomics-based methods coupled with a bioinformatic definition of the “matrisome” (ECM and ECM-associated proteins) to analyze the protein composition of the tissue extracellular matrix. We have successfully applied this strategy to characterize in detail the extracellular matrices both of normal murine tissues (lung and colon) and of melanoma tumors (nonmetastatic and metastatic), which each comprise well over 100 proteins. Moreover, we have applied this approach to understand the origins of tumor ECM proteins and have been able to show, using human into mouse xenograft models, that both tumor cells and stromal cells contribute in characteristic ways to the ECM of the tumor microenvironment. Furthermore, we show that both tumor and stromal cells contribute to significant changes in the extracellular matrices of tumors of differing metastatic potential. The strategy we describe and illustrate here can be broadly applied and we provide protocols and inventories of ECM domains and proteins to facilitate application of these methods by others.
Normal tissues were from 8- to 12-week-old FVB mice. Lungs were perfused by intracardiac injection with 3 ml of phosphate-buffered saline to remove blood. Colon segments were rinsed with phosphate-buffered saline to remove feces. A375 and MA2 human melanoma cell lines (16) were grown in HyClone high-glucose Dulbecco's modified Eagle's medium (Thermo Scientific) supplemented with 2 mm glutamine and 10% fetal bovine serum (Invitrogen, Carlsbad, CA). Eight-week-old NOD/SCID/IL2Rγ null (Jackson Laboratory, West Grove, PA) male mice were anesthetized using isoflurane (Abbott Laboratories, North Chicago, IL) and 5.105 cells were injected subcutaneously into the left flank of the mouse. Animals were sacrificed 5 weeks post-injection and the tumors were dissected, flash frozen and kept at −80 °C.
Sequential extractions of frozen samples of tissues or tumors were performed using the CNMCS (Cytosol/Nucleus/Membrane/Cytoskeleton) Compartmental Protein Extraction kit (Cytomol, Union City, CA) according to manufacturer's instructions. In brief, frozen tissues (150–200 mg) or tumors (200–j400 mg) were homogenized and extracted sequentially to remove (1) cytosolic proteins (2), nuclear proteins (3), membrane proteins (4), and cytoskeletal proteins leaving a final insoluble fraction enriched for ECM proteins. Fractions were separated on SDS-polyacrylamide gradient gels, transferred to nitrocellulose membranes and probed with antibodies to proteins characteristic of different subcellular compartments (see Fig. 1A and Extended Experimental Procedures).
ECM-enriched fractions were solubilized in urea, disulfide bonds reduced and alkylated, and proteins digested with PNGaseF, Lys-C, and trypsin. Solutions that began cloudy upon initial reconstitution were clear after overnight digestion. The resulting peptides were separated by off-gel electrophoresis (OGE) according to isoelectric point and by reversed-phase high-performance liquid chromatography followed by tandem mass spectrometry (MS/MS) on an LTQ Orbitrap mass spectrometer. Mass spectra were interpreted with SpectrumMill and annotated using the matrisome bioinformatics lists developed in this work. MS/MS spectra were searched against a UniProt database containing either mouse only or both mouse (53,448 entries) and human (78,369 entries) sequences; all sequences (including isoforms and excluding fragments) were downloaded from the UniProt web site on June 30, 2010. To each database a set of common laboratory contaminant proteins (73 entries) was appended. Peptides identified with a false discovery rate < 2.5% were assembled into identified proteins, and our in silico matrisome list was then used to categorize all of the identified proteins as being ECM derived or not. MS/MS spectra searches allowed for carbamidomethylation of cysteines and possible carbamylation of N termini as fixed/mix modifications. Allowed variable modifications were oxidized methionine, deamidation of asparagine, pyro-glutamic acid modification at N-terminal glutamine, and hydroxylation of proline with a precursor MH+ shift range of –18 to 97 Da. Hydroxyproline was only observed in the proteins known to have it (collagens and proteins containing collagen domains, emilins, etc.) and only within the expected GXPG sequence motifs. supplemental Tables S7 and S8 containing the detailed peptide spectral matches might have some examples not in the expected motif when there is either a proline near the motif for which the spectrum could have had insufficient fragmentation to confidently localize the mass change to a particular residue, or a nearby methionine in the peptide and the spectrum had insufficient fragmentation to localize the mass change to oxidized Met or hydroxyproline. When the motif nX[ST] occurs in a peptide in supplemental Tables S7 and S8, this is likely to indicate a site where N-linked glycosylation was removed by the PNGaseF treatment of the sample. Although a lowercase n indicates a gene-encoded asparagine residue detected in aspartic acid from, possible mechanisms of modification such as acid-catalyzed deamidation during sample processing versus enzymatic conversion during deglycosylation cannot be explicitly distinguished. Our automated database searching based interpretation of the MS/MS spectra did not attempt to detect any of the many known examples of crosslinking previously observed in collagen family proteins (33, 34) nor did our sample processing methods attempt to enrich for or deplete crosslinked peptides from the samples. Consequently, the spectra generated in this study may be a valuable resource to mine for sites and forms of collagen crosslinking. Additional detailed information can be found in the Extended Experimental Procedures. The raw LC-MS/MS data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash:
The human and mouse proteomes were each screened for proteins containing domains characteristic of ECM proteins, ECM-affiliated proteins, ECM modifiers and secreted factors. Those lists were subsequently screened to eliminate proteins that shared one or more of the defining domains but were not ECM or ECM-associated proteins based on other criteria. Detailed information can be found in the Extended Experimental Procedures. We have also deployed a webpage providing collection of resources (data files, sequence files) and further annotations on the bioinformatic pipeline developed for this study http://web.mit.edu/hyneslab/matrisome/.
Tumor samples were formalin-fixed and paraffin-embedded. Sections were dewaxed and rehydrated following standard procedures. Antigen retrieval was performed by incubating sections in boiling 10 mm sodium citrate buffer (pH6.0) for 20 min. Sections were then blocked with PBS containing 4% ovalbumin. Incubation with rabbit anti-HAPLN1 antibody (Sigma) was performed overnight at 4°C and secondary antibody incubation, 2 h at room temperature. Secondary goat-anti-rabbit antibody conjugated with Alexa-568 was from Invitrogen. Sections were counterstained with DAPI (4′,6-diamidino-2-phenylindole) to visualize nuclei.
Analysis of the protein composition of the extracellular matrix presents challenges due to the diversity, large size, insolubility and cross-linking of these proteins. By contrast, most other cellular components are soluble even at relatively low concentrations of salt or detergents. Therefore, we took advantage of the insolubility of ECM proteins to enrich for them while depleting other cellular components. We used a subcellular fractionation protocol to extract sequentially components from the cytosol, the nucleus, the membrane and the cytoskeleton and enrich for ECM proteins (see Extended Experimental Procedures). Fig. 1A shows the sequential extraction of proteins from the different cellular compartments, using diagnostic marker proteins for each compartment. ECM proteins such as fibronectin (as well as laminins and collagens, not shown) were not extracted during these intermediate steps and were found to be enriched in the final insoluble fraction.
To analyze the composition of the ECM-enriched fractions obtained after depletion of other cellular components, we digested the proteins to peptides and employed a proteomics pipeline shown in Fig. 1B using liquid chromatography combined with tandem mass spectrometry (LC-MS/MS) to identify peptides and proteins (see Extended Experimental Procedures). Analysis by LC-MS/MS of ECM proteins enriched from murine lung and digested to peptides confirmed a significant enrichment for matrix proteins, with more than 75% of the total precursor ion intensity (the sum of MS1 precursor ion peak areas for all identified peptides) corresponding to proteins defined as ECM (Fig. 2A, left panel). To help measure the success of our enrichment strategy and focus downstream biological follow-up we sought to categorize the identified proteins as being ECM-derived or not. The categorization of each protein identified by mass spectrometry was initially performed using the Gene Ontology (GO) “Cellular Compartment” annotations. However, this annotation showed several clear limitations. For example, several cytosolic or cytoskeletal proteins involved in cell-matrix adhesion are mis-annotated as being part of the extracellular matrix (see supplemental Table S1); conversely, some known ECM proteins (thrombospondin 1, von Willebrand factor, agrin, etc.) are defined by vague terms such as “external side of the plasma membrane” or “cell surface” (supplemental Table S1). In addition more than 20 different GO categories correspond to the extracellular matrix (extracellular matrix, basal lamina, basement membrane, etc.) and yet UniProt identifiers for some known ECM proteins are not associated with any cellular compartments in the Gene Ontology database. Finally, and of importance to the study of human/mouse xenografts (see below), we noted conflicting annotations between human and mouse proteins. In order to interpret the mass spectrometric data we needed a better definition of which proteins should be considered as part of the ECM.
Therefore, we developed a bioinformatic approach to predict within any genome the ensemble of genes encoding what we define as the “matrisome,” namely all those components constituting the extracellular matrix (the “core matrisome”) and those components associated with it (“matrisome-associated” proteins). One hallmark of ECM proteins is their domain-based structure (23). Exploiting this characteristic, we established a list of 55 diagnostic InterPro domains commonly found in ECM proteins (type I, II and III fibronectin domains, type I thrombospondin repeats, laminin G domain, etc.; Fig. 3A and supplemental Table S2A). This domain list was used to screen the UniProt protein database. We know that some of the domains used to select positively for ECM proteins are also found in transmembrane receptors and proteins involved in cell adhesion (growth factor receptors, integrins, etc) that do not belong to the ECM. These families of proteins also display a subset of specific domains (e.g. tyrosine kinase and phosphatase domains) and transmembrane domains incompatible with definition as “extracellular matrix” proteins. Therefore, a second step comprised a negative selection using 20 domains (supplemental Table S2B) and a transmembrane domain prediction (see Extended Experimental Procedures for details). This analysis was performed in parallel for both the mouse and human genomes and the respective murine and human matrisome lists were compared based on orthology. Manual curation of the matrisome lists also allowed us to add a very few known ECM proteins that do not contain any known domains; for example, dermatopontin and dentin sialophosphoprotein (supplemental Table S3). Finally, knowledge-based annotation of these gene lists allowed us to define subcategories within the core matrisome; namely, ECM glycoproteins, collagens, and proteoglycans. The bioinformatics workflow developed is presented in Fig. 3B and defined sets of core matrisome proteins from both species are listed in supplemental Table S3.
We also wished to characterize those proteins that are known to be associated with the ECM, but are not included in the core list of ECM proteins. To this end, we defined separate lists of domains commonly found in 1) ECM-affiliated proteins (proteins that share either some architectural similarities with ECM proteins or that are known to be associated with ECM proteins, or that repeatedly appeared in our proteomics analyses of ECM-enriched fractions; see Tables I and supplemental Table S2 and S4); 2) ECM regulators (ECM-remodeling enzymes, crosslinkers, proteases, regulators etc.); 3) secreted factors, many of which are known to bind to ECM and others that may (supplemental Table S2A). As for the core matrisome list, we also defined lists of domains that excluded mis-assigned proteins from these categories (supplemental Table S2B). Using similar bioinformatic pipelines as for the core matrisome, we defined three categories of “matrisome-associated” proteins: ECM-affiliated proteins, ECM regulators, and secreted factors (supplemental Fig. S1).
This bioinformatics approach allowed us to characterize the in silico human and mouse matrisomes (supplemental Table S3A and 3B respectively). The human core matrisome is composed of 278 genes corresponding to somewhat more than 1% of the proteome, the complete matrisome, with 1056 genes, accounting for ~4% of the human proteome (Table I and supplemental Table S3A). In addition to 43 known collagen genes, we conclude that the human genome encodes 200 ECM glycoproteins and 35 proteoglycans. As expected, the human and murine genomes encode very similar collections of matrisome proteins (Table I and supplemental Table S3B). In addition to providing a comprehensive in silico atlas of ECM and ECM-associated proteins, which includes all previously known ECM proteins, we report here the identification of potential novel ECM proteins that display architectures similar to ECM proteins but that have not previously been reported to be part of the ECM (highlighted in supplemental Table S3). These lists are intended to evolve as the in vivo characterization progresses and one might wish to add more proteins or domains to this bioinformatic pipeline in the future.
The categorization of all the proteins identified by mass spectrometry in the ECM-enriched fraction, using our bioinformatic definition, revealed the presence of 55 matrisome proteins out of 184 proteins detected in the simple LC-MS/MS analysis (i.e. without peptide separation by OGE, see below; see Fig. 2A and supplemental Table S1A). The other 70% of proteins detected represent insoluble intracellular components such as proteins associated with the actin cytoskeleton or with intermediate filaments plus a few transmembrane proteins (supplemental Table S1A). However these “contaminating” proteins remain a minority fraction as they represent less than 25% of the precursor-ion MS intensity (Fig. 2A, left panel). The addition of an intermediate fractionation step by off-gel electrophoresis (OGE, separation of the peptide mixture into 11 fractions according to isoelectric point, see Extended Experimental Procedures), before submitting the peptides to LC-MS/MS, increased by a factor of 4 the total number of peptides detected (Fig. 2B middle panel and and22D), corresponding to sixfold more total proteins detected with two or more peptides per protein (Fig. 2B, right panel). However, this step led to only a threefold increase in the number of matrisome proteins, indicating that the analysis is approaching a plateau in the level of detection of matrisome proteins. Interestingly, this additional step increased by a factor of 15 the number of matrisome-associated proteins detected in the sample. Moreover, it was only after peptide separation by OGE that we detected the presence of growth factors in our samples (Fig. 2C and supplemental Table S4A; see Discussion).
To test the reproducibility of the pipeline, we processed a second sample and found a large overlap of the matrisome proteins detected in the two samples, which showed ~80% identity when considering proteins identified by two peptides (Fig. 4A and supplemental Table S4A). The overlap is particularly evident for the most abundant core matrisome proteins belonging to the collagen and proteoglycan categories (Fig. 4A, right panel). Therefore, to define the extracellular matrix composition of a tissue, we combined the data from two independent samples including only those proteins recovered in both samples represented by at least two peptides in at least one of the samples. By this definition, the extracellular matrix of murine lung comprises 143 total matrisome proteins: 92 core matrisome proteins, and 51 matrisome-associated proteins (supplemental Table S4B).
Using the same pipeline and criteria, we characterized the murine colon extracellular matrix (supplemental Table S4D). Replicate ECM-enriched colon samples overlapped by 85% (Fig. 4B, left panel and supplemental Table S4C). As for the ECM-enriched lung fractions, reproducibility is particularly evident for the most abundant core matrisome proteins belonging to the collagen and proteoglycan categories (Fig. 4B, right panel).
Comparison of the lung and colon extracellular matrices allowed definition for each of these tissues of a “matrisome signature”; i.e. the subset of matrisome proteins displaying tissue-specific expression (Table II). A core set of 84 matrisome proteins were consistently found in both lung and colon matrisomes (see Fig. 4C). The vast majority of collagens, proteoglycans and components of the basement membrane (laminins a-chains 2, 3, 4, 5, b-chains 2 and 3 and chain c1, nidogen, perlecan) were found in both tissues. In addition, these categories of shared proteins are abundant, as indicated by the peptide precursor ion abundance and number of peptides detected by MS (Table II).
In addition to this common set of 84 proteins, 59 matrisome proteins were detected exclusively in the lung and 22 matrisome proteins were detected only in the colon. These differentially expressed proteins belong mainly to the ECM glycoprotein, ECM-affiliated, ECM regulators and secreted factor categories (Fig. 4C). Some of these tissue-specific proteins conform in obvious ways with the organ's physiology. For example, thrombospondin 1 (Thbs1) is expressed in the lung but not detected in the colon. This protein was found to be differentially expressed in total ECM preparations of lung versus colon by LC-MS/MS without peptide separation by OGE, indicating that it is an abundant component of the lung ECM. In addition, thrombospondin 1 has previously been shown to play a role in lung physiology (35). Conversely, mucin 2 (Muc2), a protein known to be secreted by the goblet cells of the colon is specifically found in the ECM of colon and not in the ECM of lung (36). Similarly, galectin 4 (Lgals4) has been shown to be expressed exclusively in the digestive tract and is found only in the ECM of colon (37). This same strategy can be applied to various other tissues to characterize biochemically the complete tissue matrices in much greater detail than has been possible heretofore.
Comparison of the proteins detected in vivo with the predicted in silico matrisome shows that ~33% of the 274 core matrisome proteins and 11% of the 1098 total murine matrisome proteins (core plus associated) characterized in silico were detected at the protein level in the lung (Fig. 4D). Considering both lung and colon together, we detected 8 of the 36 predicted proteoglycans encoded by the murine genome (22%). The number of collagens found to be expressed in either the lung or the colon reached 70% of the total predicted in silico (Fig. 4D). We suspect that the other proteoglycans and collagens not detected will be found in other tissue types (see Discussion). Similar conclusions can be drawn for the ECM glycoproteins of the core matrisome. The representation of the predicted ECM-associated proteins in our experimental samples is significantly lower; we detect fewer than 15% of the predicted matrisome-associated proteins (Fig. 4D). Low abundance and tissue-specificity of many of these proteins, no doubt, contribute to the lower coverage of these sectors of the in silico lists. However, it is also possible that greater solubility might account for the fact that they are not detected in our samples (see Discussion).
Having developed this strategy, we wished to apply it to begin to tackle several questions about the tumor microenvironment. As discussed in the Introduction, changes in the ECM of tumors are of considerable interest. One challenging question is to determine whether the composition of tumor extracellular matrix differs in tumors with different metastatic ability. Accordingly, we grew subcutaneous tumors by injection into NOD/SCID/IL2Rγ mice of A375 human melanoma cells (poorly metastatic) or their highly metastatic derivatives MA2 (16). The tumors were dissected 5 weeks later, and the tumor ECM was enriched using the protocols outlined above. As before, we define the tumor ECM as the ensemble of ECM proteins and ECM-associated proteins found in two independent samples (supplemental Table S5). The analysis of the composition of matrix from A375 or MA2 tumors revealed that the majority of the matrisome proteins detected were expressed by both tumor types and in comparable proportions (fibronectin, LTBP3 and 4, periostin, etc.) (Table III). In addition, this study also revealed the differential expression of matrisome proteins. Basement membranes are specialized forms of extracellular matrix of defined composition that are found at the interfaces between epithelia and connective tissues and underlying the endothelial cells of the blood vessel wall. In this comparative study, most of the basement membrane components (type IV and XV collagens, laminins, HSPG2 or perlecan, nidogen 2) were found to be expressed in the same proportions by the two tumor types although we could also detect differential expression of collagens XVIII and IVa3 expressed by the poorly metastatic melanoma whereas laminin b2 and nidogen 1 were only detected in the ECM of metastatic melanoma (Table III and supplemental Table S5).
Lysyl oxidases are enzymes capable of cross-linking collagens and elastin (38). Both tumor types expressed lysyl oxidase-like 1 (LOXL1) in the same proportions, whereas lysyl oxidase-like 2 (LOXL2) was found to be expressed by the A375 but not detected in the MA2 tumors. Conversely, LOXL3 and LOXL4 were not detected in A375 tumors but were present in the MA2 tumors. Emilins are ECM proteins that bind elastin and ECM fibrils (39). In our study, we observed that, whereas both tumor types expressed Emilin 1, only the MA2, highly metastatic tumors expressed also the two other related proteins Emilin 2 and Emilin 3. Interestingly, elastin was found in both tumor types but showed a 10-fold increase in the ECM of MA2 as compared with that of A375 (Table III).
Most of the collagens were found in both tumor types in the same proportions (collagens I, III, V), whereas some were detected in only one of the two tumor types (collagens IV chain a3, IX chain a1, XVIII in A375 tumors and collagens VI chain a6, VIII chains a1 and a2, X, XIV, XXVII, XXVIII in MA2 tumors).
Although we do not yet fully understand the functional significance of these examples, we report here for the first time a detailed composition of melanoma extracellular matrices, which can lead to further analyses.
Another challenging question when studying the tumor microenvironment is to understand the origin of the tumor ECM; that is, whether the tumor ECM is produced and secreted by the tumor cells themselves, by the stromal cells or by both compartments. To address this question, we pursued the analysis of the melanoma xenografts described above, by identifying for each protein its origin: is the protein secreted by the human tumor cells or by the murine stroma? The murine sequence of a given protein is, in most cases, sufficiently different from its human ortholog to be distinguished by proteomic analyses (supplemental Fig. S3). The mass spectrometric analysis allowed us to distinguish the human proteins from their murine counterparts. The origin of a protein could not be determined in only a very few cases (protein indicated as “indistinguishable” in column D of supplemental Table S6 such as S100A10 in the A375 ECM or SRPX in the MA2 matrisome). In order to be able to identify without ambiguity the origin of each protein, we required that proteins needed to be detected in two independent samples with at least two species-specific peptides in one of them. Using this strategy, we identified for each tumor type a set of matrisome proteins exclusively secreted by the (human) tumor cells, and another set exclusively secreted by the (murine) stromal cells (Fig. 5 and supplemental Table S6). Extracellular matrix proteins and proteases found in plasma, fibrinogen (Fg) subunits, plasminogen (Plg), thrombin (F2) and Factor XIIIa1, were all, in agreement with their function, found to be expressed exclusively from the mouse genome. In addition, we found proteins such as fibrillin-1, fibronectin etc. that are secreted by both the tumor cells and the stromal cells. Among these latter proteins are several basement membrane components: collagen IV, a1, a2, a3, a5 chains, collagen XVIII, perlecan (HSPG2), and nidogen 1.
To analyze further the relative contributions of the two cellular compartments to the secretion of a given protein, we developed a semiquantitative measure. In brief, we calculated the ratio of the observed precursor-ion MS intensities for sequence-distinguishable homologous peptides from the mouse versus human proteins (supplemental Fig. S3 and supplemental Table S6). We consider a ratio above five to indicate that the contribution of the tumor cells to the secretion of the protein of interest is predominant (indicated in orange in Fig. 5 and supplemental Table S6C). Conversely, a ratio under 0.2 indicates a greater contribution from the stroma (Fig. 5 and supplemental Table S6C, proteins highlighted in green). Finally, some proteins are secreted in significant proportions by both compartments—ratio between 0.2 and 5 (Fig. 5 and supplemental Table S6C, proteins highlighted in yellow). We conclude from these data that both tumor cells and stromal cells contribute, although differentially, to the secretion of proteins making up the tumor ECM.
Our results are in agreement with recent data from the literature. For example, periostin, a known ECM glycoprotein involved in tumor progression and metastasis formation was expressed exclusively by the stroma in A375 tumors, whereas its expression was also induced in the tumor cells in MA2 tumors. This result conforms with a gene expression study by Tilman et al. who showed that periostin is not expressed by normal skin but becomes expressed by melanoma cells and the stroma during tumor progression and metastasis formation (40). Conversely, the basement membrane components, laminin chains a5 and c1, were only expressed by the tumor cells in the A375 tumors and were secreted by both compartments (laminin c1 chain) or exclusively secreted by the stroma (laminin a5 chain) in the MA2 tumor. Of interest, laminin 511, composed of the a5, b1 and c1 chains has been shown recently to promote melanoma cell migration and invasion (41).
Our study is the first to report the detailed composition and origin of the tumor extracellular matrix at the protein level and highlights many proteins of potential importance for metastatic disease. Therefore, we conclude that the matrisome components secreted by the tumor cells vary with the metastatic potential of the tumor cells. In addition, the matrisome components secreted by the stromal cells also change in response to the metastatic potential of the tumor cells, indicating significant cross-talk between the tumor cells and the stromal cells.
To illustrate the potential of this approach, we considered the example of hyaluronan and proteoglycan link protein-1 (HAPLN1), which was found to be expressed exclusively in the metastatic tumors and specifically secreted by the tumor cells (Table III, Fig. 5, and supplemental Table S5B). The function of HAPLN1 is to link proteoglycans to hyaluronic acid thus participating in the architecture of the extracellular matrix (42). Interestingly, this gene was previously found to be up-regulated in gene expression arrays comparing MA2 tumors to A375 tumors (43) and is often up-regulated in highly metastatic cell lines or human tumors (44, 45).
Additionally, the Human Protein Atlas database indicates an increased expression of HAPLN1 in malignant melanoma (http://www.proteinatlas.org/ENSG00000145681/cancer/malignant+melanoma) (46).
Our proteomic analysis identified the origin of HAPLN1 as being secreted by the tumor cells and not by the stromal cells, since human-specific but not murine-specific peptides were detected (Figs. 6A, ,66B and supplemental Table S6B). We confirmed the expression of HAPLN1 in metastatic and not in nonmetastatic tumors by immunohistochemistry (Fig. 6D). The fragment used to generate the antibody is 89% identical with the corresponding murine sequence (Fig. 6C), which renders ambiguous the identification of the origin of HAPLN1 by immunohistochemistry alone. This example demonstrates that our proteomics strategy is a method of choice to identify without ambiguity the origin of a given protein of tumor ECM.
We report here the development of proteomic methods to characterize in detail the biochemical composition of the extracellular matrix from normal and tumor tissues. In addition to the development of this experimental strategy it was essential to develop a systematic and objective bioinformatic definition of those proteins that should be considered as part of, or associated with, the ECM—we call this set of proteins the “matrisome” (Fig. 7). This detailed inventory of ECM and ECM-associated proteins serves as a basis for analyses of the tissue specificity of ECM composition and of changes that occur during physiological or pathological processes, exemplified here by analyses of cancer. These methods can now be applied to diverse biological questions concerning ECM biology and pathology and we present here detailed protocols to allow their exploitation by others.
Taking advantage of the characteristic domain-based organization of ECM proteins, we established a bioinformatic pipeline aimed at identifying all ECM proteins encoded within the genome, based on the presence and absence of diagnostic domains. In parallel with the definition of this “core” group of extracellular components (structural and fibrillar glycoproteins and proteoglycans), we also defined, using similar methodology, an inclusive list of matrisome-associated proteins, comprising 1) ECM-affiliated proteins (affiliated either structurally or physically with the core matrisome), 2) ECM-remodeling enzymes and iii) secreted factors such as growth factors and cytokines, known or suspected to bind to ECM. In addition to providing an atlas of the matrisome, these lists were incorporated into the proteomic pipeline to annotate automatically the mass spectrometric output (see below and Fig. 7).
The compilation of matrisome-diagnostic domains and our inventory of 1056 matrisome genes in the human genome (accounting for 4% of the proteome), including 278 core matrisome genes (1% of the proteome) represents the most comprehensive definition of extracellular matrix components to date and provides a framework for future analyses of the ECM of normal and diseased tissues. Indeed, the list of core matrisome proteins should serve as a more comprehensive alternative to GO annotation of ECM proteins, which is inadequate in several ways. This list of 278 core matrisome proteins includes all previously known ECM proteins as well as a number of previously unknown proteins (see supplemental Table S3). Of note, this systematic approach would not allow the identification of proteins that lack ECM-diagnostic domains and that are not known to be part of the extracellular matrix. However, the domain and protein lists are intended to evolve as the in vivo characterization of the extracellular matrix progresses and knowledge in the field advances; one could readily add to the bioinformatic pipeline new domains or criteria, thereby including additional proteins in the lists. It is worth noting that all proteins detected by the mass spectrometry but not included in the lists of bioinformatically defined matrisome proteins are included in the supplemental Tables and available for search by additional bioinformatic analyses.
To study the global composition of the extracellular matrix from tissues and tumors, we employed a protocol originally designed to extract proteins from several intracellular and membranous subcellular compartments. This protocol largely depletes intracellular proteins and in turn enriches for insoluble ECM and ECM-associated proteins. Hence, this strategy focuses on the insoluble components present in the ECM and proteins tightly associated with these components. The enrichment for ECM components was confirmed by the mass spectrometric analysis of the composition of protein extracts. It is important to note that this protocol will miss proteins loosely bound to the ECM and eluted by the sequential extractions. However, such proteins would be detectable in the extracted fractions using targeted proteomics (see below).
Our analyses of the lung and colon showed that the ECM of a given tissue comprises between 100 and 200 proteins, including a set of tissue-specific proteins that represent 10% to 30% of the total. Considering both the lung and colon together, we detected 67 of the 195 murine ECM glycoproteins predicted bioinformatically (representing a third of the predicted ECM glycoproteins), 30 of the 43 collagen subunits and eight of the 36 proteoglycans encoded by the murine genome. We suspect that those matrisome proteins not detected will be found in other tissue types or at different developmental stages; for example, some proteoglycans, such as aggrecan or fibromodulin, and some collagens (collagen II and collagen IX) are known to be cartilage-specific (33, 47) and many ECM proteins appear to be selectively or exclusively expressed in the nervous system (48). Thus, our coupled mass spectrometric and bioinformatic approaches have allowed a more extensive characterization of the ECM of tissues than has been possible previously.
The proportions of the predicted ECM-associated protein categories found in our experimental samples are significantly lower, and most of them were only detected after peptide separation by isoelectric focusing. ECM modification enzymes and secreted growth factors are expected to be present at lower molar ratios than structural proteins and therefore to be less well represented in the data sets. Greater solubility (looser binding to ECM) might also explain in part the fact that fewer are detected in our data sets to date. Finally, the list of ECM-affiliated proteins may well be somewhat inflated, since we included in our lists entire families of proteins if any members of the family were identified in any of our experimental samples (examples include annexins, galectins, mucins, etc.), and this will increase the probability that individual proteins may not be found in particular samples. The purpose was to avoid ignoring potential matrisome components but this inclusive nature of the ECM-associated category inevitably increases the probability that such proteins will not be found in particular samples. Some of these proteins may indeed not be ECM-associated.
We anticipate that further analyses will allow detection of specific subsets of proteins present at lower levels in the ECM-enriched fractions. Targeted proteomic methods (such as Accurate Inclusion Mass Screening or Multiple Reaction Monitoring, etc.) are designed to detect specific peptides in samples and are particularly useful to detect low-abundance proteins within a complex mixture (49–51). Using as resources the in silico and experimentally determined matrisomes reported here, one can now select for each matrisome protein, signature peptides particularly likely to be detected by MS (49, 52) and use this peptide list for enhanced detection of lower abundance proteins of interest. Such approaches based on our lists of candidate proteins will, no doubt, allow the identification of more low-abundance proteins such as growth factors. Similar methods could also be used to detect alterations in solubility of particular proteins by analysis of the extracted fractions.
Recent research in cancer biology has demonstrated that tumor cells not only need to accumulate genetic alterations allowing them to proliferate but need also to be surrounded by a locally permissive microenvironment, which markedly affects tumor progression (53, 54). In this context, the surrounding stroma, including the tumor extracellular matrix, is of great interest. Previous proteomic analyses of tumors have not focused on the ECM, although ECM proteins were detected among the tumor proteins analyzed (55, 56).
We demonstrate here that our methodology can be applied to characterize specifically and in detail the ECM of tumors. Moreover, mass spectrometry is a method of choice to distinguish human from murine protein sequences and thus can address the question of the tumor or stromal origins of tumor extracellular matrix in human into mouse xenograft systems. Our data prove that the origin of the tumor extracellular matrix is dual and comes from both the tumor cells and the stromal cells, as might well have been expected. However, our analyses provide semi-quantitative estimates of the origins (tumor cell or stromal cell) for over a hundred ECM proteins, which has not been feasible before. Furthermore, we show that the matrisome components secreted by the tumor cells vary significantly with the metastatic potential of the tumor cells. Moreover, the matrisome components secreted by the stromal cells also change in response to the metastatic potential of the tumor cells, indicating significant cross-talk between the tumor cells and the stromal cells. This study is the first to report the detailed composition and origin of the tumor extracellular matrix at the protein level. It will be of interest to investigate the significance of these changes for tumor progression.
In conclusion, the strategy presented offers a new perspective to the analysis of the extracellular matrix. Moreover, the methods described here can reveal the presence and origins of novel or unsuspected proteins within the ECM and, thus, novel molecular mechanisms by which the ECM could influence cancer progression and metastasis formation. In addition to revealing novel molecular players whose functional contributions can then be investigated, this approach has the potential to identify biomarkers that can serve as prognostic and diagnostic tools. The methods described here are generalizable and portable and similar strategies could readily be applied to many other physiological (e.g. development, angiogenesis) and pathological (e.g. fibrosis, wound healing, genetic disease) processes that involve important changes in ECM.
We thank Denise Crowley for histology, Tonje Steigedal for assistance in developing the list of secreted factors, Myriam Labelle, John Lamar, and Christopher Turner for helpful critiques of the manuscript and Charlie Whittaker and the members of the Hynes Lab for many helpful discussions.
* This work was supported by grants from the National Cancer Institute (U54 CA126515); the Koch Institute, the Broad Institute of MIT and Harvard and the Howard Hughes Medical Institute, of which ROH is an Investigator and AN a Postdoctoral Associate.
This article contains supplemental Extended Experimental Procedures, Figs. S1 to S3 and Tables S1 to S8.
1 The abbreviations used are: