|Home | About | Journals | Submit | Contact Us | Français|
Packed with biological information, extracellular vesicles (EVs) offer exciting promise for biomarker discovery and applications in therapeutics and non-invasive diagnostics. Currently, our understanding of EV contents is confined by the limited cells from which vesicles have been characterized utilizing the same enrichment method. Using sixty cell lines from the National Cancer Institute (NCI-60), here we provide the largest proteomic profile of EVs in a single study, identifying 6,071 proteins with 213 common to all isolates. Proteins included established EV markers, and vesicular trafficking proteins such as Rab GTPases and tetraspanins. Differentially-expressed proteins offer potential for cancer diagnosis and prognosis. Network analysis of vesicle quantity and proteomes identified EV components associated with vesicle secretion, including CD81, CD63, syntenin-1, VAMP3, Rab GTPases, and integrins. Integration of vesicle proteomes with whole-cell molecular profiles revealed similarities, suggesting EVs provide a reliable reflection of their progenitor cell content, and are therefore excellent indicators of disease.
Extracellular vesicles (EVs) represent a diverse population of communication pods released from cells. Ranging in size from 40 to 1000 nm, EVs include exosomes, microvesicles, and apoptotic bodies. Generally, exosomes are described as 40-150 nm endocytically-derived vesicles formed by intraluminal budding of multivesicular bodies (MVBs) which are released following MVB fusion with the plasma membrane. Microvesicles are generally larger than exosomes, and are shed by budding and fission events directly at the cell membrane. Varying in size, apoptotic bodies are formed by plasma membrane blebbing, and can contain packaged organelles following initiation of cell death. Further sub-populations of vesicles likely exist, reflecting the heterogeneity of cellular biology encapsulated by EVs.
Extracellular vesicles have been implicated in a number of different physiological processes, including immune system modulation, cell-to-cell signaling, and cell proliferation [1–5]. Accumulating evidence has implicated EVs as major players in the growth, invasion, and metastatic capacity of cancer cells [6, 7]. For example, exosomes and microvesicles have been demonstrated to transfer oncoproteins and nucleic acids from virally-infected cells to uninfected neighboring cells, and likely promote viral-associated tumor progression [8–11]. Systemic circulation of EVs can play a role in establishing a tumor microenvironment, providing the “soil” for cancer cell “seeding” to metastatic sites  and cancer patients have been shown to have increased levels of circulating EVs [13–15]. Recent research has begun to characterize specific transmembrane proteins responsible for targeted vesicle uptake by cells in common cancer type-specific metastatic sites . These circulating vesicles therefore reflect a diverse form of intercellular communication that can facilitate the progression of neoplastic growth and tumor metastasis.
Though EVs likely contribute to the progression of some types of cancer, our knowledge of vesicular communication is still incomplete. One reason is due to the heterogeneity of EV sub-populations. Though the complexity of EV populations has burdened our understanding of their roles in cell biology, the existence of a variety of EVs may be beneficial in the context of cancer diagnostics and prognostics. One challenge in diagnosing cancer is that tumors often represent a diversity of cell types and genetic mutations; a tissue biopsy is limited in its ability to reflect this diversity. However, circulating EVs are derived from an all-inclusive population of cells, and therefore have the potential to more accurately reflect the entirety of a heterogeneous tumor . While some limitations of vesicle-based biopsies exist, such as disparities in EV quantities released from different tumor cells and the ability to detect small changes in the populations of circulating vesicles, EV-based detection offers alternatives to current diagnostic approaches. Current screening or monitoring tests of cancer progression, such as prostate-specific antigen (PSA; prostate), CA-125 (ovarian), alpha fetoprotein (liver), or CA19-9 (pancreatic) often lack the sensitivity or specificity to provide highly accurate clinical diagnoses [18–20]. As EVs provide membrane-bound protected cargo that can reflect cell-specific pathological processes, we and others propose that these vesicles bear great potential as circulating biomarkers that could improve the current strategies of cancer diagnosis [21–24].
A better understanding of the contents of these vesicles is crucial to the development of EV clinical applications. A current limitation in proteomic analyses of EVs from cancer cells is the narrow number of cell lines studied using comparable and reproducible methods. Here, we characterize 60 diverse human cancer cells derived from 9 distinct tissue types from the National Cancer Institute (NCI-60). The NCI-60 panel was originally compiled by the Developmental Therapeutics Program for high-throughput drug screening, and has led to a number of successful chemotherapeutic drugs used to treat cancer patients . The NCI-60 has also contributed vastly to a better understanding of cancer cell biology and the identification of many novel oncogenic DNA mutations [26, 27]. Since then, the panel has become publicized for cancer research purposes, and a full whole-cell proteomic analysis of each individual cell line has been published . Proteomic and RNA analyses of EVs from subsets of the NCI-60 panel have recently been investigated, providing initial characterizations of cancer vesicle contents [29–32]. Research using cell lines from the NCI-60 panel has also contributed to evidence demonstrating the roles of EVs in the growth and survival of tumor cells, multidrug resistance [33–35], immune evasion , cancer cell migration  and impact on cells in the tumor microenvironment . Subsets of the panel have also been used to study general mechanisms of EV biogenesis and release from cells [39, 40]. Recently, we compared the vesicle secretion of NCI-60 cell lines using nanoparticle tracking analysis (NTA). Results highlighted differences in secretion rates and sizes of vesicles from cancer cells .
With this in mind, we conducted a comparative analysis of proteins from EVs secreted by the NCI-60 cells. To our knowledge, this is the largest single-study proteomic investigation of vesicles to date. In this study, 6,071 unique proteins were identified, including 213 common to all 60 cell types, which likely reflect the common machinery involved in EV biogenesis. Differentially expressed proteins were also identified. Many of these proteins are associated with tissue type, and could therefore serve as markers of EV origin or aid as future diagnostic biomarkers of cancers. To investigate proteins involved in mechanisms of EV biogenesis and secretion, the EV proteome was further analyzed to look for associations between protein accumulation (spectral counts) and vesicle secretion quantity. Finally, the proteomic analysis of NCI-60 EVs was compared to existing cellular proteome and transcriptome datasets. These analyses revealed that the EV proteome closely reflects the transcriptome and proteome of the cell of origin, supporting the hypothesis that EVs are a rich source of diagnostic and prognostic markers. Overall, this extensive proteomic dataset provides a foundation to further investigate general mechanisms of vesicle biogenesis, and demonstrates the incredible biomedical and clinical utilities of extracellular vesicles.
To characterize and compare extracellular vesicle proteomes, EVs were harvested from 60 cell lines (NCI-60). As pure EV sub-populations are empirically difficult to isolate, a method of enriching a broad spectrum of vesicles was used in this study to contribute to a greater understanding of global EV content. We have previously demonstrated that the ExtraPEG method enriches for EVs with a comparable purity  to sucrose-purified samples following growth of cells for a period of time in serum-free medium, as performed in this study  (Figure (Figure1A).1A). Vesicles harvested using this method were found to contain extracellular vesicle markers by western blot [41, 43]. Nanoparticle tracking and electron microscopy revealed sizes and morphology consistent with those previously reported for EVs [41, 43]. Here, vesicles were enriched using identical methods from individual cell lines across nine represented histological origins: breast, brain (CNS), colon, kidney, leukemia, lung, melanoma, ovary, and prostate. Collectively, 6,071 unique proteins were identified in EVs (Supplementary Table S1). To examine the overlap between known vesicular proteins, NCI-60 EV proteins were compared to those in the Vesiclepedia compendium of extracellular vesicle molecular data . Nearly 4,500 proteins were previously identified in EVs (Figure (Figure1B1B and Supplementary Table S2). Over 1,500 proteins not previously characterized as EV components were further discovered. Because so many unreported vesicular proteins were identified, we aimed to ensure that proteins found in this study were congruent with those previously found in EVs. To increase the stringency of our dataset and characterize common EV proteins, only those identified in at least two-thirds all cell line isolates ([NCI-60]stringent) were compared to the Vesiclepedia compendium. These proteins showed over 97% overlap with proteins currently characterized as extracellular vesicle proteins.
The entire proteome of NCI-60-derived EVs was further characterized systematically using qualitative and quantitative analyses. An average of nearly 1,900 proteins were identified per cell line across all tissue types (Figure (Figure1C).1C). Calculation of the median logarithmic abundance of proteins revealed a normal distribution of spectral counts across the panel (Supplementary Figure 1D). The number of total EV proteins found within each tissue was similar across the panel (Figure (Figure1D),1D), with the exception of the prostate cancer group, likely explained by the underrepresentation of this tissue type (n=2). Strikingly, 213 proteins were identified that were common to every vesicle sample, representing the core NCI-60 EV proteome (Supplementary Table S3). These proteins likely reflect essential proteins packaged into EVs from many different cellular origins, and provide insight into general mechanisms of EV biogenesis, entry, protein trafficking, and secretion. Proteins found in at least one cell line from each exclusive tissue type were also compiled to represent tissue-specific markers (Supplementary Table S4). Notably, 165 proteins were exclusively found in leukemia-derived EVs, while fewer unique proteins were in other tissue types. As EVs from many cancer cells have been shown to be enriched in functional integrins [16, 45], these data may reflect detectable differences between cancer EVs originating from circulating hematopoietic cells versus those typically attached to basement membrane matrices through integrin linkers.
To characterize the cargo abundant in the majority of cancer EVs, proteins found in the [NCI-60]stringent dataset were further analyzed. Functional and pathway analyses were conducted using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.7. Not surprisingly, proteins enriched in protein localization, transport, and vesicular functions were identified in our data set (Figure (Figure2A).2A). Many ribosomal proteins involved in translation processes were also enriched, similar to results seen in previous studies [32, 46]. Ribosomal components may facilitate cell-to-cell communication by directly translating mRNAs present in EVs following fusion with target cells. Pathway analysis revealed proteins to be enriched in RNA processing and proteolytic processes, as well as cytoskeletal and endocytic, pathways (Figure (Figure2B).2B). Comparison to the Vesiclepedia database revealed the majority of proteins to have a subcellular localization in endolysosomal or cytoplasmic compartments (Figure (Figure2C).2C). Altogether, these analyses demonstrate an abundance of proteins with recognized functions in protein and vesicle trafficking from the endosomal pathway, and cytoskeletal involvement that likely plays a role in both exosome and microvesicle secretion. The presence of proteins involved in RNA processing suggests an active process of RNA sorting and packaging, and offers insight into how biologically active RNA messages are communicated between cells.
A common method of characterizing EVs relies on the presence of accepted protein markers enriched in vesicle populations. Recent research has described different subpopulations of EVs from cells by identifying vesicle specific protein markers . We compared commonly used exosome and microvesicle markers described across EVs derived from the NCI-60 cells (Figure (Figure2D).2D). Historically, these markers have been considered universal, however, only tetraspanin CD81, Alix, and HSC70 were found across all samples. Tetraspanins CD63 and CD9, as well as TSG101, Syntenin-1, and Flotillin-1 were identified in at least two-thirds of the samples. MMP-2, a previously reported microvesicle marker  was found in only 16 EV samples across the panel. Many of these protein markers have historically been used to describe and quantify extracellular vesicles regardless of their cellular localization or method of harvest. However, data presented here show wide variation in the levels of traditional EV markers, with some being completely undetectable in certain EV preparations. We subsequently analyzed EV proteins identified in this study to characterize those common to all cells in the NCI-60 panel. Proteins within this dataset involved in vesicle-mediated transport and protein localization were identified, and were largely enriched in GTPase function, including Rab proteins 1A, 2A, 5C, 6A, 7A, 8A, 10, 11B, and 14 (Supplementary Table S5). These proteins likely represent more universal markers of EVs that future EV researchers should consider for characterization. Notably, as various cells may package these proteins to different degrees within the same number of EVs, the correlation of vesicle quantity across cell lines to any one of these markers is complex. This poses challenges for researchers, as quantitative protein assays such as ELISA or immunoblot analyses are often used to determine vesicle quantity following EV enrichment.
Next we aimed to compare vesicle proteomes across individual cancer cells in the NCI-60 panel. As extracellular vesicles have been suggested to carry proteins that reflect their progenitor cell, it was hypothesized that EVs released from cells of the same tissue type would share similarities in protein content. Principal component analysis (PCA) demonstrated that EV proteins clustered based on tissue type (Figure (Figure3A).3A). Of note, one leukemia cell line (K562) did not cluster with the remaining cancer EVs (Supplementary Figure 1C). The K562 cell line is an erythroleukemia derived from the pleural effusion of a patient with chronic myelogenous leukemia (CML), and is positive for the Philadelphia translocation on chromosome 22, creating the chimeric BCR/ABL fusion gene. The BCR/ABL gene has been demonstrated to downregulate many cell adhesion molecules , which may, in part, explain the divergence from other cell-derived EVs. For instance, in this study, ICAM3, an adhesion molecule abundantly expressed in leukocytes, was found in high levels among all leukemia-derived EVs except the K562 EVs. In light of these findings, the K562 cell line was excluded from all subsequent analyses.
Unsupervised hierarchal clustering was used to further examine the congruity of EV proteomes from cells of the same tissue origin. Samples from colon, kidney, leukemia, lung, and melanoma cancers clustered closely within tissue type (Figure (Figure3B).3B). Breast, CNS, and ovarian cancer EVs demonstrated sub-clustering within each tissue type, suggesting similarities between cancer types that may not be universal across the tissue of origin. Part of these unique clustering patterns may reflect metastatic potential of particular cell lines. For example, highly metastatic breast cancer cells (BT-549 and MDA-MB-231) cluster together away from other non-metastatic breast cell lines.
Strikingly, a number of proteins were observed to be differentially expressed in vesicles secreted from adherent cell lines compared to cells in suspension culture. For example, Agrin (AGRN), a basement membrane glycoprotein that contains heparan and chondroitin sulfate residues was absent only in the leukemia-derived EVs (Figure (Figure3C).3C). On the other hand, adhesion molecule ICAM3 was present predominately in EVs secreted from leukemia cancer lines grown in suspension (Figure (Figure3D).3D). As integrin and heparan sulfate proteoglycan receptors have been demonstrated to play a significant role in the uptake of vesicles into cells [16, 49], these observations provide new targets for future studies of tissue-specific mechanisms of vesicular protein trafficking and targeting of EVs to recipient cells.
Recently, a urine-based exosome diagnostic assay has been demonstrated to predict high-grade prostate cancer among men with elevated PSA levels . Likewise, glypican-1, a heparan sulfate proteoglycan, was found to be detected in only cancer-derived exosomes, and was further shown to be correlated with pancreatic cancer progression in patients, providing a non-invasive early diagnostic tool for pancreatic cancer . In the NCI-60 panel, glypican-1 was identified in only 35 of 60 cancer EV isolates. This suggests an even stronger likelihood of novel protein markers identified across all samples in this study (or all samples within a tissue-type) to be useful as early diagnostic markers of cancers. Furthermore, the evidence showing differences in EV contents from circulating cells (leukocytes) compared to basement membrane-adhered cells is particularly valuable when considering liquid biopsy techniques to isolate vesicles, as proteins such as ICAM3 could serve as a tool to distinguish blood cell-borne EVs from those secreted into circulation by organ-derived or metastatic cells.
Extracellular vesicle-based liquid biopsies also carry the potential for early detection of cancer cells that normally have limited access to blood circulation. The proteomic analysis of NCI-60 EVs confirmed the presence of premelanosome protein (PMEL) in vesicles secreted from all melanoma cancer cell types (Figure (Figure3E).3E). This melanocyte-specific transmembrane glycoprotein has previously been shown to be sorted into endosomes for exosomal secretion . As melanocytes are ordinarily confined to the epidermal layers of the skin, access to deeper blood vessels is not usually achieved unless vertical growth of cancerous cells occurs. Therefore, a circulating (plasma) melanocyte-specific exosomal protein marker could serve as an early indication of various types of invasive melanoma growth.
Moreover, several vesicular proteins were identified in a very small population of cancer lines. For instance, Tenascin XB (TNXB), an extracellular matrix glycoprotein was found in abundant levels only in PC-3 cells, a high grade prostate adenocarcinoma line. Although Tenascin XB was not identified in whole cell proteomic profiling from PC-3 cells , relatively high transcript levels of this protein in PC-3 cells have been described previously . Another EV protein, periostin was recently identified as a metastatic breast cancer vesicular marker . In our study, the presence of periostin was confirmed in both metastatic breast cancer lines (MDA-MB-231 and HS 578T), but was not found in other non-metastatic breast cancer-derived EVs. Additional proteins including raftilin (a lipid-raft regulating protein), fibulin-7 (an adhesion molecule), and plasminogen activator inhibitor 1 (a serine protease inhibitor that has previously been implicated in aggressive tumor growth [53, 54]) were exclusively found in metastatic breast cancer EVs. Likewise, latent-TGFβ-binding protein-1 was identified preferentially in lung cancer cells with high invasive capacity (A549, HOP-62, and HOP-92) . In all, over 1,500 proteins were found to be differentially expressed across the 60 EV samples (Supplementary Table S6). Interestingly, comparison of whole cell protein expression reported by Gholami et al. to EV expression in Figure 3C-3E  revealed these differentially expressed proteins to be largely conserved between cell and vesicle isolates. For instance, ICAM3 was chiefly absent in non-leukemic tumor cells, while Agrin was underrepresented in leukemia-derived cells (Supplementary Figure S2). Likewise, PMEL was found in substantial levels in melanoma cell lines compared to other cancer cells. These findings suggest that differentially expressed proteins found in cancer EVs may reflect cellular phenotypes. Furthermore, the specificity of protein content in vesicles from individual cancer cell types promises great potential in further investigation of these novel markers for early cancer detection and prognostic monitoring.
Recently, we described relative extracellular vesicle secretion quantities across the NCI-60 panel . As many EV proteins in samples across the study were identified in different quantities, we hypothesized that some of these proteins are likely involved in vesicle biogenesis and therefore correlate to the total number of vesicles secreted by cells. To investigate proteins involved in common pathways of EV formation, levels of proteins in the [NCI-60]stringent were compared to previously collected vesicle secretion quantities  (particles per cell; Supplementary Table S7) by weighted gene coexpression network analysis (WGCNA).
In this analysis, hierarchal clustering of proteins (Figure (Figure4A)4A) demonstrated inter-related expression patterns and produced 15 clusters of highly related proteins (modules) that were detected by dynamic tree cut, an optimal method used to detect clusters of data within a dendrogram . Modules were then correlated to vesicle secretion patterns across the panel (Supplementary Table S8). The yellow module containing 88 proteins was most significantly correlated with particle secretion, and therefore served as our target protein cluster (Figure (Figure4B4B).
Here, protein significance is defined as the correlation of the protein expression profile across the NCI-60 panel with particle secretion levels. Module membership further measures the correlation of protein expression patterns across the members of the yellow module. We found protein significance and module membership to be positively correlated in the yellow module (p = 0.006) (Figure (Figure4C).4C). These findings suggest that proteins clustered into the module show interconnected profiles of expression in vesicles that positively correlated with the number of vesicles secreted by cells.
Enrichment analysis of the yellow module demonstrated proteins were significantly enriched in cell adhesion and growth, GTPase activity, and cell surface receptor signaling (Figure (Figure4D),4D), and included CD63, CD81, VAMP3, syntenin-1, and SEC22B, among other vesicular proteins. Strikingly, 25 of the yellow module proteins were identified in EVs from every cancer cell in the panel (Table (Table1),1), supporting the hypothesis that commonly identified EV components likely play a role in EV biogenesis. In light of the variation in current EV markers seen (Figure (Figure2D),2D), these represent important proteins that could more accurately compare vesicle quantities across a diversity of cell lines and certainly warrant future investigation.
Given the clinical utility of using extracellular vesicles for cancer diagnostics, we investigated the relationships between EV protein composition and whole cell content. Previously, cellular protein and transcript expression profiles were compared using co-inertia analysis (CIA) to examine the concordance between these molecular datasets across the NCI-60 panel . Here vesicle protein levels were similarly compared to cellular protein and RNA expression. In Figure Figure5A,5A, each of the three datasets (vesicle proteome, cellular proteome, and cellular transcriptome) is plotted for individual cell lines. Markers represent the relative position of a cell line in respective proteome or transcriptome space, where the divergence of datasets is delineated by connecting vectors. Molecular factors driving tissue-dependent clustering were plotted as overlapping protein (cell and vesicle) and RNA transcript data in the same orientation (Figure (Figure5B).5B). In variable space, RNA or protein coordinates farther from the origin are more highly expressed in cell lines projected in the same direction. Whole cell RNA profiling showed the strongest ability to differentiate samples, demonstrated by farther distances of RNAs from the origin in variable space. However, vesicle protein was observed to show similar magnitudes of projection from the origin as whole cell protein, demonstrating the ability of EV proteins to reflect cell protein profiling. A histogram of eigenvalues (Figure (Figure5C)5C) demonstrated that the first and second co-inertia axes represented 49% of the total variance (sum of the eigenvalues) seen in the datasets plotted, accounting for 31% and 18% of the variance respectively. The three datasets were also examined to consider how much variance of the eigenvalues was contributed by each dataset (Figure (Figure5D).5D). No single dataset contributed to both co-inertia axes alone, indicating that the analysis examining the relationships between vesicle proteomes and cellular transcriptomic and proteomic profiles was dependent on all datasets.
In general, melanoma, leukemia and colon cells demonstrated tissue-type clustering based on their unique proteomic and transcriptomic profiles, similar to clusters previously observed (Figure 3A-3B). The vesicle protein dataset showed considerable association with the whole cell molecular profiles, indicated by the short and randomly oriented arrows (Figure (Figure5A).5A). Overall, the similarity of vesicle protein to cellular RNA and protein resulted in RV coefficients of 0.63 and 0.57, respectively. Notably, several cell lines demonstrated greater variation between vesicle protein and whole cell profiles, including two lung lines (NCI-H322M and HOP-62), one colon cell (HCC-2998), and an ovarian cell (IGROV1). To more closely examine proteins found in discordance between whole cell and EV proteomes, proteins found in the core vesicle proteome were compared to reported cellular expression data . In total, 144 proteins identified in the core vesicle proteome in this study were similarly identified in cell lines, as reported by Gholami et al. Comparison of expression levels between whole cell and vesicular isolates revealed several outlying proteins enriched in vesicles or cells alone (Figure 5E–5H and Supplementary Table S9). While proteins such as Actin (ACTG1) or Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were found abundantly in both cells and EVs, tubulin proteins (TUBB4B and TUBA1C) were enriched in lung cancer vesicles compared to cells, and actin-binding protein Moesin (MSN) was enriched in vesicles from IGROV1 and HCC-2998 cells. Strikingly, Galectin-3-binding protein (LGALS3BP) was enriched in multiple EV isolates compared to respective progenitor cells. These proteins probably represent targets of selective vesicular packaging, and future focus will likely advance our understanding of vesicular trafficking mechanisms.
Altogether, the congruency of tissue dependent clusters across datasets and resemblance between the vesicle and cellular molecular profiles substantiates the ability of EV profiles to discriminate between tissue types and supports a significant clinical value of EVs as circulating biomarkers. As EVs can closely reflect the molecular profile of their cellular origin, they therefore represent circulating biological archetypes of healthy or diseased cells.
In the past decade, it has become clear that extracellular vesicles are an important component of cellular communication informing fields including cancer biology, immunology, and virology. The data here provide a means to better understand the protein cargo involved in this ubiquitous messaging system with implications for further use in clinical diagnostic practice. In the characterization of vesicular protein contents from the NCI-60 cells, we have expanded the list of known vesicle proteins by nearly twenty percent (based on the Vesiclepedia database of characterized vesicle proteins), significantly growing our foundation of knowledge of the nature of this communication system. Of these proteins identified, 213 were common to all cell lines. This core EV proteome likely represents conserved structural and signaling components of vesicles, and molecular factors involved in vesicle biogenesis and secretion. In support of this, 25 of the 213 proteins were found to be positively correlated with EV secretion levels. One of the largest obstacles currently faced by EV researchers is accurately defining these vesicles subtypes. The data presented in this study offer a substantial step forward in the ability to define EV populations, and to enrich vesicles for studies in all areas of EV research. As our ability to isolate distinct vesicle sub-populations continues to improve, this substantial database will likely provide an important backbone for understanding more unique vesicle contents.
In this study, critical differences in extracellular vesicle contents were also identified which may directly lead to the discovery of biomarkers of cancer, ultimately affecting diagnosis and prognosis of the increasingly prevalent disease. As emerging large-scale “omic” datasets progressively demonstrate significant potential in the future of individualized medicine, it is expected that extracellular vesicles will play an important role in modern health and disease-state surveillance. In this study, the close resemblance of vesicle and cellular molecular profiles suggests that extracellular vesicles reliably represent their progenitor cells. They are therefore excellent candidates for bearing biomarkers, and will increasingly find a place amongst clinical strategies for combating diseases like cancer.
Sixty cell lines from the National Cancer Institute (NCI-60) were acquired from the NCI Developmental Therapeutics Program. Cells were seeded based on growth rate (doubling time) to achieve 80-90% confluence after two to three days. Cells were cultured in RPMI-1640 medium (Lonza, 12-702Q) supplemented with 10% fetal bovine serum (FBS; Seradigm, 1400-500), 2 mM L-glutamine (Corning, 25-005-CI), 100 units penicillin/streptomycin (Corning, 30-002-CI), and 100 units:100 μg/mL:0.25 μg/mL antibiotic/antimycotic (Corning, 30-002-CI). Following the two to three day culture, complete medium was aspirated and cells were washed with warmed sterile phosphate buffered saline (PBS). To minimize contaminating serum proteins in downstream mass spectrometry analysis and aid in the identification of lower abundance EV proteins, cells were cultured in serum-free medium for a further 48 hours before EV enrichment. Previous results have demonstrated that EVs harvested using this method are as pure as sucrose-cushion purified vesicles .
Following the 48 hour serum-free culture, medium was harvested. Cell viability at the time of harvest was measured by counting cells stained with 0.2% trypan blue in PBS (Sigma, T8154) or AO/PI (Nexcelom Bioscience, CS2-0106) with an automated cell counter (Cellometer Vision, v126.96.36.199, Nexcelom Bioscience). We have previously demonstrated that cell viability does not significantly contribute to the number or size of particles secreted by the NCI-60 cells when viabilities are maintained greater than 85% . In this study, cell viabilities following the forty-eight hour serum-free culture approximated those observed previously in complete medium. EVs were enriched using the ExtraPEG method previously described . Briefly, cell-conditioned medium was centrifuged at 500 g for five minutes to pellet and discard cells, followed by 2,000 g for 30 minutes to remove cellular debris. Supernatant was pooled from several flasks to amass sufficient material (200-500 mL) for downstream proteomic analysis. A 1:1 volume of 2X PEG solution [16% (w/v) polyethylene glycol, 1 M NaCl] was added. Samples were inverted to mix, then incubated overnight. The next day, the medium/PEG mixture was centrifuged at 3,214 g for one hour. Crude vesicle pellets were resuspended in 1 mL of particle-free PBS and re-pelleted by ultracentrifugation at 100,000 g for 70 minutes. Final pellets were lysed in strong lysis buffer [5% SDS, 10 mM EDTA, 120 mM Tris-HCl pH 6.8, 2.5% β-mercaptoethanol, 8 M urea] with the addition of HALT protease inhibitor (Thermo, 78438). Vesicular protein was quantified using the fluorescence-based EZQ™ Kit (Thermo, R33200).
For protein purification and separation by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), 20 μg of vesicular protein from each sample was loaded into a 4-20% polyacrylamide gel (Lonza, 59511). Following electrophoresis, gels were fixed and Coomassie-stained as previously detailed . Samples were fractionated by cutting gel lanes into five sections. Sections were subsequently subdivided into 1 mm3 cubes before trypsin-digesting as described .
Following protein digestion, samples were submitted to the Florida State University Translational Science Laboratory for liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. The digest was freeze-dried and resuspended in 30 μL 0.1% FA and analyzed by liquid chromatography tandem mass spectrometry (LC-MS). An externally calibrated Thermo LTQ Orbitrap Velos nLC-ESI-LTQ-Orbitrap (high-resolution electrospray tandem mass spectrometer) was used with the following parameters: A 2 cm trap column of 100 μm internal diameter (i.d.) (SC001 Easy Column from Thermo-scientific) was followed by a 10 cm analytical column of 75 μm i.d. (SC200 Easy Column from Thermo-scientific). Both trap column and analytical column had C18-AQ packaging. Separation was carried out using Easy nanoLC II (Thermo-Scientific) with a continuous, vented column configuration. A 5 μL sample was aspirated into a 20 μL sample loop and loaded onto the trap. The flow rate was set to 300 nL/min for separation on the analytical column. Mobile phase A was composed of 99.9 H2O (EMD Omni Solvent), and 0.1% formic acid, and mobile phase B was composed of 99.9% ACN and 0.1% formic acid. A 1 hour linear gradient from 0% to 45% B was performed. The LC eluent was directly nano-sprayed into an LTQ Orbitrap Velos mass spectrometer (Thermo Scientific). During the chromatographic separation, the LTQ Orbitrap Velos was operated in a data-dependent mode and under direct control of the Xcalibur software (Thermo Scientific). The mass spectrometry data were acquired using the following parameters: 10 data-dependent collisional-induced-dissociation (CID) MS/MS scans per full scan. All measurements were performed at room temperature and three technical replicates were run for each sample.
Raw data collected from each of five fractions were pooled by sample and analyzed using MaxQuant (v188.8.131.52). The mass spectrometry data were analyzed using the integrated Andromeda peptide search engine and a recent (March 2016) UniProt knowledgebase reviewed (Swiss-prot) human protein database. The database was appended with a list of common contaminants in MaxQuant, and search parameters used were either the default settings for this version of the software, or as follows. Instrument type was set to Orbitrap, using label free quantitation, and first search peptide tolerance set to 10 ppm, and main search peptide tolerance at 4.5 ppm. Digestion mode was set specific for trypsin, with a maximum of two missed cleavages. Fixed modifications included only carbamidomethyl (C), and variable modifications included oxidation (M), N-terminal acetylation, and phosphorylation (STY). A maximum of five modifications were allowed per peptide. False discovery rate was set to 0.01. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium  via the PRIDE partner repository.
Primary analyses were conducted with the MaxQuant output data set (NCI-60). This dataset represents all proteins identified in the study. To examine proteins found in the majority of samples that reflect a more commonly shared cancer EV proteome, a more stringent protein set was used, and defined as [NCI-60]stringent. This data set includes only proteins found in at least two thirds of all EV samples.
For differential expression and network analyses, raw spectral count data were exported from the MaxQuant software and imported into R statistical framework. The spectral count based method has been demonstrated to provide a reliable representation of protein abundance with a linear dynamic range over several orders of magnitude with similar sensitivity as ion peak intensity quantitation [59–61]. Utilization of spectral counts was necessary for our downstream pipeline analyses using DeSeq2 for the identification of differentially expressed proteins and normalization for WGCNA and co-inertia analyses. Raw peak intensity data are reported in Supplementary Table 1.
DESeq2 is a package originally designed for analyzing read counts from RNA-sequencing datasets and recently used to analyze other forms of biological count data, including LC-MS/MS spectral count data . Normalization of spectral counts with DESeq2 considerably reduced the difference in sample depth between samples (Supplementary Figure 1A and and1B).1B). PCA analysis of the top 500 variant proteins revealed one sample with substantially different protein expression profiles (K562) (Supplementary Figure 1C), which was removed from further analysis. After normalization, a likelihood ratio test was performed across all samples in DESeq2 to identify proteins with significantly different patterns of expression between samples of different tissue origin. Proteins that returned an FDR-adjusted p-value (q value) of less than 0.05 were considered significant.
The WCGNA was originally implemented for the analysis of transcriptomic microarray data, but has recently been used to analyze RNA-seq and LC-MS/MS proteomics datasets [63, 64]. To identify proteins that are associated with the regulation of vesicle secretion, a network analysis was performed with the WGCNA package in the R statistical framework . Tutorials for WGCNA can be found at http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/. WGCNA is a multi-step analysis which involves: 1) generating a coexpression matrix for gene (or protein) expression, 2) transforming protein correlations into a adjacency matrix for network construction, 3) grouping together proteins (modules) that show high co-correlation, and 4) correlating an eigenprotein, the first principle component of each module, with a biological trait of interest. Importantly, WGCNA avoids the extensive FDR-correction needed to control for false positives in protein-wise differential expression analyses, as there are many fewer modules in the entire network than proteins.
Normalized [NCI-60]stringent count data were transformed in DESeq2 with a variance stabilizing transformation using the “VST” function, and used as input for WGCNA. The resulting count matrix was used as input for WGCNA. We used the one-step network construction and module detection function “blockwiseModules.” Modules of co-regulated proteins across the vesicle protein abundance profiles of the NCI-60 cell lines were identified by first grouping proteins (nodes) together into clusters of highly co-correlated members based on their degree of co-correlation (edges). For measuring the coexpression similarity between proteins, we used a signed Pearson correlation. For adjacency matrix construction, soft thresholding was used. The scale-free fitting index approached a local maximum at β = 12 (Supplementary Figure S3). This threshold approximately achieves scale free topology, meaning that the node degree distribution approximately follows the power law. Therefore, the co-correlation matrix was raised to the power of 12 to create the adjacency matrix. Finally, the adjacency matrix was converted into a topological overlap matrix, and the corresponding dissimilarities (1 – the topological overlap matrix) was used for module detection via hierarchical clustering. The minimum number of proteins required per module was set to 30, and the mergeCutHeight was set to 0.25, such that sufficiently similar modules were merged revealing 15 modules. Module eigenproteins, a synthetic measure of module belonging, were then related to the quantity of vesicles secreted per cell previously collected  by calculating Pearson correlation coefficients. The module most significantly associated with particle secretion was retained for enrichment analysis.
Co-inertia analysis (CIA; [66, 67]) was used to examine similarities between the vesicle proteome and cellular proteome and transcriptome across the NCI-60 cell lines. Microarray gene expression data was downloaded from the NCBI Gene Expression Omnibus  (accession number GSE32474; [69, 70]). Whole-cell proteome data was obtained from the NCI60 proteome resource (http://proteomics.wzw.tum.de/nci60; ). Spectral count data from the whole-cell dataset was processed analogously to the EV proteome dataset for comparability: raw spectral count data published by Moghaddas Gholami et al. were normalized with DESeq2 following removal of all proteins not found with at least one count in 40 or more samples, and further transformed with the VST function. NCI-60 cell lines not included in all three datasets were excluded from this analysis, leaving 56 samples in total. Differences between samples and molecular assays were analyzed and visualized with co-inertia analysis using the R omicade4 package  in the R statistical framework.
To identify proteins previously found in EVs, the Vesiclepedia database of vesicular proteins was downloaded (Version 3, 9 Jan 2015) from microvesicles.org/download. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) v6.7 [72, 73] was used for functional (GOTERM_BP_FAT) and pathway (KEGG_PATHWAY) analyses of proteins found in the [NCI-60]stringent dataset (Figure 2A-2B). Enrichment of cellular compartment terms (Figure (Figure2C)2C) was analyzed using FunRich v3 .
For further interpretation of the target WGCNA module, enrichment of biological processes was performed using DAVID: GOTERM_BP_FAT. Proteins entered into the WGCNA analysis were used as the background dataset, and proteins within the target module were used as the target list. All terms with a p-value (Benjamini or Benjamini-Hochberg adjusted) less than 0.05 were considered significant and ranked by the number of proteins identified in the group.
The authors would like to thank the FSU Translational Science Laboratory for help with the mass spectrometry data acquisition and downstream analyses. This study was supported by grants from the Florida Department of Health (4BB05) and the National Institutes of Health (RO1CA204621 and R15CA188941) awarded to D.G.M.
CONFLICTS OF INTEREST
The authors declare no conflicts of interest.
Author contributionsD.G.M, M.A.R., and S.N.H. designed the study, performed experiments, and interpreted results. X.L. prepared samples for mass spectrometry analyses performed by R.K.S. Computational, bioinformatics, and statistical analyses were performed by S.N.H. and J.L.B. The manuscript was written by S.N.H. and D.G.M. with contributions from all authors. D.G.M. conceived and supervised the project.