With the advent of high-throughput proteomics, it is now possible to systematically catalogue the components within a subcellular compartment. In this study, we describe an approach to characterize the composition of the presynaptic nerve terminal using subcellular proteomics and systems biology. First, we carried out proteomic studies of proteins enriched in the presynapse. For this, we separated presynaptic (PRE) and postsynaptic (PSD) fractions from rodent hippocampus and striatum by an anionic extraction method, as described in Materials and Methods. To verify the extent of the purification, the various fractions were subjected to Western Blotting, using antibodies to known presynaptic proteins: Syntaxin I and SNAP25, and to known PSD proteins: GluR1 and PSD95. In addition, the fractions were probed with antibodies to clathrin heavy chain, an endocytic protein that was previously shown to be enriched in presynaptic fractions [
23], and CAMKII, a major component of the PSD [
39] that also associates with presynaptic vesicles [
40]. The PRE fraction is enriched in presynaptic proteins and excludes proteins enriched in the PSD fraction ().
We then identified proteins in the PRE fractions by LC-MS/MS following either in-gel or in-solution digestion (
Supplementary Figure 1). Proteins were identified based on highly stringent statistical analysis (in both the quality of the MS/MS peptide fragment ion spectra and the significance of amino acid sequence matches) using the program Sonar, which has recently been reported to be one of the most specific MS/MS database search algorithms [
41]. In the hippocampal PRE fraction, we identified a total of 138 proteins (
Supplementary Table 1). The profiling of the hippocampal PSD proteins has been previously reported [
42]. In the striatal PRE fraction, we identified 121 proteins (
Supplementary Table 2). The relatively low number of proteins identified in each of our PRE fractions suggests that these lists are far from comprehensive. The presynaptic proteome likely includes both abundant proteins (e.g. those that are found across different types of synapses and at high levels) and rare proteins (e.g. those that are synapse or brain region-specific). Although subcellular fractionation is the method of choice to reduce the complexity of samples for MS analysis, there remains a large bias in MS data against low-abundant proteins in a sample. In order to address this, we used a graph theory-inspired computational approach to evaluate and enrich the knowledge about proteins identified in presynaptic fractions by us and by others.
In a first step to further analyze the PRE lists produced by proteomics, we manually extracted PPI data from the biochemical and physiological literature to generate an
in silico network that represents only presynaptic interactions, as described in Materials and Methods (
Supplementary Figure 2,
Supplementary Table 3). This network, made of 229 direct (binary) interactions between 127 presynaptic proteins, was generated without considering the results from the proteomics experiments, and is provided as a web-based resource at
http://amp.pharm.mssm.edu/presynaptome. Since other studies have reported lists of presynaptic proteins identified by proteomic approaches, we also extracted the data from two recently published proteomic studies of presynaptic fractions [
19,
23]. Compilation of these lists with the two lists we developed experimentally, and the list we created from the literature-based network, resulted in a “merged list” containing 393 entries (306 proteins from proteomics, and 87 entries exclusively from the literature) (
Supplementary Table 4). A similar strategy focusing solely on proteomic data has been applied to characterize the postsynaptic proteome [
25].
In order to readily merge and analyze data from various sources, we extended all protein and interaction data experimentally verified in other mammalian model organisms to orthologous proteins in human (
Supplementary Tables 1, 2, and 4). It is a common assumption that PPIs can be inferred through homology transfer from one model organism to another, since functionally linked proteins are likely to evolve together, and therefore should have homologs in evolutionarily related organisms. Although this is not always the case, particularly when comparing prokaryotes and simple eukaryotes with higher eukaryotes [
43], PPIs have been shown to be well conserved between protein pairs with at least 80% sequence identity [
44]. A recent study examining the evolutionary conservation of proteins, interactions, and complexes showed that mouse and rat show the greatest conservation of human proteins over all, followed by fly, worm, and yeast [
45]. The same study found that nearly 70% of human interactions are conserved in mice. Based on these data, it is believed that PPIs from higher eukaryotes such as mouse and rat are highly conserved when compared to human.
Comparison of the lists of proteins derived from the proteomic studies revealed 13-22% overlap (
Supplementary Tables 3, 5). Although this is a significant overlap when compared with the expected overlap for randomly generated lists of genes, we would expect the overlap among these lists to be higher. A low degree of overlap could be due to brain regional variation, different strategies of sample preparation, protein separation, and/or run-to-run differences in MS analysis that are routinely observed. In the merged list, 45 proteins (15%) were detected experimentally three or more times, 56 proteins (18%) were detected twice, and the rest (67%) were detected once (,
Supplementary Table 6). We designated proteins that were identified two or more times as the “core list” (containing 101 proteins). The intent of the core list is to represent proteins that are likely to be associated with most mammalian presynaptic terminals. By filtering out the proteins that were only identified once experimentally, we limit the number of protein contaminants, as well as proteins that may be specific to a single brain region, species, or even methods of sample preparation and/or protein identification. For example, with the subcellular fractionation technique used in this study, samples may contain contaminant postsynaptic proteins that remain adherent to the presynaptic fraction; however, the identification of such contaminants is less likely with repeated experiments. Thus, while the original proteomic lists and the merged list include valuable data that were used for further computational analyses, the core list represents a highly stringent subgroup of mammalian presynaptic proteins. The contribution of each list to the core list and to the merged list is illustrated in . For example, our hippocampal PRE list contributed 79 proteins to the core list, indicating that these proteins have been validated as being present in the presynapse by one or more of the other lists.
Using Gene Ontology, we mapped the “biological process” to proteins from the hippocampal and striatal PRE fractions as well as to proteins in the core list (
Supplementary Table 7). The core list is enriched for proteins involved in presynaptic functions, such as neurotransmission. Proteins belonging to transport- or secretion-related biological processes (intracellular transport (18.6%), vesicle-mediated transport (15.3%), protein transport (14.4%), secretion (9.3%), and secretion pathway (9.3%)) are more highly represented in the core dataset. On the other hand, proteins belonging to several metabolic- or catabolic-related processes (cofactor metabolism, macromolecular catabolism, negative regulation of metabolism, carbohydrate metabolism, organic acid metabolism, and electron transport) are under-represented in the core list. Thus, by integrating lists from different sources, we were able to enrich for proteins with established presynaptic functions.
To further analyze the merged list of PRE proteins, we sought to identify literature-based PPIs among the proteins in the merged list. For this we consolidated and filtered several literature-based mammalian PPI networks from BioGrid [
31], HPRD [
32], PPID [
33], and a neuronal signaling network we developed for a prior study [
28] (see Materials and Methods for details). We “connected” proteins from the core list by linking pairs of proteins through shared neighbors, using interactions from the consolidated and filtered literature-based mammalian PPI network (“background network”). Between the 101 proteins in the core list (“Top 101”), we found 13 direct interactions, 222 interactions using 1
st-level shared neighbors (path length of one extra node and two links), and 1,772 interactions using 2
nd –level shared neighbors (path length of two extra nodes and four links). The same analysis was performed using 0, 1, 2, or 3-level shared neighbors to connect the 45 proteins identified 3 or more times (“Top 45”) in the merged list (
Supplementary Table 8). A total of 226 intermediate proteins were found to “connect” core list proteins. Among them, 16 consisted of proteins that had been detected once in proteomic studies (
Supplementary Table 9). Since these proteins have been shown to interact with proteins from the core list, they are likely to be bona-fide components of the presynaptic nerve terminal proteome, and were therefore upgraded to the core list. This resulted in a “final” core presynaptic list made of 117 proteins (, ).
| Table 1Core presynaptic list containing 117 proteins. |
This “final” list represents a core portion of presynaptic proteins but is not comprehensive, since low abundance proteins or proteins associated with a specific brain region are likely to be missing from this list. In the next step, in order to predict novel presynaptic proteins not detected experimentally, we used a binomial proportions test to identify proteins from the background network that preferentially interact with proteins identified experimentally to be presynaptic (
Supplementary Table 10). Similar strategies using graph theory have been applied to enrich large-scale datasets in yeast by predicting PPIs [
46-
48]. The binomial proportions test was used to find proteins from the background network that specifically interact with presynaptic proteins, while pruning out proteins that interact with a large number of other non-presynaptic proteins, and as such could be interacting with some presynaptic proteins but not specifically. We found 92 proteins from the background network that show a significant preference (z-score > 3) to interact with proteins from the merged list, suggesting that these proteins could also exist at presynaptic nerve terminals.
The proteins with z-scores > 3 were compared to those with z-scores < -1, by categorizing them according to Gene Ontology's “biological process”, “cellular component”, and “molecular function” (
Supplementary Figure 3). We find that the list of proteins with z-scores > 3 contains a higher proportion of membrane proteins (61%) and transport-related proteins (31%), while the list of proteins with z-scores < -1 contains a higher proportion of nuclear proteins (54%), transcriptional regulators (48%) and metabolism-related proteins (82%). This is consistent with the notion that the statistical test identifies proteins that have a higher chance of being presynaptic, by virtue of their subcellular localization, function, and ability to interact with previously identified presynaptic proteins. Indeed, of the 92 proteins with z-scores > 3, 42 had previously been identified as presynaptic proteins, as indicated by a database search in PubMed, SynDB [
35], or GO [
36] (
Supplementary Table 11). This leaves 50 proteins that preferentially interact with the merged list, which have not been previously identified as presynaptic in any of these databases.
In order to verify the predictions that these proteins are indeed present in the presynapse, we selected five top-ranked proteins that have available antibodies (z-scores in brackets): PCTAIRE-1 (4.8), GEF-H1 (ARHGEF2, 4.2), RIN1 (3.9), NUMB (3.6), and IQGAP1 (3.5). These proteins were also selected because they are known to be involved in signal transduction processes [
49-
53] and would be of potential interest at the presynapse. We examined the selected proteins in fractions obtained during the purification process. Among them, four could be clearly detected in the PRE fraction by Western blotting (). The protein NUMB could not be detected in any of the fractions, possibly due to the poor quality of the antibody. For RIN1, the size of the protein in the PRE fraction appeared to be of a lower molecular weight as compared to those in the homogenate and synaptosomal fractions; this could indicate selective post-translational processing or the presence of an alternatively spliced variant. To further confirm the subcellular localization of the predicted proteins to be presynaptic, the localization of two of them, RIN1 and PCTAIRE-1, was examined in cultured primary cortical neurons by immunofluoresence. We find that these proteins co-localize with the presynaptic markers SV2 or synaptophysin (), confirming that these predicted proteins are present at the presynapse. PCTAIRE-1 is a kinase that has been shown to phosphorylate NSF in PC12 cells [
49]; it is therefore conceivable that PCTAIRE-1 could play a role in regulating vesicle trafficking at presynaptic nerve terminals. RIN1 is a Ras effector that has previously been shown to modulate postsynaptic plasticity in aversive memory formation in the amygdala [
50]; a presynaptic form of RIN could play a similar role in regulating signaling pathways at the presynaptic nerve terminal. Overall, these data are consistent with the idea that, using computational methods, it is possible to enrich proteomic data by including low abundance proteins such as signaling molecules.
Finally, a “shared neighbor” analysis was applied to identify potential presynaptic complexes. Previous studies in yeast have shown that two proteins sharing a significantly large number of common interaction partners have close functional associations and are likely to exist in a complex [
54,
55]. We hypothesized that non-homologous presynaptic proteins that share many interacting partners, but have not been shown to interact directly, may be present in a complex. We computed the percent of shared direct interacting partners (shared neighbors) between proteins from the merged list of 306 proteins identified by proteomics. We found that 21 pairs of non-homologous proteins have at least four shared neighbors, but have not been previously described to directly interact with each other (
Supplementary Table 12). Using this information, we generated a hypothetical protein complex containing 17 proteins (). This complex included synapsin I and dynamin, which have been shown to co-precipitate with Src in PC12 cells [
56], and MAP2, which has been shown to co-localize with synapsin I in the olfactory bulb glomerulus [
57]. Using co-immunoprecipitation experiments, we biochemically validated some of the predictions made by this analysis. In mouse hippocampal synaptosomes, synapsin I co-immunoprecipitates with three other proteins from the predicted complex: dynamin, CAMKII, and MAP2 (), supporting the presence of the predicted interacting proteins in this complex. These results validate the presence of some parts of the computationally predicted complex at presynaptic nerve terminals, and show that this method of identifying proteins with shared interactors can be used to successfully identify novel interacting complexes.