To allow comparison and analysis, the data sets from the three research groups were merged into a central repository hosted by the University of California—Los Angeles. The three groups operated independently, with variations in donors, sample collection procedures, separation strategies, and data acquisition and analysis. As a result, different numbers of proteins were identified by each group (). In total, the three research teams submitted 173 811 peptide and 68 789 protein identifications. These produced 11 592 distinct peptide sequences that matched 2153 distinct protein accession numbers (ACs) (1682 parotid, 1623 SM/SL) (Supplemental Tables 1
and 2 in Supporting Information
). Overall, the integration process yielded 1166 nonredundant clusters, with each cluster containing at least one peptide that was distinguished from peptides used in other clusters; 642 contained a single AC, 524 contained more than one AC, and the cluster with the maximum number contained 47 ACs (Supplemental Table 3 in Supporting Information
Overview of the Data Sets Submitted by the Research Groups of the Human Salivary Proteome Consortiuma
Following data integration, a representative protein was chosen from each cluster that contained isoforms, fragments, or homologues. In general, the representatives were picked because they were confirmed by the most research groups and conformed to the rules described in Materials and Methods. Of a total of 1166 representative proteins, 914 were identified from parotid saliva, and 917 were identified from SM/SL saliva. Fifty-seven percent of the protein identifications were shared between the two glandular secretions ().
Figure 1 Human parotid and SM/SL salivas share many of the same protein components. The protein composition of each ductal fluid is presented as a Venn diagram. Fifty-seven percent of proteins were found in both parotid and submandibular/sublingual (SM/SL) salivas, (more ...)
To maximize interrogation of the salivary proteome, the three research groups used different experimental approaches resulting in different numbers of proteins identified: University of California—San Francisco, 197 parotid and 205 SM/SL; The Scripps Research Institute/University of Rochester, 855 parotid and 848 SM/SL; and University of California—Los Angeles/University of Southern California, 401 parotid and 309 SM/SL. In total, 152 parotid and 139 SM/SL proteins were identified by all three research groups (). These proteins constituted a minimal set of the salivary proteins reflecting differences in the number of experiments performed and experimental approaches that were the least susceptible to variations in experimental approach. We termed these proteins the “core minimally overlapping proteome.” An additional 235 parotid and 167 SM/SL proteins were identified by at least two groups. The 527 parotid and 611 SM/SL proteins that were identified by only one group are probably the salivary components that reflect saliva donor and/or methodological variations.
Figure 2 Venn diagrams illustrating the number of parotid (left) and SM/SL (right) proteins that were identified at each research site. Areas of overlap depict the number of proteins that were contributed by more than one group, with the core proteome—proteins (more ...)
We hypothesized that the most abundant proteins would be among those identified by all three groups. To test this theory, we classified components of the salivary proteome based on several criteria: (A) the research group that identified the protein, (B) the salivary gland where it was produced, and (C) the sequence coverage (high, intermediate, or low). The classified proteins were listed in a matrix, and the matrix was displayed using Array Viewer software (). The Array Viewer was created using the Genesis software tool.46
Figure 3 Array view of the minimally overlapping human salivary proteome. Data were grouped according to the research team that identified each protein, the salivary gland where it was produced, and percent sequence coverage. The extent of coverage positively (more ...)
We also created a histogram that displays the sequence coverage of individual proteins according to the number of research groups that made a particular identification (). There was a strong positive correlation between the sequence coverage and membership in the core proteome. Proteins that were identified by a single research group had low sequence coverage, on average 16%, whereas proteins that were identified by all three groups had an average sequence coverage of 62%. However, there were notable exceptions: a few proteins at the low end of sequence coverage were detected by all three groups. This subproteome included the highly glycosylated SM/SL protein MUC5b, illustrating the fact that post-translational modifications and/or unusual amino acid sequences limit peptide generation and/or MS detection. We also compared a protein’s predicted cellular location with the number of research groups making the identification (). Overall, the entire salivary proteome was enriched in extracellular proteins as compared to the IPI human reference data set. Proteins that are well-known components of salivary secretions, such as α-amylase, cystatin, histatin, proline-rich proteins, and mucins, were prominently represented in the core proteome. Interestingly, proteins identified by a single group were more likely to be associated with the nucleus, cytoplasm, or plasma membrane than were proteins reported by all three groups. However, exceptions were noted. For example, the core proteome included one protein that was specific to SM/SL saliva: Golgi phosphoprotein 2.
Figure 4 Sequence coverage and cellular origin of salivary proteins identified by single or multiple groups. (A) Histogram of the sequence coverage for proteins that were identified in parotid and SM/SL salivas. The shaded areas depict the number of proteins that (more ...)
As expected, the salivary protein components reported in this study covered a wide range of molecular masses and pI
s (). With regard to molecular mass distribution, 46.6% of the human salivary proteins identified were ≤40 kDa, 44.6% were between 40 and 120 kDa, and 8.7% were ≥120 kDa. The human urine proteome has a similar molecular mass distribution.47
As with the urine study, the distribution of pI
s of salivary proteins was broad, ranging from 3.38 to 12.56. However, the majority of salivary proteins had a pI
between 4 and 8, indicative of a large number of acidic species (pI
4–5) that play an important role in buffering this biological fluid. The maintenance of a neutral pH in the oral cavity is critical to oral health and taste sensation.
Molecular mass and pI distribution of the entire human salivary proteome. (A) The distribution was skewed to relatively low-molecular-mass components. (B) In contrast, the salivary proteins had a relatively broad range of pIs.
To obtain a functional overview, we searched the salivary proteome against gene ontology and protein pathway databases using slim definitions. The results showed that 568 nonredundant parotid and 578 SM/SL proteins mapped to the cellular components, 647 parotid and 658 SM/SL proteins had molecular functions, and 582 parotid and 594 SM/SL proteins were associated with biological processes (Supplemental Table 4 in Supporting Information
). Given their nearly identical composition, we expected that parotid and SM/SL proteins would be similarly distributed across the gene ontology slim categories. With regard to location, a high proportion of the parotid and SM/SL proteins mapped to the extracellular region; others localized to the plasma membrane, cytoplasm, organelles, or cytoskeleton or formed protein complexes (). With regard to molecular functions, salivary proteins had binding, catalytic, structural, and enzymatic activities (). With regard to biological processes, parotid and SM/SL protein constituents had the highest distribution in metabolic and regulatory pathways ().
Figure 6 Relative allocation of the proteins identified in parotid and SM/SL saliva according to their gene ontology annotations. Components of both fluids were similarly distributed with regard to cellular locations (A), molecular functions (B), and biological (more ...)
About a quarter of the salivary proteins reported in this study were classified as hypothetical and, as such, lacked annotations (, “unknown”). To provide insights into their possible functions, sequence similarities to proteins with known functions were assessed using the PSI-BLAST search engine, which enables template modeling. The results showed that a number of these hypothetical proteins showed sequence similarities to immunoglobulins and extracellular matrix components.
The salivary proteins were also searched against protein pathway databases, revealing 434 entries in BioCarta and 887 entries in KEGG. As expected, these proteins were involved in a number of metabolic processes involving amino acids (123 entries), carbohydrates (157 entries), energy utilization (80 entries), glycans (78 entries), lipids (31 entries), secondary metabolites (17 entries), biodegradation of xenobiotics (21 entries), and cofactors/vitamins (16 entries) (Supplemental Table 5 in Supporting Information
). This analysis also showed that some salivary proteins functioned in the complement cascade and coagulation, a possible sign of plasma leakage. The presence of other proteins that play a role in cell adhesion and communication, cell cycle progression, and regulation of the actin cytoskeleton could be attributable to cellular debris arising from degradative processes such as apoptosis or breakage during prolonged secretion. Interestingly, a number of the salivary proteins mapped to pathways involved in neurode-generative conditions (Alzheimer’s, Huntington’s, and Parkinson’s diseases), cancers (breast, colorectal, and pancreatic), or type I/II diabetes.
We were also interested in comparing the human salivary proteome with that of human plasma and tears. To address the possible origin of plasma proteins, we made correlative comparisons between the two proteomes. The results showed that 192 of 657 plasma proteins were found in human saliva, including the most abundant species, which is possible evidence of vascular leakage or the contribution of fluid from the interstitial compartment (). Given the fact that the parotid gland is composed entirely of serous acini and that the SM/SL glands also contain these elements, we investigated the degree to which the salivary proteome resembles that of other serous glands. For comparison, we chose tear fluid, which is produced by the lacrimal gland, a serous structure ().43
In total, 259 proteins identified as components of either parotid or SM/SL saliva were present in lacrimal gland secretions, representing approximately 55% of the available tear proteome. Confirming earlier results, individual protein species that were common to all the glands included cystatins B, C, SA, S, and SN, zinc-alpha-glycoprotein, and prolactin-inducible protein.48
In addition, our work showed that all glands contained dermcidin, MUC5b, MUC7, and cathepsins B and D. There were also notable differences. For example, as previously reported, secretion of histatins and acidic, basic, and glycosylated proline-rich proteins, with the exception of PRP4 and PRP1, was specific to salivary glands.49
In addition, we found that cystatins A and D, calnexin, PDI, and cathepsins H and S were specific to salivary gland secretions. Finally, protein components unique to tear fluid included lactadherin, aspartyl aminopeptidase, and lupus La protein.
Figure 7 Human parotid and SM/SL gland secretions share many protein components with human tears and plasma. (A) Areas of overlap depict the number of proteins that were identified in both tears and parotid (top) or SM/SL saliva (bottom). (B) Similarly, parotid (more ...)
In additional experiments, we used an immunoblot approach to confirm the presence in saliva of a portion of the proteins that were identified by MS techniques. shows the electrophoretic profile, visualized by staining the gels with Coomassie brilliant blue, of the nine individual samples of ductal saliva that were analyzed in most of the experiments. In general, we selected for validation salivary components outside the core proteome that had low sequence coverage and for which well-characterized antibodies existed. Overall, the results of these experiments showed that proteins in the relatively low confidence end of the salivary proteome were found in at least a subset of the ductal or whole saliva samples that were analyzed. For example, a mAb that specifically recognized a complex glycopeptide epitope carried by CEA (CD66) reacted with a band that corresponded to the estimated molecular mass (~180 kDa)50
of CEA, primarily in several SM/SL samples (). Kallikrein 1 was detected as several immunoreactive species, most prominent in parotid saliva, in the range of the predicted molecular mass (29–38 kDa) spanned by this proteinase (). A goat polyclonal antibody against apoE reacted with multiple bands. In parotid saliva, the 34-kDa species corresponded to the predicted molecular mass of the apoE monomer (, left). The electrophoretic mobility of the lower-molecular-mass band was consistent with previously published data regarding the major product of cathepsin D processing of the full-length protein.51,52
The higher-molecular-mass immunoreactive proteins may be heteromeric complexes of apoE and apoAII (, apo[AII-E] and apo[AII-E-AII]), which have been detected in cerebrospinal fluid.53
In SM/SL saliva, immunoreactive bands with the molecular mass of apoE/apoAII complexes were also observed along with higher-molecular-mass species that corresponded to the relative electrophoretic mobility of apoE/β
With regard to TIMP-1, immunore-active bands that were higher than the predicted molecular mass (29 kDa)55
were detected only in SM/SL saliva (). This molecule’s ability to form complexes with matrix metalloproteinase (MMP) 2 (72 kDa)56,57
is a likely explanation, whereas association with an MMP that had lost its hemopexin domain58
or activated MMP-7 (19–21 kDa)59
could explain the ~50-kDa immunoreactive band that was present in some samples. We were also interested in whether antigens could be detected in whole saliva, the most likely fluid to be employed in diagnostic tests. Accordingly, nitrocellulose replicas of electrophoretically separated whole saliva samples were probed with anti-HLA-G.45
Most of the samples contained an immunoreactive band of 50 kDa, which corresponds to the glycosylated form of this molecule (). shows that both parotid and SM/SL salivas contain a protein of the expected molecular mass (45 kDa)60
that reacts with anti-human prostatic acid phosphatase. Finally, some samples contained the largest subunit of calpain-1 in its latent (~80 kDa) and active (75–78 kDa) forms;61
the 58-kDa form may be a proteolytic fragment of this autolytic molecule.62
Figure 8 Detection of novel salivary proteins and those with low sequence coverage. An immunoblot approach was used to confirm the presence in saliva of a portion of the proteins that were identified by using MS approaches. Most of the analyses shown employed (more ...)
Finally, we noticed that a high proportion of the salivary identifications were annotated as protein precursors; however, in many cases, these were sequences that included a signal peptide that is co-translationally cleaved in the endoplasmic reticulum. Therefore, this was simply the result of selecting representative proteins for an individual cluster that had the longest sequence.40
Another contributing factor could be the inherent characteristics of the IPI database, which was created using the same strategy we used to pick a prototypical representative from a group of homologous proteins with highly related sequences. Additionally, the annotations are derived from mRNA sequences, cDNA libraries, and automatic gene prediction programs.41
To determine if precursor forms of proteins were actually secreted, we extracted the location of signal peptides of salivary proteins from the UniProt database and looked for evidence of peptides that contained the corresponding sequences. Of the 250 salivary proteins that contained an N-terminal signal peptide, we found evidence for only 7 such peptides. Finally, a manual check of the IPI database showed that all the amylase entries were annotated in the IPI database as precursors. Therefore, we concluded that the salivary proteins annotated as precursors are most likely secreted as the mature forms of the proteins.