|Home | About | Journals | Submit | Contact Us | Français|
Mammalian sperm are differentiated germ cells that transfer genetic material from the male to the female. Owing to this essential role in the reproductive process, an understanding of the complex mechanisms that underlie sperm function has implications ranging from the development of novel contraceptives to the treatment of male infertility. While the importance of phosphorylation in sperm differentiation, maturation and fertilization has been well established, the ability to directly determine the sites of phosphorylation within sperm proteins and to quantitate the extent of phosphorylation at these sites is a recent development that has relied almost exclusively on advances in the field of proteomics. This review will summarize the work that has been carried out to date on sperm phosphoproteomics and discuss how the resulting qualitative and quantitative information has been used to provide insight into the manner in which protein phosphorylation events modulate sperm function. The authors also present the proteomics process as it is most often utilized for the elucidation of protein expression, with a particular emphasis on the way in which the process has been modified for the analysis of protein phosphorylation in sperm.
As a result of their high level of specialization, sperm have abandoned much of the cellular machinery that is found in other cell types. During spermiogenesis, postmeiotic spermatids discard several organelles and retain only the minimal protein complement necessary to deliver the male genetic information to the ovum . At the same time, the DNA condenses tightly around sperm-specific proteins called protamines, making it unavailable for transcription . Upon leaving the testes, sperm may be morphologically mature, but they are immotile and have not yet gained the ability to fertilize. Sperm acquire progressive motility during their transit through the epididymis in a process known as epididymal maturation. After ejaculation, mammalian sperm move actively, but they need to reside in the female reproductive tract before acquiring fertilization potential through a series of events collectively referred to as capacitation.
Although capacitation and epididymal maturation are poorly understood at the molecular level, the belief is that sperm are both transcriptionally and translationally silent after leaving the testes [3,4]. This lack of protein synthesis in mature sperm supports the current view that regulation of post-testicular development is controlled almost exclusively by the addition of exogenous proteins (e.g., during epididymal transit) [5,6] or by the post-translational modification (PTM) of their intrinsic protein complement [7,8]. An understanding of sperm function is therefore dependent upon the ability to unequivocally identify proteins, quantitate their relative abundances and determine the exact sites of protein PTM. While traditional genomic approaches are of limited utility in this regard, the field of proteomics is uniquely suited to address these challenges .
Although the word ‘proteome’, defined as the entire set of proteins encoded by a genome, was not coined until 1996 , attempts to identify the protein complement of sperm began in the early 1950s . In fact, the identification of specific sites of protein phosphorylation in sperm, an area of study currently referred to as ‘sperm phosphoproteomics’, began several decades ago. In 1967, Ingles and Dixon isolated protamines from trout spermatids undergoing differentiation and, in a remarkably advanced series of experiments for the time, determined seven distinct sites of serine phosphorylation . Notably these sites appeared to be completely dephosphorylated in mature spermatozoa, leading the authors to hypothesize that phosphorylation was being used as a means to prevent nucleic acids from binding to the newly synthesized, highly basic protamines until they were properly localized to the nucleus. Not long after this ground-breaking research, phosphorylation was implicated in sperm movement when it was discovered that cAMP signaling is essential for the regulation of motility  and that cAMP-dependent protein kinases, such as protein kinase A (PKA), are highly active in mammalian spermatozoa . The role of phosphorylation in sperm motility was confirmed in 1980 when it was shown that partially-purified sperm protein fractions incorporated 32P after cAMP treatment . While these early studies were essential in establishing a link between protein phosphorylation and sperm function, the identification of novel phosphoproteins and their specific sites of modification remained elusive. Unfortunately, the specificity of the protein enrichment methods and the sensitivity of the protein sequencing used at the time prevented all but the most abundant proteins, such as the protamines, from being successfully analyzed.
The difficulties associated with the isolation and identification of novel phosphoproteins were partially addressed with the development of antiphosphotyrosine (anti-pY) antibodies in the early 1980s . Leyton and Saling first used monoclonal antibodies against phosphotyrosine while attempting to elucidate the mechanism by which the acrosome reaction is initiated in mouse sperm . They used radioactively labeled ZP3, a known acrosome reaction-stimulating ligand, and an anti-pY antibody to probe sperm western blots. Although a single 95 kDa band was visualized using both methods, identification of the purified protein was not accomplished until 1994 when Kalab et al. utilized partial ingel tryptic digestion, HPLC-based peptide separation and Edman sequencing of three specific peptides to unambiguously identify the 95 kDa protein as a testis-specific form of hexokinase . Not long after, anti-pY antibodies were used to investigate the role of phosphorylation in the capacitation of mouse sperm . Using gel electrophoresis, western blotting and anti-pY labeling, it was observed that a global increase in protein tyrosine phosphorylation is temporally associated with progression of the capacitation process in mouse and that these changes correlate with an increase in cAMP production and concomitant activation of PKA [20–23]. Additional phosphotyrosine-containing proteins were seen when a similar approach was used to study the phosphorylation of human sperm surface proteins. Naaby-Hansen et al. determined the approximate molecular weight (MW) and isoelectric point (pI) for 22 tyrosine phosphorylated human proteins, but were unable to unequivocally identify any of these proteins as the sequencing techniques used at the time required relatively large amounts of highly purified material .
While Naaby-Hansen et al. did not sequence the proteins recognized by anti-pY antibodies, they suggested a technique for phosphoprotein microsequencing utilizing mass spectrometry (MS) . Their approach, wherein a protein spot on a gel, known to be phosphorylated through alignment with a corresponding western blot, is cored, in-gel digested and identified using tandem MS (MS/MS), was first applied to sperm in 1999. Mandal et al. sequenced 18 peptides from a 95 kDa human sperm protein (FSP95) that is tyrosine phosphorylated during capacitation and which shows reactivity to antibodies present in the sera of infertile men . Not only did this information lead to the identification of the antigen as a novel A-kinase anchor protein (AKAP), but it also allowed primers to be generated, cDNA for FSP95 to be isolated, and protein expression to be carried out in E. coli. In 2001, the same group used MS/MS-based peptide sequencing on an ion trap mass spectrometer to identify eight sperm surface proteins cored from two-dimensional (2D) gels . In the same year, Bohring et al. used an alternative MS-based protein identification approach, peptide mass fingerprinting (PMF), to identify six autoantigenic sperm surface proteins . While these studies were not specifically focused on phosphorylation, this same approach is still being used today for the identification of phosphoproteins involved in the capacitation process [28–30]. However, these studies have been largely limited to the identification of pY-containing proteins as the lower inherent immunogenicity of phosphoserine (pS) and phosphothreonine (pT) has prevented the production of anti-pS/pT antibodies that demonstrate specificities equivalent to those seen with anti-pY antibodies .
In an effort to address the poor specificity associated with anti-pS/pT antibodies and to more closely track changes in phosphorylation within a given signaling pathway, antibodies recognizing specific phosphorylated epitopes have been developed. Not only can these antibodies be used to monitor and quantitate the level of phosphorylation at particular sites, but they can also be coupled with immunofluorescence to detect phosphorylation-induced changes in cellular localization. As an example, antiphospho-ERK1/2 antibodies were used to determine that the MAPK pathway is activated during sperm capacitation [32,33]. However, antibodies of this type are often too specific to be used for the identification of novel kinase substrates. Zhang et al. addressed this issue by creating a broad range, phospho-specific substrate antiserum by raising antibodies against a degenerate mix of phosphopeptides . Using this same approach, Harrison took advantage of antibodies developed against a phosphopeptide library containing the consensus sequence RXXp(S/T) in an effort to identify the PKA substrates involved in capacitation . However, as this consensus sequence is known to be phosphorylated by other kinases, such as Protein Kinase B (AKT), a control experiment using a known PKA inhibitor was required in order to exclude non-PKA substrates. This work identified two outer dense fiber proteins as components of the cAMP/PKA-dependent pathway in boar sperm and similar strategies have since been used to identify both PKA dependent [36–40] and proline-directed  phosphorylation in sperm from other species. As useful as these studies have been, experiments relying on antibody binding can only provide indirect evidence of protein phosphorylation at known sites of modification. In order to truly understand the role which phosphorylation plays in the regulation of sperm function, direct localization of the specific sites of phosphorylation in unknown proteins is essential.
In 2003, an attempt to identify the specific sites of phosphorylation on all of the proteins present in capacitated human sperm was carried out by Ficarro et al. . Recognizing the limitations of antibody-based phosphoprotein visualization, the authors used immobilized metal ion affinity chromatography (IMAC) for unbiased phosphopeptide enrichment. Combining this methodology with MS/MS-based sequencing resulted in the identification of five tyrosine, 56 serine and two threonine phosphorylation sites. Notably, this research was the first to identify specific sites of tyrosine phosphorylation in AKAPs and provided direct evidence connecting this protein family to the capacitation process. An alternative method for phosphopeptide enrichment, based upon the affinity of phosphate for titanium dioxide (TiO2), was first applied to sperm by Baker et al. in 2010 . In this study, 288 distinct sites of phosphorylation were identified from 120 proteins in capacitated rat sperm. Among the reported phosphoproteins, many are known to be involved in sperm–egg binding, suggesting a linkage between capacitation-induced phosphorylation and fertilization capability. Importantly, the results of these global phosphoproteomic studies have the potential to be used for the future generation of novel, site-specific antibodies designed to monitor phosphorylation changes associated with sperm function.
Recent developments in sperm phosphoproteomics obviate the need for antibody-based quantitation and permit the relative extent of phosphorylation at a specific site to be determined from MS data acquired during MS/MS-based phosphopeptide sequencing. In 2009, Platt et al. utilized a differential isotopic labeling scheme based on the Fischer esterification  to label peptides derived from both capacitated and noncapacitated mouse sperm proteins . Of the 55 phosphorylation sites reported in this study, relative quantitation was achieved for 53 of these sites by comparing the differentially labeled MS peak areas of 42 unique phosphopeptides. Consistent with previously published reports, a general increase in phosphorylation was seen following sperm capacitation. Of note, these experiments also unambiguously identified the site of tyrosine phosphorylation within hexokinase which was originally visualized by Kalab et al. in 1994 using anti-pY antibodies . Relative quantitation of phosphorylation levels as a consequence of the capacitation process in rat sperm was subsequently carried out by Baker et al. in 2010 . In this study a label-free approach, relying on the ratio of MS peak areas measured in multiple analyses, was used to identify 15 proteins whose phosphorylation status changed as a result of capacitation. The following year, Baker et al. used the same approach on rat sperm extracted from the caput and caudal regions of the epididymis to quantitate the extent of modification at 77 specific phosphorylation sites during epididymal maturation . Importantly, these studies also represent the first use of electron transfer dissociation (ETD) for the sequencing of phosphopeptides derived from sperm proteins. Although only three phosphopeptides were successfully sequenced using ETD in these two studies, this method overcomes difficulties associated with traditional phosphopeptide fragmentation  and promises to play an integral role in future studies designed to answer particular biological questions in sperm phosphoproteomics.
The fact that the proteomics workflow is largely a linear process is apparent in Figure 1. Every sperm phosphoproteomics experiment begins with the selection of a particular organism for analysis. Although this may seem trivial, the source of spermatozoa determines protein concentration and directly impacts downstream data analysis. Once an organism is selected for study, the overall goal of the project (i.e., quantitation, phosphorylation site determination or phosphoprotein identification) is the primary consideration which drives sample preparation. The appropriate type of mass spectrometer can be chosen only after the sample is prepared properly, as each instrument has its own capabilities and limitations. Ultimately, the approaches taken to analyze the resulting data are perhaps the most critical as a compromise must be made between speed and accuracy. Only when all steps in the process have been considered and integrated into an overall experimental workflow can valid biological conclusions be drawn from the acquired data.
The quality of the information obtained in any phosphoproteomics experiment is dependent upon the concentration, complexity and contaminating proteins present in the sample selected for analysis. High concentrations of protein can be obtained from mammalian sperm because they are readily accessible and protein dense. They are also uniform in their starting state and a number of techniques are available which can induce them to undergo in vitro maturational changes as an entire population, thereby limiting their proteomic complexity. The major difficulty in the analysis of sperm is the presence of highly abundant species such as protamines, tubulins and outer dense fiber proteins that result in a wide range of phosphoprotein concentrations . This complicates the MS-based sequencing approaches used in current phosphoproteomic workflows because the 103 dynamic range, achievable in a single MS analysis on even the most advanced mass spectrometer, prevents detection of any analyte whose concentration is 1000-times lower than that of the most abundant species .
Some sample preparation protocols used in sperm phospho-proteomics involve heating the samples in the presence of solubilization reagents in an effort to extract the greatest number of proteins [28,42,44]. While the majority of these compounds, such as salts and detergents, demonstrate minimal interference with traditional gel-based protein separation methods, their presence can be detrimental to modern MS-based phosphoproteomic workflows. For example, heating sperm samples in the presence of urea may aid in membrane protein solubilization , but the cyanic acid found in urea solutions  can carbamylate the primary amines found on lysine residues and protein N-termini . This chemically-induced modification has a significant impact on all subsequent steps in the proteomics process: it can suppress ionization, it can inhibit digestion and the altered MW can confound automated database searching algorithms. It is important to note that modifications such as this do not necessarily preclude MS-based sequencing, but they do add variables that must be taken into account in order to avoid peptide sequence misassignment and protein misidentification.
Denaturing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), which separates proteins on the basis of their length (which closely correlates with protein MW), was relied upon heavily in early phosphoproteomics experiments  because the Edman sequencing used at the time required concentrated protein samples relatively free of contamination . Even today, SDS-PAGE may still be used before MS if salts and detergents are used for certain experimental strategies [52,53] or if western blotting using a specific antibody is the basis for phosphoprotein ‘identification’ . The moderate resolution afforded by SDS-PAGE is also exploited in some proteomic workflows as a means to reduce sample complexity prior to MS . An orthogonal method of separation, in which zwitterionic molecules align themselves at their pIs in an immobilized pH gradient, is known as isoelectric focusing (IEF) . This technique was used successfully to separate peptides derived from mouse sperm proteins but, in this case, rapid processing of the IEF-embedded peptides was necessary to minimize diffusion and sample loss . When used for the resolution of intact proteins, the problem of diffusion is greatly reduced and IEF can be coupled with SDS-PAGE to separate a complex protein mixture into thousands of well-resolved spots . As mentioned earlier, this combination, referred to as 2D gel electrophoresis (2DE) has been adopted extensively for the identification of phosphorylated proteins in sperm and is still used today [24–28,30,58]. Despite this widespread acceptance, there are several caveats to keep in mind when incorporating 2DE into an overall phosphoproteomics workflow. To begin with, protein identification from gels requires visualization, and the sensitivity of the reagents used for staining is often incompatible with the sensitivity exhibited by current MS-based analytical approaches. For example, Coomassie Blue, the most commonly used protein stain, is at least an order of magnitude less sensitive than the lower limit of detection achieved using most modern mass spectrometers [25–27], meaning that it is not possible to visualize with Coomassie all of the proteins which could potentially be identified using MS. At the other extreme, fluorescent protein stains, such as Sypro® Ruby, exhibit greater sensitivity than most MS instrumentation, meaning that although a protein may be visualized, it may not be present in sufficient quantity for subsequent in-gel digestion and MS-based identification . In our experience, a staining approach relying on the reduction of ionic silver  demonstrates a limit of detection most compatible with downstream MS analysis . However, once visualization is achieved, identification of the exact site(s) of phosphorylation in a protein cored from a gel may still remain elusive due to the low stoichiometric level of phosphorylation and the relatively high concentration of enzymatic autolysis products present following in-gel digestion .
As the extensive handling associated with gel-based protein separation can lead to significant sample loss , solution-based digestion procedures are often a better choice. The enzyme most commonly used for this purpose is trypsin, largely because of its specificity and tendency to generate peptides which are particularly amenable to sequencing when using positive ionization MS . Trypsin cleaves the peptide bond C-terminal to the basic amino acids lysine and arginine, such that most peptides generated from trypsin digestion carry two charges: one at the C-terminal basic residue and another on the primary amine at the peptide N-terminus. Importantly, the spatial separation of these charges increases the amount of structural information obtained following peptide fragmentation  and the relative abundances of lysine and arginine mean that most tryptic peptides fall within a narrow mass-to-charge (m/z) range of 400–800. Of course, as useful as trypsin is in preliminary studies, it may not be ideal if the protein or proteins of interest do not contain lysine or arginine, or if cleavage at these residues generates peptides which are too long or too short for subsequent analysis. In those situations, chemical digestion  or an alternative enzyme, such as V8 protease, may be required .
Regardless of the procedure used for proteolysis, the presence of more abundant nonphosphorylated peptides in the digestion mixture requires the selective enrichment of phosphorylated species. Although attempts have been made to use anti-pY antibodies to enrich for phosphorylated sperm proteins prior to digestion, these approaches still suffer from the presence of significant numbers of nonphosphorylated peptides after digestion . As a result, several methods for the enrichment of all phosphate-containing species have been developed and applied at the peptide level [65–69]. One of the most widely used techniques for global phosphopeptide enrichment, IMAC, is based upon the affinity which phosphate exhibits towards immobilized metal ions, such as iron(III) [70–72]. A drawback to the application of this technology, which was recognized early on, is that the affinity material also binds carboxylate-rich peptides (e.g., those containing aspartic and/or glutamic acid residues), unnecessarily complicating the sample for analysis . To prevent the binding of nonphosphorylated peptides to the IMAC resin, a modification of this technique was developed wherein acidic residues are converted to their corresponding methyl esters using the Fischer esterification [28,74]. Our group has successfully used this reaction in sperm phosphoproteomic studies to minimize the binding of nonphosphorylated peptides to an alternative phosphate-binding material composed of TiO2 beads. However, other groups [42,45] have relied on the inclusion of a ‘displacer’, such as dihydroxybenzoic acid (DHB), into the phosphopeptide loading buffer to compete with carboxylic acids for binding sites on the TiO2 . While this competitive approach has proven to be successful, the bridging interaction that characterizes the binding of phosphate to TiO2 makes the elution of multiply phosphorylated species more difficult . This suggests that it may be best to view TiO2 not as an alternative to IMAC, but rather as a complementary method: TiO2 for the enrichment of singly phosphorylated peptides and IMAC for the enrichment of multiply phosphorylated species .
The lists of phosphoproteins and phosphopeptides obtained in studies attempting to link sperm protein PTM with function take on greater biological significance when associated with measurements permitting their relative quantitation. Difference in-gel electrophoresis (DIGE) has been used in conjunction with anti-pS antibodies for the relative quantitation of a protein believed to be phosphorylated to a greater extent in caput versus caudal spermatozoa . This approach, which relies upon protein labeling using fluorescent dyes with unique excitation wavelengths, permits the visualization and volume measurement of the same protein spot from two differentially labeled samples on a single 2DE gel, thereby eliminating the problems associated with gel to gel variability . However, this technique does not allow for the localization of specific phosphorylation sites and, as mentioned earlier, the sensitivity of fluorescent labeling is often incompatible with downstream protein identification. A more suitable approach for the relative quantitation of phosphorylation changes occurring on a particular residue is the differential chemical labeling of peptides following protein digestion [78–80]. To this end, our group has again taken advantage of the Fischer esterification, which is required for the enrichment of phosphorylated species using IMAC as explained earlier, and carried out the reaction on capacitated and noncapacitated sperm protein digests using methanol (d0) and deuterated methanol (d3) as reagents . In the simplest experiment, the two isotopically labeled peptide samples are combined and examined together in a single MS analysis for the relative quantitation of protein expression. Alternatively, as shown in Figure 2, the pooled sample can be enriched for phosphopeptides and a single MS experiment can be used to simultaneously identify the phosphoproteins, pinpoint the specific sites of phosphorylation and quantitate the relative extent of modification at these sites as a consequence of the biological changes occurring within sperm.
Although it is by no means the only factor to be considered in the proteomics process, MS represents the major enabling technology for the identification of sperm phosphoproteins and the localization of phosphorylation sites. Following sample preparation, consideration must be given to the residual protein complexity, concentration and potential contamination in order to select the most appropriate MS instrumentation for downstream analysis. As each type of mass spectrometer uses a specific method of ionization, mass analysis, fragmentation and detection, each has a unique set of capabilities making it particularly well suited for a given phosphoproteomic study.
For the rapid identification of sperm phosphoproteins resolved on 2DE gels, several studies have avoided amino acid sequencing of the peptides resulting from in-gel digestion, and have instead used the determination of several intact peptide m/z values as a means to identify the protein from which they are derived [27,58,81–83]. The success of this approach, known as PMF, requires that multiple peptide m/z values be matched to a single known protein sequence. For this reason, an instrument coupling time-of-flight (TOF) detection with matrix-assisted laser desorption/ionization (MALDI) is often used, as this combination exhibits a high resolving power stemming from the method of ion generation. MALDI uses a brief laser pulse applied to a dried matrix-sample droplet in a vacuum chamber to generate predominantly singly charged gas-phase ions in a discrete location and with a narrow distribution of kinetic energies . As a consequence, the initial spread of velocities for ions with the same m/z is minimized, ensuring that their arrival time at the TOF detector can be measured accurately . This permits the peptide m/z values to be determined with a high degree of precision, minimizing the number of potential amino acid sequences associated with each and significantly improving the likelihood of correctly identifying the corresponding protein of interest.
When irrefutable phosphoprotein identification is required however, a partial determination of the primary amino acid sequence of the protein must be obtained. MS/MS of peptides involves the gas-phase isolation of species with a given m/z, their fragmentation, and the detection of the resulting product ions. Carrying out this process on a MALDI-TOF/TOF instrument involves an initial round of TOF to isolate ions of a specific m/z in a trapping cell so that they may undergo a process known as collisionally induced dissociation (CID). The peptide ions are confined for a finite amount of time, during which they undergo multiple collisions with an inert gas held at relatively high pressure within the cell. The energy imparted to the ions through each impact gradually raises their internal vibrational energies in an ergodic process that eventually induces them to fragment . Following fragmentation, the product ions are accelerated out of the cell and into a second TOF region, providing a high mass accuracy determination of the fragment m/z values and permitting both peptide sequencing and phosphoprotein identification .
As useful as it is, a MALDI-dependent approach becomes impractical when the peptide mixture is of even moderate complexity (i.e., derived from more than two proteins) as MALDI ionizes all species in a sample simultaneously and may suffer from reproducibility issues associated with the nonhomogeneity of matrix crystallization and the localization of peptides to particular regions (‘hot spots’) within the matrix-sample droplet . As an alternative, electrospray ionization (ESI) involves the application of high potential (±1–5 kV) on a capillary emitter tip to produce multiply charged, gas-phase peptide ions from an aerosol of small, highly charged liquid droplets [88,89]. This method is particularly well suited for the generation of ions from a complex mixture because it directly transfers solvated species into the gas phase and, as it occurs at atmospheric pressure, it can easily be interfaced with upfront liquid chromatographic (LC) peptide separation [54,90,91].
Although ESI has been successfully used in conjunction with TOF detection [42,45,82], it has most often been coupled with ion trapping instrumentation. In such mass spectrometers, the kinetic energies of the species generated by ESI are reduced through low energy collisions with an inert bath gas present in the ion trap (i.e., ‘collisional cooling’) and are then sequentially ejected to an electron multiplier in order to produce a mass spectrum. MS/MS is accomplished by first isolating ions of a given m/z value within the same trapping region, forcing these ions to undergo CID through more energetic collisions with the bath gas and then scanning out the fragments [29,40,56,90–92]. While this tandem-in-time approach to MS/MS allows successive rounds of dissociation to be carried out on the resulting fragments in order to obtain additional structural information (MSn), mass spectrometers of this type have a low duty cycle due to the use of the ion trap for both MS and MS/MS data acquisition. In addition, the effective resolution of these instruments is limited by the rate at which ions can be scanned out of the trap [93,94]. In order to address these issues, ESI sources have been coupled with Fourier transform (FT)-based detectors that exhibit greatly improved duty cycle (i.e., the number of ions detected in a given time) and resolving power relative to their scanning, ion trapping counterparts [28,85]. The major drawback of these instruments is the extremely low vacuum required for FT-based detection that prevents the concomitant use of a CID bath gas in the trapping cell for controlled peptide fragmentation. Hybrid mass spectrometers work around this issue by incorporating both ion trapping and FT-based detectors into a single instrument: while the resolving power and associated mass accuracy of FT-based detection is being exploited for the determination of m/z, CID can be carried out simultaneously in the ion trap to generate a linked MS/MS spectrum for a given precursor .
Although advances in peptide ionization and MS instrumentation have resulted in the unequivocal identification of many sperm phosphoproteins, a molecular-level understanding of sperm function requires determination of the specific sites of phosphorylation within these proteins. The electronegativity and unique chemical characteristics of the phosphate moiety, which make it useful in biological signaling, lead to a number of issues that must be addressed in order to successfully identify specific sites of protein phosphorylation in sperm. For example, it has been reported that phosphorylated peptides ionize poorly when traditional MALDI matrices are used . To overcome this issue, different matrices  and acidic additives [97,98] have been utilized to increase ionization yields and improve phosphopeptide identification. The complications associated with MALDI-based sequencing of phosphopeptide mixtures have been circumvented by using reversed phase LC coupled with ESI MS/MS, but this approach is not without its own inherent difficulties, even when upstream phosphopeptide enrichment is used. To begin with, phosphorylated species exhibit decreased hydrophobicity when compared with their unmodified counterparts, so shallow reversed phase LC elution gradients and minimal column rinsing need to be used to ensure phosphopeptide retention and chromatographic resolution. In addition, the phosphodiester bond that links the phosphate moiety to serine and threonine residues is labile, making it the bond most readily broken when CID is used for phosphopeptide fragmentation [56,75–78]. This produces MS/MS spectra in which the majority of the detected ion current represents species that have undergone the neutral loss of phosphoric acid and in which there are few, if any, sequence-informative peaks [28,45].
The creation of a linear ion trap (LIT) capable of isolating significantly more precursor ions  partially addresses this problem by trapping more fragments and increasing, above the noise threshold, the signals resulting from peptide backbone cleavage [42,44,45]. As shown in Figure 3A, while there are a few detectable sequence-informative peaks in the MS/MS spectrum acquired on a LIT mass spectrometer, they are low level and, alone, they are insufficient to allow for phosphopeptide sequencing. In this instance, it may have been beneficial to conduct the analysis on a hybrid instrument in which the high mass accuracy afforded by FT-based detection could have been used in combination with the limited sequence information to successfully identify the phosphorylation sites [44,99]. Ultimately, however, the limitation in sperm phosphopeptide sequencing using this approach stems from the ergodic nature of the CID process itself. This means that the energy deposited gradually becomes randomized throughout the peptide and leads to fragmentation occurring primarily at the labile phosphodiester bond . While identification of the phosphorylated peptide can be improved by using alkaline phosphatase to remove the phosphate moiety prior to CID, this procedure alone cannot be used to unequivocally identify the site(s) of phosphorylation . As an alternative, a novel, nonergodic technique referred to as ETD has been developed wherein the transfer of a low energy (<10 eV) electron from a radical anionic gas leads to fragmentation of the phosphopeptide without inducing the neutral loss of phosphoric acid . As illustrated in Figure 3B, the greater extent of backbone cleavage shown in the ETD MS/MS spectrum permits unambiguous assignment of the amino acid sequence and localization of the sites of phosphorylation within the large, multiply phosphorylated peptide [42,45].
The ability to draw meaningful biological conclusions from any sperm phosphoproteomics experiment depends entirely upon an unbiased interpretation of the data. However, the complexity of the proteomics process means that it is possible to unintentionally neglect a critical consideration, which, in turn, severely compromises the quality of the peptide and protein assignments. In addition, the large number of spectra obtained in a given MS experiment is often used to justify complete reliance on software for sequence assignments, which can further compound the problem. For example, the database searching tool Mascot  has been utilized extensively for the identification of sperm phosphoproteins in PMF experiments [58,83,103]. The software works by extracting the most abundant peptide m/z values from an experimental MS spectrum, comparing them with values generated from the in silico enzymatic digestion of proteins contained within a user-selected database and then assigning a protein or proteins to each spectrum on the basis of the matched peptide m/z values. Although the manner in which Mascot carries out these steps is proprietary, it uses the probability-based MOWSE algorithm  to associate a significance threshold with each protein assignment such that a sequence with a score above this threshold has a low probability of being assigned by chance. Unfortunately, the significance of this score is easily altered when an incomplete set of PTMs are considered or when fewer proteins are present in the interrogated dataset , as is the case when a species-specific subdatabase is used to expedite the data analysis process.
If the goal of the project is not simply the identification of sperm phosphoproteins but rather the visualization of specific phosphorylation sites, determination of the primary amino acid sequence is required. The Mascot algorithm can be used for this purpose by matching the m/z values in an acquired CID MS/MS spectrum with the m/z values generated from in silico peptide backbone fragmentation . However, even though Mascot is the most widely used software for phosphopeptide sequencing [30,42,45], the proprietary nature of the software severely limits user understanding and control of the process . By contrast, the operation of the second most commonly used database searching algorithm in sperm phosphoproteomic studies [28,40,42,45,108], SEQUEST, is well documented [109,201]. SEQUEST consists of two rounds of scoring: the first round prioritizes potential peptide assignments by matching the m/z values for fragment peaks in the MS/MS spectrum with fragment m/z values determined through in silico peptide fragmentation. The second round involves the creation of idealized MS/MS spectra for the highest scoring peptide sequences from the first round and then their translation across the experimental spectrum using the Fast Fourier Transform (FFT). Unlike the probability-based scoring used in Mascot, this process results in the calculation of a cross-correlation (XCorr) score that is a measure of how well a given peptide assignment matches the distribution of peaks within the experimental spectrum.
Regardless of the software tool, successful use of automated searching algorithms requires that the correct peptide be present in the database and that the sequence fit the constraints programmed by the user. As shown in Figure 4A, the best SEQUEST assignment for the given MS/MS spectrum is the singly phosphorylated peptide AQGMAQpSQGEALPN. Although this peptide assignment has a relatively high XCorr of 3.016, it actually represents a false-positive identification: a problem frequently encountered in phosphopeptide sequencing due to the large number of peaks corresponding to the neutral loss of phosphoric acid. As shown in Figure 4B, a greater number of fragment ions within the spectrum are actually explained by the doubly phosphorylated peptide LEMAApSKNpTDNN. The sequence was misassigned by SEQUEST because the correct sequence contains an unanticipated modification, deamidation of an asparagine residue to aspartic acid, and therefore does not exist in any protein database. It is important to note that even if this modification had been considered, the correct sequence would still not have been determined using SEQUEST as the aspartic acid, which was generated following deamidation, was subsequently methylated using the Fischer esterification.
The correct phosphopeptide sequence (Figure 4B) was only assigned to the MS/MS spectrum shown in Figure 4 through de novo peptide sequencing, an unbiased, but more technically challenging, alternative to automated database searching . In its simplest form, de novo sequencing involves the preliminary identification of sequence informative peaks within the spectrum by finding complementary pairs of ions whose m/z values sum to the approximate mass of the intact peptide. Notably, this process can be carried out using the c- and z-ion pairs that are commonly encountered in ETD or with the b- and y-ion pairs that are found in CID spectra . Once identified, the spacing between all of the peaks is examined to find m/z differences that correspond to mass shifts characteristic of particular amino acids. In this manner, a string of MS/MS peaks can be found which explain the entire peptide sequence or which explain enough of the sequence (a sequence ‘tag’) to permit a subsequent database search, such as the Basic Local Alignment Search Tool (BLAST) at the National Center for Biotechnology Information (NCBI), to be used for peptide/protein identification .
Although a more detailed description of de novo sequencing is beyond the scope of this review, successful use of the process frequently involves the consideration of several disparate lines of evidence, especially when it is applied to phosphopeptides. For instance, our group has used de novo sequencing to assign phosphopeptide sequences to CID MS/MS spectra derived from isotopically labeled, IMAC-enriched, capacitated and noncapacitated mouse sperm digests . This is often extremely challenging due to the aforementioned neutral loss of phosphoric acid, but in this case the ability to compare a given MS/MS spectrum with its differentially labeled counterpart greatly improved our de novo sequencing capability. To begin with, calculation of the difference between the ‘heavy’ and ‘light’ precursor masses allowed the number of glutamic and/or aspartic acid residues in the phosphopeptide to be determined prior to sequencing. Likewise, inspection of the MS/MS spectrum of the deuterated version of the peptide for fragment peaks whose masses shifted as a consequence of the ‘heavy’ label simplified the discovery of complementary ion pairs, permitted the rapid determination of the ion series corresponding to the peptide C-terminus and made the localization of acidic residues within the sequence remarkably straightforward. As an example of the utility of this approach, 11 phosphopeptides whose amino acid sequences were not present in the NCBI non-redundant (nr) protein database were sequenced in this study. A subsequent search of these sequences against the mouse genome revealed an unknown gene in a region which had previously been considered noncoding. The identification of the protein encoded by this gene, which the authors have named testis-specific serine/proline-rich protein , represents an exception to the currently accepted paradigm: rather than relying on translated genetic information for protein sequencing it is possible to identify a target gene directly through de novo peptide sequencing.
Metabolic labeling strategies (e.g., SILAC) commonly used for relative quantitation in other cell types are of limited utility in sperm because, as mentioned earlier, sperm are believed to be both transcriptionally and translationally silent after leaving the testes [3,4]. Nonetheless, alternative strategies have been used in sperm phosphoproteomic studies to not only quantitate phosphoprotein expression  but to quantify the extent of modification at a specific phosphorylation site [42,44,45]. One of these methods, label-free quantitation, uses the ratio of chromatographic peak areas obtained in separate LC-MS analyses to quantify differences between the samples being compared . Although this method avoids the additional sample handling associated with chemical labeling, quantitation using this approach is difficult due to the influence which the sample medium and chromatographic conditions have on ionization efficiency and, ultimately, on MS-based detection. These inherent variations require the analysis of a significant number of biological replicates. For example, Baker et al. carried out eight replicate MS analyses of both caput and caudal spermatozoa and used specialized software (DeCyder™ MS) to average out the variability among these 16 analyses in order to quantitate phosphorylation changes associated with epididymal maturation . In an alternative strategy, our group circumvented the issue of run-to-run variability by analyzing the heavy and light labeled versions of capacitated and noncapacitated sperm digests simultaneously in a single MS experiment . We also carried out the analysis with the labeling reversed in order to correct for any variation in chromatographic retention, ionization efficiency or MS detection caused by the presence of the isotopic label. Within each individual analysis, the chromatographic peak area of a phosphopeptide derived from the capacitated sample was divided by the peak area of the identical, differentially labeled phosphopeptide derived from the noncapacitated sample. These two ratios were then used to calculate an odds ratio , which more accurately reflects the relative change in phosphorylation occurring on a particular phosphopeptide as a result of the capacitation process.
The importance of phosphorylation in sperm has been well documented, but only recently, phosphoproteomic workflows have been developed that can concomitantly identify phosphoproteins, localize specific sites of phosphorylation and quantitate the extent of modification at a particular site. In our experience, a workflow incorporating both IMAC and TiO2 phosphopeptide enrichment is necessary to address the low stoichiometric level of phosphorylation and to obtain a more comprehensive view of phosphorylation events in sperm. However, chemical modification should be used to minimize the binding of nonphosphorylated peptides to the enrichment resins. For this purpose, we use the Fischer esterification as it has a number of additional advantages: the reagents are relatively inexpensive, the reaction goes to completion under anhydrous conditions and every peptide in the mixture contains at least one label (at the peptide C-terminus). In addition, the Fischer esterification can be carried out using heavy and light methanol if the phosphoproteomic analysis involves a comparison of dissimilar samples. As mentioned earlier, this allows the enriched and differentially labeled peptides to be mixed and analyzed in a single LC-MS experiment. We recommend carrying out this experiment on a hybrid mass spectrometer, as the high resolving power can be used to more accurately measure chromatographic peak areas for purposes of quantitation, while the associated increase in accurate mass determination can be used to improve phosphopeptide sequencing. Furthermore, some of these instruments are capable of utilizing ETD as an alternative phosphopeptide fragmentation method if the CID MS/MS spectra exhibit the dominant neutral loss of phosphoric acid . However, ETD is a relatively recent development and, despite its commercial availability, optimization of the process is an active area of research . In addition, ETD involves a different fragmentation mechanism than CID and, as a result, current database searching algorithms have difficulty assigning amino acid sequences to ETD spectra. In our laboratory, we capitalize upon the differential isotopic labeling strategy to aid in the de novo sequencing of both CID and ETD phosphopeptide MS/MS spectra, but we recognize that this approach is technically challenging and impractical for extremely large datasets. As an alternative, we suggest that most laboratories use either SEQUEST or Andromeda , a well-documented probability-based scoring algorithm similar to Mascot, for automated phosphopeptide sequence assignment. We also recommend that validation of the sequence involve more than a ‘manual check’ to identify the presence of several fragment m/z values within the spectrum: this only serves to confirm the operation of the software and says nothing about the quality of the peptide assignment. In our opinion, no phosphopeptide sequence should be accepted if the associated MS/MS spectrum contains peak pairs that are unaccounted for, or if a significant portion of the ion current (~20%) remains unexplained. Under these circumstances, the probability that this assignment is a false-positive identification is simply too high and, with more and more laboratories relying on proteomic data, the potential for this information to be used as the basis for unwarranted research effort is simply too great.
Our current understanding of the role which phosphorylation plays in the regulation of sperm function has relied almost entirely on recent advances in the field of proteomics. As methodologies for the global enrichment of phosphorylated species and MS-based sequencing techniques are adopted by more and more laboratories involved in sperm research, it is likely that the number of identified protein phosphorylation sites will increase dramatically in the next 5 years. In addition, the ability to quantitate changes in the extent of modification at particular amino acids, coupled with advances in germ cell transplantation and transgenic rescue of knock-out models will begin to reveal the role of these specific sites in sperm function. However, the utility of this information is absolutely dependent upon the quality of the experimental design and the resultant data. As we move forward, the greatest challenge in the field of sperm phosphoproteomics will be in ensuring that non-proteomic specialists fully understand the complexity of the overall proteomics process and that they adequately validate and correctly interpret their results.
This work was supported by Rensselaer Polytechnic Institute start-up funds (to MD Platt) and grants (HD38082 and HD44044) from the NIH (to PE Visconti).
Financial & competing interests disclosure
The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Papers of special note have been highlighted as:
• of interest
•• of considerable interest