|Home | About | Journals | Submit | Contact Us | Français|
Mass spectrometry using matrix-assisted laser desorption/ionization (MALDI) is a widespread technique for various types of proteomic analysis. In the identification of proteins using peptide mass fingerprinting, samples are enzymatically digested and resolved into a number of peptides, whose masses are determined and matched with a sequence data-base. However, the presence inside the cell of several splicing variants, protein isoforms, or fusion proteins gives rise to a complex picture, demanding more complete analysis. Moreover, the study of species with yet uncharacterized genomes or the investigation of post-translational modifications are not possible with classical mass fingerprinting, and require specific and accurate de novo sequencing.
In the last several years, much effort has been made to improve the performance of peptide sequencing with MALDI. Here we present applications using a fast and robust chemical modification of peptides for improved de novo sequencing. Post-source decay of derivatized peptides generates at the same time peaks with high intensity and simple spectra, leading to a very easy and clear sequence determination.
Mass spectrometry (MS) is a pivotal technique in proteomic studies. The traditional protein identification by MALDI relied on peptide mass fingerprinting, in which proteins are resolved into a number of peptides, and matching peptide masses with a sequence database allows identification of the protein.1 However, an increasing amount of data are shedding light on the complexity of protein systems. The central dogma of 1 gene = 1 protein has been completely overcome; it is now well established that proteins are subjected to a wide spectrum of chemical, physical, and structural rearrangements, from their synthesis until their degradation. mRNA editing, splicing variants, and proteolytic cleavages by various enzymes in different cellular compartments and under different physiological states all create a large number of proteins that do not fit with the theoretical models.2 In addition, particularly in some pathological conditions such as cancer, there are cases of fusion proteins, bearing domains from two different wild-type molecules.3–4 Furthermore, the function and localization of proteins playing a crucial role in metabolic activities are tightly regulated by several post-translational modifications (PTMs).5 Although the number of species with sequenced genomes is increasing, the vast majority of species are uncharacterized, making PMF predictions impossible. In order to identify a protein from an uncharacterized species, de novo sequencing will generate a unique peptide sequence with which an error-tolerant homology search can be performed. Also, primary sequence information is needed in order to initiate a cloning project. Furthermore, for the analysis of PTMs, de novo sequencing is crucial, since the presence as well as the position of the modification will be determined.
MALDI MS is considered a “soft” ionization technique,6 producing almost exclusively intact, monoprotonated ion species.7 This technique has been largely utilized for the study of high-molecular-weight molecules, such as proteins or peptide mixtures. However, different studies8 have shown that a significant degree of metastable ion decay occurs after ion acceleration and prior to detection. The ion fragments produced by this metastable ion decay of peptides and proteins typically include neutral molecular losses (such as water, ammonia, and portions of the amino acid side chains) as well as random cleavages along the peptide backbone.
In the past few years, many efforts have been undertaken to determine the primary structure of peptides using MALDI-MS, and several methodologies have been developed. An important approach utilizes the phenomenon of metastable ion decay to produce sequence information.9 In a linear time of flight (TOF) mass spectrometer, metastable fragments, which occur post-source (i.e., in the field-free region), are not separated, since precursor and metastable decay ions move with the same velocity and arrive simultaneously at the detector. Conversely, a reflector TOF mass spectrometer will separate precursor and metastable decay ions by their difference in kinetic energy. The majority of metastable ion decay observed in reflector TOF mass spectrometers occurs during ion acceleration out of the ion source and in the field-free region of the mass spectrometer—i.e., post-source decay (PSD). In the PSD technique, some of the MALDI-generated ions undergo metastable decay during flight by either unimolecular or bimolecular (collisions) pathways, producing smaller m/z ions and neutrals. The fragmentation pattern in PSD reflectron TOF favors backbone cleavages producing predominantly an-, bn-, and yn-type fragment ions, with very few (if any) side chain–specific cleavages. However, this system often generates peaks of weak intensity, and different peptides show great variation in their ability to fragment. Therefore, the resulting spectra are quite complicated, due to the presence of diverse types of ions, making a determination of the final sequence very challenging. An important improvement for “cleaner” PSD fragmentation was introduced by Keough and colleagues.10–11 This includes a derivatization protocol directed towards the peptide N-terminal group, which becomes strongly negative by sulfonation in a fast, simple, and specific reaction. The presence of a strong acidic group greatly enhances the fragmentation ability of tryptic peptides and produces almost exclusively b and y fragments. Moreover, the negative charge at the N-terminus counterbalances the positive charge of the captured proton in the b fragments, rendering them neutral and thus undetectable. In this way, the final spectrum is composed of a clear, well-characterized series of y ions, making the deconvolution of the amino acid sequence easy and efficient.12
The Ettan CAF-MALDI sequencing kit was purchased from GE Healthcare Amersham Biosciences AB, Uppsala, Sweden, and was used for peptide derivatization according to the manufacturer’s instructions, with a few modifications. Modified sequence-grade porcine trypsin was purchased from Promega (Madison, WI).
Peptides for sequence analysis were resolved from proteins by polyacrylamide gel electrophoresis, and Coomassie- or silver-stained bands were destained, as described.13 As the fragmentation procedure requires basic amino acids (lysine or arginine) at the C-terminus, proteins were tryptically digested overnight at 37°C. All peptide derivatizations were performed on solid-phase supports. A μZipTip C18 (Millipore, Bedford, MA) was wetted with a solution of 10 μL of 0.1% trifluoroacetic acid (TFA)/60% acetonitrile, then equilibrated with 0.1% TFA. The sample to be derivatized, dissolved in aqueous buffers without any organic solvent, was drawn up and down about 10 times, in order to favor the adsorption onto the reversed-phase material. The sample was then washed with 0.1% TFA and subjected to in situ derivatization.
Since the sulfonating reagent reacts with all primary amino groups, blocking of the -amino group of lysine residues is required. We achieved this in two different ways. When not specified, we used a solution of O-methylisourea hydrogen sulfate (17.2 mg/mL in 0.25 M NaHCO3, pH 10) which was allowed to react in situ with the tryptic digest overnight at room temperature (RT). This Lys-blocking reaction (guanidation) adds 42 (42.012) Da per Lys residue; it is specific for the -amino groups and thus leaves the N-terminal amino group intact. Another possibility is the use of the 2-methoxy-4,5-dihydro-1H-imidizole compound, (Lys Tag 4H, Agilent Technologies). This reagent adds 68.037 Da specifically to -amino group of lysine residues. The reaction was performed according the manufacturer’s protocol and the literature data.14
The freshly prepared CAF-labelling solution (1mg/10 μL in 0.25 M NaHCO3, pH 9.4) was slowly drawn up and down, and allowed to react with the adsorbed sample for 3 min at RT. The reagent was washed out with 10 μL of 0.1% TFA. A preparation of 5% hydroxylamine in the same labeling solution was drawn up and down few times in order to hydrolyze off unwanted uptake of the CAF reagent onto hydroxyl-containing amino acid residues, then washed as above. The sulfonated peptide(s) were eluted into a 0.2-mL Eppendorf tube by drawing 2–5 μL of 0.1% TFA/60% acetonitrile up and down the tip. The sulfonation adds 136 Da to primary amines in the peptide. A peptide with C-terminal Arg will increase its mass by 136 Da, and peptides with a C-terminal Lys will increase by 42 (or 68) +136 = 178 (or 204)Da (additional internal Lys residues take up 42 or 68 Da each). If a Lys-containing peptide was sulfonated without the guanidination step, it would be sulfonated also on the -amino group.
All experiments were performed using a MALDI/TOF-TOF Ultraflex instrument (Bruker Daltonics, Bremen, Germany). The parameter settings were optimized for analyses of peptides up to 3500 mass-to-charge ratio (m/z). The N2 laser (337 nm wave length) was used at a 50 Hertz frequency. Samples were prepared with the “dried-droplet“ method: 0.3 μL of the sample was mixed on Parafilm with an equal volume of a saturated solution of α-cyano-4-hydroxycinnamic acid (HCCA, Bruker Daltonics) in 0.1% TFA/40% acetonitrile. For the study of very acidic proteins (i.e., sulfonated salmon calcitonin—see Figure 55,, later in the article) we used as a matrix 2,5-dihydroxybenzoic acid (DHB, Bruker Daltonics). A drop of 0.3 μL was deposited on the polished stainless steel MALDI target. Calibration for PMF samples (digests) was performed both externally, using a mixture of nine peptides ranging from m/z 757.40 to 3147.47, and internally by using autolytic tryptic fragments. All samples were analyzed in reflector mode before and after derivatization to obtain PMF spectra. By comparing these spectra and scanning for additions of 136 or 178 (204) Da, candidates for sequence analysis were identified. The instrument was switched to PSD mode and the ion selector was set to the m/z values of the precursor ions with a window ±0.2–1.0% of the parent ion mass. More than 400 laser shots were collected for each spectrum. PSD spectra were interpreted manually, or using BioTools 2.0 from Bruker Daltonics.
In most cases, a spectrum containing exclusively y ions was obtained. The sulfonation label (mass of 136) is lost from the labeled precursor ion, and then the series of y ions is observed. By simple manual calculation of the differences between the adjacent y-ion fragments, or by using suitable software, the amino acid sequence can be easily interpreted.
Although the PSD mechanism has supplied the theoretical background for peptide sequencing, the spectra produced are often very difficult to interpret. It should be remembered that the fragmentation observed is both residue and peptide specific, and a correct sequence determination of a given peptide is challenging. In order to improve this technique, derivatization processes have been developed over the last several years. Among them, sulfonation at the N-terminus using the CAF reagent (3-sulfopropionic acid NHS ester) turned out to be an efficient and reproducible system for de novo sequencing. This approach consists of a first blocking step, in which O-methylisourea-hydrogen sulphate is used to convert lysine residues into homoarginine (giving rise to a 42-Da mass addition) in order to protect the lysine side chains from the subsequent sulfonation labeling. This reaction is specific for the lysine -amino group, and therefore it does not react with the N-terminal amino group. In the second step, the CAF reagent reacts with the amino-terminal group of the peptide, generating a sulfonated N-terminus (136-Da increased mass; Figure 1B1B).). The presence of a negative charge at the N-terminus has two main effects: first, peptides have a higher tendency to fragment due to the presence of the additional, highly mobile electron; second, in the b/y couple of fragments generated, the b series become neutral, as a consequence of the negative charge of the sulfo group and the positive charge of the proton captured from the matrix (Figure 1C1C).). As the y ions are the only ones detected, the final spectrum consists of a unique series of y ions and is therefore particularly easy to interpret. It is worth noting that the CAF-dependent fragmentation requires the presence of a strong basic amino acid (i.e., lysine or arginine) at the C-terminus, which makes the procedure highly suitable for trypsin-digested proteins.
As pointed out in the introduction, there are several cases in which protein identification using PMF requires confirmation; here a de novo sequence determination becomes highly valuable. CAF-PSD generates spectra that are very clear and easy to interpret, with the only drawback of leucine/isoleucine misidentification. As lysine residues are always modified in one way or another, they are not mistaken for glutamine. Figure 22 shows an example of a tryptic, unknown peptide. As part of the ABRF Protein Research Group (PRG05) program, five synthetic peptides were mixed and sent to participants who wanted to test their ability to perform de novo sequencing.16
The upper panel shows the spectrum of one of those peptides (m/z 2328.152), after a normal PSD fragmentation. As several peaks do not fit in a single ion series with a relevant S/N ratio, the resulting spectrum is not suitable for an easy sequence readout. The lower panel shows the fragmentation profile of the same peptide after chemical modification. After the loss of the N-terminal sulfonating group (136 Da), a well-defined series of y ions is observed, giving rise to an unambiguous sequence of 20 amino acids. To highlight the relative complexity of this analysis, it is worth mentioning that out of 40 laboratories sending back the results, only two were able to supply the correct sequence.16
The majority of the biological research is obviously focused on human and on a few other well characterized species. However, the majority of species are characterized only partly or not at all. In order to identify and describe proteins from these species, it is often necessary to proceed with a de novo sequence analysis. Figure 33 shows an example of the study of proteins from the bivalve mollusc Mytilus galloprovincialis. Panel A is the PMF of one of these proteins, while panel B shows the PMF pattern after lysine modification and sulfonation (adding 178 or 136 Da to lysine- or arginine-terminated peptides, respectively). As we wanted to confirm the identity of this protein, and as a PMF analysis is not useful in this regard, we selected a few derivatized peptides from panel B and subjected them to CAF-PSD. Panel C shows an unambiguous sequence of 25 amino acids, derived from the peak at m/z 2825.3. Note that this is not one of the major peaks in the spectrum; its intensity is quite low compared to surrounding peaks. Moreover, we supported our amino acid composition (sequence) by comparing the theoretical mass of this peptide (2647.216) with the measured one (2647.324, panel 3A). With this sequence, we looked for protein homology via a Blast search, and in this way we were able to identify the protein as a cAMP-dependent protein kinase. This approach allowed for the identification by sequence homology of unknown proteins.
A field of growing importance in proteomic research is the identification of type, site, and role played by PTMs. To this end, the complete sequencing of a modified peptide is the most direct and reliable approach for the evaluation of both the specific PTM(s) present and the amino acid side chain(s) involved. In Figure 44,, an interesting case is presented. The protein Smad7, involved in TGF β signal transduction, was known to be acetylated at one or more lysine residues. We carried out PMF of the trypsin-digested protein (panel A) and then compared the obtained mass list with the theoretical one, taking into account possible acetylation on any lysine residue, as well as oxidation on methionine. The peptide mass of 896.404 Da fits exactly with a fragment acetylated on lysine and oxidized on methionine. The fact that this value did not correspond to any other theoretical masses belonging to Smad7, or to possible contaminants such as trypsin or keratin fragments, strongly supported the assignment of this peak to the acetylated and oxidized fragment of Smad7. A common, routine mass-spectrometric analysis had given this result, supported by the concomitant fitting of two different mass analyses. In order to confirm the fragment assignment, we sulfonated the peptide mixture (panel B) and performed CAF-PSD sequencing of the peptide of interest (896.404 + 136 = 1032.404; panel C). Surprisingly, the sequence obtained (GSPEFGFR) showed that the peptide incorporated the three N-terminal amino acids from Smad7 fused to the GST tag. The mass of this fusion peptide turned out to be the same as that of a possible post-translational modified internal fragment, and only a sequence analysis revealed the correct characterization of the protein.
Calcitonin (CT) is a peptide hormone secreted by the parafollicular cells of the thyroid gland in mammals and by the ultimobranchial gland in birds and fishes. It is involved in calcium-phosphorus homeostasis, and it has also been postulated as a neuromodulator and/or neurotransmitter. Fish calcitonins have been found to have more potent biological effects than mammalian ones; particularly, salmon calcitonin is used therapeutically mainly for the treatment of osteoporosis, and a chemical modification has been introduced in order to improve its medical efficiency.15 The modification consists of the addition of a sulfonic group (SO3−, 80 Da) to cysteine 1 and 7. We carried out sequence analysis of this protein, in order to check for the presence of sulfonic groups. Figure 5A5A shows the spectrum of the native protein; the 3592.9 peak corresponds to the sulfonated mass of the intact protein. A tryptic digest of calcitonin was subjected to lysine blocking using imidazole compound (addition of 68 Da); we chose this more basic lysine tag instead of O-methylisourea in order to counterbalance the strong acidity of the two sulfonic groups. We focused our attention on the peptide 1–11 (C*SNLSTC*VLGK, m/z 1284.4575), the one supposed to contain the modifications on cysteines. Surprisingly, in the PMF of the lysine-modified mixture, all the peptides appear shifted by 68 Da, as expected, except for the peptide 1–11 (Figure 5B5B,, peak 1284.48 Da, indicated by the arrow). CAF chemistry was performed, and the peptide of interest (1284.48 + 136 = 1420.49) was selected for sequencing. Figure 5C5C shows the PSD spectrum of this peptide. The expected mass of modified cysteine was 183 Da (103 + 80 from the sulfonic group). The sequence obtained matched perfectly with the theoretical one; however, the mass of both cysteines was 149 Da—hence, a loss of 34 Da (183 – 149). The native protein possesses the two sulfo groups, as suggested by the mass of the native protein. During the lysine blocking reaction, 68 Da of the imidazole group were added to the C-terminal lysine, as confirmed by the mass of the y1 ion (147 + 68 = 215 Da); at the same time, 34 Da were lost from both sulfo groups on the two cysteines, most probably caused by atomic rearrangements. The final result was an unchanged peptide mass, despite documented molecular modification. The atomic rearrangement with the loss of 34 Da is possibly generated by the very basic environment present during the lysine-blocking reaction, since the incubation of the peptide under the same conditions without the imidazole compound still generates the molecular loss of 2 × 34 Da. Although we are not able to explain the mechanism of such a rearrangement, the CAF modification supplied a clear sequence of the peptide, confirmed the presence of modifications on cysteines, and provided interesting additional data concerning the chemistry of these cysteine-linked sulfo groups.
MS analysis can generate a huge amount of data in a short time, and the help of several algorithms is essential for their interpretation. However, uncommon situations may occur, and the experience of the operator is always pivotal. In a search for mutations in the low-molecular-weight isoforms of transthyretin, eventually involved in amyloidosis, we carried out PMF of the sulfonated protein (Figure 6A6A)) followed by PSD analysis of selected fragment ions. For the ion 1975.9 we found the sequence Y[L/I]GEVFEEETT[L/ I]; surprisingly, this sequence matched neither with transthyretin nor with any other proteins in the database. A more accurate analysis showed that the peptide sequence matched with transthyretin, but in a back-forward orientation (Figure 6C, DD).). This very strange result was explained by the combination of two events: (i) the presence of an Arg (Lys)-Arg motif, which, after tryptic digestion, generates a peptide with an arginine at the N-terminus; and (ii) the presence of chymotryptic activity and, as a consequence, the production of a peptide with a phenylalanine at the C-terminus. The sulfonation chemistry usually labels the N-terminal amino group, generating only y ions. Since this particular peptide has a phenylalanine at the C-terminus and an arginine at the N-terminus, the CAF-PSD generated a series of b ions that was erroneously interpreted as y ions, resulting in a backward sequence. Although this was an unusual peptide, this example highlights the complexity of biological situations, and the need for careful data interpretation.
Along with the expanding requirement of protein de novo sequencing, the improved PSD after N-terminal sulfonation supplies a robust and efficient methodology to achieve primary structure determination. As shown in PSD spectra of derivatized peptides, fragmentation occurs exclusively at the peptide bonds, generating b and y fragments; and as the b series is neutral, only y ions reach the detector, generating an easily interpretable spectrum. In general, the 20 common amino acids show a comparable fragmentation tendency. However, proline and glycine show a peculiar behavior: their C-terminal peptide bond is easily broken, generating more intense peaks of fragments ending with one of these amino acids, whereas the corresponding N-terminal fragmentation generates a peak of low intensity (see Figure 33).). Aspartic acid has a similar, but mirrored, behavior: its N-terminal peptide bond is strongly subjected to fragmentation, generating a reverse pattern with a small peak followed by an intense one (Figure 33).). However, no amino acid is completely resistant to fragmentation; thus, for each it is usually possible to identify the corresponding peak.
A point to be highlighted is that, in order to obtain this efficient fragmentation pattern, a strong basic group is required at the C-terminus of the peptide. This fits perfectly with the enzymatic activity of trypsin, which ideally generates peptides terminating with one of the two basic amino acids.
In the last few years, large proteomic projects have been carried out, in which thousands of proteins were identified or characterized regarding their PTMs or their specific isoforms. This allowed the establishment of excellent proteomic profiles of several species, and among the same species of different tissues, or pathological conditions, or subcellular structures. The generation of high-throughput results, although extremely useful for an overall comprehension of metabolic behavior, has a risk of weak controls or superficial analysis, as very well pointed out in a recent review by Steen and Mann.17 A representative example of the requirement for confirming experiments is presented in Figure 44.. In this experiment, a unique mass corresponding to the acetylated form of the peptide was obtained in the PMF analysis (it is important to highlight that the acetylation was exactly the modification the researchers were looking for). Moreover, this mass did not match with any other theoretical tryptic masses, thus strongly supporting the presence of the expected acetylated peptide. Only after the sequencing of that particular peptide were we able to clarify that the suspected acetylated peptide mass was represented by a peptide originating from the fusion of the GST tag with the Smad sequence. This unusual coincidence is a strong warning for scientific and responsible interpretations of mass spectrometry data. This example clearly demonstrates that in some cases a careful sequencing of peptides is needed, and the sulfonation chemistry is a strong tool to achieve this task.
Another interesting and unusual example came from the analysis of low-molecular-weight isoforms of transthyretin. After PSD analysis of a peptide with a nonfitting mass, we found an amino acid sequence that did not match with any protein present in the database. A careful sequence analysis showed that the sequence belongs to transthyretin, but in a back-forward orientation (Figure 66).). The explanation was that, due to the presence of the Arg (Lys)-Arg motif, a tryptic peptide with an N-terminal arginine was formed, and the CAF reagent labeled the N-terminal amino group. At the same time, a chymotryptic cleavage occurred at the C-terminus of the same peptide. This combination resulted, upon CAF-PSD, in a series of b ions that were erroneously read as the expected y-ion series, hence in a backward orientation.
Moreover, even if numerous techniques can now produce a large amount of data very quickly, unusual situations necessitate the interpretation of data by skilled operators. Therefore, amino acid sequencing analysis and a careful interpretation of data obtained remain important approaches for future proteomic work.
We thank the following colleagues for supplying samples used in this report: Dr Antonio Villamarín, Universidade de Santiago, Spain (M. galloprovincialis sample); Dr. Joakim Bergström at the Dept of Genetics and Pathology, Uppsala University (transthyretin material); and Ahmad Amini, Medical Products Agency, Uppsala, Sweden (calcitonin protein). This work was partly supported by fellowships to PC from AIRC and EMBO (ASTF 283-2004).