Samples of digested proteins often contain multiple overlapping peptides covering the same region of a protein sequence, such as prefix peptides (e.g.
PEPTI/PEPTIDES), suffix peptides (e.g.
TIDES/PEPTIDES) or partially-overlapping peptides (e.g.
PEPTIDES/TIDESHIGH). In addition, most experimental protocols unintentionally generate multiple chemical modifications (e.g.
, oxidation) and it has been repeatedly shown that existing MS2
datasets typically contain modified versions for many peptides.4,45–47
If the peptide sequences were known in advance, determining their overlap would be a straightforward application of the standard sequence alignment algorithms.48
Conversely, spectral alignment is defined as the alignment of matching peaks between spectra from overlapping peptides.49,50
This concept is illustrated in with the matching b
-ions highlighted in blue. The surprising outcome of spectral alignment, as opposed to sequence alignment, is that even though one does not know the peptide sequences in advance, the sequence information encoded in the masses of the b
-ions actually suffices to detect pairs of MS2
spectra from overlapping peptides. In fact, it turns out that the reliability of spectral alignment allows one to discern the high-scoring true spectral pairs from the many millions of possible spectral pairs in high-throughput proteomics experiments17,50
Moreover, since each spectrum may align to several other spectra, the set of detected spectral pairs defines a spectral network
where each node corresponds to a different spectrum and nodes are connected by an edge if the corresponding spectra were found to to be significantly aligned. This concept is illustrated in with spectral networks from human cataractous lens17
and a monoclonal antibody raised against the B- and T-cell lymphocyte attenuator molecule.51
Note that since most spectra usually come from non-contiguous protein regions, the consequent outcome of this approach is not a single spectral network but rather multiple spectral networks, one for each set of spectra from overlapping peptides.
Fig. 1 Discovery and identification of post-translational modifications through spectral networks; (a) Spectral alignment between modified and unmodified variants of the peptide TETMA (b-ions shown in blue, y-ions in red, blue/red lines track consecutively matched (more ...)
In traditional DNA sequence alignment, it often happens that query sequences differ from the reference sequences by the insertion or deletion of one or more nucleotides.48
While the insertion/deletion of amino acids is also usually allowed when aligning protein sequences, an additional factor needs to be considered when aligning peptides from experimental samples due to the occurrence of post-translational modifications. In fact, multiple groups have shown16,46,52
that the phenomenon of unexpected modifications is much more widespread than commonly acknowledged. From a sequence alignment perspective, a modification could be modeled by following the modified residue with a special character for each type of modification. Thus, the alignment of a modified peptide PEPT*IDE with its unmodified counterpart PEPTIDE would result in a single difference caused by the insertion of the modification ‘*’ In tandem mass spectrometry, however, a modification of mass m
conceptually corresponds to the insertion of additional m
Da in the b
-ion series between the ions immediately preceding and following the site of post-translational modification (i.e.
the mass of the residue becomes larger by mass m
). Conversely, if the modification causes a loss of m
Da from the modified residue then the corresponding effect is the subtraction of m
Da between the ions for the modified residue. When applied to unmodified and modified versions of the same peptide, the role of spectral alignment algorithms15,17,53
is to (a
) use the spectrum of the unmodified peptide to determine where to position the modification mass in the spectrum of the modified peptide and (b
) to assess whether the post-alignment match between the two spectra is significant enough to accept the spectra as a pair of modified/unmodified spectra from the same peptide. Thus, spectral alignment considers every possible spectral pair and every possible location for the mass difference (i.e.
modification mass) between the aligned spectra. illustrates the spectral alignment between MS2
spectra from the peptides TETMA and phosphorylated TET+80
MA. By requiring a significant match between the aligned spectrum peaks17
and by placing no restrictions on which modifications to consider, this approach can be used to discover novel or unexpected modifications. In fact, when applied to a set of spectra from cataractous lenses proteins from a 93-year old patient, spectral networks were able to rediscover the modifications identified by database search methods and additionally discovered several novel modification events.17,46
When first analyzing a sample possibly containing modified peptides one does not know a priori
which residues or peptides will be modified. Thus, spectral alignment considers every possible spectral pair and every possible location for the mass difference (e.g.
modification mass) between the aligned spectra. By requiring a significant match between the aligned spectrum peaks17
but placing no restrictions on which modifications to consider, this approach can be used to discover novel or unexpected modifications. In fact, when applied to a set of spectra from cataractous lenses proteins from a 93-year old patient, spectral networks were able to rediscover the modifications identified by database search methods and additionally discovered several novel modification events17,46
The identification of peptides containing multiple modifications via
database search is a challenging problem imparted by the combinatorial explosion in the number of possible modification variants for all the peptides in a database.46,52
Not only can this make the approach much slower, but the increased number of peptide candidates for any given spectrum significantly increases the risk of incorrect identifications. However, samples containing peptides with two or more modifications often also contain variants of the same peptide with only one or no modification. In these cases, we have found that spectral alignment is able to group these related spectra from multiple modification variants of the same peptide into small spectral networks thus increasing confidence in their identity as a related peptide. illustrates the spectral network for a particular peptide in a sample of cataractous lenses proteins.
By grouping together spectra from multiple variants of the same peptide, spectral networks additionally contribute to the reliable identification of highly modified peptides. While database searching is restricted to matching ion masses between theoretical and observed spectra, spectral networks further capitalizes on the occurrence of common fragment ions at corresponding masses with similar peak intensities (). In general, it becomes easier to identify a highly modified peptide if one additionally observes highly-similar spectra from its intermediate modification states. Thus, spectral alignment not only allows one to discover unexpected modifications (instead of only identifying expected modifications) but additionally provides an alternative route for identification of highly modified peptides.