As an application of the above work, we analyze the mass distributions of peptides/proteins. First we recognize that the objects forming the alphabet are the residues of the amino acids. Proposition 2 (Mass recurrence relation) defines the relationship for this special case as
where the numbers 57, 71, 87, …, are the masses of amino acid residues measured in Daltons, to the nearest integer. This allows us to analyze the mass distribution of all possible (theoretical) peptides with mass distributions of peptides that appear within a given organism (see ).
Comparison of Theoretical to Empirical mass distributions
Next, we determine the characteristic polynomial,
We can then approximate the roots () of the characteristic polynomial using a solver, such as the “roots” function in MATLAB [12
]. In this case, as in all we have observed in biological applications, the characteristic polynomial has distinct roots. Therefore, we can apply Theorem 1. (General Solution to Sequence Counting Problem), yielding an exponential term summed with a collection of terms that contain a periodic function multiplied by an exponential function, the former dominating the latter (). If we write the exponential terms in Equation 2
by powers of 2, then we can easily read the doubling time of each term. Furthermore, if we write the frequencies θj
in terms of wavelength, then we can emphasize the period of each term. With these conventions, the first three terms are
Roots and eigen-values of characteristic polynomial for peptides
Comparison of first two terms to actual distribution
In total, there is one purely exponential term (which doubles in size every 24.67 Da) and 93 periodic terms we can consider, each of which gives a different periodicity and dominance in the distribution of masses. However, we found that the first few terms are sufficient to approximate the distribution of peptides, at least when rounding the amino acid masses to the nearest Dalton. In particular, the dominant periodic term has a period of 14.28 Da. Solutions for higher mass accuracy are possible by changing units. More examples and details are available in [5
We expect that these findings could improve estimates of distributions, thus improving a class of algorithms that score peptide identifications in mass spectrometry [13
], those that require knowledge of the number of possible peptide sequences within a particular mass range.