PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Math Chem. Author manuscript; available in PMC 2012 September 26.
Published in final edited form as:
J Math Chem. 2012 June 1; 50(6): 1458–1483.
doi:  10.1007/s10910-012-9983-z
PMCID: PMC3458794
NIHMSID: NIHMS393696

Mass Distributions of Linear Chain Polymers

Abstract

Biochemistry has many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities; examples include proteins, RNA, single-stranded DNA, and paired DNA. In the field of mass spectrometry, it is useful to consider the idea of weighted alphabets, with a word inheriting weight from its letters. We describe the distribution of the mass of these words in terms of a simple recurrence relation, the general solution to that relation, and a canonical form that explicitly describes both the exponential form of this distribution and its periodic features, thus explaining a wave pattern that has been observed in protein mass databases. Further, we show that a pure exponential term dominates the distribution and that there is exactly one such purely exponential term. Finally, we illustrate the use of this theorem by describing a formula for the integer mass distribution of peptides and we compare our theoretical results with mass distributions of human and yeast peptides.

Keywords: linear chain polymers, peptides, mass distribution, mass spectrometry, protein database

1. Introduction

In chemistry and biology there are many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities. Examples include peptides/proteins, RNA, single-stranded DNA, and paired DNA. We want to characterize patterns of the masses of such objects, considering an alphabet of building blocks (e.g. amino acids, in the case of peptides and proteins) and an assigned mass function. Because the molecule is linear, the sequence of building blocks can be thought of as a word over the alphabet.

We are particularly interested in the number C(M) of words that have a particular weight M. Ideally we would like to solve this problem for arbitrary masses but, as we shall see, we will restrict most of our analysis to integer masses.

However, we describe a recurrence relation that is applicable to real-valued masses:

equation M1
Equation 1

where d is the number of objects in our alphabet and mj is the mass of the jth object. In addition, by rescaling the masses, we can generalize all of our results to the case of rational masses.

Next we solve this recurrence relation. The general solution is given by the following theorem:

Theorem 1. (General Solution to Sequence Counting Problem)

Consider a finite set of masses m1, …, md and denote by C(M) the number of sequences of masses (i.e. “words”) that have a total mass of M. Then C satisfies a mass recurrence relation (Equation 1). Furthermore, if the characteristic polynomial of this recurrence relation has distinct roots then there exist real constants k, c0, …, ck, r0, …, rk, θ1, …, θk, ϕ1, …, ϕk such that

equation M2
Equation 2

and r0rj > 0 for j = 1, …, k.

The characteristic polynomial of a recurrence relation will be discussed in the next section. The rest of the paper is devoted to the proof of this theorem and related results, including a generalization for the case where the characteristic polynomial does not have distinct roots.

Our interest in this problem originated from the study of mass distributions of peptide fragment ions[1] obtained using mass spectrometry[2,3]. In particular, we (and others[4]) investigated the mass distribution of peptides for both biological and theoretical peptides[5]. This led to the observation that peptide density reaches a local maximum approximately every 14 Daltons[4,5]. The results described in this paper explain the source of this pattern, predict that periodic patterns are present for all non-trivial linear chain polymers, and describe the method by which all periodic terms can be enumerated, given a fixed size mass unit (e.g. Daltons).

Finally, we note that computing the general form of the solution involving irrational masses is a very interesting open problem.

2. Definitions and Notations

We define an alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg as a finite ordered list of objects {a1, …, ad}.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet. Then mass(·) is a mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg if it is a function from An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg to the positive real numbers.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet. Then mass(·) is an integer mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg if it is a function from An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg to the positive integers.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet and mass(·) is a mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg. Then the pair An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects with alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg= {a1, …, ad}. Then m1, …, md are the object masses of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if mj = mass(aj) (j =1, …, d).

Suppose m1, …, md are the object masses of the collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Then Equation 1 is the mass recurrence relation for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg.

Suppose also that m1m2 ≤ … ≤ md are the object masses of the collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Then p is the characteristic polynomial of the recurrence relation for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if equation M3.

Note that this definition of characteristic polynomial is the standard one for linear difference equations of one variable.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects. Then s is an ordered sequence (or simply sequence) over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg (or, equivalently, over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg) if s is an ordered set of letters, a1, a2, …, a[ell] where ai [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, i =1, 2, …, [ell]. Intuitively, we can think of a sequence over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg as a word formed using letters from the alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects and s = a1a2a[ell] is a sequence. The mass of the sequence s is equation M4

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg is a collection of mass objects. Then C is the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if C(M) = #{sequence s over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg | mass(s)= M}, the number of sequences s over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg where the mass of s is M.

3. Mass Recurrence Relation

In this section we describe a recurrence relation for the sequence counting function.

Proposition 2 (Mass recurrence relation)

Suppose C is the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg whose masses are m1m2 ≤ … ≤ mk. Then C satisfies the following mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg with initial conditions:

equation M5
Equation 3

Proof

Assuming M > 0, we start with the definition for a sequence counting function and consider the last letter of every sequence:

equation M6

The last equality only holds for all M > 0 if we define C to be zero for the relevant negative masses that arise. The initial condition related to mass 0 follows from the idea that there is exactly one string of mass zero, namely the null string, [var phi].

4. Solving the recurrence relation

We next want to solve the recurrence relation. Note that while the recurrence relation described in the previous theorem is applicable for real-valued (or even vector-valued) masses, from now on we require integer masses.

Theorem 3 (General solution to a linear homogenous difference equation)

Suppose C satisfies the linear homogenous difference equation

equation M7
Equation 4

for all positive integers M. Let p(x) = anxn + an−1xn−1 +…+ a0x0 be the characteristic polynomial for the difference equation. Suppose further that p has n distinct roots, z1, …, zn. Then for any solution of Equation 4 there exists a unique list of complex numbers λ1, …, λn such that for all positive integers M,

equation M8
Equation 5

Proof

This is a key result in many texts on difference equations and a proof for it can be found in, for example, [6,7,8]. Here is a sketch of the proof: For any root z of p, the sequence zM satisfies Equation 4. Therefore all linear combinations equation M9 satisfy Equation 4. This form represents the general solution because equation M10 is a Vandermonde matrix and is, therefore, invertible, giving us a unique solution.

The next lemma applies Theorem 3 (General solution to a linear homogenous difference equation) to sequence counting functions.

Lemma 4 (Solving the recurrence relation – distinct roots)

Let C be the sequence counting function for a collection of mass objects with integer masses, An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg, and let p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Suppose p is of degree n and has n distinct roots, z1, …, zn. Then there exists a unique set of complex numbers λ1, …, λn such that

equation M11

Proof

By Proposition 2 (Mass recurrence relation), C satisfies the mass recurrence relation with initial conditions

equation M12

(where n is the largest mass). Note that these initial conditions together with the mass recurrence relation uniquely determine the function C on all positive integers. The mass recurrence relation is a linear homogeneous difference equation with characteristic polynomial p. Because p has distinct roots, we may apply Theorem 3 (General solution to a linear homogenous difference equation) to obtain the desired conclusion.

Another form of this lemma which handles non-distinct roots can be found in [8].

5. Analysis of the solution of the recurrence relation

Now that we have a solution to the recurrence relation, we want to describe the form of the solution in a way that is easy to interpret. In particular, we describe an exponential growth function overlaid with a periodic pattern. A pattern of this sort has been described in [5,9].

Lemma 5 (Eigenvalues of conjugate pairs are conjugates)

Let C be the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg and let p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that p has distinct roots, z1, …, zd and that equation M13. Suppose that equation M14. Then equation M15.

Proof

Note that the conjugate of every root of p is also a root of p. Let j ′ denote the index such that equation M16 for each j. (This is well-defined because p has distinct roots.) Then, combined with the fact that C(M) is a real number, we have

equation M17

However, by Lemma 4 (Solving the recurrence relation – distinct roots), we know that the set of λ1, …, λd is unique. Therefore, equation M18.

To simplify the statement of the next theorem, we need the following definition: We say that that cj, θj, and ϕj satisfy standard conditions for j [set membership] J if, for all j [set membership] J we have:

equation M19

The terms cj, θj, and ϕj refer to the magnitude, period, and phase of periodic terms, respectively, defined in the theorem below and used in later statements.

Theorem 6 (Explicit description of sequence counting function – simple form)

Let C be the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg and p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Suppose p has distinct roots. Then there exists real numbers cj, θj, ϕj, that satisfy the standard conditions for 1≤ jk and there exist positive c0, r0, r1, …, rk such that

equation M20

and

0 < rj for all j = 0,…, k.

Proof

By Proposition 2 (Mass recurrence relation), we know that C satisfies the recurrence relation. Thus, we have changed the problem to one of solving the recurrence relation, which is a linear difference equation. Let n be the order of p and z1, …, zn be the roots of p. By Lemma 4 (Solving the recurrence relation – distinct roots) there exist complex numbers λ1, …, λn, such that equation M21. Note that 0 is not a root since p(0) ≥1. Therefore, we may separate the list of roots into four sets: positive roots R+, negative roots R, complex roots in the upper half-plane Ru, and complex roots in the lower half-plane Rl. We can now rewrite the expression of C(M) as

equation M22

Without loss of generality, we assume that the indices of the roots proceed in order from R+, R, Ru, and Rl. We will handle each summand separately.

Positive root

Note that has exactly one element by simple application of Descartes’ rule of signs. Denote the single term in the corresponding sum as equation M23; set r0 = z0 and c0 = λ0. Thus,

equation M24

Negative roots

Define b as the largest index of an element in R. Let zj [set membership] R. We set the constants for the jth term as rj = |zj|, cj =|λj|, equation M25, and equation M26

If we use this definition for an individual term in the summand we can rewrite the summand as find

equation M27

Thus, the summand is

equation M28

Complex roots (upper and lower half-plane)

Define bu as the largest index j of an element in Ru. Let zj [set membership] Ru. Note that, because zj is a root of p and p is a polynomial over real numbers, the conjugate of zj, equation M29, is also a root of p. Call the index of the conjugate root j ′; i.e. equation M30. Note also that equation M31, by Lemma 5 (Eigenvalues of conjugate pairs are conjugates). Set cj = 2|λj| and set ϕj such that equation M32 and −π < ϕj < π. Also set rj = |zj| = |zj| and set θj so that zj = rjei2πθj and 0< 2πθj < π. In other words, equation M33. Now we combine the term j from the first complex sum with the term j ′ from the second sum to get

equation M34

Thus, the contribution from the complex roots (those that are not real) is

equation M35

All roots

Combining the three parts we get

equation M36

where k = bu.

To complete this proof, we need to verify that all of the constants are in the correct ranges; i.e. that they satisfy the standard conditions. The cj’s (j > 0) are positive by definition, as are the rj’s. In addition, equation M37 and −π < ϕjπ by definition, for j = 1, 2,…, bu. Also, combining the above statements with Lemma 13 (Constant c0 > 0) we know that c0 > 0.

6. Proof of Maximality

The only difference between Theorem 6 (Explicit description of sequence counting function – simple form), above, and Theorem 1. (General Solution to Sequence Counting Problem) is that the latter also tells us that r0r for all j ≥ 1. This is the topic of Theorem 14 (Maximality of positive root), found at the end of this section. The rest of this section contains technical lemmas for use in the proof. The intuition of the proof is that if the periodic terms dominate the real term then there must be a point at which the result is negative, which is a contradiction of the fact that the function is a counting function (i.e., must map non-negative integers to non-negative integers). First, we show that the sum of cosines with rational period over a special set of indices is zero.

Lemma 7 (Sum of rational period cosine is zero)

Let θ be a positive rational number given in reduced form by (a/b). Further, assume that b> 2. Let k and w be natural numbers and ϕ a real number. Then

equation M38

Proof

First, fix integers k and w and real number ϕ. Now note that a and b are relatively prime. This means that for every value j = 0,1,…, b − 1, there exists an m [set membership][w +1, w + b] such that am [equivalent] j mod b. Therefore,

equation M39

Note that the last term is the sum of the bth roots of unity. Let ω1, …, ωb be the bth roots of unity. We want to show that the sum of the bth roots of unity is always zero. Note that these roots are all of the solutions to xb − 1 = 0. Thus,

equation M40

Of particular interest, however, is the term of order b−1 : its coefficient is the sum of all of the roots. Since this coefficient is zero, then the sum of the roots is zero.

Finally, to prove our lemma, we need only add several terms that add to zero:

equation M41

For simplification in the rest of this section we need to add additional notation. We denote the set of indexes of roots of the characteristic polynomial that have magnitude r, {j | rj = r}, as An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg. Also, we denote the periodic contribution of the ith root towards C(M), ci cos(2πθi M+ϕi), as ti (M). We refer to ti (M) as a parameterized cosine function.

The following lemma states that the sum of ti (M) repeats a particular negative value infinitely often, provided the An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg contains only rational values.

Lemma 8 (Functions with rational period repeat negative values)

Suppose θ is rational for every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg and equation M42. Then either f is identically zero on the integers or there exists δ > 0 and integers w and [ell] such that f (w + k[ell]) = −δ for k=1, 2,….

Proof

Since θj is rational, it can be represented by equation M43 (assume that this is reduced form). Let [ell] = lcm {bj | j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg}, the least common multiple of the denominators of the cosine frequencies. First note that, because θj[ell] is an integer and tj (M) = cj cos(2πθj M + ϕj),

equation M44

In other words, f is periodic with a period that divides [ell].

Now, by Lemma 7 (Sum of rational period cosine is zero) we know that, for all integers w and k

equation M45

In particular, because bj divides [ell], we know that for all j,

equation M46

Therefore,

equation M47

If f is identically zero on the integers then we are done. Otherwise, it follows that there exists an integer, w, for which f (w) = −δ< 0. However, f is periodic with a period that divides [ell]. Therefore we know that w + [ell], w+ 2[ell], … is an infinite sequence of integers such that f (w) = −δ< 0.

The following lemma extends the previous one by assuming that the members of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg are all related to each other by a complex constant.

Lemma 9 (Sum of rationally related cosines of irrational period is negative infinitely often)

Define equation M48, where tj is defined as previously. Suppose f is not identically zero on R and there exists an irrational number s such that for every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg, the ratio equation M49 is rational. Then there exists an irrational number α such that equation M50 has period 1. Further, there exists a real number ε > 0 and a nonempty open interval I [subset or is implied by] [0,1] such that equation M51.

Proof

For every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg, we define aj and bj as the reduced form fraction equation M52. Let equation M53 and choose equation M54. Define equation M55; we will show that g is periodic with period 1.

equation M56

Note that equation M57 is an integer by the definition of [ell]. Therefore, the term equation M58 is an integral multiple of 2π and can be ignored when inside the cosine function. Thus,

equation M59

In other words, equation M60 has period 1.

Next we prove the second conclusion, that there exists a real number ε > 0 and a nonempty open interval I [subset or is implied by] [0,1] such that equation M61. Recall that we proved the first conclusion by showing that each cosine term in the sum defining g had an integral number of cycles for x between 0 and 1. This implies

equation M62

However, because f is not identically zero, neither is g. This fact, combined with a zero integral, means that g must be negative for some w [set membership] (0,1). Define equation M63. Being a finite sum of continuous functions, g inherits continuity from the cosines. Therefore, there exists an open interval I containing w such that x [set membership] I [implies] g(x) < − ε, satisfying the second conclusion of the lemma.

The following theorem will allow us to create a form of the previous lemmas without restrictions on the members of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg (see the subsequent lemma).

Theorem 10 (Infinite number of special integers)

If the numbers 1, α1,α2, …, αk are linearly independent over the rational numbers then for any k open non-empty intervals Ii [subset or is implied by] [0,1], j =1, …, k, the set

equation M64

Proof

A more general version of this theorem can be found as Theorem 3.13 of [10], where it is attributed to Hardy and Littlewood [11].

Recall that that cj, θj, and ϕj satisfy standard conditions for j [set membership] J if, for all j [set membership] J we have:

equation M65

All of the previous lemmas in this section are used in this lemma. It states that the sum of the ti (M) is less than a particular negative number infinitely often.

Lemma 11 (Sum is negative infinitely often)

Assume that cj, θj, and ϕj satisfy the standard conditions for j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg. Define the periodic contribution of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg towards C(M) as

equation M66

Then either f is identically zero on integers or there exists δ > 0 and an infinite sequence of positive integers M1, M2, … such that f (M[ell]) < − δ for all [ell] = 1, 2, ….

Proof

It is easier to prove this in three steps:

  1. Rational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [set membership] Q
  2. Irrational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [negated set membership] Q
  3. Mixed case: There exists i, j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg such that θi [set membership] Q and θj [negated set membership] Q

Case 1

Rational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [set membership] Q.

Define f as equation M67. By Lemma 8 (Functions with rational period repeat negative values) f is either zero on the integers or there exists δ> 0 and integers w and [ell] such that f (w + k[ell]) = −δ for k = 1,2,…. Note that δ/2 and the sequence of integers w +[ell], w+2[ell], … satisfy the conclusion of the theorem.

Case 2

Irrational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [negated set membership] Q.

We can decompose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg into k equivalence classes, equation M68, of indexes representing frequencies that are linear combinations of each other over the rational numbers. Select a single member from each of the equivalence classes: equation M69 and define sj = θij. We can also decompose the function f to only sum the terms related to the jth equivalence class: equation M70. Without loss of generality, each fj is not identically zero. This allows us to apply Lemma 9 (Sum of rationally related cosines of irrational period is negative infinitely often) to ascertain the existence of positive numbers α1, α2, …, αk, ε1, ε2, …, εk, and open sub-intervals of [0,1] I1, I2, …, Ik such that equation M71 has period 1 and equation M72. Note also that 1, α1, α2, …, αk are linearly independent over rational numbers. This allows us to apply Theorem 10 (Infinite number of special integers) to state that the collection of integers Γ={n [set membership] Z+ | j mod 1[set membership] Ij; j =1,2,…,k} is infinite. This list of integers is the set we need in order to prove our claim.

To show this, we start with an n [set membership] Γ.

equation M73

By the definition of Γ, our selection of n means that, for each of j=1,2,…, (j mod1) [set membership] Ij. However, we defined Ij so that gj (x) < −εj for every x [set membership] Ij. In particular, this will be true for j. If we now define

equation M74

we have the following inequality:

equation M75

Recall that this holds for every n [set membership] Γ, an infinite set, thus proving this case.

Case 3

Mixed case: There exists i, j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg such that θi [set membership] Z and θj [negated set membership] Z.

We decompose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg in the same way as in Case 2. However, exactly one of the decompositions includes indices of rational frequencies; call this member of the decomposition equation M76. We decompose f as well:

equation M77

From Case 1 we know that there exists δ0 > 0 and integers w0 and [ell]0 such that, for every integer n,

equation M78

Now we remove equation M79 from the set of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg (calling the result An external file that holds a picture, illustration, etc.
Object name is nihms393696ig4.jpg) and rescale f :

equation M80

Because f ′ only contains cosines of irrational period, it satisfies the conditions of Case 2; we conclude that there exists a positive number ε ′ and an infinite set, Γ′ ={n[set membership] Z+ | f′(n′) < −ε′}

Select an n[set membership] Γ′ and define

equation M81

Let n = w0 + n[ell]0. Then

equation M82

This completes our proof of Case 3 and, thus, the lemma.

Using the previous lemma we are now able to put exponential bounds on the size of negative contributions of Equation 2.

Lemma 12 (Sum of non-positive terms is negative infinitely often)

Suppose equation M83 where cj, θj, and ϕj satisfy the standard conditions (found in definitions). Define equation M84. Then there exists δ > 0 and integers M1 < M2 < … such that equation M85 for [ell] =1,2,….

Proof

(This lemma is identical to Lemma 11 (Sum is negative infinitely often), except for the exponential term in both the definition of f and the conclusion.)

Recall the notation An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg ={j | rj = r}. If all indices j are in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg then the theorem follows directly from Lemma 11 (Sum is negative infinitely often), after factoring out rmax ; we will assume, therefore, that there are additional indices.

Define equation M86 and equation M87 and note that the contribution of all the maximal terms towards f is

equation M88

(where ti is the parameterized cosine function, as defined previously). By Lemma 11 (Sum is negative infinitely often) we know that there exists a δ> 0 and an infinite sequence of integers, M1 < M2 < … such that

equation M89

Thus,

equation M90

Therefore, given a real number u > 0, all integers M*> u from the sequence above satisfy equation M91. We select a particular u:

equation M92

By solving for (rmax)u in the definition of u above we find that

equation M93

or

equation M94

We apply this inequality to get

equation M95

The following lemma says that the coefficient of the purely exponential term is positive and follows easily from the previous lemma.

Lemma 13 (Constant c0 > 0)

Let C be the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that C is not identically zero. Suppose that equation M96 and that cj, θj, and ϕj satisfy the standard conditions. Then c0> 0.

Proof

By Lemma 12 (Sum of non-positive terms is negative infinitely often), we know that there exists an integer M ′ and a real number δ> 0 such that equation M97. This implies that

equation M98

In particular, if c0 ≤ 0 then C(M′) < 0. Since C is a counting function, this is a contradiction arising from the assumption that c0 ≤ 0. Thus, c0 > 0.

The following theorem describes the closed form of a sequence counting function. However, this theorem applies to the “simple form”; i.e. the case where the characteristic polynomial has distinct roots. If it does not have distinct roots then the solution involves the product of polynomials and exponentials in M inside the summation, instead of simply exponentials inside the summation.

Theorem 14 (Maximality of positive root)

Let C be the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that C is not identically zero. Suppose that equation M99 and that cj, θj, and ϕj satisfy the standard conditions (defined earlier) with c0 > 0. Then r0rj for j =1, …, k.

Proof

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an M such that the counting function C(M) is negative.

Let equation M100. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg (notation defined at the beginning of this subsection).

Let

equation M101

Then, by Lemma 12 (Sum of non-positive terms is negative infinitely often) we may conclude that there exists δ> 0 and M1 < M2 < … such that equation M102 for =1,2,….

Therefore, given a real number u> 0, there exists an M *> u from our list such that equation M103. We select a particular u:

equation M104

By solving for equation M105 in the definition of u and using the assumption that equation M106

we find that

equation M107

or

equation M108

We apply this inequality to get

equation M109

Since C is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Finally this leads to the main theorem:

6. Proof of main theorem

Proof of Theorem 1. (General Solution to Sequence Counting Problem)

Lemma 4 (Solving the recurrence relation – distinct roots) tells us specifically that C satisfies a recurrence relation. Theorem 6 (Explicit description of sequence counting function – simple form) gives us most of the relationship but Theorem 14 (Maximality of positive root) shows that the single purely exponential term dominates the rest, completing the proof.

7. Extension

It is possible to remove the restrictions on the recurrence relation (unique roots of the characteristic polynomial). We extend Theorem 3 (General solution to a linear homogenous difference equation), Theorem 6 (Explicit description of sequence counting function – simple form), Lemma 12 (Sum of non-positive terms is negative infinitely often), and Theorem 14 (Maximality of positive root).

Theorem 15 (Extension of General solution to a linear homogenous difference equation)

Suppose C satisfies the linear homogenous difference equation (Equation 4). for all positive integers M. Let p(x) = anxn + an−1xn−1 +…+ a0x0 be the characteristic polynomial for the difference equation. Then for any solution of Equation 4 there exists a unique list of polynomials p1, …, pn such that for all positive integers M,

equation M110

Proof

This extension (and how to find the polynomials) can be found in [6].

Theorem 16 (Extension of Theorem 6)

Let C, a function of mass M, count how many sequences of a finite set of masses have a combined mass of M. Then C satisfies a mass recurrence relation. Then there exist real constants k, c0, …, ck, r0, …, rk, θ1, …, θk, ϕ1, …, ϕk, satisfying the standard conditions, and polynomials p1, …, pk, whose highest orders have a coefficient of one, such that that

equation M111

Proof

The proof is nearly identical to that of Theorem 6 (Explicit description of sequence counting function – simple form) except that we need to use Theorem 15 (Extension of General solution to a linear homogenous difference equation) in place of Theorem 3 (General solution to a linear homogenous difference equation) and use slightly modified versions of Lemma 4 (Solving the recurrence relation – distinct roots) and Lemma 5 (Eigenvalues of conjugate pairs are conjugates), substituting polynomials for constants.

Lemma 17 (Extension of Lemma 12)

Suppose equation M112 where cj, θj, and ϕj satisfy the standard conditions and pj (M) are polynomials whose highest order term have coefficients of 1. Define equation M113. Define nj =Degree(pj) and equation M114. Then there exists δ> 0 and integers M1 < M2 < … such that equation M115 for [ell] =1, 2, …

Proof

Define equation M116. Using ti, the parameterized cosine function, as defined previously, we can rewrite f as

equation M117

where

equation M118

The first three terms all have the same ri, namely rmax, while the last term contains all other ri terms.

The first term, f1, is the dominant term and can be rewritten as

equation M119

By Lemma 11 (Sum is negative infinitely often) we know that there exists a δ > 0 and an infinite sequence of integers, M1 < M2 < … such that

equation M120

Thus,

equation M121

Therefore, given a real number u > 0, all integers M* > u from the sequence above satisfy

equation M122

We now want to select a particular u that makes the non-dominant terms f2, f3, f4, sufficiently small compared to f1.

First, note that f2 differs from f1 only in the polynomial factors (pi (M)− Mnmax), which have degree at most nmax −1. Thus, there exists a u2 such that equation M123 for all M > u2 and for all i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg where ni = nmax. Therefore, for all M > u2,

equation M124

Similarly, because of the maximality of nmax, there exists a u3 such that

equation M125

Finally, because of the maximality of rmax, there exists a u4 such that for all M > u3,

equation M126

We select a u = max(u2, u3, u4).

We apply the properties of u and we get the following for every M* > u where M*[set membership] {M1, M2, …}:

equation M127

Theorem 18 (Complete Extension of General Solution to Sequence Counting Problem)

Under the conditions of Theorem 16 (Extension of Theorem 6), r0rj for j =1, …, k.

Proof

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an M such that the counting function C(M) is negative.

Let equation M128. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg (notation defined at the beginning of this subsection).

Let

equation M129

Then, by Lemma 17 (Extension of Lemma 12) we may conclude that there exists δ > 0 and M1 < M2 < … such that equation M130 for [ell] =1,2,…. We now choose a particular member M*, from our list with the property that

equation M131

By solving for equation M132 in the definition of u and using the assumption that equation M133

we find that

equation M134

or

equation M135

We apply this inequality (and the fact that M* > 1) to get

equation M136

Since C is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Example 19 (Extension of General Solution is Necessary)

Suppose we have an alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg={a1, a2, a3, a4, a5} and define the masses m1 = m2 = m3 = 4 and m4 = m5 = 6. Then the mass recurrence relation is C(M) = 3C(M − 4) + 2 C(M − 6) and the characteristic polynomial is p(x) = x6 − 3x2 − 2. The roots of p are equation M137 and ±i (twice). The explicit form of C is

equation M138

Note, in particular, the last term, which is linear in M; it is of the form c3q(M)1M cos(θM+ϕ) where q(M) = M, a polynomial of degree 1 in M.

8. Applications

As an application of the above work, we analyze the mass distributions of peptides/proteins. First we recognize that the objects forming the alphabet are the residues of the amino acids. Proposition 2 (Mass recurrence relation) defines the relationship for this special case as

equation M139

where the numbers 57, 71, 87, …, are the masses of amino acid residues measured in Daltons, to the nearest integer. This allows us to analyze the mass distribution of all possible (theoretical) peptides with mass distributions of peptides that appear within a given organism (see Figure 1).

Figure 1
Comparison of Theoretical to Empirical mass distributions

Next, we determine the characteristic polynomial,

equation M140

We can then approximate the roots (Figure 2) of the characteristic polynomial using a solver, such as the “roots” function in MATLAB [12]. In this case, as in all we have observed in biological applications, the characteristic polynomial has distinct roots. Therefore, we can apply Theorem 1. (General Solution to Sequence Counting Problem), yielding an exponential term summed with a collection of terms that contain a periodic function multiplied by an exponential function, the former dominating the latter (Figure 3). If we write the exponential terms in Equation 2 by powers of 2, then we can easily read the doubling time of each term. Furthermore, if we write the frequencies θj in terms of wavelength, then we can emphasize the period of each term. With these conventions, the first three terms are

Figure 2
Roots and eigen-values of characteristic polynomial for peptides
Figure 3
Comparison of first two terms to actual distribution
equation M141

In total, there is one purely exponential term (which doubles in size every 24.67 Da) and 93 periodic terms we can consider, each of which gives a different periodicity and dominance in the distribution of masses. However, we found that the first few terms are sufficient to approximate the distribution of peptides, at least when rounding the amino acid masses to the nearest Dalton. In particular, the dominant periodic term has a period of 14.28 Da. Solutions for higher mass accuracy are possible by changing units. More examples and details are available in [5].

We expect that these findings could improve estimates of distributions, thus improving a class of algorithms that score peptide identifications in mass spectrometry [13,14,15], those that require knowledge of the number of possible peptide sequences within a particular mass range.

Acknowledgments

SLH was supported by NLM training grant Computation and Informatics in Biology and Medicine 5T15LM007359. GC and SLH were supported by NSF DBI-0701846.

Bibliography

1. Hubler SL, Jue A, Keith J, McAlister GC, Craciun G, et al. Valence parity renders z(center dot)-type ions chemically distinct. Journal of the American Chemical Society. 2008;130:6388–6394. [PMC free article] [PubMed]
2. Scigelova M, Makarov A. Orbitrap Mass Analyzer – Overview and Applications in Proteomics. Proteomics. 2006;6:16–21. [PubMed]
3. Coon JJ. Collisions or Electrons? Protein Sequence Analysis in the 21st Century. Analytical Chemistry. 2009;81:3208–3215. [PMC free article] [PubMed]
4. Tipton JDMY, Hendrickson CL, Marshall AG. Utility of the Valence Parity Rule for ECD/AIECD FT-ICR MS: H-dot Atom Transfer, Isobars, and Isomers for Peptide Analysis; 2008 December 7–10; Madison, WI.
5. Hubler SL. PhD Thesis. Madison, WI: University of Wisconsin-Madison; 2010. Mathematical Analysis of Mass Spectra Data.
6. Balakrishnan VK. Introductory discrete mathematics. New York: Dover Publications; 1996. p. xiv.p. 236.
7. Cormen TH. Introduction to algorithms. Cambridge, MA: The MIT Press; 2009.
8. Kelley WG, Peterson AC. Difference equations : an introduction with applications. Boston: Academic Press; 1991. p. xi.p. 455.
9. Hubler SL, Craciun G. Periodic Patterns in Distributions of Peptide Masses. 2011. [PubMed]
10. Bergelson V. Minimal idempotents and ergodic Ramsey theory. In: Bezuglyi S, Kolyada S, editors. Topics in Dynamics and Ergodic Theory. Cambridge University Press; 2003. pp. 8–39.
11. Hardy G, Littlewood J. Some problems of diophantine approximation (Part I) Acta Mathematica. 1914;37:155–191.
12. MATLAB. 7.4. Nattick, MA: The Mathworks Inc; 2007.
13. Alves G, Ogurtsov AY, Wu WW, Wang G, Shen RF, et al. Calibrating e-values for MS2 database search methods. Biology Direct. 2007;2 [PMC free article] [PubMed]
14. Kim S, Bandeira N, Pevzner PA. Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for de Novo Peptide Sequencing and Identification. Mol Cell Proteomics. 2009;8:1391–1400. [PubMed]
15. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, et al. Open mass spectrometry search algorithm. Journal of Proteome Research. 2004;3:958–964. [PubMed]