PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Math Chem. Author manuscript; available in PMC 2012 September 26.
Published in final edited form as:
J Math Chem. 2012 June 1; 50(6): 1458–1483.
doi:  10.1007/s10910-012-9983-z
PMCID: PMC3458794
NIHMSID: NIHMS393696

Mass Distributions of Linear Chain Polymers

Abstract

Biochemistry has many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities; examples include proteins, RNA, single-stranded DNA, and paired DNA. In the field of mass spectrometry, it is useful to consider the idea of weighted alphabets, with a word inheriting weight from its letters. We describe the distribution of the mass of these words in terms of a simple recurrence relation, the general solution to that relation, and a canonical form that explicitly describes both the exponential form of this distribution and its periodic features, thus explaining a wave pattern that has been observed in protein mass databases. Further, we show that a pure exponential term dominates the distribution and that there is exactly one such purely exponential term. Finally, we illustrate the use of this theorem by describing a formula for the integer mass distribution of peptides and we compare our theoretical results with mass distributions of human and yeast peptides.

Keywords: linear chain polymers, peptides, mass distribution, mass spectrometry, protein database

1. Introduction

In chemistry and biology there are many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities. Examples include peptides/proteins, RNA, single-stranded DNA, and paired DNA. We want to characterize patterns of the masses of such objects, considering an alphabet of building blocks (e.g. amino acids, in the case of peptides and proteins) and an assigned mass function. Because the molecule is linear, the sequence of building blocks can be thought of as a word over the alphabet.

We are particularly interested in the number C(M) of words that have a particular weight M. Ideally we would like to solve this problem for arbitrary masses but, as we shall see, we will restrict most of our analysis to integer masses.

However, we describe a recurrence relation that is applicable to real-valued masses:

C(M)=j=1dC(M-mj)
Equation 1

where d is the number of objects in our alphabet and mj is the mass of the jth object. In addition, by rescaling the masses, we can generalize all of our results to the case of rational masses.

Next we solve this recurrence relation. The general solution is given by the following theorem:

Theorem 1. (General Solution to Sequence Counting Problem)

Consider a finite set of masses m1, …, md and denote by C(M) the number of sequences of masses (i.e. “words”) that have a total mass of M. Then C satisfies a mass recurrence relation (Equation 1). Furthermore, if the characteristic polynomial of this recurrence relation has distinct roots then there exist real constants k, c0, …, ck, r0, …, rk, θ1, …, θk, ϕ1, …, ϕk such that

C(M)=c0r0M+j=1kcjrjMcos(2πθjM+ϕj)
Equation 2

and r0rj > 0 for j = 1, …, k.

The characteristic polynomial of a recurrence relation will be discussed in the next section. The rest of the paper is devoted to the proof of this theorem and related results, including a generalization for the case where the characteristic polynomial does not have distinct roots.

Our interest in this problem originated from the study of mass distributions of peptide fragment ions[1] obtained using mass spectrometry[2,3]. In particular, we (and others[4]) investigated the mass distribution of peptides for both biological and theoretical peptides[5]. This led to the observation that peptide density reaches a local maximum approximately every 14 Daltons[4,5]. The results described in this paper explain the source of this pattern, predict that periodic patterns are present for all non-trivial linear chain polymers, and describe the method by which all periodic terms can be enumerated, given a fixed size mass unit (e.g. Daltons).

Finally, we note that computing the general form of the solution involving irrational masses is a very interesting open problem.

2. Definitions and Notations

We define an alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg as a finite ordered list of objects {a1, …, ad}.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet. Then mass(·) is a mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg if it is a function from An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg to the positive real numbers.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet. Then mass(·) is an integer mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg if it is a function from An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg to the positive integers.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg is an alphabet and mass(·) is a mass function of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg. Then the pair An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects with alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg= {a1, …, ad}. Then m1, …, md are the object masses of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if mj = mass(aj) (j =1, …, d).

Suppose m1, …, md are the object masses of the collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Then Equation 1 is the mass recurrence relation for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg.

Suppose also that m1m2 ≤ … ≤ md are the object masses of the collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Then p is the characteristic polynomial of the recurrence relation for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if p(z)=zmd-j=1dzmd-mj.

Note that this definition of characteristic polynomial is the standard one for linear difference equations of one variable.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects. Then s is an ordered sequence (or simply sequence) over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg (or, equivalently, over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg) if s is an ordered set of letters, a1, a2, …, a[ell] where ai [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, i =1, 2, …, [ell]. Intuitively, we can think of a sequence over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg as a word formed using letters from the alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg.

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg = ( An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg, mass(·)) is a collection of mass objects and s = a1a2a[ell] is a sequence. The mass of the sequence s is mass(s)=i=1length(s)mass(ai)

Suppose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg is a collection of mass objects. Then C is the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg if C(M) = #{sequence s over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg | mass(s)= M}, the number of sequences s over An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg where the mass of s is M.

3. Mass Recurrence Relation

In this section we describe a recurrence relation for the sequence counting function.

Proposition 2 (Mass recurrence relation)

Suppose C is the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg whose masses are m1m2 ≤ … ≤ mk. Then C satisfies the following mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg with initial conditions:

C(M)={0when-mk<M<01whenM=0j=1dC(M-mj)whenM>0
Equation 3

Proof

Assuming M > 0, we start with the definition for a sequence counting function and consider the last letter of every sequence:

C(M)=#{sequencesoverOmass(O,s)=M}=j=1d#{sequencesoverOmass(O,s)=Mandthelastletterofsisaja}=j=1d#{sequencesoverOmass(O,s)=M-mj}=j=1dC(M-mj)

The last equality only holds for all M > 0 if we define C to be zero for the relevant negative masses that arise. The initial condition related to mass 0 follows from the idea that there is exactly one string of mass zero, namely the null string, [var phi].

4. Solving the recurrence relation

We next want to solve the recurrence relation. Note that while the recurrence relation described in the previous theorem is applicable for real-valued (or even vector-valued) masses, from now on we require integer masses.

Theorem 3 (General solution to a linear homogenous difference equation)

Suppose C satisfies the linear homogenous difference equation

anC(M+n)+an-1C(M+n-1)++a0C(M+0)=0
Equation 4

for all positive integers M. Let p(x) = anxn + an−1xn−1 +…+ a0x0 be the characteristic polynomial for the difference equation. Suppose further that p has n distinct roots, z1, …, zn. Then for any solution of Equation 4 there exists a unique list of complex numbers λ1, …, λn such that for all positive integers M,

C(M)=j=1nλjzjM.
Equation 5

Proof

This is a key result in many texts on difference equations and a proof for it can be found in, for example, [6,7,8]. Here is a sketch of the proof: For any root z of p, the sequence zM satisfies Equation 4. Therefore all linear combinations j=1nλjzjM satisfy Equation 4. This form represents the general solution because (zjM)a+1Ma+n1jn is a Vandermonde matrix and is, therefore, invertible, giving us a unique solution.

The next lemma applies Theorem 3 (General solution to a linear homogenous difference equation) to sequence counting functions.

Lemma 4 (Solving the recurrence relation – distinct roots)

Let C be the sequence counting function for a collection of mass objects with integer masses, An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg, and let p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Suppose p is of degree n and has n distinct roots, z1, …, zn. Then there exists a unique set of complex numbers λ1, …, λn such that

C(M)=j=1nλjzjM.

Proof

By Proposition 2 (Mass recurrence relation), C satisfies the mass recurrence relation with initial conditions

C(M)={0whenM(-n,0)1whenM=0

(where n is the largest mass). Note that these initial conditions together with the mass recurrence relation uniquely determine the function C on all positive integers. The mass recurrence relation is a linear homogeneous difference equation with characteristic polynomial p. Because p has distinct roots, we may apply Theorem 3 (General solution to a linear homogenous difference equation) to obtain the desired conclusion.

Another form of this lemma which handles non-distinct roots can be found in [8].

5. Analysis of the solution of the recurrence relation

Now that we have a solution to the recurrence relation, we want to describe the form of the solution in a way that is easy to interpret. In particular, we describe an exponential growth function overlaid with a periodic pattern. A pattern of this sort has been described in [5,9].

Lemma 5 (Eigenvalues of conjugate pairs are conjugates)

Let C be the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg and let p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that p has distinct roots, z1, …, zd and that C(M)=j=1dλjzjM. Suppose that zj=zj¯. Then λj=λj¯.

Proof

Note that the conjugate of every root of p is also a root of p. Let j ′ denote the index such that zj=zj¯ for each j. (This is well-defined because p has distinct roots.) Then, combined with the fact that C(M) is a real number, we have

j=1dλjzjM=C(M)=C(M)¯=j=1dλj¯zjM¯=j=1dλj¯zjM=j=1dλj¯zjM

However, by Lemma 4 (Solving the recurrence relation – distinct roots), we know that the set of λ1, …, λd is unique. Therefore, λj=λj¯.

To simplify the statement of the next theorem, we need the following definition: We say that that cj, θj, and ϕj satisfy standard conditions for j [set membership] J if, for all j [set membership] J we have:

cj>0,0<θj12,and-π<ϕjπ.

The terms cj, θj, and ϕj refer to the magnitude, period, and phase of periodic terms, respectively, defined in the theorem below and used in later statements.

Theorem 6 (Explicit description of sequence counting function – simple form)

Let C be the sequence counting function for a collection of mass objects An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg and p be the characteristic polynomial of the mass recurrence relation of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Suppose p has distinct roots. Then there exists real numbers cj, θj, ϕj, that satisfy the standard conditions for 1≤ jk and there exist positive c0, r0, r1, …, rk such that

C(M)=c0r0M+i=1kciriMcos(2πθiM+ϕi),

and

0 < rj for all j = 0,…, k.

Proof

By Proposition 2 (Mass recurrence relation), we know that C satisfies the recurrence relation. Thus, we have changed the problem to one of solving the recurrence relation, which is a linear difference equation. Let n be the order of p and z1, …, zn be the roots of p. By Lemma 4 (Solving the recurrence relation – distinct roots) there exist complex numbers λ1, …, λn, such that C(M)=j=1nλjzjM. Note that 0 is not a root since p(0) ≥1. Therefore, we may separate the list of roots into four sets: positive roots R+, negative roots R, complex roots in the upper half-plane Ru, and complex roots in the lower half-plane Rl. We can now rewrite the expression of C(M) as

C(M)=j:zjR+λjzjM+j:zjR-λjzjM+j:zjRuλjzjM+j:zjRlλjzjM.

Without loss of generality, we assume that the indices of the roots proceed in order from R+, R, Ru, and Rl. We will handle each summand separately.

Positive root

Note that has exactly one element by simple application of Descartes’ rule of signs. Denote the single term in the corresponding sum as λ0z0M; set r0 = z0 and c0 = λ0. Thus,

j:zjR+λjzjM=c0r0M

Negative roots

Define b as the largest index of an element in R. Let zj [set membership] R. We set the constants for the jth term as rj = |zj|, cj =|λj|, θj=12, and ϕj={0ifλj0πifλj<0

If we use this definition for an individual term in the summand we can rewrite the summand as find

λjzjM=λj(-rj)M=cjrjMcos(2πθjM+ϕj)

Thus, the summand is

jzjR-λjzjM=i=1b-λj(-rj)M=i=1b-cirjMcos(2πθiM+ϕi)

Complex roots (upper and lower half-plane)

Define bu as the largest index j of an element in Ru. Let zj [set membership] Ru. Note that, because zj is a root of p and p is a polynomial over real numbers, the conjugate of zj, zj¯, is also a root of p. Call the index of the conjugate root j ′; i.e. zj=zj¯. Note also that λj=λj¯, by Lemma 5 (Eigenvalues of conjugate pairs are conjugates). Set cj = 2|λj| and set ϕj such that λj=12cjeiϕj and −π < ϕj < π. Also set rj = |zj| = |zj| and set θj so that zj = rjei2πθj and 0< 2πθj < π. In other words, 0<θj<12. Now we combine the term j from the first complex sum with the term j ′ from the second sum to get

λjzjM+λjzjM=λjzjM+λjzjM¯=12cjeiϕjrjMei2πθjM+12cje-iϕjrjMe-i2πθjM=12cjrjM(ei(2πθjM+ϕj)+e-i(2πθjM+ϕj))=cjrjMcos(2πθjM+ϕj)

Thus, the contribution from the complex roots (those that are not real) is

jzjRuλjzjM+jzjRlλjzjM=j=b-+1bucirjMcos(2πθiM+ϕi).

All roots

Combining the three parts we get

C(M)=jzjR+λjzjM+jzjR-λjzjM+jzjRuλjzjM+jzjRlλjzjM=c0r0M+j=1b-cirjMcos(2πθiM+ϕi)+j=b-+1bucirjMcos(2πθiM+ϕi)=c0r0M+j=1kcirjMcos(2πθiM+ϕi),

where k = bu.

To complete this proof, we need to verify that all of the constants are in the correct ranges; i.e. that they satisfy the standard conditions. The cj’s (j > 0) are positive by definition, as are the rj’s. In addition, 0<θj12 and −π < ϕjπ by definition, for j = 1, 2,…, bu. Also, combining the above statements with Lemma 13 (Constant c0 > 0) we know that c0 > 0.

6. Proof of Maximality

The only difference between Theorem 6 (Explicit description of sequence counting function – simple form), above, and Theorem 1. (General Solution to Sequence Counting Problem) is that the latter also tells us that r0r for all j ≥ 1. This is the topic of Theorem 14 (Maximality of positive root), found at the end of this section. The rest of this section contains technical lemmas for use in the proof. The intuition of the proof is that if the periodic terms dominate the real term then there must be a point at which the result is negative, which is a contradiction of the fact that the function is a counting function (i.e., must map non-negative integers to non-negative integers). First, we show that the sum of cosines with rational period over a special set of indices is zero.

Lemma 7 (Sum of rational period cosine is zero)

Let θ be a positive rational number given in reduced form by (a/b). Further, assume that b> 2. Let k and w be natural numbers and ϕ a real number. Then

m=w+1w+kbcos(2πθm+ϕ)=0.

Proof

First, fix integers k and w and real number ϕ. Now note that a and b are relatively prime. This means that for every value j = 0,1,…, b − 1, there exists an m [set membership][w +1, w + b] such that am [equivalent] j mod b. Therefore,

m=w+1w+bcos(2πθm+ϕ)=m=w+1w+bRe(e2πiθm+ϕ)=Re(m=w+1w+be2πiabm+ϕ)=Re(ew+ϕm=1be2πimb)

Note that the last term is the sum of the bth roots of unity. Let ω1, …, ωb be the bth roots of unity. We want to show that the sum of the bth roots of unity is always zero. Note that these roots are all of the solutions to xb − 1 = 0. Thus,

j=1b(x-ωj)=xb-1

Of particular interest, however, is the term of order b−1 : its coefficient is the sum of all of the roots. Since this coefficient is zero, then the sum of the roots is zero.

Finally, to prove our lemma, we need only add several terms that add to zero:

m=w+1w+kbcos(2πθm+ϕ)=m=w+1w+bcos(2πθm+ϕ)+m=(w+b)+1w+2bcos(2πθm+ϕ)++m=(w+(k-1)b)+1w+kbcos(2πθm+ϕ)=0

For simplification in the rest of this section we need to add additional notation. We denote the set of indexes of roots of the characteristic polynomial that have magnitude r, {j | rj = r}, as An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg. Also, we denote the periodic contribution of the ith root towards C(M), ci cos(2πθi M+ϕi), as ti (M). We refer to ti (M) as a parameterized cosine function.

The following lemma states that the sum of ti (M) repeats a particular negative value infinitely often, provided the An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg contains only rational values.

Lemma 8 (Functions with rational period repeat negative values)

Suppose θ is rational for every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg and f(M)=jMrcjcos(2πθjM+ϕj). Then either f is identically zero on the integers or there exists δ > 0 and integers w and [ell] such that f (w + k[ell]) = −δ for k=1, 2,….

Proof

Since θj is rational, it can be represented by ajbj (assume that this is reduced form). Let [ell] = lcm {bj | j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg}, the least common multiple of the denominators of the cosine frequencies. First note that, because θj[ell] is an integer and tj (M) = cj cos(2πθj M + ϕj),

f(M+)=jMrtj(M+)=jMrcjcos(2πθjM+2πθj+ϕj)=jMrcjcos(2πθjM+ϕj)=f(M)

In other words, f is periodic with a period that divides [ell].

Now, by Lemma 7 (Sum of rational period cosine is zero) we know that, for all integers w and k

M=w+1w+kbitj(M)=0.

In particular, because bj divides [ell], we know that for all j,

M=w+1w+tj(M)=0

Therefore,

M=w+1w+f(M)=M=w+1w+jMrtj(M)=jMrM=w+1w+tj(M)=0

If f is identically zero on the integers then we are done. Otherwise, it follows that there exists an integer, w, for which f (w) = −δ< 0. However, f is periodic with a period that divides [ell]. Therefore we know that w + [ell], w+ 2[ell], … is an infinite sequence of integers such that f (w) = −δ< 0.

The following lemma extends the previous one by assuming that the members of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg are all related to each other by a complex constant.

Lemma 9 (Sum of rationally related cosines of irrational period is negative infinitely often)

Define f(M)=jMrtj(M), where tj is defined as previously. Suppose f is not identically zero on R and there exists an irrational number s such that for every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg, the ratio θjs is rational. Then there exists an irrational number α such that f(xα) has period 1. Further, there exists a real number ε > 0 and a nonempty open interval I [subset or is implied by] [0,1] such that xIf(xα)<-ε.

Proof

For every j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg, we define aj and bj as the reduced form fraction ajbj=θjs. Let =lcmjMr{bj} and choose α=s. Define g(x)=f(xα); we will show that g is periodic with period 1.

g(x+1)=f(xα+1α)=f(xα+s)=jMrcjcos(2πθj(xα+s)+ϕj)=jMrcjcos(2πθj(xα)+ϕj+2πajbj)

Note that bj is an integer by the definition of [ell]. Therefore, the term 2πajbj is an integral multiple of 2π and can be ignored when inside the cosine function. Thus,

g(x+1)=jMrcjcos(2πθj(xα)+ϕj)=g(x)

In other words, f(xα) has period 1.

Next we prove the second conclusion, that there exists a real number ε > 0 and a nonempty open interval I [subset or is implied by] [0,1] such that xIf(xα)<-ε. Recall that we proved the first conclusion by showing that each cosine term in the sum defining g had an integral number of cycles for x between 0 and 1. This implies

01g(x)dx=jMr01cjcos(2πθjM+ϕj)dx=0

However, because f is not identically zero, neither is g. This fact, combined with a zero integral, means that g must be negative for some w [set membership] (0,1). Define ε=-12g(w). Being a finite sum of continuous functions, g inherits continuity from the cosines. Therefore, there exists an open interval I containing w such that x [set membership] I [implies] g(x) < − ε, satisfying the second conclusion of the lemma.

The following theorem will allow us to create a form of the previous lemmas without restrictions on the members of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg (see the subsequent lemma).

Theorem 10 (Infinite number of special integers)

If the numbers 1, α1,α2, …, αk are linearly independent over the rational numbers then for any k open non-empty intervals Ii [subset or is implied by] [0,1], j =1, …, k, the set

Γ={nZ+nαjmod1Ij;j=1,2,,k}isinfinite.

Proof

A more general version of this theorem can be found as Theorem 3.13 of [10], where it is attributed to Hardy and Littlewood [11].

Recall that that cj, θj, and ϕj satisfy standard conditions for j [set membership] J if, for all j [set membership] J we have:

cj>0,0<θj12,and-π<ϕjπ.

All of the previous lemmas in this section are used in this lemma. It states that the sum of the ti (M) is less than a particular negative number infinitely often.

Lemma 11 (Sum is negative infinitely often)

Assume that cj, θj, and ϕj satisfy the standard conditions for j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg. Define the periodic contribution of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg towards C(M) as

f(M)=iMrcicos(2πθiM+ϕi)=iMrti(M)

Then either f is identically zero on integers or there exists δ > 0 and an infinite sequence of positive integers M1, M2, … such that f (M[ell]) < − δ for all [ell] = 1, 2, ….

Proof

It is easier to prove this in three steps:

  1. Rational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [set membership] Q
  2. Irrational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [negated set membership] Q
  3. Mixed case: There exists i, j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg such that θi [set membership] Q and θj [negated set membership] Q

Case 1

Rational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [set membership] Q.

Define f as f(M)=iMrcicos(2πθiM+ϕi). By Lemma 8 (Functions with rational period repeat negative values) f is either zero on the integers or there exists δ> 0 and integers w and [ell] such that f (w + k[ell]) = −δ for k = 1,2,…. Note that δ/2 and the sequence of integers w +[ell], w+2[ell], … satisfy the conclusion of the theorem.

Case 2

Irrational case: i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg [implies] θi [negated set membership] Q.

We can decompose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg into k equivalence classes, Mr1,Mr2,,Mrk, of indexes representing frequencies that are linear combinations of each other over the rational numbers. Select a single member from each of the equivalence classes: ijMrj and define sj = θij. We can also decompose the function f to only sum the terms related to the jth equivalence class: fj(M)=iMrjti(M). Without loss of generality, each fj is not identically zero. This allows us to apply Lemma 9 (Sum of rationally related cosines of irrational period is negative infinitely often) to ascertain the existence of positive numbers α1, α2, …, αk, ε1, ε2, …, εk, and open sub-intervals of [0,1] I1, I2, …, Ik such that gj(x)=fj(xαj) has period 1 and xIjfj(xαj)<-εj. Note also that 1, α1, α2, …, αk are linearly independent over rational numbers. This allows us to apply Theorem 10 (Infinite number of special integers) to state that the collection of integers Γ={n [set membership] Z+ | j mod 1[set membership] Ij; j =1,2,…,k} is infinite. This list of integers is the set we need in order to prove our claim.

To show this, we start with an n [set membership] Γ.

f(n)=j=1kfj(n)=j=1kgj(nαj)

By the definition of Γ, our selection of n means that, for each of j=1,2,…, (j mod1) [set membership] Ij. However, we defined Ij so that gj (x) < −εj for every x [set membership] Ij. In particular, this will be true for j. If we now define

δ=j=1kεj,

we have the following inequality:

f(n)=j=1kgj(αjn)<j=1k-εj<-δ

Recall that this holds for every n [set membership] Γ, an infinite set, thus proving this case.

Case 3

Mixed case: There exists i, j [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg such that θi [set membership] Z and θj [negated set membership] Z.

We decompose An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg in the same way as in Case 2. However, exactly one of the decompositions includes indices of rational frequencies; call this member of the decomposition Mr0. We decompose f as well:

f(M)=j=0kfj(M),fj(M)=iMrjti(M)

From Case 1 we know that there exists δ0 > 0 and integers w0 and [ell]0 such that, for every integer n,

f0(w0+n0)=-δ0.

Now we remove Mr0 from the set of An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg (calling the result An external file that holds a picture, illustration, etc.
Object name is nihms393696ig4.jpg) and rescale f :

f(x)=f(w0+x0)-f0(w0+x0).

Because f ′ only contains cosines of irrational period, it satisfies the conditions of Case 2; we conclude that there exists a positive number ε ′ and an infinite set, Γ′ ={n[set membership] Z+ | f′(n′) < −ε′}

Select an n[set membership] Γ′ and define

δ=ε0+ε

Let n = w0 + n[ell]0. Then

f(n)=f0(n)+f(n)-f0(n)=f0(n)+f(w0+n0)-f0(w0+n0)=f0(n)+f(n)<-ε0+-ε<-δ

This completes our proof of Case 3 and, thus, the lemma.

Using the previous lemma we are now able to put exponential bounds on the size of negative contributions of Equation 2.

Lemma 12 (Sum of non-positive terms is negative infinitely often)

Suppose f(M)=j=1kcjrjMcos(2πθjM+ϕj) where cj, θj, and ϕj satisfy the standard conditions (found in definitions). Define rmax=maxj{rj}. Then there exists δ > 0 and integers M1 < M2 < … such that f(M)<-δrmaxM for [ell] =1,2,….

Proof

(This lemma is identical to Lemma 11 (Sum is negative infinitely often), except for the exponential term in both the definition of f and the conclusion.)

Recall the notation An external file that holds a picture, illustration, etc.
Object name is nihms393696ig3.jpg ={j | rj = r}. If all indices j are in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg then the theorem follows directly from Lemma 11 (Sum is negative infinitely often), after factoring out rmax ; we will assume, therefore, that there are additional indices.

Define r-=maxjMrmax{rj} and c-=maxjMrmax{cj} and note that the contribution of all the maximal terms towards f is

iMrmaxriMti(M)=rmaxMiMrmaxti(M)

(where ti is the parameterized cosine function, as defined previously). By Lemma 11 (Sum is negative infinitely often) we know that there exists a δ> 0 and an infinite sequence of integers, M1 < M2 < … such that

jMrmaxtj(M)<-2δ=1,2,

Thus,

jMrmaxrjMtj(M)=rmaxMjMrmaxtj(M)=1,2,<-2rmaxMδ

Therefore, given a real number u > 0, all integers M*> u from the sequence above satisfy iMrmaxriMti(M)<-δrmaxM. We select a particular u:

u=log(c-δ)log(rmaxr-)+1

By solving for (rmax)u in the definition of u above we find that

(rmax)u=c-δr-u-1rmax>c-δr-u

or

1-δc-(rmaxr-)u<0

We apply this inequality to get

f(M)=iMrriMti(M)+iMrriMti(M)<c-r-M-2δrmaxM=c-r-M(1-δc-(rmaxr-)M)-δrmaxM=c-r-M(1-δc-(rmaxr-)M-u(rmaxr-)u)-δrmaxM<c-r-M(1-δc-(rmaxr-)u)-δrmaxM<-δrmaxM

The following lemma says that the coefficient of the purely exponential term is positive and follows easily from the previous lemma.

Lemma 13 (Constant c0 > 0)

Let C be the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that C is not identically zero. Suppose that C(M)=c0r0M+j=1kcjrjMcos(2πθjM+ϕj) and that cj, θj, and ϕj satisfy the standard conditions. Then c0> 0.

Proof

By Lemma 12 (Sum of non-positive terms is negative infinitely often), we know that there exists an integer M ′ and a real number δ> 0 such that i=1kciriMcos(2πθiM+ϕi)<-δrmaxM. This implies that

C(M)=c0r0M+j=1kcjrjMcos(2πθjM+ϕj)<c0r0M

In particular, if c0 ≤ 0 then C(M′) < 0. Since C is a counting function, this is a contradiction arising from the assumption that c0 ≤ 0. Thus, c0 > 0.

The following theorem describes the closed form of a sequence counting function. However, this theorem applies to the “simple form”; i.e. the case where the characteristic polynomial has distinct roots. If it does not have distinct roots then the solution involves the product of polynomials and exponentials in M inside the summation, instead of simply exponentials inside the summation.

Theorem 14 (Maximality of positive root)

Let C be the sequence counting function for An external file that holds a picture, illustration, etc.
Object name is nihms393696ig2.jpg. Assume that C is not identically zero. Suppose that C(M)=c0r0M+j=1kcjrjMcos(2πθjM+ϕj) and that cj, θj, and ϕj satisfy the standard conditions (defined earlier) with c0 > 0. Then r0rj for j =1, …, k.

Proof

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an M such that the counting function C(M) is negative.

Let rmax=maxj{rj}. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg (notation defined at the beginning of this subsection).

Let

f(M)=j=1kcjrjMcos(2πθjM+ϕj).

Then, by Lemma 12 (Sum of non-positive terms is negative infinitely often) we may conclude that there exists δ> 0 and M1 < M2 < … such that f(M)<-δrmaxM for =1,2,….

Therefore, given a real number u> 0, there exists an M *> u from our list such that f(M)<-δrmaxM. We select a particular u:

u=log(c0δ)log(rmaxr0)+1

By solving for rmaxu in the definition of u and using the assumption that rmaxur0>1

we find that

rmaxu=c0δr0u-1rmax>c0δr0u

or

1-δc0(rmaxr0)u<0

We apply this inequality to get

C(M)=c0r0M+f(M)<c0r0M-δrmaxM=c0r0M(1-δc0(rmaxr0)M)=c0r0M(1-δc0(rmaxr0)M-u(rmaxr0)u)<c0r0M(1-δc0(rmaxr0)u)<0

Since C is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Finally this leads to the main theorem:

6. Proof of main theorem

Proof of Theorem 1. (General Solution to Sequence Counting Problem)

Lemma 4 (Solving the recurrence relation – distinct roots) tells us specifically that C satisfies a recurrence relation. Theorem 6 (Explicit description of sequence counting function – simple form) gives us most of the relationship but Theorem 14 (Maximality of positive root) shows that the single purely exponential term dominates the rest, completing the proof.

7. Extension

It is possible to remove the restrictions on the recurrence relation (unique roots of the characteristic polynomial). We extend Theorem 3 (General solution to a linear homogenous difference equation), Theorem 6 (Explicit description of sequence counting function – simple form), Lemma 12 (Sum of non-positive terms is negative infinitely often), and Theorem 14 (Maximality of positive root).

Theorem 15 (Extension of General solution to a linear homogenous difference equation)

Suppose C satisfies the linear homogenous difference equation (Equation 4). for all positive integers M. Let p(x) = anxn + an−1xn−1 +…+ a0x0 be the characteristic polynomial for the difference equation. Then for any solution of Equation 4 there exists a unique list of polynomials p1, …, pn such that for all positive integers M,

C(M)=j=1npj(M)zjM.

Proof

This extension (and how to find the polynomials) can be found in [6].

Theorem 16 (Extension of Theorem 6)

Let C, a function of mass M, count how many sequences of a finite set of masses have a combined mass of M. Then C satisfies a mass recurrence relation. Then there exist real constants k, c0, …, ck, r0, …, rk, θ1, …, θk, ϕ1, …, ϕk, satisfying the standard conditions, and polynomials p1, …, pk, whose highest orders have a coefficient of one, such that that

C(M)=c0r0M+j=1kpj(M)rjMcos(2πθjM+ϕj).

Proof

The proof is nearly identical to that of Theorem 6 (Explicit description of sequence counting function – simple form) except that we need to use Theorem 15 (Extension of General solution to a linear homogenous difference equation) in place of Theorem 3 (General solution to a linear homogenous difference equation) and use slightly modified versions of Lemma 4 (Solving the recurrence relation – distinct roots) and Lemma 5 (Eigenvalues of conjugate pairs are conjugates), substituting polynomials for constants.

Lemma 17 (Extension of Lemma 12)

Suppose f(M)=j=1kcjpj(M)rjMcos(2πθjM+ϕj) where cj, θj, and ϕj satisfy the standard conditions and pj (M) are polynomials whose highest order term have coefficients of 1. Define rmax=maxj{rj}. Define nj =Degree(pj) and nmax=maxjMrmax{nj}. Then there exists δ> 0 and integers M1 < M2 < … such that f(M)<-δMnmaxrmaxM for [ell] =1, 2, …

Proof

Define cmax=maxj{cj}. Using ti, the parameterized cosine function, as defined previously, we can rewrite f as

f(M)=i=1kpi(M)riMcicos(2πθjM+ϕj)=i=1kpi(M)riMti(M)={iMrmaxni=nmax}MnmaxriMti(M)+{iMrmaxni=nmax}(pi(M)-Mnmax)riMti(M)+{iMrmaxni<nmax}pi(M)riMti(M)+iMrmaxpi(M)riMti(M)=f1(M)+f2(M)+f3(M)+f4(M)

where

f1(M)={iMrmaxni=nmax}MnmaxriMti(M)f2(M)={iMrmaxni=nmax}(pi(M)-Mnmax)riMti(M)f3(M)={iMrmaxni<nmax}pi(M)riMti(M)f4(M)=iMrmaxpi(M)riMti(M)

The first three terms all have the same ri, namely rmax, while the last term contains all other ri terms.

The first term, f1, is the dominant term and can be rewritten as

f1(M)={iMrmaxni=nmax}MnmaxriMti(M)={iMrmaxni=nmax}MnmaxrmaxMti(M)=MnmaxrmaxM{iMrmaxni=nmax}ti(M)

By Lemma 11 (Sum is negative infinitely often) we know that there exists a δ > 0 and an infinite sequence of integers, M1 < M2 < … such that

jMrmaxtj(M)<-4δ=1,2,

Thus,

jMrmaxMnmaxrjMtj(M)=MnmaxrmaxMjMrmaxtj(M)=1,2,<-4MnmaxrmaxMδ

Therefore, given a real number u > 0, all integers M* > u from the sequence above satisfy

f1(M)=iMrmaxriMti(M)<-4MnmaxrmaxMδ.

We now want to select a particular u that makes the non-dominant terms f2, f3, f4, sufficiently small compared to f1.

First, note that f2 differs from f1 only in the polynomial factors (pi (M)− Mnmax), which have degree at most nmax −1. Thus, there exists a u2 such that (pi(M)-Mnmax)riM<δkcmaxMnmaxrmaxM for all M > u2 and for all i [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg where ni = nmax. Therefore, for all M > u2,

f2(M)={iMrmaxni=nmax}(pi(M)-Mnmax)riMti(M)<{iMrmaxni=nmax}(pi(M)-Mnmax)riMcmax<{iMrmaxni=nmax}δkMnmaxrmaxM<δMnmaxrmaxM

Similarly, because of the maximality of nmax, there exists a u3 such that

f3(M)={iMrmaxni<nmax}pi(M)riMti(M)<δMnmaxrmaxM

Finally, because of the maximality of rmax, there exists a u4 such that for all M > u3,

f4(M)=iMrmaxpi(M)riMti(M)<δMnmaxrmaxM

We select a u = max(u2, u3, u4).

We apply the properties of u and we get the following for every M* > u where M*[set membership] {M1, M2, …}:

f(M)=f1(M)+f1(M)+f1(M)+f1(M)<-4δMnmaxrmaxM+δMnmaxrmaxM+δMnmaxrmaxM+δMnmaxrmaxM<-δMnmaxrmaxM

Theorem 18 (Complete Extension of General Solution to Sequence Counting Problem)

Under the conditions of Theorem 16 (Extension of Theorem 6), r0rj for j =1, …, k.

Proof

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an M such that the counting function C(M) is negative.

Let rmax=maxj{rj}. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in An external file that holds a picture, illustration, etc.
Object name is nihms393696ig5.jpg (notation defined at the beginning of this subsection).

Let

f(M)=j=1kpj(M)rjMcjcos(2πθjM+ϕj).

Then, by Lemma 17 (Extension of Lemma 12) we may conclude that there exists δ > 0 and M1 < M2 < … such that f(M)<-δMnmaxrmaxM for [ell] =1,2,…. We now choose a particular member M*, from our list with the property that

M>max{1,log(c0δ)log(rmaxr0)+1}

By solving for rmaxM in the definition of u and using the assumption that rmaxr0>1

we find that

rmaxM>c0δr0M-1rmax>c0δr0M

or

1-δc0(rmaxr0)M<0

We apply this inequality (and the fact that M* > 1) to get

C(M)=c0r0M+f(M)<c0r0M-δrmaxMMnmax<c0r0M-δrmaxM=c0r0M(1-δc0(rmaxr0)M)<0

Since C is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Example 19 (Extension of General Solution is Necessary)

Suppose we have an alphabet An external file that holds a picture, illustration, etc.
Object name is nihms393696ig1.jpg={a1, a2, a3, a4, a5} and define the masses m1 = m2 = m3 = 4 and m4 = m5 = 6. Then the mass recurrence relation is C(M) = 3C(M − 4) + 2 C(M − 6) and the characteristic polynomial is p(x) = x6 − 3x2 − 2. The roots of p are ±2 and ±i (twice). The explicit form of C is

C(M)=c02M+c1(-2)M+c2cos(π2M+ϕ2)+c3Mcos(π2M+ϕ3)=c02M+c1(2)Mcos(πM+0)+c2cos(π2M+ϕ2)+c2Mcos(π2M+ϕ3).

Note, in particular, the last term, which is linear in M; it is of the form c3q(M)1M cos(θM+ϕ) where q(M) = M, a polynomial of degree 1 in M.

8. Applications

As an application of the above work, we analyze the mass distributions of peptides/proteins. First we recognize that the objects forming the alphabet are the residues of the amino acids. Proposition 2 (Mass recurrence relation) defines the relationship for this special case as

C(M)=C(M-57)+C(M-71)+C(M-87)+C(M-97)+C(M-99)+C(M-101)+C(M-103)+2C(M-113)+C(M-115)+C(M-131)+C(M-137)+2C(M-128)+C(M-129)+C(M-147)+C(M-156)+C(M-163)+C(M-186),

where the numbers 57, 71, 87, …, are the masses of amino acid residues measured in Daltons, to the nearest integer. This allows us to analyze the mass distribution of all possible (theoretical) peptides with mass distributions of peptides that appear within a given organism (see Figure 1).

Figure 1
Comparison of Theoretical to Empirical mass distributions

Next, we determine the characteristic polynomial,

x186-x129-x115-x99-x89-x87-x85-x83-2x73-x72-x71-2x58-x57-x55-x49-x39-x30-x23-x0

We can then approximate the roots (Figure 2) of the characteristic polynomial using a solver, such as the “roots” function in MATLAB [12]. In this case, as in all we have observed in biological applications, the characteristic polynomial has distinct roots. Therefore, we can apply Theorem 1. (General Solution to Sequence Counting Problem), yielding an exponential term summed with a collection of terms that contain a periodic function multiplied by an exponential function, the former dominating the latter (Figure 3). If we write the exponential terms in Equation 2 by powers of 2, then we can easily read the doubling time of each term. Furthermore, if we write the frequencies θj in terms of wavelength, then we can emphasize the period of each term. With these conventions, the first three terms are

Figure 2
Roots and eigen-values of characteristic polynomial for peptides
Figure 3
Comparison of first two terms to actual distribution
C(M)=195.2382M24.67+147.042M27.53cos(1.032+(36014.28)M)+145.412M32.67cos(13.676+(3602.023)M)+

In total, there is one purely exponential term (which doubles in size every 24.67 Da) and 93 periodic terms we can consider, each of which gives a different periodicity and dominance in the distribution of masses. However, we found that the first few terms are sufficient to approximate the distribution of peptides, at least when rounding the amino acid masses to the nearest Dalton. In particular, the dominant periodic term has a period of 14.28 Da. Solutions for higher mass accuracy are possible by changing units. More examples and details are available in [5].

We expect that these findings could improve estimates of distributions, thus improving a class of algorithms that score peptide identifications in mass spectrometry [13,14,15], those that require knowledge of the number of possible peptide sequences within a particular mass range.

Acknowledgments

SLH was supported by NLM training grant Computation and Informatics in Biology and Medicine 5T15LM007359. GC and SLH were supported by NSF DBI-0701846.

Bibliography

1. Hubler SL, Jue A, Keith J, McAlister GC, Craciun G, et al. Valence parity renders z(center dot)-type ions chemically distinct. Journal of the American Chemical Society. 2008;130:6388–6394. [PMC free article] [PubMed]
2. Scigelova M, Makarov A. Orbitrap Mass Analyzer – Overview and Applications in Proteomics. Proteomics. 2006;6:16–21. [PubMed]
3. Coon JJ. Collisions or Electrons? Protein Sequence Analysis in the 21st Century. Analytical Chemistry. 2009;81:3208–3215. [PMC free article] [PubMed]
4. Tipton JDMY, Hendrickson CL, Marshall AG. Utility of the Valence Parity Rule for ECD/AIECD FT-ICR MS: H-dot Atom Transfer, Isobars, and Isomers for Peptide Analysis; 2008 December 7–10; Madison, WI.
5. Hubler SL. PhD Thesis. Madison, WI: University of Wisconsin-Madison; 2010. Mathematical Analysis of Mass Spectra Data.
6. Balakrishnan VK. Introductory discrete mathematics. New York: Dover Publications; 1996. p. xiv.p. 236.
7. Cormen TH. Introduction to algorithms. Cambridge, MA: The MIT Press; 2009.
8. Kelley WG, Peterson AC. Difference equations : an introduction with applications. Boston: Academic Press; 1991. p. xi.p. 455.
9. Hubler SL, Craciun G. Periodic Patterns in Distributions of Peptide Masses. 2011. [PubMed]
10. Bergelson V. Minimal idempotents and ergodic Ramsey theory. In: Bezuglyi S, Kolyada S, editors. Topics in Dynamics and Ergodic Theory. Cambridge University Press; 2003. pp. 8–39.
11. Hardy G, Littlewood J. Some problems of diophantine approximation (Part I) Acta Mathematica. 1914;37:155–191.
12. MATLAB. 7.4. Nattick, MA: The Mathworks Inc; 2007.
13. Alves G, Ogurtsov AY, Wu WW, Wang G, Shen RF, et al. Calibrating e-values for MS2 database search methods. Biology Direct. 2007;2 [PMC free article] [PubMed]
14. Kim S, Bandeira N, Pevzner PA. Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for de Novo Peptide Sequencing and Identification. Mol Cell Proteomics. 2009;8:1391–1400. [PMC free article] [PubMed]
15. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, et al. Open mass spectrometry search algorithm. Journal of Proteome Research. 2004;3:958–964. [PubMed]