Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3458794

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Definitions and Notations
- 3. Mass Recurrence Relation
- 4. Solving the recurrence relation
- 5. Analysis of the solution of the recurrence relation
- 6. Proof of Maximality
- 6. Proof of main theorem
- 7. Extension
- 8. Applications
- Bibliography

Authors

Related links

J Math Chem. Author manuscript; available in PMC 2012 September 26.

Published in final edited form as:

J Math Chem. 2012 June 1; 50(6): 1458–1483.

doi: 10.1007/s10910-012-9983-zPMCID: PMC3458794

NIHMSID: NIHMS393696

See other articles in PMC that cite the published article.

Biochemistry has many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities; examples include proteins, RNA, single-stranded DNA, and paired DNA. In the field of mass spectrometry, it is useful to consider the idea of weighted alphabets, with a word inheriting weight from its letters. We describe the distribution of the mass of these words in terms of a simple recurrence relation, the general solution to that relation, and a canonical form that explicitly describes both the exponential form of this distribution and its periodic features, thus explaining a wave pattern that has been observed in protein mass databases. Further, we show that a pure exponential term dominates the distribution and that there is exactly one such purely exponential term. Finally, we illustrate the use of this theorem by describing a formula for the integer mass distribution of peptides and we compare our theoretical results with mass distributions of human and yeast peptides.

In chemistry and biology there are many examples of linear chain polymers, i.e., molecules formed from a sequence of units from a finite set of possibilities. Examples include peptides/proteins, RNA, single-stranded DNA, and paired DNA. We want to characterize patterns of the masses of such objects, considering an alphabet of building blocks (e.g. amino acids, in the case of peptides and proteins) and an assigned mass function. Because the molecule is linear, the sequence of building blocks can be thought of as a word over the alphabet.

We are particularly interested in the number *C*(*M*) of words that have a particular weight *M*. Ideally we would like to solve this problem for arbitrary masses but, as we shall see, we will restrict most of our analysis to integer masses.

However, we describe a recurrence relation that is applicable to real-valued masses:

$$C(M)=\sum _{j=1}^{d}C(M-{m}_{j})$$

Equation 1

where *d* is the number of objects in our alphabet and *m _{j}* is the mass of the

Next we solve this recurrence relation. The general solution is given by the following theorem:

Consider a finite set of masses *m*_{1}, …, *m _{d}* and denote by

$$C(M)={c}_{0}{r}_{0}^{M}+\sum _{j=1}^{k}{c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})$$

Equation 2

and *r*_{0} ≥ *r _{j}* > 0 for

The characteristic polynomial of a recurrence relation will be discussed in the next section. The rest of the paper is devoted to the proof of this theorem and related results, including a generalization for the case where the characteristic polynomial does *not* have distinct roots.

Our interest in this problem originated from the study of mass distributions of peptide fragment ions[1] obtained using mass spectrometry[2,3]. In particular, we (and others[4]) investigated the mass distribution of peptides for both biological and theoretical peptides[5]. This led to the observation that peptide density reaches a local maximum approximately every 14 Daltons[4,5]. The results described in this paper explain the source of this pattern, predict that periodic patterns are present for all non-trivial linear chain polymers, and describe the method by which all periodic terms can be enumerated, given a fixed size mass unit (e.g. Daltons).

Finally, we note that computing the general form of the solution involving irrational masses is a very interesting open problem.

We define an ** alphabet**
as a finite ordered list of objects {

Suppose
is an alphabet. Then mass(·) is a ** mass function** of
if it is a function from
to the positive real numbers.

Suppose
is an alphabet. Then mass(·) is an ** integer mass function** of
if it is a function from
to the positive integers.

Suppose
is an alphabet and mass(·) is a mass function of
. Then the pair
= (
, mass(·)) is a ** collection of mass objects**.

Suppose
= (
, mass(·)) is a collection of mass objects with alphabet
= {*a*_{1}, …, *a _{d}*}. Then

Suppose *m*_{1}, …, *m _{d}* are the object masses of the collection of mass objects
. Then Equation 1 is the

Suppose also that *m*_{1} ≤ *m*_{2} ≤ … ≤ *m _{d}* are the object masses of the collection of mass objects
. Then

Note that this definition of characteristic polynomial is the standard one for linear difference equations of one variable.

Suppose
= (
, mass(·)) is a collection of mass objects. Then *s* is an ** ordered sequence** (or simply

Suppose
= (
, mass(·)) is a collection of mass objects and *s* = *a*_{1}*a*_{2}…*a _{}* is a sequence. The

Suppose
is a collection of mass objects. Then *C* is the ** sequence counting function** for
if

In this section we describe a recurrence relation for the sequence counting function.

Suppose *C* is the sequence counting function for a collection of mass objects
whose masses are *m*_{1} ≤ *m*_{2} ≤ … ≤ *m _{k}*. Then

$$C(M)=\{\begin{array}{ll}0\hfill & \text{when}-{m}_{k}<M<0\hfill \\ 1\hfill & \text{when}\phantom{\rule{0.16667em}{0ex}}M=0\hfill \\ \sum _{j=1}^{d}C(M-{m}_{j})\hfill & \text{when}\phantom{\rule{0.16667em}{0ex}}M>0\hfill \end{array}$$

Equation 3

Assuming *M* > 0, we start with the definition for a sequence counting function and consider the last letter of every sequence:

$$\begin{array}{l}C(M)=\#\{\text{sequence}\phantom{\rule{0.16667em}{0ex}}s\phantom{\rule{0.16667em}{0ex}}\text{over}\phantom{\rule{0.16667em}{0ex}}\mathcal{O}\mid \text{mass}(\mathcal{O},s)=M\}\\ =\sum _{j=1}^{d}\#\{\text{sequence}\phantom{\rule{0.16667em}{0ex}}s\phantom{\rule{0.16667em}{0ex}}\text{over}\phantom{\rule{0.16667em}{0ex}}\mathcal{O}\mid \text{mass}(\mathcal{O},s)=M\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}\text{the}\phantom{\rule{0.16667em}{0ex}}\text{last}\phantom{\rule{0.16667em}{0ex}}\text{letter}\phantom{\rule{0.16667em}{0ex}}\text{of}\phantom{\rule{0.16667em}{0ex}}s\phantom{\rule{0.16667em}{0ex}}\text{is}\phantom{\rule{0.16667em}{0ex}}{a}_{j}\in \mathcal{a}\}\\ =\sum _{j=1}^{d}\#\{\text{sequence}\phantom{\rule{0.16667em}{0ex}}s\phantom{\rule{0.16667em}{0ex}}\text{over}\phantom{\rule{0.16667em}{0ex}}\mathcal{O}\mid \text{mass}(\mathcal{O},s)=M-{m}_{j}\}\\ =\sum _{j=1}^{d}C(M-{m}_{j})\end{array}$$

The last equality only holds for all *M* > 0 if we define *C* to be zero for the relevant negative masses that arise. The initial condition related to mass 0 follows from the idea that there is exactly one string of mass zero, namely the null string, .

We next want to solve the recurrence relation. Note that while the recurrence relation described in the previous theorem is applicable for real-valued (or even vector-valued) masses, from now on we require *integer* masses.

Suppose *C* satisfies the linear homogenous difference equation

$${a}_{n}C(M+n)+{a}_{n-1}C(M+n-1)+\dots +{a}_{0}C(M+0)=0$$

Equation 4

for all positive integers *M*. Let *p*(*x*) = *a _{n}x^{n}* +

$$C(M)=\sum _{j=1}^{n}{\lambda}_{j}{z}_{j}^{M}.$$

Equation 5

This is a key result in many texts on difference equations and a proof for it can be found in, for example, [6,7,8]. Here is a sketch of the proof: For any root *z* of *p*, the sequence *z ^{M}* satisfies Equation 4. Therefore all linear combinations
$\sum _{j=1}^{n}}{\lambda}_{j}{z}_{j}^{M$ satisfy Equation 4. This form represents the general solution because
${\left({z}_{j}^{M}\right)}_{\begin{array}{c}a+1\le M\le a+n\\ 1\le j\le n\end{array}}$ is a Vandermonde matrix and is, therefore, invertible, giving us a unique solution.

The next lemma applies Theorem 3 (General solution to a linear homogenous difference equation) to sequence counting functions.

Let *C* be the sequence counting function for a collection of mass objects with integer masses,
, and let *p* be the characteristic polynomial of the mass recurrence relation of
. Suppose *p* is of degree *n* and has *n* distinct roots, *z*_{1}, …, *z _{n}*. Then there exists a unique set of complex numbers

$$C(M)=\sum _{j=1}^{n}{\lambda}_{j}{z}_{j}^{M}.$$

By Proposition 2 (Mass recurrence relation), C satisfies the mass recurrence relation with initial conditions

$$C(M)=\{\begin{array}{ll}0\hfill & \text{when}\phantom{\rule{0.16667em}{0ex}}M\in (-n,0)\hfill \\ 1\hfill & \text{when}\phantom{\rule{0.16667em}{0ex}}M=0\hfill \end{array}$$

(where n is the largest mass). Note that these initial conditions together with the mass recurrence relation uniquely determine the function *C* on all positive integers. The mass recurrence relation is a linear homogeneous difference equation with characteristic polynomial *p*. Because *p* has distinct roots, we may apply Theorem 3 (General solution to a linear homogenous difference equation) to obtain the desired conclusion.

Another form of this lemma which handles non-distinct roots can be found in [8].

Now that we have a solution to the recurrence relation, we want to describe the form of the solution in a way that is easy to interpret. In particular, we describe an exponential growth function overlaid with a periodic pattern. A pattern of this sort has been described in [5,9].

Let *C* be the sequence counting function for a collection of mass objects
and let *p* be the characteristic polynomial of the mass recurrence relation of
. Assume that *p* has distinct roots, *z*_{1}, …, *z _{d}* and that
$C(M)={\displaystyle \sum _{j=1}^{d}}{\lambda}_{j}{z}_{j}^{M}$. Suppose that
${z}_{j}=\overline{{z}_{{j}^{\prime}}}$. Then
${\lambda}_{j}=\overline{{\lambda}_{{j}^{\prime}}}$.

Note that the conjugate of every root of *p* is also a root of *p*. Let *j* ′ denote the index such that
${z}_{j}=\overline{{z}_{{j}^{\prime}}}$ for each *j*. (This is well-defined because *p* has distinct roots.) Then, combined with the fact that *C*(*M*) is a real number, we have

$$\sum _{j=1}^{d}{\lambda}_{j}{z}_{j}^{M}=C(M)=\overline{C(M)}=\sum _{j=1}^{d}\overline{{\lambda}_{j}}\overline{{z}_{j}^{M}}=\sum _{{j}^{\prime}=1}^{d}\overline{{\lambda}_{j}}{z}_{{j}^{\prime}}^{M}=\sum _{j=1}^{d}\overline{{\lambda}_{{j}^{\prime}}}{z}_{j}^{M}$$

However, by Lemma 4 (Solving the recurrence relation – distinct roots), we know that the set of *λ*_{1}, …, *λ _{d}* is unique. Therefore,
${\lambda}_{j}=\overline{{\lambda}_{{j}^{\prime}}}$.

To simplify the statement of the next theorem, we need the following definition: *We say that that c _{j}*,

$$\begin{array}{l}{c}_{j}>0,\\ 0<{\theta}_{j}\le \frac{1}{2},\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\text{and}\\ -\pi <{\varphi}_{j}\le \pi .\end{array}$$

The terms *c _{j}*,

Let *C* be the sequence counting function for a collection of mass objects
and *p* be the characteristic polynomial of the mass recurrence relation of
. Suppose *p* has distinct roots. Then there exists real numbers *c _{j}*,

$$C(M)={c}_{0}{r}_{0}^{M}+\sum _{i=1}^{k}{c}_{i}{r}_{i}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i}),$$

and

0 < *r _{j}* for all

By Proposition 2 (Mass recurrence relation), we know that *C* satisfies the recurrence relation. Thus, we have changed the problem to one of solving the recurrence relation, which is a linear difference equation. Let *n* be the order of *p* and *z*_{1}, …, *z _{n}* be the roots of

$$C(M)=\sum _{j:{z}_{j}\in {R}^{+}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j:{z}_{j}\in {R}^{-}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j:{z}_{j}\in {R}^{u}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j:{z}_{j}\in {R}^{l}}{\lambda}_{j}{z}_{j}^{M}.$$

Without loss of generality, we assume that the indices of the roots proceed in order from *R*^{+}, *R*^{−}, *R ^{u}*, and

Note that has exactly one element by simple application of Descartes’ rule of signs. Denote the single term in the corresponding sum as
${\lambda}_{0}{z}_{0}^{M}$; set *r*_{0} = *z*_{0} and *c*_{0} = *λ _{0}*. Thus,

$$\sum _{j:{z}_{j}\in {R}^{+}}{\lambda}_{j}{z}_{j}^{M}={c}_{0}{r}_{0}^{M}$$

Define *b*_{−} as the largest index of an element in *R*^{−}. Let *z _{j}*

If we use this definition for an individual term in the summand we can rewrite the summand as find

$$\begin{array}{l}{\lambda}_{j}{z}_{j}^{M}={\lambda}_{j}{(-{r}_{j})}^{M}\\ ={c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})\end{array}$$

Thus, the summand is

$$\begin{array}{l}\sum _{j\mid {z}_{j}\in {R}^{-}}{\lambda}_{j}{z}_{j}^{M}=\sum _{i=1}^{{b}_{-}}{\lambda}_{j}{(-{r}_{j})}^{M}\\ =\sum _{i=1}^{{b}_{-}}{c}_{i}{r}_{j}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i})\end{array}$$

Define *b _{u}* as the largest index

$$\begin{array}{l}{\lambda}_{j}{z}_{j}^{M}+{\lambda}_{{j}^{\prime}}{z}_{{j}^{\prime}}^{M}={\lambda}_{j}{z}_{j}^{M}+\overline{{\lambda}_{j}{z}_{j}^{M}}\\ =\frac{1}{2}{c}_{j}{e}^{i{\varphi}_{j}}\phantom{\rule{0.16667em}{0ex}}{r}_{j}^{M}{e}^{i2\pi {\theta}_{j}M}+\frac{1}{2}{c}_{j}{e}^{-i{\varphi}_{j}}\phantom{\rule{0.16667em}{0ex}}{r}_{j}^{M}{e}^{-i2\pi {\theta}_{j}M}\\ =\frac{1}{2}{c}_{j}{r}_{j}^{M}\left({e}^{i(2\pi {\theta}_{j}M+{\varphi}_{j})}+{e}^{-i(2\pi {\theta}_{j}M+{\varphi}_{j})}\right)\\ ={c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})\end{array}$$

Thus, the contribution from the complex roots (those that are not real) is

$$\sum _{j\mid {z}_{j}\in {R}^{u}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j\mid {z}_{j}\in {R}^{l}}{\lambda}_{j}{z}_{j}^{M}=\sum _{j={b}_{-}+1}^{{b}_{u}}{c}_{i}{r}_{j}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i}).$$

Combining the three parts we get

$$\begin{array}{l}C(M)=\sum _{j\mid {z}_{j}\in {R}^{+}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j\mid {z}_{j}\in {R}^{-}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j\mid {z}_{j}\in {R}^{u}}{\lambda}_{j}{z}_{j}^{M}+\sum _{j\mid {z}_{j}\in {R}^{l}}{\lambda}_{j}{z}_{j}^{M}\\ ={c}_{0}{r}_{0}^{M}+\sum _{j=1}^{{b}_{-}}{c}_{i}{r}_{j}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i})+\sum _{j={b}_{-}+1}^{{b}_{u}}{c}_{i}{r}_{j}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i})\\ ={c}_{0}{r}_{0}^{M}+\sum _{j=1}^{k}{c}_{i}{r}_{j}^{M}cos(2\pi {\theta}_{i}M+{\varphi}_{i}),\end{array}$$

where *k* = *b _{u}*.

To complete this proof, we need to verify that all of the constants are in the correct ranges; i.e. that they satisfy the standard conditions. The *c _{j}*’s (

The only difference between Theorem 6 (Explicit description of sequence counting function – simple form), above, and Theorem 1. (General Solution to Sequence Counting Problem) is that the latter also tells us that *r*_{0} ≥ *r* for all *j* ≥ 1. This is the topic of Theorem 14 (Maximality of positive root), found at the end of this section. The rest of this section contains technical lemmas for use in the proof. The intuition of the proof is that if the periodic terms dominate the real term then there must be a point at which the result is negative, which is a contradiction of the fact that the function is a counting function (i.e., must map non-negative integers to non-negative integers). First, we show that the sum of cosines with rational period over a special set of indices is zero.

Let *θ* be a positive rational number given in reduced form by (*a*/*b*). Further, assume that *b*> 2. Let *k* and *w* be natural numbers and *ϕ* a real number. Then

$$\sum _{m=w+1}^{w+kb}cos(2\pi \phantom{\rule{0.16667em}{0ex}}\theta \phantom{\rule{0.16667em}{0ex}}m+\varphi )=0.$$

First, fix integers *k* and *w* and real number *ϕ*. Now note that *a* and *b* are relatively prime. This means that for every value *j* = 0,1,…, *b* − 1, there exists an *m* [*w* +1, *w* + *b*] such that *am* *j* mod *b*. Therefore,

$$\begin{array}{l}\sum _{m=w+1}^{w+b}cos(2\pi \phantom{\rule{0.16667em}{0ex}}\theta \phantom{\rule{0.16667em}{0ex}}m+\varphi )=\sum _{m=w+1}^{w+b}Re({e}^{2\pi i\theta m+\varphi})\\ =Re\phantom{\rule{0.16667em}{0ex}}\left(\sum _{m=w+1}^{w+b}{e}^{2\pi i\frac{a}{b}m+\varphi}\right)\\ =Re\phantom{\rule{0.16667em}{0ex}}\left({e}^{w+\varphi}\sum _{m=1}^{b}{e}^{2\pi i\frac{m}{b}}\right)\end{array}$$

Note that the last term is the sum of the bth roots of unity. Let *ω*_{1}, …, *ω _{b}* be the bth roots of unity. We want to show that the sum of the bth roots of unity is always zero. Note that these roots are all of the solutions to

$$\prod _{j=1}^{b}(x-{\omega}_{j})={x}^{b}-1$$

Of particular interest, however, is the term of order *b*−1 : its coefficient is the sum of all of the roots. Since this coefficient is zero, then the sum of the roots is zero.

Finally, to prove our lemma, we need only add several terms that add to zero:

$$\begin{array}{l}\sum _{m=w+1}^{w+kb}cos(2\pi \theta m+\varphi )=\sum _{m=w+1}^{w+b}cos(2\pi \theta m+\varphi )+\sum _{m=(w+b)+1}^{w+2b}cos(2\pi \theta m+\varphi )+\dots +\sum _{m=(w+(k-1)b)+1}^{w+kb}cos(2\pi \theta m+\varphi )\\ =0\end{array}$$

For simplification in the rest of this section we need to add additional notation. We denote the set of indexes of roots of the characteristic polynomial that have magnitude *r*, {*j* | *r _{j}* =

The following lemma states that the sum of *t _{i}* (

Suppose *θ* is rational for every *j*
and
$f(M)={\displaystyle \sum _{j\in {\mathcal{M}}_{r}}}{c}_{j}cos(2\pi \phantom{\rule{0.16667em}{0ex}}{\theta}_{j}\phantom{\rule{0.16667em}{0ex}}M+{\varphi}_{j})$. Then either *f* is identically zero on the integers or there exists δ > 0 and integers *w* and such that *f* (*w* + *k*) = −*δ* for *k*=1, 2,….

Since *θ _{j}* is rational, it can be represented by
$\frac{{a}_{j}}{{b}_{j}}$ (assume that this is reduced form). Let =

$$\begin{array}{l}f(M+\ell )=\sum _{j\in {\mathcal{M}}_{r}}{t}_{j}(M+\ell )\\ =\sum _{j\in {\mathcal{M}}_{r}}{c}_{j}cos(2\pi {\theta}_{j}M+2\pi {\theta}_{j}\ell +{\varphi}_{j})\\ =\sum _{j\in {\mathcal{M}}_{r}}{c}_{j}cos(2\pi {\theta}_{j}M+{\varphi}_{j})\\ =f(M)\end{array}$$

In other words, *f* is periodic with a period that divides .

Now, by Lemma 7 (Sum of rational period cosine is zero) we know that, for all integers w and k

$$\sum _{M=w+1}^{w+{kb}_{i}}{t}_{j}(M)=0.$$

In particular, because *b _{j}* divides , we know that for all

$$\sum _{M=w+1}^{w+\ell}{t}_{j}(M)=0$$

Therefore,

$$\begin{array}{l}\sum _{M=w+1}^{w+\ell}f(M)=\sum _{M=w+1}^{w+\ell}\sum _{j\in {\mathcal{M}}_{r}}{t}_{j}(M)\\ =\sum _{j\in {\mathcal{M}}_{r}}\sum _{M=w+1}^{w+\ell}{t}_{j}(M)\\ =0\end{array}$$

If *f* is identically zero on the integers then we are done. Otherwise, it follows that there exists an integer, w, for which *f* (*w*) = −*δ*< 0. However, *f* is periodic with a period that divides . Therefore we know that *w* + , *w*+ 2, … is an infinite sequence of integers such that *f* (*w*) = −*δ*< 0.

The following lemma extends the previous one by assuming that the members of are all related to each other by a complex constant.

Define
$f(M)={\displaystyle \sum _{j\in {\mathcal{M}}_{r}}}{t}_{j}(M)$, where *t _{j}* is defined as previously. Suppose

For every *j*
, we define *a _{j}* and

$$\begin{array}{l}g(x+1)=f\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}+\frac{1}{\alpha}\right)\\ =f\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}+\frac{\ell}{s}\right)\\ =\sum _{j\in {\mathcal{M}}_{r}}{c}_{j}cos\phantom{\rule{0.16667em}{0ex}}\left(2\pi {\theta}_{j}\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}+\frac{\ell}{s}\right)+{\varphi}_{j}\right)\\ =\sum _{j\in {\mathcal{M}}_{r}}{c}_{j}cos\phantom{\rule{0.16667em}{0ex}}\left(2\pi {\theta}_{j}\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}\right)+{\varphi}_{j}+2\pi \frac{{a}_{j}\ell}{{b}_{j}}\right)\end{array}$$

Note that
$\frac{\ell}{{b}_{j}}$ is an integer by the definition of . Therefore, the term
$2\pi \frac{{a}_{j}\ell}{{b}_{j}}$ is an integral multiple of 2*π* and can be ignored when inside the cosine function. Thus,

$$\begin{array}{l}g(x+1)=\sum _{j\in {\mathcal{M}}_{r}}{c}_{j}cos\phantom{\rule{0.16667em}{0ex}}\left(2\pi {\theta}_{j}\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}\right)+{\varphi}_{j}\right)\\ =g(x)\end{array}$$

In other words, $f\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}\right)$ has period 1.

Next we prove the second conclusion, that there exists a real number ε > 0 and a nonempty open interval *I* [0,1] such that
$x\in I\Rightarrow f\phantom{\rule{0.16667em}{0ex}}\left(\frac{x}{\alpha}\right)<-\epsilon $. Recall that we proved the first conclusion by showing that each cosine term in the sum defining g had an integral number of cycles for x between 0 and 1. This implies

$$\begin{array}{l}{\int}_{0}^{1}g(x)dx=\sum _{j\in {\mathcal{M}}_{r}}{\int}_{0}^{1}{c}_{j}cos(2\pi {\theta}_{j}M+{\varphi}_{j})dx\\ =0\end{array}$$

However, because *f* is not identically zero, neither is *g*. This fact, combined with a zero integral, means that *g* must be negative for some *w* (0,1). Define
$\epsilon =-\frac{1}{2}g(w)$. Being a finite sum of continuous functions, *g* inherits continuity from the cosines. Therefore, there exists an open interval I containing w such that *x* *I* *g*(*x*) < − *ε*, satisfying the second conclusion of the lemma.

The following theorem will allow us to create a form of the previous lemmas without restrictions on the members of (see the subsequent lemma).

If the numbers 1, *α*_{1},*α*_{2}, …, *α _{k}* are linearly independent over the rational numbers then for any

$$\mathrm{\Gamma}=\left\{n\in {\mathbb{Z}}^{+}\mid n{\alpha}_{j}\phantom{\rule{0.16667em}{0ex}}mod1\in {I}_{j};j=1,2,\dots ,k\right\}\phantom{\rule{0.16667em}{0ex}}is\phantom{\rule{0.16667em}{0ex}}\mathit{infinite}.$$

A more general version of this theorem can be found as Theorem 3.13 of [10], where it is attributed to Hardy and Littlewood [11].

Recall that that *c _{j}*,

$$\begin{array}{l}{c}_{j}>0,\\ 0<{\theta}_{j}\le \frac{1}{2},\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\text{and}\\ -\pi <{\varphi}_{j}\le \pi .\end{array}$$

All of the previous lemmas in this section are used in this lemma. It states that the sum of the *t _{i}* (

Assume that *c _{j}*,

$$\begin{array}{l}f(M)=\sum _{i\in {\mathcal{M}}_{r}}{c}_{i}cos(2\pi \phantom{\rule{0.16667em}{0ex}}{\theta}_{i}\phantom{\rule{0.16667em}{0ex}}M+{\varphi}_{i})\\ =\sum _{i\in {\mathcal{M}}_{r}}{t}_{i}(M)\end{array}$$

Then either *f* is identically zero on integers or there exists *δ* > 0 and an infinite sequence of positive integers *M*_{1}, *M*_{2}, … such that *f* (*M _{}*) < −

It is easier to prove this in three steps:

- Rational case:
*i**θ*_{i} - Irrational case:
*i**θ*_{i} - Mixed case: There exists
*i*,*j*such that*θ*and_{i}*θ*_{j}

Rational case: *i*
*θ _{i}* .

Define *f* as
$f(M)={\displaystyle \sum _{i\in {\mathcal{M}}_{r}}}{c}_{i}cos(2\pi \phantom{\rule{0.16667em}{0ex}}{\theta}_{i}\phantom{\rule{0.16667em}{0ex}}M+{\varphi}_{i})$. By Lemma 8 (Functions with rational period repeat negative values) *f* is either zero on the integers or there exists *δ*> 0 and integers *w* and such that *f* (*w* + *k*) = −*δ* for *k* = 1,2,…. Note that δ/2 and the sequence of integers *w* +, *w*+2, … satisfy the conclusion of the theorem.

Irrational case: *i*
*θ _{i}* .

We can decompose
into k equivalence classes,
${\mathcal{M}}_{r}^{1},{\mathcal{M}}_{r}^{2},\dots ,{\mathcal{M}}_{r}^{k}$, of indexes representing frequencies that are linear combinations of each other over the rational numbers. Select a single member from each of the equivalence classes:
${i}_{j}\in {\mathcal{M}}_{r}^{j}$ and define *s _{j}* =

To show this, we start with an *n* Γ.

$$\begin{array}{l}f(n)=\sum _{j=1}^{k}{f}_{j}\phantom{\rule{0.16667em}{0ex}}(n)\\ =\sum _{j=1}^{k}{g}_{j}\phantom{\rule{0.16667em}{0ex}}\left(n{\alpha}_{j}\right)\end{array}$$

By the definition of Γ, our selection of n means that, for each of *j*=1,2,…, (*nα _{j}* mod1)

$$\delta =\sum _{j=1}^{k}{\epsilon}_{j},$$

we have the following inequality:

$$\begin{array}{l}f\phantom{\rule{0.16667em}{0ex}}(n)=\sum _{j=1}^{k}{g}_{j}\phantom{\rule{0.16667em}{0ex}}\left({\alpha}_{j}n\right)\\ <\sum _{j=1}^{k}-{\epsilon}_{j}\\ <-\delta \end{array}$$

Recall that this holds for every *n* Γ, an infinite set, thus proving this case.

Mixed case: There exists *i*, *j*
such that *θ _{i}* and

We decompose in the same way as in Case 2. However, exactly one of the decompositions includes indices of rational frequencies; call this member of the decomposition ${\mathcal{M}}_{r}^{0}$. We decompose f as well:

$$\begin{array}{l}f(M)=\sum _{j=0}^{k}{f}_{j}(M),\\ {f}_{j}(M)=\sum _{i\in {\mathcal{M}}_{r}^{j}}{t}_{i}(M)\end{array}$$

From Case 1 we know that there exists *δ*_{0} > 0 and integers *w*_{0} and _{0} such that, for every integer *n*,

$${f}_{0}({w}_{0}+n{\ell}_{0})=-{\delta}_{0}.$$

Now we remove
${\mathcal{M}}_{r}^{0}$ from the set of
(calling the result
) and rescale *f* :

$${f}^{\prime}(x)=f({w}_{0}+x{\ell}_{0})-{f}_{0}({w}_{0}+x{\ell}_{0}).$$

Because *f* ′ only contains cosines of irrational period, it satisfies the conditions of Case 2; we conclude that there exists a positive number *ε* ′ and an infinite set, Γ′ ={*n* ′ ^{+} | *f*′(*n*′) < −*ε*′}

Select an *n*′ Γ′ and define

$$\delta ={\epsilon}_{0}+{\epsilon}^{\prime}$$

Let *n* = *w*_{0} + *n*′ _{0}. Then

$$\begin{array}{ccccc}f(n)& =& {f}_{0}(n)& +& f(n)-{f}_{0}(n)\\ \phantom{\rule{0.16667em}{0ex}}& =& {f}_{0}(n)& +& f({w}_{0}+{n}^{\prime}{\ell}_{0})-{f}_{0}({w}_{0}+{n}^{\prime}{\ell}_{0})\\ \phantom{\rule{0.16667em}{0ex}}& =& {f}_{0}(n)& +& {f}^{\prime}({n}^{\prime})\\ \phantom{\rule{0.16667em}{0ex}}& <& -{\epsilon}_{0}& +& -{\epsilon}^{\prime}\\ \phantom{\rule{0.16667em}{0ex}}& <& -\delta & \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}\end{array}$$

This completes our proof of Case 3 and, thus, the lemma.

Using the previous lemma we are now able to put exponential bounds on the size of negative contributions of Equation 2.

Suppose
$f(M)={\displaystyle \sum _{j=1}^{k}}{c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})$ where *c _{j}*,

(This lemma is identical to Lemma 11 (Sum is negative infinitely often), except for the exponential term in both the definition of *f* and the conclusion.)

Recall the notation
={*j* | *r _{j}* =

Define
${r}_{-}=\underset{j\notin {\mathcal{M}}_{{r}_{max}}}{max}\{{r}_{j}\}$ and
${c}_{-}=\underset{j\notin {\mathcal{M}}_{{r}_{max}}}{max}\{{c}_{j}\}$ and note that the contribution of all the maximal terms towards *f* is

$$\sum _{i\in {\mathcal{M}}_{{r}_{max}}}{r}_{i}^{M}{t}_{i}(M)={r}_{max}^{M}\sum _{i\in {\mathcal{M}}_{{r}_{max}}}{t}_{i}(M)$$

(where *t _{i}* is the parameterized cosine function, as defined previously). By Lemma 11 (Sum is negative infinitely often) we know that there exists a

$$\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{t}_{j}({M}_{\ell})<-2\delta \phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\ell =1,2,\dots $$

Thus,

$$\begin{array}{l}\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{r}_{j}^{{M}_{\ell}}{t}_{j}({M}_{\ell})={r}_{max}^{{M}_{\ell}}\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{t}_{j}({M}_{\ell})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\ell =1,2,\dots \\ <-2{r}_{max}^{{M}_{\ell}}\delta \end{array}$$

Therefore, given a real number *u* > 0, all integers *M**> *u* from the sequence above satisfy
$\sum _{i\in {M}_{{r}_{max}}}}{r}_{i}^{M}{t}_{i}(M)<-\delta {r}_{max}^{{M}^{\ast}$. We select a particular u:

$$u=\frac{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{c}_{-}}{\delta}\right)}{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{r}_{max}}{{r}_{-}}\right)}+1$$

By solving for (*r*_{max})* ^{u}* in the definition of u above we find that

$$\begin{array}{l}{({r}_{max})}^{u}=\frac{{c}_{-}}{\delta}{r}_{-}^{u-1}{r}_{max}\\ >\frac{{c}_{-}}{\delta}{r}_{-}^{u}\end{array}$$

or

$$1-\frac{\delta}{{c}_{-}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{-}}\right)}^{u}<0$$

We apply this inequality to get

$$\begin{array}{l}f({M}^{\ast})=\sum _{i\notin {\mathcal{M}}_{r}}{r}_{i}^{{M}^{\ast}}{t}_{i}({M}^{\ast})+\sum _{i\in {\mathcal{M}}_{r}}{r}_{i}^{{M}^{\ast}}{t}_{i}({M}^{\ast})\\ <{c}_{-}{r}_{-}^{{M}^{\ast}}-2\delta {r}_{max}^{{M}^{\ast}}\\ ={c}_{-}{r}_{-}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{-}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{-}}\right)}^{{M}^{\ast}}\right)-\delta {r}_{max}^{{M}^{\ast}}\\ ={c}_{-}{r}_{-}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{-}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{-}}\right)}^{{M}^{\ast}-u}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{-}}\right)}^{u}\right)-\delta {r}_{max}^{{M}^{\ast}}\\ <{c}_{-}{r}_{-}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{-}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{-}}\right)}^{u}\right)-\delta {r}_{max}^{{M}^{\ast}}\\ <-\delta {r}_{max}^{{M}^{\ast}}\end{array}$$

The following lemma says that the coefficient of the purely exponential term is positive and follows easily from the previous lemma.

Let *C* be the sequence counting function for
. Assume that *C* is not identically zero. Suppose that
$C(M)={c}_{0}{r}_{0}^{M}+{\displaystyle \sum _{j=1}^{k}}{c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})$ and that *c _{j}*,

By Lemma 12 (Sum of non-positive terms is negative infinitely often), we know that there exists an integer *M* ′ and a real number *δ*> 0 such that
$\sum _{i=1}^{k}}{c}_{i}{r}_{i}^{{M}^{\prime}}cos(2\pi {\theta}_{i}{M}^{\prime}+{\varphi}_{i})<-\delta {r}_{max}^{{M}^{\prime}$. This implies that

$$\begin{array}{l}C({M}^{\prime})={c}_{0}{r}_{0}^{{M}^{\prime}}+\sum _{j=1}^{k}{c}_{j}{r}_{j}^{{M}^{\prime}}cos(2\pi {\theta}_{j}{M}^{\prime}+{\varphi}_{j})\\ <{c}_{0}{r}_{0}^{{M}^{\prime}}\end{array}$$

In particular, if *c*_{0} ≤ 0 then *C*(*M*′) < 0. Since *C* is a counting function, this is a contradiction arising from the assumption that *c*_{0} ≤ 0. Thus, *c*_{0} > 0.

The following theorem describes the closed form of a sequence counting function. However, this theorem applies to the “simple form”; i.e. the case where the characteristic polynomial has distinct roots. If it does not have distinct roots then the solution involves the product of polynomials and exponentials in *M* inside the summation, instead of simply exponentials inside the summation.

Let *C* be the sequence counting function for
. Assume that *C* is not identically zero. Suppose that
$C(M)={c}_{0}{r}_{0}^{M}+{\displaystyle \sum _{j=1}^{k}}{c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})$ and that *c _{j}*,

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an *M* such that the counting function *C*(*M*) is negative.

Let ${r}_{max}=\underset{j}{max}\{{r}_{j}\}$. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in (notation defined at the beginning of this subsection).

Let

$$f(M)=\sum _{j=1}^{k}{c}_{j}{r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j}).$$

Then, by Lemma 12 (Sum of non-positive terms is negative infinitely often) we may conclude that there exists *δ*> 0 and *M*_{1} < *M*_{2} < … such that
$f({M}_{\ell})<-\delta {r}_{max}^{{M}_{\ell}}$ for =1,2,….

Therefore, given a real number *u*> 0, there exists an *M* *> *u* from our list such that
$f({M}^{\ast})<-\delta {r}_{max}^{{M}^{\ast}}$. We select a particular u:

$$u=\frac{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{c}_{0}}{\delta}\right)}{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{r}_{max}}{{r}_{0}}\right)}+1$$

By solving for
${r}_{max}^{u}$ in the definition of *u* and using the assumption that
$\frac{{r}_{max}^{u}}{{r}_{0}}>1$

we find that

$$\begin{array}{l}{r}_{max}^{u}=\frac{{c}_{0}}{\delta}{r}_{0}^{u-1}{r}_{max}\\ >\frac{{c}_{0}}{\delta}{r}_{0}^{u}\end{array}$$

or

$$1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{u}<0$$

We apply this inequality to get

$$\begin{array}{l}C({M}^{\ast})={c}_{0}{r}_{0}^{{M}^{\ast}}+f({M}^{\ast})\\ <{c}_{0}{r}_{0}^{{M}^{\ast}}-\delta {r}_{max}^{{M}^{\ast}}\\ ={c}_{0}{r}_{0}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{{M}^{\ast}}\right)\\ ={c}_{0}{r}_{0}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{{M}^{\ast}-u}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{u}\right)\\ <{c}_{0}{r}_{0}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{u}\right)\\ <0\end{array}$$

Since *C* is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Finally this leads to the main theorem:

Lemma 4 (Solving the recurrence relation – distinct roots) tells us specifically that *C* satisfies a recurrence relation. Theorem 6 (Explicit description of sequence counting function – simple form) gives us most of the relationship but Theorem 14 (Maximality of positive root) shows that the single purely exponential term dominates the rest, completing the proof.

It is possible to remove the restrictions on the recurrence relation (unique roots of the characteristic polynomial). We extend Theorem 3 (General solution to a linear homogenous difference equation), Theorem 6 (Explicit description of sequence counting function – simple form), Lemma 12 (Sum of non-positive terms is negative infinitely often), and Theorem 14 (Maximality of positive root).

Suppose *C* satisfies the linear homogenous difference equation (Equation 4). *for all positive integers M. Let p*(*x*) = *a _{n}x^{n}* +

$$C(M)=\sum _{j=1}^{n}{p}_{j}(M){z}_{j}^{M}.$$

This extension (and how to find the polynomials) can be found in [6].

Let *C*, a function of mass *M*, count how many sequences of a finite set of masses have a combined mass of *M*. Then *C* satisfies a mass recurrence relation. Then there exist real constants *k*, *c*_{0}, …, *c _{k}*,

$$C(M)={c}_{0}{r}_{0}^{M}+\sum _{j=1}^{k}{p}_{j}(M){r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j}).$$

The proof is nearly identical to that of Theorem 6 (Explicit description of sequence counting function – simple form) except that we need to use Theorem 15 (Extension of General solution to a linear homogenous difference equation) in place of Theorem 3 (General solution to a linear homogenous difference equation) and use slightly modified versions of Lemma 4 (Solving the recurrence relation – distinct roots) and Lemma 5 (Eigenvalues of conjugate pairs are conjugates), substituting polynomials for constants.

Suppose
$f(M)={\displaystyle \sum _{j=1}^{k}}{c}_{j}{p}_{j}(M){r}_{j}^{M}cos(2\pi {\theta}_{j}M+{\varphi}_{j})$ where *c _{j}*,

Define
${c}_{max}=\underset{j}{max}\{{c}_{j}\}$. Using *t _{i}*, the parameterized cosine function, as defined previously, we can rewrite

$$\begin{array}{l}f(M)=\sum _{i=1}^{k}{p}_{i}(M){r}_{i}^{M}{c}_{i}cos(2\pi {\theta}_{j}M+{\varphi}_{j})\\ =\sum _{i=1}^{k}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\\ =\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}{M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{i}^{M}{t}_{i}(M)+\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}\left({p}_{i}(M)-{M}^{{n}_{max}}\right){r}_{i}^{M}{t}_{i}(M)+\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}<{n}_{max}\right\}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)+\sum _{i\in \phantom{\rule{0.16667em}{0ex}}{\mathcal{M}}_{{r}_{max}}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\\ ={f}_{1}(M)+{f}_{2}(M)+{f}_{3}(M)+{f}_{4}(M)\end{array}$$

where

$$\begin{array}{l}{f}_{1}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}{M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{i}^{M}{t}_{i}(M)\\ {f}_{2}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}\left({p}_{i}(M)-{M}^{{n}_{max}}\right){r}_{i}^{M}{t}_{i}(M)\\ {f}_{3}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}<{n}_{max}\right\}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\\ {f}_{4}(M)=\sum _{i\notin \phantom{\rule{0.16667em}{0ex}}{\mathcal{M}}_{{r}_{max}}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\end{array}$$

The first three terms all have the same *r _{i}*, namely

The first term, *f*_{1}, is the dominant term and can be rewritten as

$$\begin{array}{l}{f}_{1}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}{M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{i}^{M}{t}_{i}(M)\\ =\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}{M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}{t}_{i}(M)\\ ={M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}{t}_{i}(M)\end{array}$$

By Lemma 11 (Sum is negative infinitely often) we know that there exists a *δ* > 0 and an infinite sequence of integers, *M*_{1} < *M*_{2} < … such that

$$\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{t}_{j}({M}_{\ell})<-4\delta \phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\ell =1,2,\dots $$

Thus,

$$\begin{array}{l}\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{M}_{\ell}^{{n}_{max}}{r}_{j}^{{M}_{\ell}}{t}_{j}({M}_{\ell})={M}_{\ell}^{{n}_{max}}{r}_{max}^{{M}_{\ell}}\sum _{j\in {\mathcal{M}}_{{r}_{max}}}{t}_{j}({M}_{\ell})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\ell =1,2,\dots \\ <-4{M}_{\ell}^{{n}_{max}}{r}_{max}^{{M}_{\ell}}\delta \end{array}$$

Therefore, given a real number *u* > 0, all integers *M*^{*} > *u* from the sequence above satisfy

$$\begin{array}{l}{f}_{1}({M}^{\ast})=\sum _{i\in {M}_{{r}_{max}}}{r}_{i}^{{M}^{\ast}}{t}_{i}({M}^{\ast})\\ <-4{M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}\delta .\end{array}$$

We now want to select a particular *u* that makes the non-dominant terms *f*_{2}, *f*_{3}, *f*_{4}, sufficiently small compared to *f*_{1}.

First, note that *f*_{2} differs from *f*_{1} only in the polynomial factors (*p _{i}* (

$$\begin{array}{l}{f}_{2}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}\left({p}_{i}(M)-{M}^{{n}_{max}}\right){r}_{i}^{M}{t}_{i}(M)\\ <\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}\left({p}_{i}(M)-{M}^{{n}_{max}}\right){r}_{i}^{M}{c}_{max}\\ <\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}={n}_{max}\right\}}\frac{\delta}{k}{M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}\\ <\delta {M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}\end{array}$$

Similarly, because of the maximality of *n*_{max}, there exists a *u*_{3} such that

$$\begin{array}{l}{f}_{3}(M)=\sum _{\left\{i\in {\mathcal{M}}_{{r}_{max}}\mid {n}_{i}<{n}_{max}\right\}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\\ <\delta {M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}\end{array}$$

Finally, because of the maximality of *r*_{max}, there exists a *u*_{4} such that for all *M* > *u*_{3},

$$\begin{array}{l}{f}_{4}(M)=\sum _{i\notin \phantom{\rule{0.16667em}{0ex}}{\mathcal{M}}_{{r}_{max}}}{p}_{i}(M){r}_{i}^{M}{t}_{i}(M)\\ <\delta {M}^{{n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{M}\end{array}$$

We select a *u* = max(*u*_{2}, *u*_{3}, *u*_{4}).

We apply the properties of *u* and we get the following for every *M** > *u* where *M** {*M*_{1}, *M*_{2}, …}:

$$\begin{array}{l}f({M}^{\ast})={f}_{1}({M}^{\ast})+{f}_{1}({M}^{\ast})+{f}_{1}({M}^{\ast})+{f}_{1}({M}^{\ast})\\ <-4\delta {M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}+\delta {M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}+\delta {M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}+\delta {M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}\\ <-\delta {M}^{\ast {n}_{max}}\phantom{\rule{0.16667em}{0ex}}{r}_{max}^{{M}^{\ast}}\end{array}$$

Under the conditions of Theorem 16 (Extension of Theorem 6), *r*_{0} ≥ *r _{j}* for

Assume that this is not the case, that the positive root is not maximal. We will show that this leads to a contradiction, namely that we can find an *M* such that the counting function *C*(*M*) is negative.

Let ${r}_{max}=\underset{j}{max}\{{r}_{j}\}$. The statement that the positive root is not maximal is equivalent to stating that the index of the positive root, 0, is not included in (notation defined at the beginning of this subsection).

Let

$$f(M)=\sum _{j=1}^{k}{p}_{j}(M){r}_{j}^{M}{c}_{j}cos(2\pi {\theta}_{j}M+{\varphi}_{j}).$$

Then, by Lemma 17 (Extension of Lemma 12) we may conclude that there exists *δ* > 0 and *M*_{1} < *M*_{2} < … such that
$f({M}_{\ell})<-\delta {M}_{\ell}^{{n}_{max}}{r}_{max}^{{M}_{\ell}}$ for =1,2,…. We now choose a particular member *M*^{*}, from our list with the property that

$${M}^{\ast}>max\phantom{\rule{0.16667em}{0ex}}\left\{1,\frac{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{c}_{0}}{\delta}\right)}{log\phantom{\rule{0.16667em}{0ex}}\left(\frac{{r}_{max}}{{r}_{0}}\right)}+1\right\}$$

By solving for
${r}_{max}^{{M}^{\ast}}$ in the definition of *u* and using the assumption that
$\frac{{r}_{max}}{{r}_{0}}>1$

we find that

$$\begin{array}{l}{r}_{max}^{{M}^{\ast}}>\frac{{c}_{0}}{\delta}{r}_{0}^{{M}^{\ast}-1}{r}_{max}\\ >\frac{{c}_{0}}{\delta}{r}_{0}^{{M}^{\ast}}\end{array}$$

or

$$1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{{M}^{\ast}}<0$$

We apply this inequality (and the fact that *M*^{*} > 1) to get

$$\begin{array}{l}C({M}^{\ast})={c}_{0}{r}_{0}^{{M}^{\ast}}+f({M}^{\ast})\\ <{c}_{0}{r}_{0}^{{M}^{\ast}}-\delta {r}_{max}^{{M}^{\ast}}{M}^{\ast {n}_{max}}\\ <{c}_{0}{r}_{0}^{{M}^{\ast}}-\delta {r}_{max}^{{M}^{\ast}}\\ ={c}_{0}{r}_{0}^{{M}^{\ast}}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{\delta}{{c}_{0}}\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{r}_{max}}{{r}_{0}}\right)}^{{M}^{\ast}}\right)\\ <0\end{array}$$

Since *C* is a counting function, it must be nonnegative; the assumption that the positive root was not maximal has led to a contradiction. Therefore, the positive root is maximal.

Suppose we have an alphabet
={a_{1}, a_{2}, a_{3}, a_{4}, a_{5}} and define the masses *m*_{1} = *m*_{2} = *m*_{3} = 4 *and m*_{4} = *m*_{5} = 6. Then the mass recurrence relation is *C*(*M*) = 3*C*(*M* − 4) + 2 *C*(*M* − 6) and the characteristic polynomial is *p*(*x*) = *x*^{6} − 3*x*^{2} − 2. The roots of p are
$\pm \sqrt{2}$ and ±*i* (twice). The explicit form of *C* is

$$\begin{array}{l}C(M)={c}_{0}{\sqrt{2}}^{M}+{c}_{1}\phantom{\rule{0.16667em}{0ex}}{\left(-\sqrt{2}\right)}^{M}+{c}_{2}cos\phantom{\rule{0.16667em}{0ex}}\left(\frac{\pi}{2}M+{\varphi}_{2}\right)+{c}_{3}Mcos\phantom{\rule{0.16667em}{0ex}}\left(\frac{\pi}{2}M+{\varphi}_{3}\right)\\ ={c}_{0}{\sqrt{2}}^{M}+{c}_{1}\phantom{\rule{0.16667em}{0ex}}{\left(\sqrt{2}\right)}^{M}cos(\pi M+0)+{c}_{2}cos\phantom{\rule{0.16667em}{0ex}}\left(\frac{\pi}{2}M+{\varphi}_{2}\right)+{c}_{2}Mcos\phantom{\rule{0.16667em}{0ex}}\left(\frac{\pi}{2}M+{\varphi}_{3}\right).\end{array}$$

Note, in particular, the last term, which is linear in *M*; it is of the form *c*_{3}*q*(*M*)1* ^{M}* cos(

As an application of the above work, we analyze the mass distributions of peptides/proteins. First we recognize that the objects forming the alphabet are the residues of the amino acids. Proposition 2 (Mass recurrence relation) defines the relationship for this special case as

$$\begin{array}{l}C(M)=C(M-57)+C(M-71)+C(M-87)+C(M-97)+C(M-99)\\ +C(M-101)+C(M-103)+2C(M-113)+C(M-115)+C(M-131)\\ +C(M-137)+2C(M-128)+C(M-129)+C(M-147)+C(M-156)\\ +C(M-163)+C(M-186),\end{array}$$

where the numbers 57, 71, 87, …, are the masses of amino acid residues measured in Daltons, to the nearest integer. This allows us to analyze the mass distribution of all possible (theoretical) peptides with mass distributions of peptides that appear within a given organism (see Figure 1).

Next, we determine the characteristic polynomial,

$${x}^{186}-{x}^{129}-{x}^{115}-{x}^{99}-{x}^{89}-{x}^{87}-{x}^{85}-{x}^{83}-2{x}^{73}-{x}^{72}-{x}^{71}-2{x}^{58}-{x}^{57}-{x}^{55}-{x}^{49}-{x}^{39}-{x}^{30}-{x}^{23}-{x}^{0}$$

We can then approximate the roots (Figure 2) of the characteristic polynomial using a solver, such as the “roots” function in MATLAB [12]. In this case, as in all we have observed in biological applications, the characteristic polynomial has distinct roots. Therefore, we can apply Theorem 1. (General Solution to Sequence Counting Problem), yielding an exponential term summed with a collection of terms that contain a periodic function multiplied by an exponential function, the former dominating the latter (Figure 3). If we write the exponential terms in Equation 2 by powers of 2, then we can easily read the doubling time of each term. Furthermore, if we write the frequencies *θ _{j}* in terms of wavelength, then we can emphasize the period of each term. With these conventions, the first three terms are

$$C(M)=\frac{1}{95.238}{2}^{{\scriptstyle \frac{M}{24.67}}}+\frac{1}{47.04}{2}^{{\scriptstyle \frac{M}{27.53}}}cos\phantom{\rule{0.16667em}{0ex}}\left(1.032+\phantom{\rule{0.16667em}{0ex}}\left(\frac{360}{14.28}\right)M\right)+\frac{1}{45.41}{2}^{{\scriptstyle \frac{M}{32.67}}}cos\phantom{\rule{0.16667em}{0ex}}\left(13.676+\phantom{\rule{0.16667em}{0ex}}\left(\frac{360}{2.023}\right)M\right)+\dots $$

In total, there is one purely exponential term (which doubles in size every 24.67 Da) and 93 periodic terms we can consider, each of which gives a different periodicity and dominance in the distribution of masses. However, we found that the first few terms are sufficient to approximate the distribution of peptides, at least when rounding the amino acid masses to the nearest Dalton. In particular, the dominant periodic term has a period of 14.28 Da. Solutions for higher mass accuracy are possible by changing units. More examples and details are available in [5].

We expect that these findings could improve estimates of distributions, thus improving a class of algorithms that score peptide identifications in mass spectrometry [13,14,15], those that require knowledge of the number of possible peptide sequences within a particular mass range.

SLH was supported by NLM training grant Computation and Informatics in Biology and Medicine 5T15LM007359. GC and SLH were supported by NSF DBI-0701846.

1. Hubler SL, Jue A, Keith J, McAlister GC, Craciun G, et al. Valence parity renders z(center dot)-type ions chemically distinct. Journal of the American Chemical Society. 2008;130:6388–6394. [PMC free article] [PubMed]

2. Scigelova M, Makarov A. Orbitrap Mass Analyzer – Overview and Applications in Proteomics. Proteomics. 2006;6:16–21. [PubMed]

3. Coon JJ. Collisions or Electrons? Protein Sequence Analysis in the 21st Century. Analytical Chemistry. 2009;81:3208–3215. [PMC free article] [PubMed]

4. Tipton JDMY, Hendrickson CL, Marshall AG. Utility of the Valence Parity Rule for ECD/AIECD FT-ICR MS: H-dot Atom Transfer, Isobars, and Isomers for Peptide Analysis; 2008 December 7–10; Madison, WI.

5. Hubler SL. PhD Thesis. Madison, WI: University of Wisconsin-Madison; 2010. Mathematical Analysis of Mass Spectra Data.

6. Balakrishnan VK. Introductory discrete mathematics. New York: Dover Publications; 1996. p. xiv.p. 236.

7. Cormen TH. Introduction to algorithms. Cambridge, MA: The MIT Press; 2009.

8. Kelley WG, Peterson AC. Difference equations : an introduction with applications. Boston: Academic Press; 1991. p. xi.p. 455.

9. Hubler SL, Craciun G. Periodic Patterns in Distributions of Peptide Masses. 2011. [PubMed]

10. Bergelson V. Minimal idempotents and ergodic Ramsey theory. In: Bezuglyi S, Kolyada S, editors. Topics in Dynamics and Ergodic Theory. Cambridge University Press; 2003. pp. 8–39.

11. Hardy G, Littlewood J. Some problems of diophantine approximation (Part I) Acta Mathematica. 1914;37:155–191.

12. MATLAB. 7.4. Nattick, MA: The Mathworks Inc; 2007.

13. Alves G, Ogurtsov AY, Wu WW, Wang G, Shen RF, et al. Calibrating e-values for MS2 database search methods. Biology Direct. 2007;2 [PMC free article] [PubMed]

14. Kim S, Bandeira N, Pevzner PA. Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for de Novo Peptide Sequencing and Identification. Mol Cell Proteomics. 2009;8:1391–1400. [PMC free article] [PubMed]

15. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, et al. Open mass spectrometry search algorithm. Journal of Proteome Research. 2004;3:958–964. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |