Home | About | Journals | Submit | Contact Us | Français |

**|**Bioinformatics**|**PMC2668612

Formats

Article sections

- Abstract
- 1 INTRODUCTION
- 2 METHODS
- 3 SOFTWARE
- 4 RESULTS
- 5 DISCUSSION
- 6 CONCLUSION
- Supplementary Material
- REFERENCES

Authors

Related links

Bioinformatics. 2008 August 15; 24(16): 1772–1778.

Published online 2008 June 16. doi: 10.1093/bioinformatics/btn308

PMCID: PMC2668612

NIHMSID: NIHMS78997

Associate Editor: Thomas Lengauer

Received 2008 April 23; Revised 2008 June 5; Accepted 2008 June 10.

Copyright © 2008 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

**Motivation:** A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward–backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis.

**Results:** Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10 000.

**Availability:** Sample C++-code for optimal backtrace is available in the Supplementary Materials.

**Contact:** leen/at/cs.rpi.edu

**Supplementary information:** Supplementary data is available at *Bioinformatics* online.

Dynamic programming algorithms are often used to find an optimal solution by backtracking through intermediate values of the computation. Typical examples are that of chain matrix multiplication, string algorithms such as longest common subsequence, the Viterbi (1967) algorithm for hidden Markov models, and sequence alignment algorithms such as those of Needleman and Wunsch (1970) and Smith and Waterman (1981). Dynamic programming algorithm backtraces are also used for random sampling, where the score for each possible backtrace path is deemed to be (proportional to) the probability of the path, and it is desired to choose a path according to that probability distribution. A typical example is the algorithm of Ding and Lawrence (1999) for the sampling of RNA secondary structure. A third use for dynamic programming backtraces is as the second step of a forward–backward algorithm, such as that of Baum–Welch (Baum *et al.* (1970) for finding the parameters of a hidden Markov model. Sometimes a trade-off with run time allows a problem to be solved without a backtrace through stored results, e.g. sequence alignment (Durbin *et al.*, 2006 Section 2.6; Myers and Miller, 1998; Waterman, 1995, page 211) and Baum–Welch (Miklós and Meyer, 2005), but this is not always the case.

When there is not enough space to store all intermediate results in high-speed memory, checkpointing strategies are employed, whereby selected stages of the computation are stored, and missing information is recomputed from these checkpoints on an as-needed basis. A stage of the computation, also known as a frontier, is a set of intermediate values that are sufficient for making subsequent computations. For instance, in a 2D dynamic programming algorithm that computes a small number of values for each (*i,j*) in a grid from the neighboring ‘earlier’ values associated with (*i*−1,*j*), (*i,j*−1) and (*i*−1,*j*−1), we could define a stage as a row of the computation grid. In this case, stage *k* would be the values associated with the cells {(*k,j*) : *j*=*j*_{min} … *j*_{max}}, and the stage *k* values would be sufficient for computing values for cell (*i,j*) for any *i*>*k*. Similarly one could use columns to define stages. In many cases it makes sense to have overlapping stages; in the above example stage *k* might be the *k*-th diagonal frontier, i.e. the computation values associated with the cells {(*i,j*) : *i*+*j*{*k*−1,*k*}}.

Herein we will describe an optimal checkpointing strategy that provably minimizes the number of stage re-computations necessary in performing a backtrace with limited high-speed memory. The algorithm is simple and efficient. Note that, because this limited-memory approach can be used to allow significant increases in locality of reference, it can provide more efficient computations even when the amount of high-speed memory might otherwise be considered sufficient.

We build upon a previous approach that is fairly memory-efficient, which is described in *Bioinformatics* (Grice *et al.*, 1997; Wheeler and Hughey, 2000). With memory enough to store *M* stages, their ‘2-level’ algorithm uses the memory to compute the first *M* stages, but then retains only the *M*-th stage as a checkpoint, discarding the previous ones. Using the remaining *M*−1 memory locations, the algorithm computes stages *M*+1, …, 2*M*−1, and then uses the (2*M*−1)th stage as the second checkpoint. It continues this process, using the (*M*+(*M*−1)+(*M*−2))th stage as its third checkpoint, and so forth, up to and including *M*+(*M*−1)+···+1=*M*(*M*+1)/2 as its *M*-th checkpoint. Thus, if *N*=*M*(*M*+1)/2 stages are needed in the backtrace, they can be achieved with memory locations; in the backtrace, each missing stage is computed using the space freed by discarding the checkpoints that are no longer needed.

Because the algorithm needs to compute each stage at most twice, once in the forward pass to create the checkpoints and once during the backtrace, the overall number of stage computations of the memory-reduced algorithm is at most double what it would have been.

Wheeler and Hughey (2000) also generalize their 2-level algorithm to an ‘*L*-level’ algorithm, where *L* is any positive integer. With *M* memory locations, the *L*-level algorithm can compute

(1)

stages, where this formula works for any integers *L* and *M* so long as the binomial coefficient is defined to be *n*!/*d*!(*n*−*d*)! when *d*≥0 and *n*−*d*≥0, and zero otherwise. The asymptotic limit is as *M*→∞ for fixed *L*. For the *M, L*≥1 algorithm, the *k*-th stage to be checkpointed is

(2)

That is, the first stage to be a checkpoint is the last stage that would be a checkpoint under the (*L*−1)-level algorithm. Generally, the *k*-th stage to be checkpointed is beyond the (*k*−1)th checkpoint by an amount that would be the last checkpoint for an (*L*−1)-level algorithm that uses the remaining *M*-(*k*−1) memory locations. Equation (2) solves to

(3)

The number of stage computations for the *L*-level algorithm to compute *N*_{WH}(*M,L*) stages using *M* memory locations is given by the recursion

(4)

because the *L*-level algorithm first computes all *N*_{WH}(*M,L*) stages, to get the *L*-level checkpoints; it then provides access to the stages in reverse order by working with the *L*-level checkpoints in reverse order; the *L*-level algorithm uses the (*L*−1)-level algorithm to generate the missing intervening stages. However, 1 is subtracted, because the last computation of each (*L*−1)-level algorithm invocation produces an *L*-level checkpoint that we already had available. Thus, this last computation for each (*L*−1)-level algorithm invocation is not performed. Equation (4) solves to

(5)

where this formula works for any integers *L* and *M* so long as the trinomial coefficient is defined to be (*a*+*b*+*c*)!/*a*!*b*!*c*!. when *a, b, c* ≥ 0, and zero otherwise. Thus we have a multiplier for the number of stage computations of approximately *L* for memory locations. Computed values of *N*_{WH}(*M,L*) and *T*_{WH}(*M,L*) for small *M* and *L* are given in Table 1 and plotted in Figure 1.

Low memory comparison of algorithms. This figure exhibits the effect on the run time at low memory levels. Clockwise from the top, the curves come in 12 pairs, one each for *M*=2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15 and 20 memory locations. Within each pair, **...**

The main drawback to the *L*-level algorithm is that it can perform badly for a value of *N* that falls between *N*_{WH}(*M,L*−1) and *N*_{WH}(*M,L*), for some *L*. Wheeler and Hughey, (2000) propose an ‘(*L,L*−1)-level’ algorithm that performs better for these intermediate values of *N*, but it is not optimal. Wheeler and Hughey (2000) also discuss (*L, L*−1, …)-level algorithms and an optimal checkpointing algorithm, but do not provide a quick computation for choosing optimal checkpoints, nor do they give formulae to compute the number of stage computations for general values of *M* and *N*.

The choice of the stages to checkpoint can be framed as an optimization problem. We write *T*(*M,N*) for the number of stage computations that are needed by the optimal checkpointing algorithm for a backtrace through *N* stages, using room for *M* stages. When *N*≤*M*, there is ample memory, and *T*(*M,N*)=*N*. At the other extreme, if *M*=1 there is room to compute only one stage, and *N*>1 stages cannot be computed even with an infinite amount of time available. [Following the lead of Wheeler and Hughey (2000) we bar in-place calculations.]

If *N*>*M*>1, and if we choose *C* as the first stage to checkpoint, then we begin by computing the first *C* stages, 1, …, *C*, by alternating the use of two memory locations. We store stage *C*, discarding the previous ones. Then, using the remaining *M*−1 memory locations, we recursively perform an optimal backtrace on the final *N*−*C* stages, *C*+1, …, *N*. Next, we present the retained stage *C* to the user. Finally, we discard stage *C* and recursively perform an optimal backtrace on the initial *C*−1 stages, using the full *M* memory locations. Thus, with an optimal choice for *C*, we have the recursion for *T*(*M,N*):

(6)

Note that Wheeler and Hughey (2000) give a different recursion for optimal checkpointing. Translated into our notation, their recursion is *T*(*M,N*)=min_{C}{*C*+*T*(*M*−1,*N*−*C*)+*T*(*M,C*)−1}. This error may have impeded their further progress.

A straightforward computation of this recursion would require (*M**N*^{2}) time (Wheeler and Hughey, 2000). However, in the following we will show a mathematical solution to the recursion that permits the calculation of all the needed checkpoints in (*N*) time.

As with the analysis of the *L*-level algorithm of Wheeler and Hughey (2000), we find it easier to initially restrict our attention to special values of *N*. The main contribution of this work is our subsequent generalization to arbitrary values of *N*.

With storage for *M*≥1 stages, and for any integer *L*≥1, we will define a special number of stages *N*_{opt}(*M,L*), and we will calculate *T*_{opt}(*M,L*), the number of stage computations required for a backtrace through *N*_{opt}(*M,L*) stages using *M* memory locations. Let

(7)

For *M,L*>1, with *N*=*N*_{opt}(*M,L*) stages, we set

(8)

the unique optimum checkpoint stage for the problem with *M* memory locations and *N*=*N*_{opt}(*M,L*) stages (see Appendix for proofs). Plugging *C*(*M,N*)=*N*_{opt}(*M,L*−1)+1 and *N*−C(*M,N*)=*N*_{opt}(*M*−1,*L*) into Equation (6), we obtain

(9)

This solves to

(10)

(11)

where Equation (11) is defined for *M*≥2 only. Computed values of *N*_{opt}(*M,L*) and *T*_{opt}(*M,L*) for small *M* and *L* are given in Table 1 and plotted in Figure 1.

As with the *L*-level algorithm of Wheeler and Hughey (2000), with the optimal checkpointing algorithm, we have a multiplier for the number of stage computations of approximately *L* for memory locations. However, we shall now see that, with the optimal checkpointing algorithm, we easily achieve this multiplier for the number of stage computations even when the number of stages *N* is arbitrary.

When *N* falls between two optimal values, *N*_{opt}(*M,L*) and *N*_{opt}(*M,L*+1), we can compute the number of stage computations *T*(*M,N*) by linear interpolation between *N*_{opt}(*M,L*) and *N*_{opt}(*M,L*+1) (see Appendix for proofs). Noting that

(12)

we derive

(13)

for *N*≥*M*>1.

Furthermore, for *N* between *N*_{opt}(*M,L*) and *N*_{opt}(*M,L*+1), it is optimal to choose the initial checkpoint *C*(*M,N*) so that *C*(*M,N*)−1 and *N*−*C*(*M,N*) fall between the values that they would have had to equal, if *N* had equaled *N*_{opt}(*M,L*) or *N*_{opt}(*M,L*+1). That is, we must choose *C*(*M,N*) so as to simultaneously satisfy

(14)

and

(15)

In practice, we choose the largest legal value,

(16)

The optimal checkpointing algorithm is presented in Figure 2. Note that we include not just *M*, and *N*, but also a level *L* and a special stage *N*_{opt}(*M,L*) in the parameter list for the recursive subroutine, because the availability of values for *L* and *N*_{opt}(*M,L*) greatly speeds the calculations of *N*_{opt}(*M*−1,*L*) and *N*_{opt}(*M,L*−1), which are needed in the calculations of optimal checkpoints:

(17)

(18)

It is imperative that the required calculations be impervious to integer overflows. We initially prevented overflow in integer calculations, such as *abc*/*de* for Equations (17) and (18), by canceling all common factors between each variable in the numerator and each variable in the denominator. This approach requires 3×2=6 invocations of Euclid's algorithm for finding a greatest common divisor. Such a procedure leaves the value of each denominator variable at 1 and, when *N*_{opt}(*M,L*)+1 can be represented as an integer, the numerator variable values are sufficiently small enough to permit all the needed computations—so long as extra care is taken when verifying that the initial value of *N* is less than *N*_{opt}(*M,L*+1).

To handle computations where the number of stages exceeds the largest unsigned integer, often 2^{32}−1≈4×10^{9}, we modified our C++software implementation to use a C++class that manipulates integers of arbitrary size.

Sample C++-code for optimal backtrace is available in the Supplementary Materials.

We applied the algorithm to pairwise local alignments (Smith and Waterman, 1981) of sequences of up to 3000 nucleotides of human DNA with sequences of up to 2864 nucleotides of rodent DNA. For the largest of the alignments, to keep within a 125 MB limit for total memory use, we restricted ourselves to *M*=486 stages of storage for the *N*=2864 stages. For these choices, *L*=1, *N*_{opt}(*M*=486, *L*=1)=486, and *T*(*M*=486, *N*=2864)=5242. Thus, the multiplier for the number of stage computations is *T*/*N*=5242/2864≈1.83 for memory use *M*/*N*=486/2864≈17%. The algorithm ran in 70 s, but would have run much more slowly if it had tried to use memory for all 2864 stages, because the resulting memory swapping would have been onerous. In contrast, the 2-level algorithm of (Wheeler and Hughey, 2000) computes checkpoints for stages 486, 971, 1455, 1938 and 2420, and requires two computations for all other stages with index under 2420. Thus its total number of stage computations is 2864+(2420−5)=5279, only slightly worse than 5242.

The same calculation performed on a pair of sequences, each 10 000 nucleotides long, takes 12 min to run in 125 MB of memory, a memory size sufficient to store only 138 stages instead of the full 10 000 stages. For a problem of this size, *L*=2, *N*_{opt}(*M*=138, *L*=2)=9728, and *T*(*M*=138, *N*=10 000)=20 134. Thus, the multiplier for the number of stage computations is *T*/*N*=20 134 / 10 000≈2.01 for memory use*M*/*N*=138/10 000≈1.4%. In contrast, the 3-level algorithm ofWheeler and Hughey (2000) is at a particular disadvantage in that it computes its only 3-level checkpoint at stage 9591, with subsequent 2-level checkpoints at 9728, 9864, and 9999. The algorithm requires 29 448 stage computations, significantly worse than 20 134. With 1 GB of memory, sufficient for storing 1104 stages for the pairwise alignment of sequences of length 10 000 nucleotides, the optimal checkpointing algorithm requires 18 896 stage computations, whereas the 2-level algorithm of Wheeler and Hughey (2000) requires 19 891 stage computations, almost 1000 more.

On similar datasets, using a probabilistic model that defines a probability distribution on the set of possible alignments, we used the optimal backtrace algorithm to compute a centroid (Ding and Lawrence, 1999), also known as a posterior decoding (Miyazawa, 1995). This task requires a dynamic programming calculation during the backtrace that is comparable to the calculation performed during the forward pass. With sufficient memory, the total computation would require 2*N* stage computations, thus the multiplier for the number of stage computations with limited memory is

(19)

this value is better than *L*, the multiplier of the number of stage computations for the cheaper backtrace tasks.

We also can draw independent samples from the probability distribution on the set of possible alignments. Here, the run time is as slow as the centroid calculation only when the number of samples to be drawn is of the order of the smaller of the two sequence lengths.

We have provided an algorithm for optimal backtrace through a dynamic programming algorithm when memory is limited. The algorithm improves upon previous work via the simplicity and speed of the calculation for the index of the optimal checkpoint, and via achievement of optimal performance for a problem of arbitrary size.

A few variations are worthy of consideration. Generally, for backtrace computations, whether or not they are achieved with the optimal checkpointing algorithm described here, the first stage is computed from initial or boundary conditions, and each subsequent stage is computed from the immediately preceding stage. Thus, at least conceptually, the first stage requires special treatment. If this distinction makes implementation of the *advance* callback routine difficult, it may be prudent to compute and permanently store the first stage in the first memory location, and to run the optimal checkpointing algorithm so as to provide an optimally computed backtrace through the remaining *N*−1 stages using *M*−1 memory locations. The number of stage computations for this approach is 1+*T*(*M*−1,*N*−1).

As already described for both optimal checkpointing and the *L*-level algorithm of Wheeler and Hughey (2000), in the limit as the number of memory locations *M* goes to infinity with a fixed multiplier *L* for the number of stage computations, we can backtrace through *N* ~ *M*^{L} / *L*! stages with *T* ~ *M*^{L} / (*L*−1)! stage computations. However, for the case when memory is severely limited, it is instructive to look at the asymptotics for a fixed value of *M*, with *L* tending to infinity. For this situation we have

(20)

(21)

(22)

(23)

Thus, in these low-memory situations, the optimal algorithm is asymptotically faster than the *L*-level algorithm of Wheeler and Hughey (2000), even when the latter is applied to its optimal problem sizes *N*_{WH}(*M,L*). The speed multiplier is , which is approximately 1+(0.693}/(*M*−1.347)) for moderate values of *M*. See Table 1 and Figure 1.

When high-speed memory is limited, dynamic programming algorithm backtraces make use of checkpoints for re-computing needed intermediate values. We have provided an easy-to-use algorithm for optimally selecting the checkpoints.

We thank the Computational Molecular Biology and Statistics Core Facility at the Wadsworth Center for the computing resources for the pairwise local sequence alignment calculations.We wish to acknowledge use of the Maple software package by Waterloo Maple, Inc., without which the calculations in this article would have been much more difficult.

*Funding*: This research was supported by the National Institutes of Health / National Human Genome Research Institute grant K25 HG003291.

*Conflict of Interest*: none declared.

So that this section is self-contained, we will restate the relevant assumptions and definitions.

We define the following functions and will prove their usefulness presently.

(24)

(25)

We take as given that the number of stage computations required satisfies:

(26)

where a choice of value for *C* that minimizes this last expression for a given *N* and *M* is called an optimal first checkpoint, *C*(*M,N*).

*Let N be the number of stages to be made available in a backtrace using storage for M stages. For a choice of N and M, define*

(27)

*We will show that*

(28)

*for N* ≥ *M* ≥ 1, *where the second term is deemed zero if N*=*N*_{opt}(*M,L*), *even when L*=+∞. *We will show that, when N*>*M*>1, *the value of an optimal first checkpoint C*(*M,N*) *satisfies*

(29)

*and*

(30)

*The proof will be by induction on N and M. We have two base cases: first, a base case for a low value of N and, second, a base case for a low value of M*.

Base case, *N*=*M* ≥ 1

We wish to show that Equation (A5) correctly matches Equation (A3) when *N*=*M* ≥ 1.

We put *L*=1 into Equation (A1) for *N*_{opt}(*M,L*) and Equation (A1) for *T*_{opt}(*M,L*):

(31)

(32)

where the trinomial terms with *L*−2 vanish by our convention. Thus, for this base case, Equation (A4) indicates that *L*=1. It follows that Equation (A5) gives *T*(*M,N*)=*M*, which is in agreement with Equation (A3) because *N*=*M*.

Base Case, *N*>*M*=1

We wish to show that Equation (A5) correctly matches Equation (A3) when *N*>*M*=1.

We put *M*=1 into Equation (A1) for *N*_{opt}(*M,L*) and Equation (A2) for *T*_{opt}(*M,L*):

(33)

(34)

Thus, for this base case, Equation (A4) indicates that *L*=+∞. Because *N* is strictly greater than *N*_{opt}(*M,L*) (for any *L*) it follows that Equation (A5) gives *T*(*M*=1,*N*>1)=+∞, in agreement with Equation (A3), as desired.

General Case, *N*>*M*>1

We assume that the theorem is proved true for *N*′ ≥ *M*′ ≥ 1, when *N*′≤*N* and *M*′≤*M* but (*N*′,*M*′) ≠ (*N, M*). We will show that checkpoint choices *C*′ and *C*″=*C*′+1 will give the same number of stage computations if both *C*′ and *C*″ satisfy the restrictions of Equations (A6) and (A7). We will show that if *C*″ is too high to satisfy these restrictions then use of *C*′ will lead to fewer stage computations; we will show that if *C*′ is too low to satisfy these restrictions then use of *C*″ will lead to fewer stage computations. Together these will demonstrate that the restrictions are optimal and that they can be satisfied simultaneously. We will then show that satisfaction of the restrictions implies Equation (A5).

Let *T*′(*M,N*) and *T*″(*M,N*) be the number of stage computations required given that *C*′ or *C*″, respectively, is chosen as the first checkpoint, and optimal checkpoints are used in all remaining subproblems:

(35)

and

(36)

Set *L*_{1} to be the unique integer such that

(37)

and observe that

(38)

Set *L*_{2} to be the unique integer such that

(39)

and observe that

(40)

Using our induction hypothesis [Equation (A5)], we thus compute that

(41)

(42)

(43)

We observe that when both *C*′ and *C*″ satisfy the restrictions given as Equations (A6) and (A7) then *L*_{1}=*L*_{2} and the stages *C*′ and *C*″ make equally good choices as a checkpoint. When *C*″ is too large for the restrictions then *L*_{1}>*L*_{2}, and when *C*′ is too small for the restrictions then *L*_{1}<*L*_{2}. Thus, we have verified that the restrictions define an optimal first checkpoint.

Finally, using the induction hypothesis, we compute *T*(*M,N*) to verify that it yields Equation (A5), using a value of *C* satisfying the restrictions of Equations (A6) and (A7).

(44)

(45)

(46)

(47)

as desired. For the last equality, we have used *N*_{opt}(*M,L*)=*N*_{opt}(*M,L*)+*N*_{opt}(*M,L*1)+1 and *T*_{opt}(*M,L*)=*T*_{opt}(*M*−1,*L*)+*T*_{opt}(*M,L*−1)+*N*_{opt}(*M,L*−1)+1, which are easily proved from Equations (A1) and (A2).

- Baum LE, et al. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 1970;41:164–171.
- Ding Y, Lawrence CE. ABayesian statistical algorithm for RNA secondary structure prediction. Comput. Chem. 1999;23:387–400. [PubMed]
- Durbin R, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 2006.
- Grice JA, et al. Reduced space sequence alignment. Comput. Appl. Biosci. 1997;13:45–53. [PubMed]
- Miklós I, Meyer IM. A linear memory algorithm for Baum-Welch training. BMC Bioinformatics. 2005;6:231. [PMC free article] [PubMed]
- Miyazawa S. A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 1995;8:999–1009. [PubMed]
- Myers EW, Miller W. Optimal alignments in linear space. Comput. Appl. Biosci. 1998;4:11–17. [PubMed]
- Needleman SB. Ageneral method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. [PubMed]
- Smith TF, Waterman MS. Comparison of biosequences. Adv. Appl. Math. 1981;2:482–489.
- Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE T. Inform. Theory. 1967;13:260–269.
- Waterman MS. Introduction to Computational Biology. Maps, Sequences and Genomes. London, UK: Chapman & Hall/CRC; 1995.
- Wheeler R, Hughey R. Optimizing reduced-space sequence analysis. Bioinformatics. 2000;16:1082–1090. [PubMed]

Articles from Bioinformatics are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |