PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bioinfoLink to Publisher's site
 
Bioinformatics. 2008 August 15; 24(16): 1772–1778.
Published online 2008 June 16. doi:  10.1093/bioinformatics/btn308
PMCID: PMC2668612
NIHMSID: NIHMS78997

Memory-efficient dynamic programming backtrace and pairwise local sequence alignment

Abstract

Motivation: A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward–backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis.

Results: Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10 000.

Availability: Sample C++-code for optimal backtrace is available in the Supplementary Materials.

Contact: leen/at/cs.rpi.edu

Supplementary information: Supplementary data is available at Bioinformatics online.

1 INTRODUCTION

Dynamic programming algorithms are often used to find an optimal solution by backtracking through intermediate values of the computation. Typical examples are that of chain matrix multiplication, string algorithms such as longest common subsequence, the Viterbi (1967) algorithm for hidden Markov models, and sequence alignment algorithms such as those of Needleman and Wunsch (1970) and Smith and Waterman (1981). Dynamic programming algorithm backtraces are also used for random sampling, where the score for each possible backtrace path is deemed to be (proportional to) the probability of the path, and it is desired to choose a path according to that probability distribution. A typical example is the algorithm of Ding and Lawrence (1999) for the sampling of RNA secondary structure. A third use for dynamic programming backtraces is as the second step of a forward–backward algorithm, such as that of Baum–Welch (Baum et al. (1970) for finding the parameters of a hidden Markov model. Sometimes a trade-off with run time allows a problem to be solved without a backtrace through stored results, e.g. sequence alignment (Durbin et al., 2006 Section 2.6; Myers and Miller, 1998; Waterman, 1995, page 211) and Baum–Welch (Miklós and Meyer, 2005), but this is not always the case.

When there is not enough space to store all intermediate results in high-speed memory, checkpointing strategies are employed, whereby selected stages of the computation are stored, and missing information is recomputed from these checkpoints on an as-needed basis. A stage of the computation, also known as a frontier, is a set of intermediate values that are sufficient for making subsequent computations. For instance, in a 2D dynamic programming algorithm that computes a small number of values for each (i,j) in a grid from the neighboring ‘earlier’ values associated with (i−1,j), (i,j−1) and (i−1,j−1), we could define a stage as a row of the computation grid. In this case, stage k would be the values associated with the cells {(k,j) : j=jminjmax}, and the stage k values would be sufficient for computing values for cell (i,j) for any i>k. Similarly one could use columns to define stages. In many cases it makes sense to have overlapping stages; in the above example stage k might be the k-th diagonal frontier, i.e. the computation values associated with the cells {(i,j) : i+j[set membership]{k−1,k}}.

Herein we will describe an optimal checkpointing strategy that provably minimizes the number of stage re-computations necessary in performing a backtrace with limited high-speed memory. The algorithm is simple and efficient. Note that, because this limited-memory approach can be used to allow significant increases in locality of reference, it can provide more efficient computations even when the amount of high-speed memory might otherwise be considered sufficient.

We build upon a previous approach that is fairly memory-efficient, which is described in Bioinformatics (Grice et al., 1997; Wheeler and Hughey, 2000). With memory enough to store M stages, their ‘2-level’ algorithm uses the memory to compute the first M stages, but then retains only the M-th stage as a checkpoint, discarding the previous ones. Using the remaining M−1 memory locations, the algorithm computes stages M+1, …, 2M−1, and then uses the (2M−1)th stage as the second checkpoint. It continues this process, using the (M+(M−1)+(M−2))th stage as its third checkpoint, and so forth, up to and including M+(M−1)+···+1=M(M+1)/2 as its M-th checkpoint. Thus, if N=M(M+1)/2 stages are needed in the backtrace, they can be achieved with An external file that holds a picture, illustration, etc.
Object name is btn308i1.jpg memory locations; in the backtrace, each missing stage is computed using the space freed by discarding the checkpoints that are no longer needed.

Because the algorithm needs to compute each stage at most twice, once in the forward pass to create the checkpoints and once during the backtrace, the overall number of stage computations of the memory-reduced algorithm is at most double what it would have been.

Wheeler and Hughey (2000) also generalize their 2-level algorithm to an ‘L-level’ algorithm, where L is any positive integer. With M memory locations, the L-level algorithm can compute

equation image
(1)

stages, where this formula works for any integers L and M so long as the binomial coefficient An external file that holds a picture, illustration, etc.
Object name is btn308i2.jpg is defined to be n!/d!(nd)! when d≥0 and nd≥0, and zero otherwise. The asymptotic limit is as M→∞ for fixed L. For the M, L≥1 algorithm, the k-th stage to be checkpointed is

equation image
(2)

That is, the first stage to be a checkpoint is the last stage that would be a checkpoint under the (L−1)-level algorithm. Generally, the k-th stage to be checkpointed is beyond the (k−1)th checkpoint by an amount that would be the last checkpoint for an (L−1)-level algorithm that uses the remaining M-(k−1) memory locations. Equation (2) solves to

equation image
(3)

The number of stage computations for the L-level algorithm to compute NWH(M,L) stages using M memory locations is given by the recursion

equation image
(4)

because the L-level algorithm first computes all NWH(M,L) stages, to get the L-level checkpoints; it then provides access to the stages in reverse order by working with the L-level checkpoints in reverse order; the L-level algorithm uses the (L−1)-level algorithm to generate the missing intervening stages. However, 1 is subtracted, because the last computation of each (L−1)-level algorithm invocation produces an L-level checkpoint that we already had available. Thus, this last computation for each (L−1)-level algorithm invocation is not performed. Equation (4) solves to

equation image
(5)

where this formula works for any integers L and M so long as the trinomial coefficient An external file that holds a picture, illustration, etc.
Object name is btn308i3.jpg is defined to be (a+b+c)!/a!b!c!. when a, b, c ≥ 0, and zero otherwise. Thus we have a multiplier for the number of stage computations of approximately L for An external file that holds a picture, illustration, etc.
Object name is btn308i4.jpg memory locations. Computed values of NWH(M,L) and TWH(M,L) for small M and L are given in Table 1 and plotted in Figure 1.

Table 1.
The number of stages and run time for both the algorithm of Wheeler and Hughey (2000) and the optimal checkpointing algorithm
Fig. 1.
Low memory comparison of algorithms. This figure exhibits the effect on the run time at low memory levels. Clockwise from the top, the curves come in 12 pairs, one each for M=2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15 and 20 memory locations. Within each pair, ...

The main drawback to the L-level algorithm is that it can perform badly for a value of N that falls between NWH(M,L−1) and NWH(M,L), for some L. Wheeler and Hughey, (2000) propose an ‘(L,L−1)-level’ algorithm that performs better for these intermediate values of N, but it is not optimal. Wheeler and Hughey (2000) also discuss (L, L−1, …)-level algorithms and an optimal checkpointing algorithm, but do not provide a quick computation for choosing optimal checkpoints, nor do they give formulae to compute the number of stage computations for general values of M and N.

2 METHODS

The choice of the stages to checkpoint can be framed as an optimization problem. We write T(M,N) for the number of stage computations that are needed by the optimal checkpointing algorithm for a backtrace through N stages, using room for M stages. When NM, there is ample memory, and T(M,N)=N. At the other extreme, if M=1 there is room to compute only one stage, and N>1 stages cannot be computed even with an infinite amount of time available. [Following the lead of Wheeler and Hughey (2000) we bar in-place calculations.]

If N>M>1, and if we choose C as the first stage to checkpoint, then we begin by computing the first C stages, 1, …, C, by alternating the use of two memory locations. We store stage C, discarding the previous ones. Then, using the remaining M−1 memory locations, we recursively perform an optimal backtrace on the final NC stages, C+1, …, N. Next, we present the retained stage C to the user. Finally, we discard stage C and recursively perform an optimal backtrace on the initial C−1 stages, using the full M memory locations. Thus, with an optimal choice for C, we have the recursion for T(M,N):

equation image
(6)

Note that Wheeler and Hughey (2000) give a different recursion for optimal checkpointing. Translated into our notation, their recursion is T(M,N)=minC{C+T(M−1,NC)+T(M,C)−1}. This error may have impeded their further progress.

A straightforward computation of this recursion would require O(MN2) time (Wheeler and Hughey, 2000). However, in the following we will show a mathematical solution to the recursion that permits the calculation of all the needed checkpoints in O(N) time.

As with the analysis of the L-level algorithm of Wheeler and Hughey (2000), we find it easier to initially restrict our attention to special values of N. The main contribution of this work is our subsequent generalization to arbitrary values of N.

2.1 Special values for the number of stages

With storage for M≥1 stages, and for any integer L≥1, we will define a special number of stages Nopt(M,L), and we will calculate Topt(M,L), the number of stage computations required for a backtrace through Nopt(M,L) stages using M memory locations. Let

equation image
(7)

For M,L>1, with N=Nopt(M,L) stages, we set

equation image
(8)

the unique optimum checkpoint stage for the problem with M memory locations and N=Nopt(M,L) stages (see Appendix for proofs). Plugging C(M,N)=Nopt(M,L−1)+1 and N−C(M,N)=Nopt(M−1,L) into Equation (6), we obtain

equation image
(9)

This solves to

equation image
(10)

equation image
(11)

where Equation (11) is defined for M≥2 only. Computed values of Nopt(M,L) and Topt(M,L) for small M and L are given in Table 1 and plotted in Figure 1.

As with the L-level algorithm of Wheeler and Hughey (2000), with the optimal checkpointing algorithm, we have a multiplier for the number of stage computations of approximately L for An external file that holds a picture, illustration, etc.
Object name is btn308i5.jpg memory locations. However, we shall now see that, with the optimal checkpointing algorithm, we easily achieve this multiplier for the number of stage computations even when the number of stages N is arbitrary.

2.2 General values for the number of stages

When N falls between two optimal values, Nopt(M,L) and Nopt(M,L+1), we can compute the number of stage computations T(M,N) by linear interpolation between Nopt(M,L) and Nopt(M,L+1) (see Appendix for proofs). Noting that

equation image
(12)

we derive

equation image
(13)

for NM>1.

Furthermore, for N between Nopt(M,L) and Nopt(M,L+1), it is optimal to choose the initial checkpoint C(M,N) so that C(M,N)−1 and NC(M,N) fall between the values that they would have had to equal, if N had equaled Nopt(M,L) or Nopt(M,L+1). That is, we must choose C(M,N) so as to simultaneously satisfy

equation image
(14)

and

equation image
(15)

In practice, we choose the largest legal value,

equation image
(16)

The optimal checkpointing algorithm is presented in Figure 2. Note that we include not just M, and N, but also a level L and a special stage Nopt(M,L) in the parameter list for the recursive subroutine, because the availability of values for L and Nopt(M,L) greatly speeds the calculations of Nopt(M−1,L) and Nopt(M,L−1), which are needed in the calculations of optimal checkpoints:

equation image
(17)

equation image
(18)
Fig. 2.
The optimal checkpointing algorithm in pseudo-C++, for a backtrace through N stages using memory sufficient for M stages. Using Equation (7), find the level L=max{L : Nopt(M,L)≤N}. For the convention that the memory locations are labeled 0, …, ...

It is imperative that the required calculations be impervious to integer overflows. We initially prevented overflow in integer calculations, such as abc/de for Equations (17) and (18), by canceling all common factors between each variable in the numerator and each variable in the denominator. This approach requires 3 × 2=6 invocations of Euclid's algorithm for finding a greatest common divisor. Such a procedure leaves the value of each denominator variable at 1 and, when Nopt(M,L)+1 can be represented as an integer, the numerator variable values are sufficiently small enough to permit all the needed computations—so long as extra care is taken when verifying that the initial value of N is less than Nopt(M,L+1).

To handle computations where the number of stages exceeds the largest unsigned integer, often 232−1≈4 × 109, we modified our C++software implementation to use a C++class that manipulates integers of arbitrary size.

3 SOFTWARE

Sample C++-code for optimal backtrace is available in the Supplementary Materials.

4 RESULTS

We applied the algorithm to pairwise local alignments (Smith and Waterman, 1981) of sequences of up to 3000 nucleotides of human DNA with sequences of up to 2864 nucleotides of rodent DNA. For the largest of the alignments, to keep within a 125 MB limit for total memory use, we restricted ourselves to M=486 stages of storage for the N=2864 stages. For these choices, L=1, Nopt(M=486, L=1)=486, and T(M=486, N=2864)=5242. Thus, the multiplier for the number of stage computations is T/N=5242/2864≈1.83 for memory use M/N=486/2864≈17%. The algorithm ran in 70 s, but would have run much more slowly if it had tried to use memory for all 2864 stages, because the resulting memory swapping would have been onerous. In contrast, the 2-level algorithm of (Wheeler and Hughey, 2000) computes checkpoints for stages 486, 971, 1455, 1938 and 2420, and requires two computations for all other stages with index under 2420. Thus its total number of stage computations is 2864+(2420−5)=5279, only slightly worse than 5242.

The same calculation performed on a pair of sequences, each 10 000 nucleotides long, takes 12 min to run in 125 MB of memory, a memory size sufficient to store only 138 stages instead of the full 10 000 stages. For a problem of this size, L=2, Nopt(M=138, L=2)=9728, and T(M=138, N=10 000)=20 134. Thus, the multiplier for the number of stage computations is T/N=20 134 / 10 000≈2.01 for memory useM/N=138/10 000≈1.4%. In contrast, the 3-level algorithm ofWheeler and Hughey (2000) is at a particular disadvantage in that it computes its only 3-level checkpoint at stage 9591, with subsequent 2-level checkpoints at 9728, 9864, and 9999. The algorithm requires 29 448 stage computations, significantly worse than 20 134. With 1 GB of memory, sufficient for storing 1104 stages for the pairwise alignment of sequences of length 10 000 nucleotides, the optimal checkpointing algorithm requires 18 896 stage computations, whereas the 2-level algorithm of Wheeler and Hughey (2000) requires 19 891 stage computations, almost 1000 more.

On similar datasets, using a probabilistic model that defines a probability distribution on the set of possible alignments, we used the optimal backtrace algorithm to compute a centroid (Ding and Lawrence, 1999), also known as a posterior decoding (Miyazawa, 1995). This task requires a dynamic programming calculation during the backtrace that is comparable to the calculation performed during the forward pass. With sufficient memory, the total computation would require 2N stage computations, thus the multiplier for the number of stage computations with limited memory is

equation image
(19)

this value is better than L, the multiplier of the number of stage computations for the cheaper backtrace tasks.

We also can draw independent samples from the probability distribution on the set of possible alignments. Here, the run time is as slow as the centroid calculation only when the number of samples to be drawn is of the order of the smaller of the two sequence lengths.

5 DISCUSSION

We have provided an algorithm for optimal backtrace through a dynamic programming algorithm when memory is limited. The algorithm improves upon previous work via the simplicity and speed of the calculation for the index of the optimal checkpoint, and via achievement of optimal performance for a problem of arbitrary size.

A few variations are worthy of consideration. Generally, for backtrace computations, whether or not they are achieved with the optimal checkpointing algorithm described here, the first stage is computed from initial or boundary conditions, and each subsequent stage is computed from the immediately preceding stage. Thus, at least conceptually, the first stage requires special treatment. If this distinction makes implementation of the advance callback routine difficult, it may be prudent to compute and permanently store the first stage in the first memory location, and to run the optimal checkpointing algorithm so as to provide an optimally computed backtrace through the remaining N−1 stages using M−1 memory locations. The number of stage computations for this approach is 1+T(M−1,N−1).

As already described for both optimal checkpointing and the L-level algorithm of Wheeler and Hughey (2000), in the limit as the number of memory locations M goes to infinity with a fixed multiplier L for the number of stage computations, we can backtrace through N ~ ML / L! stages with T ~ ML / (L−1)! stage computations. However, for the case when memory is severely limited, it is instructive to look at the asymptotics for a fixed value of M, with L tending to infinity. For this situation we have

equation image
(20)

equation image
(21)

equation image
(22)

equation image
(23)

Thus, in these low-memory situations, the optimal algorithm is asymptotically faster than the L-level algorithm of Wheeler and Hughey (2000), even when the latter is applied to its optimal problem sizes NWH(M,L). The speed multiplier is An external file that holds a picture, illustration, etc.
Object name is btn308i6.jpg, which is approximately 1+(0.693}/(M−1.347)) for moderate values of M. See Table 1 and Figure 1.

6 CONCLUSION

When high-speed memory is limited, dynamic programming algorithm backtraces make use of checkpoints for re-computing needed intermediate values. We have provided an easy-to-use algorithm for optimally selecting the checkpoints.

Supplementary Material

[Supplementary Data]

ACKNOWLEDGEMENTS

We thank the Computational Molecular Biology and Statistics Core Facility at the Wadsworth Center for the computing resources for the pairwise local sequence alignment calculations.We wish to acknowledge use of the Maple software package by Waterloo Maple, Inc., without which the calculations in this article would have been much more difficult.

Funding: This research was supported by the National Institutes of Health / National Human Genome Research Institute grant K25 HG003291.

Conflict of Interest: none declared.

APPENDIX: PROOF OF RUN TIME

So that this section is self-contained, we will restate the relevant assumptions and definitions.

We define the following functions and will prove their usefulness presently.

equation image
(24)

equation image
(25)

We take as given that the number of stage computations required satisfies:

equation image
(26)

where a choice of value for C that minimizes this last expression for a given N and M is called an optimal first checkpoint, C(M,N).

THEOREM. —

Let N be the number of stages to be made available in a backtrace using storage for M stages. For a choice of N and M, define

equation image
(27)

We will show that

equation image
(28)

for NM ≥ 1, where the second term is deemed zero if N=Nopt(M,L), even when L=+∞. We will show that, when N>M>1, the value of an optimal first checkpoint C(M,N) satisfies

equation image
(29)

and

equation image
(30)

PROOF. —

The proof will be by induction on N and M. We have two base cases: first, a base case for a low value of N and, second, a base case for a low value of M.

Base case, N=M ≥ 1

We wish to show that Equation (A5) correctly matches Equation (A3) when N=M ≥ 1.

We put L=1 into Equation (A1) for Nopt(M,L) and Equation (A1) for Topt(M,L):

equation image
(31)

equation image
(32)

where the trinomial terms with L−2 vanish by our convention. Thus, for this base case, Equation (A4) indicates that L=1. It follows that Equation (A5) gives T(M,N)=M, which is in agreement with Equation (A3) because N=M.

Base Case, N>M=1

We wish to show that Equation (A5) correctly matches Equation (A3) when N>M=1.

We put M=1 into Equation (A1) for Nopt(M,L) and Equation (A2) for Topt(M,L):

equation image
(33)

equation image
(34)

Thus, for this base case, Equation (A4) indicates that L=+∞. Because N is strictly greater than Nopt(M,L) (for any L) it follows that Equation (A5) gives T(M=1,N>1)=+∞, in agreement with Equation (A3), as desired.

General Case, N>M>1

We assume that the theorem is proved true for N′ ≥ M′ ≥ 1, when N′≤N and M′≤M but (N′,M′) ≠ (N, M). We will show that checkpoint choices C′ and C″=C′+1 will give the same number of stage computations if both C′ and C″ satisfy the restrictions of Equations (A6) and (A7). We will show that if C″ is too high to satisfy these restrictions then use of C′ will lead to fewer stage computations; we will show that if C′ is too low to satisfy these restrictions then use of C″ will lead to fewer stage computations. Together these will demonstrate that the restrictions are optimal and that they can be satisfied simultaneously. We will then show that satisfaction of the restrictions implies Equation (A5).

Let T′(M,N) and T″(M,N) be the number of stage computations required given that C′ or C″, respectively, is chosen as the first checkpoint, and optimal checkpoints are used in all remaining subproblems:

equation image
(35)

and

equation image
(36)

Set L1 to be the unique integer such that

equation image
(37)

and observe that

equation image
(38)

Set L2 to be the unique integer such that

equation image
(39)

and observe that

equation image
(40)

Using our induction hypothesis [Equation (A5)], we thus compute that

equation image
(41)

equation image
(42)

equation image
(43)

We observe that when both C′ and C″ satisfy the restrictions given as Equations (A6) and (A7) then L1=L2 and the stages C′ and C″ make equally good choices as a checkpoint. When C″ is too large for the restrictions then L1>L2, and when C′ is too small for the restrictions then L1<L2. Thus, we have verified that the restrictions define an optimal first checkpoint.

Finally, using the induction hypothesis, we compute T(M,N) to verify that it yields Equation (A5), using a value of C satisfying the restrictions of Equations (A6) and (A7).

equation image
(44)

equation image
(45)

equation image
(46)

equation image
(47)

as desired. For the last equality, we have used Nopt(M,L)=Nopt(M,L)+Nopt(M,L1)+1 and Topt(M,L)=Topt(M−1,L)+Topt(M,L−1)+Nopt(M,L−1)+1, which are easily proved from Equations (A1) and (A2).

REFERENCES

  • Baum LE, et al. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 1970;41:164–171.
  • Ding Y, Lawrence CE. ABayesian statistical algorithm for RNA secondary structure prediction. Comput. Chem. 1999;23:387–400. [PubMed]
  • Durbin R, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 2006.
  • Grice JA, et al. Reduced space sequence alignment. Comput. Appl. Biosci. 1997;13:45–53. [PubMed]
  • Miklós I, Meyer IM. A linear memory algorithm for Baum-Welch training. BMC Bioinformatics. 2005;6:231. [PMC free article] [PubMed]
  • Miyazawa S. A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 1995;8:999–1009. [PubMed]
  • Myers EW, Miller W. Optimal alignments in linear space. Comput. Appl. Biosci. 1998;4:11–17. [PubMed]
  • Needleman SB. Ageneral method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. [PubMed]
  • Smith TF, Waterman MS. Comparison of biosequences. Adv. Appl. Math. 1981;2:482–489.
  • Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE T. Inform. Theory. 1967;13:260–269.
  • Waterman MS. Introduction to Computational Biology. Maps, Sequences and Genomes. London, UK: Chapman & Hall/CRC; 1995.
  • Wheeler R, Hughey R. Optimizing reduced-space sequence analysis. Bioinformatics. 2000;16:1082–1090. [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press