Multiple structural alignments (MSTAs) provide position-specific information on the sequence variability allowed by protein folds. This information can be exploited to better understand the evolution of proteins and the physical chemistry of polypeptide folding. Most MSTA methods rely on a pre-computed library of pairwise alignments. This library will in general contain conflicting residue equivalences not all of which can be realized in the final MSTA. Hence to build a consistent MSTA, these methods have to select a conflict-free subset of equivalences.
Using a dataset with 327 families from SCOP 1.63 we compare the ability of two different methods to select an optimal conflict-free subset of equivalences. One is an implementation of Reinert et al.’s integer linear programming formulation (ILP) of the maximum weight trace problem (Reinert et al., 1997, Proc. 1st Ann. Int. Conf. Comput. Mol. Biol. (RECOMB-97), ACM Press, New York). This ILP formulation is a rigorous approach but its complexity is difficult to predict. The other method is T-Coffee (Notredame et al., 2000) which uses a heuristic enhancement of the equivalence weights which allow it to use the speed and simplicity of the progressive alignment approach while still incorporating information of all alignments in each step of building the MSTA. We find that although the ILP formulation consistently selects a more optimal set of conflict-free equivalences, the differences are small and the quality of the resulting MSTAs are essentially the same for both methods. Given its speed and predictable complexity, our results show that T-Coffee is an attractive alternative for producing high-quality MSTAs.
The software for Resolver, our implementation of Reinert et al.’s ILP formulation, and the dataset used in this study are available at http://www.sbc.su.se/~erik/resolver