The alignment of multiple sequences is a very important task and still remains a challenging problem. Acknowledging the difficulty of that challenge, an alternate approach to the underlying problem has been examined here: augmenting alignment-generation procedures with a separate alignment-refinement algorithm capable of repairing those errors that remain. Iterative refinement algorithms generally attempt to improve the overall quality of alignment by employing objective functions and heuristics to obtain an optimal alignment. Most iterative refinement methods face the challenge of how to escape from sub-optimal alignments. Therefore, the main differences among the existing methods lie in the effective definition of an objective function and intelligent design of the method's heuristics.
In this study we conducted an extensive comparison of the performances of three different alignment refinement algorithms. The accuracy and efficiency of the refinement programs such as, RASCAL [14
], RF method [13
] and REFINER [15
] were compared using the 3D structure-based alignments from the BAliBASE benchmark database as well as a diverse set of manually curated high quality alignments from the Conserved Domain Database. A comparison in terms of different objective scoring functions found better performance for alignments refined by REFINER rather than the RF or RASCAL methods. The biological relevance of the refined alignments was also assessed, where we adopted the BAliBASE sum-of-pair (SP) scoring scheme to evaluate the refined alignments' quality. Though none of the methods displayed dramatic improvements, REFINER performed consistently well for alignments generated by six different alignment algorithms. Correlation analysis between improvements in the predicted accuracy (objective score) and the real accuracy (SP score) also suggested better overall performance by REFINER algorithm.
Further, we tried to identify the range of initial alignment quality in which REFINER is most successful at improving the alignment. High-quality input alignments are difficult for refinement algorithms to improve without also making additional deleterious modifications. Because the good input alignments also tend towards higher objective scores, for these purposes the input alignment's objective score is viewed in a general sense a proxy for the difficulty that alignment presents to a refinement algorithm. The impacts of refinement by the REFINER algorithm are very prominent in the lower ranges of initial alignment quality where REFINER provided significant improvements. For higher quality input alignments (i.e., higher ranges of the input's objective score) although REFINER still found improvements, the impact of refinement is reduced. This might indicate that those alignments were already been optimized and therefore were less prone to changes.
We have also described a way to validate the quality of a refined alignment by examining the performance of its sequence profile in homology searches. This validation test provides a useful quality control in the typical situation where one does not have a reference alignment. In addition, it demonstrated that the sensitivity of sequence profiles/HMMs increased when employing the REFINER method but fell slightly for the other two refinement methods studied.
Since the REFINER method was designed as an alignment refinement tool, extensive benchmarking, validation and comparison of its performance is vital. Therefore, we conducted comparison tests on large benchmark data sets and found that REFINER on average provided moderately better performance in terms of improving the quality of an input alignment. However, we have also shown that significant improvements are possible, particularly for initial alignments with lower values of one of the various common objective functions. Obtaining such improvements manually or by re-running another automated alignment generation algorithms is both uncertain and time-consuming. Therefore, as a practical matter, refinement methods such as REFINER do appear well worth the time spent on their application to alignments of interest.
Our previous study [15
] established the concept that realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. In contrast, the current manuscript describes a comparison of three different methods available for the refinement of multiple sequence alignments using a standard benchmark dataset (BAliBASE 3.0). The performance of the refinement methods is compared in terms of profile sensitivities for homolog retrieval and CPU time usage. Furthermore, we analyzed how different strategies for using refinement programs are appropriate depending on the quality of the input alignment (i.e
., its difficulty). We are not aware of another analysis like the one presented here, and believe that it will be helpful to researchers in the sequence analysis field when attempting to decide if their alignment tasks can benefit from the use of one or more refinement programs.