We compare six structural alignment methods by aligning all 4,290,985 pairs in a set of 2930 sequence-diverse CATH v.2.42
domains (2930×2929/2 pairs as (i
) and (j
) are treated once). These methods are implemented in the following programs: (1) SSAP,6
(3) DALI using DaliLite (v.1.8),8,9
and (6) SSM.14
The set of structures†
, includes 769 fold classes. The programs output the alignment native score, the number of residues matched, and the cRMS deviation. We count the number of gaps in both structures by inspecting the alignment, which is available for all programs except for the standalone version of SSM. Based on these data, we calculate the geometric match measures SI, MI, and SAS for all methods and calculate GSAS for all methods except SSM. Using the geometric match measures we create a seventh method denoted Best-of-All, which returns the best alignment found by all the other methods.
Comparison of methods using ROC curves
We first evaluate the methods (i.e. the six individual programs mentioned above, and the Best-of-All method) by comparing their ROC curves.31
CATH serves as the gold standard: a pair of structures is defined as “positive” (or similar) if both structures have the same CAT classification and “negative” (or not similar) otherwise. For varying thresholds, all pairs below the threshold are assumed positive, and all above it negative: the pairs that agree with the standard are called true positives (TP) while those that do not are false positives (FP). We sort the alignments either by their geometric match measures or by the native scores of the programs. A method that best agrees with the gold standard will have the uppermost curve, or equivalently, the one with the largest area under it. It can be argued that we are more concerned with comparing the agreement of the methods with CATH when the percentage of FPs is low. We do this by plotting the x
-axis of the ROC curve in log scale, that is, log10
of fraction of FPs against the fraction of TPs.
shows the ROC curves of all the methods; we sort the alignments either by their native score (broken lines) or by the SAS geometric match measure (continuous lines). shows the same ROC curves with log10 (fraction of FP) versus fraction of TPs. lists values quantifying these ROC curves: the area under the curve and the number of TPs when the fraction of FPs is 1% (numbering 42,910).
Figure 1 Receiver operating characteristic (ROC) curves for the structural alignment methods SSAP, STRUCTAL, DALI, LSQMAN, CE, and SSM. A true positive is assumed when the two aligned structures have the same Class/Architecture/Topology CATH classification. We (more ...)
Score from ROC curves and cumulative distributions distinguish structural alignment methods
When sorting the alignments by their SAS measure, the ROC curve analysis suggests that DALI, CE, STRUCTAL, and SSM are the strongest methods. When sorting the alignments by their native scores, the methods DALI and STRUCTAL are strongest. At low false positives rates (), DALI, CE, SSAP and STRUCTAL do well. It is somewhat surprising that the programs CE, SSM, LSQMAN and SSAP do much better when using the geometric measure than when using their own native score. DALI performs similarly when using the two measures, but the geometric score does better in the lower FP rates. STRUCTAL also performs similarly using these two measures, although the STRUCTAL native score, which was specifically designed to be better than the SAS measure,33
does increase the area under the ROC curve from 0.93 to 0.94. Clearly, a program can be successful in finding good alignments and less successful in evaluating them. Another surprising result is that the Best-of-All method does not perform better, in terms of its ROC curve, than the individual methods.
shows the plot of the average SAS measure of the TPs and the FPs below increasing thresholds against the fraction of TPs. This compares directly the quality of the alignments found by different methods in equally sized sets of pairs. Here, successful methods find alignments with lower SAS values, represented by curves that are shifted to the left. The average SAS of the TPs is lower than the average SAS of the FPs. As expected from a list sorted by increasing SAS values, this difference is small and subtle in the parts of the curve corresponding to a low TP fraction. The Best-of-All method finds the best alignments, followed by STRUCTAL and SSM. Surprisingly, the ranking of the structural alignment methods based on their average SAS values differs from that implied by their ROC curves. For example, DALI and CE have almost identical ROC curves ranking them similarly, yet their SAS curves show that CE finds consistently better alignments. Another example is SSAP that performs well by its ROC curve, while its average SAS curve suggests otherwise. This means that the best alignments found by SSAP, although generally not as good as those found by other methods, often correspond to proteins in the same CATH fold class. Clearly, a particular method can seem to be as, or more, successful than another method based on its ROC curve, while the actual alignments it finds are inferior.
We have also plotted the same figures when sorting the alignments by their SI, MI and GSAS measures (data not shown, but available online†
). In general the relative performance of the methods is similar when judged by the average values of the four geometric match measures. When examining the curves of average values of SI, MI, and GSAS, we see that similarly to the implied ranking of the methods is different from that suggested by the methods’ ROC curves.
Comparing the methods directly using geometric match measures
We calculate the four geometric match measures (SI, MI, SAS, and GSAS) for all alignments found by all the methods. In all four measures, better alignments correspond to lower values. Using these data, we calculate the cumulative distributions of the geometric measures, over the set of 104,309 structure pairs, that have the same CAT classification (the CAT–CAT set) and over the set of all structure pairs (the full set). In both cases, we normalize to 100% the total number of pairs in the set.
shows the cumulative distributions of the geometric match measures. Here, better performance corresponds to finding more alignments (greater values along the y-axis) with better geometric match measures (lower values along the x-axis). In the case of GSAS, SAS, and SI, we focus on good alignments: the cutoff value is 5 Å in all three cases. Even though all programs were given the same set of pairs, the maximum value on each curve varies, since there are many alignments of match measure greater than 5 Å that are not shown. For each method, lists, for both the CAT–CAT set and full set, the number of pairs for which good alignments are found (expressed as a percentage of pairs in the set).
Figure 2 Comparison of the quality of the alignments produced by the methods SSAP, STRUCTAL, DALI, LSQMAN, CE, and SSM, using four geometric match measures: GSAS, SAS, SI, and MI. For each geometric measure and for each method, we plot a cumulative distribution. (more ...)
We see that the Best-of-All method finds the greatest number of good alignments, both in the CAT–CAT set and the full set, for all four geometric measures; the second best performer is STRUCTAL in both sets and for all geometric measures; the third best performer is SSM (except when using GSAS). Among the programs compared, SSAP performs the worst. When using GSAS, a version of SAS that penalizes alignment gaps, the relative performance is flipped in two cases: CE versus DALI and CE versus LSQMAN. This implies that CE finds less fragmented alignments than those found by DALI and LSQMAN. We also see that LSQMAN finds more highly similar alignments, and less moderately similar alignments, than all other methods except STRUCTAL.
When using the similarity cutoff value 5 Å, even the best methods find good alignments for no more than 80% of the CAT–CAT pairs. At the same threshold level, there are good alignments for about 10% of all pairs, even though the CAT–CAT pairs account for only 2.5% of these pairs. This shows that there is a significant amount of structural similarity between structures in different fold classes. An alignment’s GSAS measure will generally be larger than its SAS measure because we reduce the denominator (Nmat) by the number of gaps; for this reason the percentage of pairs below the 5 Å threshold is smaller for GSAS than for SAS. Similarly, the relative size of the values of SI and SAS depends on how the typical length of the shorter structure relates to the value 100. We see that the percentage of pairs below the 5 Å threshold is greater for SI than SAS, implying that the typical length of the shorter structure is less than 100 (we estimate 70 residues). This means that SAS is generally larger than SI.
Analysis of the good alignments found by the Best-of-All method
lists the proportional contribution of each of the methods to the Best-of-All method, when considering good alignments. In all cases, STRUCTAL is the leading contributor, contributing more than 50%. SSM is the second largest contributor, with more than 15%, followed by LSQMAN with over 7%. SSAP, DALI and CE contribute less to the combined effort; if one of the top contributors were to be omitted these methods could contribute more.
Contributions to Best-of-All method
For each pair of structures we find the best structural alignment, as determined by its GSAS, SAS or SI value. This set of structure pairs is partitioned into four sets: (1) pairs that agree on the three CATH classifiers: Classification/Architecture/Topology (the CAT set), (2) pairs that agree only on the first two classifiers (the CA set), (3) pairs that agree only on the first classifier (the C set) and (4) others. In we plot, for several threshold values of the geometric similarity measures (less than 2.5 Å, 3 Å, …, 5 Å), the number of alignments found at that level of similarity (upper panel) as well as the percentage of each of the four sets found at that level (lower panel). The rightmost panels show the same analysis for the subset of long alignments (more than 50 residues matched). The threshold values we consider describe structural alignments that vary from highly similar (less than 2.5 Å) to moderately similar (less than 5 Å).
Figure 3 The composition of the set of structure pairs that have good alignments demonstrates the significant amount of structural similarity across CATH fold classes. These Best-of-All alignments are divided into four categories depending on the similarity of (more ...)
shows that there are many cases of very similar structure pairs that are in different CAT classes; in some cases they are even in different C classes. For example, with a GSAS threshold of 5 Å, less than 40% of the pairs of proteins found are in the same CATclass, even when considering only the long alignments (more than 50 residues matched). Note, that at this level of similarity, only 80% of all the pairs of proteins with the same CAT classification are detected by any of the programs (see ). For both GSAS and SAS, the fraction of the alignments for which both structures are in the same CAT class is biggest for the highly similar pairs (less than 2.5 Å), and lowest for the moderately similar pairs (less than 5 Å). This expected behavior is even more pronounced when we consider only long alignments. also shows that the number of good alignments is greatly reduced (by more than a factor of 2) when restricting our attention to long alignments.
For each protein structure alignment method lists the total number of pairs of proteins that only the particular method was able to identify as structurally similar. We include only those pairs of structures for which one of the methods found a good alignment (SAS less than 4 Å and more than 35 residues matched), and all other programs found only bad alignments (SAS greater than 6 Å for any length of match). These alignments are by definition challenging for all programs but the successful one. We also present in a breakdown of these cases according to the C classification of the structures aligned. The complete list of these pairs, along with the best cRMS, the alignment length and the SAS values for all the programs is available online†
. STRUCTAL contributes the largest number of alignments, followed by LSQMAN and SSM. For all three methods, a significant number of the contributed pairs corresponds to similarities between a “Mainly α” protein and a “Mixed α/β” protein. In the case of STRUCTAL, there is also a significant number of pairs involving a “Mainly β” protein and a “Mixed α/β” protein.
Difficult structure pairs with good alignments found just by one method
compares the relative success of the methods in detecting alignments of pairs that we know are similar. Here, we consider six test sets of similar structure pairs, and use the Best-of-All method to find the pairs in each test set. A set of similar pairs is parameterized by: (1) an upper bound on the SAS value, (2) a lower bound on the alignment length Nmat and (3) a lower bound on the percent of overlap between the aligned residues and the shorter structure in the pair. The first column lists the parameters for each of the test sets when using Best-of-All, as well as their respective sizes. The test set sizes vary significantly, ranging from 20,000 to 350,000. We then use the methods to find sets of well-aligned pairs. We hold the methods to lower standards than those used to select the test sets: we relax the bounds for good alignments: the upper SAS bound is increased, and the lower bounds of the alignment length and overlap (when applicable) are decreased. lists the percentage of the missed good alignments in the test set for each of the methods. In all cases, STRUCTAL, CE and SSM miss fewest alignments.
Percentage of good Best-of-All alignments missed by each method
Analysis of the performance of the methods on different CATH classes
compares the relative success of each structural alignment method on CAT pairs in the four classes of protein structure (level C of CATH hierarchy). This is done for alignments with SAS values between 2.5 Å and 5 Å. We see, that α–α pairs, and α–β pairs are over-represented when the match is good (SAS value less than 3.5 Å), and, consequently, the β–β pairs are under-represented. When the match is less good (SAS value between 4 Å and 5 Å) each method behaves as expected finding different classes of pairs with a frequency that is proportional to the fraction of the pairs in the entire CAT set (shown by the horizontal lines in each panel). Most methods behave similarly in that at a particular SAS value, they detect similar percentages in each of the four classes. Such common behavior is unexpected, as the methods do not necessarily find the same pairs or even similar numbers of pairs. The LSQMAN method is unusual in that it finds more β–β pairs when the match is good (low SAS value) and consequently under-represents α + β pairs. Most of the pairs in CAT are Mixed α/β pairs (71%), followed by Mainly β pairs (23.5%), Mainly α pairs (6.2%) and Few Secondary Structures (0.45%). The fact that most methods detect relatively few β–β pairs of high match quality may mean that matches between β proteins are less good possibly due to the greater deformability of β strands.
Figure 4 Comparing the performance of the different structural alignment programs when aligning different classes of structures. We consider all 104,309 pairs of the same fold class (i.e. same CAT), and partition them according to their C classification into four (more ...)
lists the total amount of central processor unit time (CPU hours) used to compute all the structural alignments for each of the programs. All programs were run under the Linux operating system (RedHat 7.3) on a cluster of dual 2.8 MHz Intel Xeon processor machines, each with one Gigabyte of memory. SSM is the fastest method, followed by LSQMAN; SSAP and DALI are the slowest methods.
Total running time of each program on all pairs