SuperPose is composed of two parts, a front-end web interface (written in Perl and HTML) and a back-end for alignment, superposition, RMSD calculation and rendering (written in Perl and C). The front-end accepts two kinds of input, PDB text files (from a user's hard drive) or PDB accession numbers or any combination of both. If users choose to use PDB accession numbers, they can also designate which chain(s) they would like (2TRX_A means the A chain of 2TRX) in the input text box. Once the PDB accession numbers are chosen, the program automatically goes to the PDB website and retrieves the necessary files. SuperPose also allows users to interactively select chains within PDB files if they are not familiar with the chain structure or chain content of their chosen PDB file. This is done simply by clicking on the name of the chain in the scroll boxes that SuperPose generates after it has read each PDB file. To support alternative displays and alternative superpositions or to override SuperPose decisions, SuperPose offers three sets of options: (i) output options; (ii) alignment options and (iii) advanced options, which are listed below the SuperPose input form. Normally most users would have no need to change the default values. Nevertheless, detailed descriptions about what these options mean and how to fill out the option boxes are provided on both the SuperPose home page and its Help pages.
SuperPose is designed to handle five kinds of macromolecular superposition requirements: (i) superposition of two or more molecules of identical sequence but slightly different structure; (ii) superposition of two molecules of identical sequence but profoundly different structure (e.g. open and closed forms of calmodulin); (iii) superposition of two or more molecules of modestly dissimilar sequence, length and structure; (iv) superposition of two or more molecules with profoundly different lengths but similar structure or sequence; and (v) superposition of two or more molecules that are profoundly different in sequence but similar in structure. The most common scenario, and the one supported by most superposition packages, is scenario (i). This type of superposition is frequently done in generating NMR (nuclear magnetic resonance) structure ensembles, in comparing ligand-bound and ligand-free molecules and in comparing two different crystal isoforms. In scenario (i) sequence and sequence length differences are irrelevant and the problem can be framed as a pure geometrical optimization problem. However, for the other four scenarios, sequence and length information are relevant—as is information about local structure similarity. Unfortunately, most available superposition packages do not account for this kind of information and so they frequently perform poorly or require considerable user knowledge or input to get them to perform well. To deal with all five superposition scenarios, SuperPose employs a combination of four techniques: (i) pairwise or multiple pairwise sequence alignment; (ii) secondary structure alignment (when sequence identity <25%); (iii) difference distance matrix calculation; and (iv) quaternion superposition.
Beginning with an input PDB file or set of files, SuperPose first extracts the sequences of all chains in the file(s). Each sequence pair is then aligned using a Needleman–Wunsch pairwise alignment algorithm (8
) employing a BLOSUM62 scoring matrix. If the pairwise sequence identity falls below the default threshold (25%), SuperPose determines the secondary structure using VADAR (volume, area, dihedral angle reproter) (9
) and performs a secondary structure alignment using a modified Needleman–Wunsch algorithm. After the sequence or secondary structure alignment is complete, SuperPose then generates a difference distance (DD) matrix (10
) between aligned alpha carbon atoms. A difference distance matrix can be generated by first calculating the distances between all pairs of Cα atoms in one molecule to generate an initial distance matrix. A second pairwise distance matrix is generated for the second molecule and, for equivalent/aligned Cα atoms, the two matrices are subtracted from one another, yielding the DD matrix. From the DD matrix it is possible to quantitatively assess the structural similarity/dissimilarity between two structures. In fact, the difference distance method is particularly good at detecting domain or hinge motions in proteins [see scenario (ii)]. SuperPose analyzes the DD matrices and identifies the largest contiguous domain between the two molecules that exhibits <2.0 Å difference. From the information derived from the sequence alignment and DD comparison, the program then makes a decision regarding which regions should be superimposed and which atoms should be counted in calculating the RMSD. This information is then fed into the quaternion superposition algorithm and the RMSD calculation subroutine. The quaternion superposition program is written in C and is based on both Kearsley's method (4
) and the PDBSUP Fortran program developed by Rupp and Parkin (11
). Quaternions were developed by W. Hamilton (the mathematician/physicist) in 1843 as a convenient way to parameterize rotations in a simple algebraic fashion. Because algebraic expressions are more rapidly calculable than trigonometric expressions using computers, the quaternion approach is exceedingly fast.
SuperPose can calculate both pairwise and multiple structure superpositions [using standard hierarchical methods (5
)] and can generate a variety of RMSD values for alpha carbons, backbone atoms, heavy atoms and all atoms (average and pairwise). When identical sequences are compared, SuperPose also generates ‘per residue’ RMSD tables and plots to allow users to identify, assess and view individual residue displacements.