We present an analysis of three test cases in which CPDL identified sets of positions constituting a small fraction of the total amino acid sequence that included experimentally validated SDPs. The positions primarily responsible for defining class-specific functions between the desaturase and hydroxylase members of the Fad2-like family of enzymes. [4
] were identified. The potential SDPs that CPDL flagged for the nucleotidyl cyclases contain two residues previously identified by a structure-based approach and shown experimentally to be important determinants of specificity [3
]. Our results with the MurD/E ligase family demonstrate that CPDL-identified potential SDPs are primarily located in regions of the proteins shown experimentally and/or predicted to be important for function and/or specificity. Taken together, the results from these three independent test cases suggest that CPDL-identified positions are likely to be contained within the enzyme active site, providing a link between amino acid sequence, structure, and enzyme function. These positions can thus serve as starting points for detailed structure-function studies. The fact that CPDL analysis for all three test cases, including one integral membrane and two globular enzyme families, yielded a small number of amino acid positions that included those reported to contribute to specificity suggests CPDL will be generally useful for analysis of other families of enzymes.
One property of CPDL that contributes to its utility is the graphic output comprised of a pair of consensus sequences with potential SDPs marked with flags. The output is directly comparable to the input multiple sequence alignment which is useful for visualizing whether the potential SDPs fall within regions of otherwise high homology that are likely to represent active sites. Furthermore, CPDL has the ability to display each of the properties (size, hydrophobicity, charge, polarity, and aromaticity) as well as sequence for every residue in a protein alignment, making it possible to distinguish between potential SDPs based on property conservation (e.g. D/E changes are flagged differently than K/E changes). CPDL also incorporates a user-defined masking hierarchy allowing for the optimization of the output for each comparison. We note that CPDL allows for the identification of potential SDPs without a requirement for a 3D structure, a feature that makes it suitable for the study of membrane proteins for which there are few crystal structures available.
CPDL is unique in that it uses a distinct flag for those positions where one class has a conserved residue but where at least one member of the other class contains the same residue (open triangles). Because CPDL is heavily dependent on the quality of the multiple sequence alignment, users are advised to evaluate the input data with great care. Accurate CPDL output is also dependent on correct functional classification. Thus, in cases where several open triangles are attributable to the same input sequence, it may be desirable to either exclude the sequence from analysis or confirm its classification experimentally. CPDL also identifies positions where properties other than sequence (e.g. charge or hydrophobicity) are conserved within classes but differ between classes. These positions may also represent specificity-determining positions and so may warrant experimental testing.
The CPDL program is also well suited for fine mapping of chimeric enzymes that have been constructed to coarsely map specificity-determining regions of an enzyme. Furthermore, since the CPDL input alignment is user defined portions of interest within proteins e.g., domains can be evaluated separately.