|Home | About | Journals | Submit | Contact Us | Français|
Secondary structure content (SSC) cannot be accurately calculated from circular dichroism (CD) spectra for the majority of proteins whose three dimensional structures have been solved. ‘Reliable’ SSC that is significantly different from random SSC can be calculated from CD spectra only for all-α proteins and all-α proteins with canonical β-strand geometry.
The two fields to which the protein CD spectroscopy is applied with well-developed methodology are folding thermodynamics [1; 2] and secondary structure estimation . Most thermodynamic studies rely on relative changes in CD spectra and are therefore relatively independent of calibration with structure. In contrast, the calculation of protein secondary structure content (SSC) requires strict cross-calibration/validation of experimental and reference CD spectra with reference crystallographic or NMR structural data. Following the development and dissemination of reliable CD analysis software via the internet [4; 5; 6; 7], improvements in SSC calculations have resulted from increasing the number of proteins in the protein reference set , splitting the ordered fractions of regular and distorted portions  and expanding CD spectral analysis to wavelengths below 185 nm using vacuum ultraviolet circular dichroism spectroscopy [8; 9]. Splitting the ordered fractions occurs when α-helices and β-strands are divided into regular and distorted classes  yielding six secondary structure classifications: regular α-helix (αR), distorted α-helix (αD), regular β-strand (βR), distorted β-strand (βD), turn and disordered.
The performances of secondary structure calculations are typically characterized by the root-mean-square deviations (RMSD) between the crystal and CD estimates of the secondary-structure content,
where Xi and Yi are the crystallographic and CD estimates of a given type of secondary structure, i, in N reference samples. The overall RMSD is determined by considering all secondary fractions collectively [4; 6; 8]. Lower values of RMSD indicate less discrepancy between the calculated and crystallographic data. It is generally accepted that RMSD measures the predictive power of the method.
Joint application of splitting the ordered fractions and utilization of the lower wavelength CD data (down to 160 nm if obtainable) yields in the best accuracy . Overall RMSD values obtained for 29 of 31 studied proteins  are less than the overall RMSD values of 0.091 – 0.098 calculated on the basis of the splitting only .
Two questions arise from this and similar results. First, what is the lower limit for RMSD in such calculations? Second, is the accuracy that is reached sufficient to make a reliable and meaningful estimation of SSC for the proteins from their CD spectra? To answer these questions we have compared the overall RMSD for the 31 proteins calculated from their CD spectra  with the overall RMSD value obtained for simulated SSC assuming that the main secondary structure types (α-helices, β-strands, turns and disordered) are represented equally. The simulated SSC values were assumed equal to 0.125 each for regular and distorted helices and strands (αR, αD, βR, βD) and 0.25 each for turns and unordered structure.
Table 1 lists the RMSD values comparing crystallographic and CD SSC estimates (RMSDcd) and the RMSD values comparing the crystallographic estimates with those obtained from simulated SSC values of SSC (0.125 or 0.25 in the particular cases; RMSDs). The proteins in Table 1 are grouped accordingly to their tertiary structure class: all-α, all-α and αβ combining α+β and α/β classes [10; 11]. Peroxidase and xylanase are placed in the all-α and all-β groups since their helix/strand ratios are 13/2 and 1/15, respectively.
It is readily evident in the table that except for human serum albumin, only for the proteins belonging to all-α and all-β classes are the simulated RMSD values essentially higher than the experimental ones and values higher than the overall RMSD value of 0.091 estimated for 29 proteins for DSSP assignments . Of the 22 proteins of the αβ class, 10 show simulated RMSDs lower or comparable with experimental ones while the remainder has simulated RMSD lower than the overall value of the DSSP assignment . The only exception is insulin, presumably due to its small dimension, short structural elements and uncertainty in its intermolecular interactions in solution . These properties are expected to influence the CD spectrum of insulin. Moreover individual β-sheets have variable CD spectra due to the variations in the geometry of β-structure in proteins . If β-strands are within an unusual structural motif like the Pentapeptide Repeat Protein fold, the SSC calculated from CD and crystallographic data demonstrate poor correspondence . Thus the analyzed proteins included in all-β class all apparently have canonical β-structure.
As follows from this analysis SSC cannot be accurately calculated from CD spectra for the vast majority of the proteins. Reliable calculation of SSC from CD spectra can be made only for all-α proteins and all-α proteins whose β-strands geometry is apparently canonical. Why does this occur and can the method be ‘repaired’? We propose there are some intrinsic limits to the application of CD spectroscopy to protein secondary structure calculation as summarized below: a) The quality of the Ramachandran plots of the reference crystallographic structures is poor for some reference CD datasets. In fact, the residues for most of the structures in some protein reference sets are under the 90% and even the 80% thresholds for the most favored region of the Ramachandran plot ; b) The quality of the proteins used for solution and crystallographic studies may not be consistent. Many of the reference CD spectra are obtained using commercially prepared proteins without purification [4; 8]; c) The consistency of the reference CD database sets is sometimes suspect. The spectrum of some proteins differs in different databases ; d) There has been little cross validation of the instruments used to obtain reference and experimental CD spectra. Many reference CD spectra were obtained long ago sometimes on laboratory-specific instruments whose specifications are not documented. Perhaps creation of a central resource of the published and cross-validated CD data files such as the proposed Protein Circular Dichroism Data Bank  can help solve these problems.
Lastly, the different algorithms used to calculate protein SSC from crystallographic structures give average contents for particular structures with standard deviations comparable with the RMSD values shown in the Table 1 [6; 9]. Unlike the above problems this one cannot be easily fixed. Its solution requires the mutual agreement of the scientific community on the ceasing the indiscriminate use of the programs DSSP, Procheck, STRIDE, XtlSSTR and PROMOTIF in favor of one. We favor the DSSP algorithm as it is mostly used for PDB files.
We conclude that a reliable and meaningful estimation of SSC from the CD spectra can not be made for proteins with the mixed α and β elements in their structure and apparently for proteins with the noncanonical β-strand geometry. At the same time such estimations can be used as relative measures of the structural changes of the proteins at different conditions such as those observed during folding and unfolding.
The author thanks Dr. Michael Brenowitz for his constant support and generous sharing of many valuable suggestions. This study was supported by grant R01-GM079618 from the Institute of General Medical Sciences of the National Institutes of Health.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.