Our work uses the FEATURE program for recognizing calcium sites, but can be used with any similar structure-based method. FEATURE outperforms published methods in detecting divalent cation binding sites [8
]. Glazer et al. compared FEATURE with another method for predicting ligand-binding sites (VALENCE). At a very low level of false positives (16 out of 22*10e3) FEATURE identified 21 out of 24 sites in holo structures and 16 out of 23 sites in apo structures; VALENCE identified 15 out of the 24 sites in the holo structures and zero out of 23 the sites in apo structures. The key contribution of this work is the combination of modeling and machine learning to recognize probable functions in disordered protein regions. Our method also produces a low-resolution structural model of the missing loops. Our goal is not to predict the structure of a binding site or loop with crystallographic accuracy, but to demonstrate that the modeled loop can adopt a conformation consistent with calcium binding. We first showed that FEATURE can recognize known ligand-binding loops that are rebuilt using modeling methods. We generated de novo
loop structures using two separate strategies. The RMSDs between the 100 rebuilt loops and the corresponding natural apo loop structure range from 0.7 to 6.5 Å; and those to the corresponding natural holo structure range from 0.8 to 5.6 Å. The rebuilt loops show good structural diversity. We demonstrated that a cutoff of 50 has very low false-positive rate, a critical feature for prediction. Since biologists may pursue a prediction with experiments, it is generally more important to have a low false positive rate than a low false negative rate. On the other hand, false positives can lead to wasted experimental resources. False negatives are not typically tested and so although regrettable, do not generally consume resources. The trade off depends on the particular goals of the investigator, and so the cutoffs can be changed to accommodate different priorities for sensitivity and specificity.
At the cutoff of 50, the RMSDs between the FEATURE-recognized loops and the corresponding natural apo loop structure range from 1.3 to 4.8 Å; and those to the corresponding natural holo structure range from 1.3 to 4.7 Å. Thus, the modeling protocol produces diverse loops, and then FEATURE identifies the subset whose configuration is compatible with calcium binding. Table shows two proteins (1DVIB277 and 1K94_998) where the experimental apo structure shows strong calcium binding signal with FEATURE, but none of the rebuilt loops show strong signal. Thus, even though the model building generally works, it does not always generate variations that are compatible with calcium binding.
We went on to show that we can combine FEATURE with modeling methods to find binding sites that are not completely resolved experimentally (they are unstructured or disordered in the crystal structure). We applied this procedure to the candidate calcium-binding proteins for which experimentally solved structures are incomplete, resulting in 256 novel predictions of calcium-binding sites. As discussed above, we selected a cutoff score that increases the likelihood that these calcium-binding predictions are correct. Some of our predictions may nonetheless be false positives, but our analysis suggests that they are very likely to be functionally interesting regions of these proteins. For example, they may bind other cations or positively charged ligands. Magnesium binding is very difficult to distinguish from calcium binding using "low resolution" methods such as FEATURE, where features are averaged in shells of thickness 1.25 Angstroms. These sites display a variety of structural characters and have a broad range of functions, providing novel insights into the studies of calcium-binding proteins.
To further test the usefulness of our protocol, we have applied it to CASP8 targets http://predictioncenter.org/casp8/
. There is only one calcium-binding protein in the 128 CASP8 targets, C-terminal domain of probable hemolysin from chromobacterium violaceum
). We scanned 24 loops in 3DED
, including six calcium-binding loops and 18 loops as negative controls. FEATURE successfully identifies the six calcium-binding loops. For each positive prediction, we found multiple high FEATURE scores in both crystallographic loops as well as our modeled loops. These high scores are distributed over a relatively large volume, most likely indicating more than one calcium ion in these loops. In fact, each pair of loops binds 2 calcium ions. None of the 18 loops (negative control) are predicted as false positive. These results demonstrate that our procedure can identify calcium-binding sites successfully even when the structures of binding-loops are rebuilt using modeling methods.
Our prediction strategy focused on proteins with electron density gaps that are in SCOP families that bind calcium. We decided to focus on these to limit the number of gaps that we modeled and evaluated for this study, and because it increased the chance that our predictions would be biologically plausible and interpretable. It would certainly be possible to predict calcium-binding loops for all unstructured loops. Interestingly, only 11 of the 78 highest hits were in proteins that already had resolved binding sites. These 11 predictions are thus for an additional calcium site. This is not at all unusual, as many proteins bind multiple calcium ions. The remaining top 67 predictions are all novel and would represent the first calcium-binding site in these structures.
The most well known calcium-binding site is the EF-hand sequence motif [30
]. We scanned 500 different SCOP fold families known to bind calcium. About 2% of structures (about 29000 structures) in these 500 families contain EF-hand motif. Our study focused on 2495 structures with regions of poor (or missing) electron density. Only 0.08% (total 19 structures) of the 2485 structures contain EF-hand motifs. Furthermore, only three loops that we modeled and scanned are within the EF-hand motifs. We successfully recognize the calcium-binding activity of these three loops: they are 1SL9
(homolog of 1U5R
). The paucity of EF-hand missing protein loops suggests that they tend to be stable, even in the absence of calcium. Indeed, we find that both the apo and holo states of EF-hand form ordered structures; and so experimental 3D structures of both forms are usually available. The remaining (non-EF-hand) sites generally are not associated with good sequence motifs, and seem to show higher structural flexibility
We have observed that a considerable number of predictions adopt anti-parallel beta structures, but in different topologies, including beta-barrel, beta-sandwich and Greek key. These proteins display a wide range of function roles, including photosynthesis (1DBZ
), aggregation of acetylcholine receptors (1PZ7
), membrane insertion as a responsible for anthrax (1ACC
). For these proteins, calcium ions are often predicted to bind the loops between beta-strands. These structures show structural homologies with the C2 domains, which form an eight-stranded anti-parallel beta-sandwich consisting of a pair of four-stranded beta-sheets. The C2 domain is often involved in calcium-dependent membrane targeting. Calcium ions often bind in an indentation formed by the first and final loops of the membrane binding face of the C2 domain. Newton et al proposed a model for calcium-dependent membrane binding by the C2 domain: calcium binding induces a conformational change in the C2 domain in order to expose functional groups responsible for membrane binding [31
]. In these predicted sites found in the anti-parallel beta structures, loops are not observed in crystal structures due to their high flexibility. We proposed that calcium binding to these loops induces conformational changes, which contribute to the functional roles of the loops.
We note that 61 proteins of our 78 novel predictions bind to other ligands in addition to metal ions. In those cases, it is possible that calcium plays a role in regulating the binding of other ligands, as has been observed in some proteins[32
]. For example, calcium induces a conformational change in the ligand-binding site of the LDL receptor and maintains the cysteine-rich regions in a more folded, native state. In the absence of calcium, the LDL receptor does not have the proper conformation for ligand binding. Consequently, only when the LDL receptor is exposed to calcium, it is competent to bind the ligand [33
]. In our predictions, we have predicted that calcium binding regulates the binding of 1PU or other consequent activity changes to the cell division protein kinase 2 (1GIH
) (see the Results). Our prediction has been confirmed by the observation of its homologous structure 1U5R
. A similar role of calcium binding is observed in one of the 78 novel predictions, mycobacterium tuberculosis RecA (PDB ID: 1G18
). We therefore propose that these 61 proteins' activity may be regulated by calcium binding via allosteric effects.