We utilized the NetPhos algorithm to predict putative phosphorylation sites along the DNA repair and cell cycle proteins, and studied whether 89 naturally occurring nsSNPs (64 from 28 DNA repair and 25 from 19 cell cycle genes) might alter the phosphorylation patterns in these proteins. The sensitivity of NetPhos prediction has been reported to be 69–96% with a false-positive prediction rate of 0–26% for Y, 0–11% for S, and 0–14% for T [22
]. The results obtained using the NetPhos software are shown in Table , and are summarized in Table . Our results have shown that 16.9% (15/89) of the nsSNPs studied are likely to abolish or create 17 putative phosphorylation sites in 44.0% (14/32) of the proteins. As summarized in Table , five nsSNPs (ERCC5-S311C, OGG1-S326C, XRCC3-T241M, CCND3-S259A, and CDKN1A-S31R) were predicted to abolish putative phosphorylation sites, whereas four nsSNPs were predicted to create putative phosphorylation sites in the proteins (ERCC2-H201Y, ERCC4-P379S, LIG4-P231S, and XRCC1-P309S). These nsSNPs resulted in the addition or removal of a S, T or Y residue at the predicted phosphorylation site.
Table 1 nsSNPs that abolish or create putative phosphorylated residues in DNA repair and cell cycle proteins. Only the NetPhos  predictions that remove or create a site at either the SNP location or at kinase recognition motifs are shown. The nsSNPs that (more ...)
Distribution of the nsSNPs predicted to alter the phosphorylation sites.
The kinase recognition/interaction motif involves 7–12 amino acids around the phosphorylated residue [40
], and the physicochemical characteristics of these amino acids determine the specificity of the protein kinases [41
]. Thus, the amino acid substitutions within the kinase recognition motifs are likely to influence the substrate recognition and the subsequent phosphorylation by kinases. Accordingly, we have identified six nsSNPs (Table , ) located within the phosphorylation motif of six proteins (within 4 amino acids on either side of the putative phosphorylated residue based on NetPhos outputs) that abolished eight putative phosphorylation sites (BRCA1-P871L at S868, BRCA1-S1040N at S1041, ERCC5-S311C at S310, IGHMBP2-T671A at S672, WRN-S1079L at S1083 and at S1084, CCNI-V207I at S208, and NFKB1-H712Q at T716). Interestingly, NetPhos predicts two overlapping phosphorylation motifs for the ERCC5-S311C nsSNP (S311 SLPSS
SKMH and S310 ESLPS
SSKM), which are both completely abolished by the substitution of the serine residue (position 311) with a cysteine (Table ). Similarly, the WRN-S1079L nsSNP was also predicted to remove 2 putative overlapping phosphorylation motifs (S1083 SKTVS
SGTK and S1084 KTVSS
The Swiss-Prot [43
], HPRD [44
], PhosphoBase [45
], and Phospho.ELM [46
] databases and the existing literature did not reveal any experimentally verified phosphorylation at the predicted sites. Analysis of the mouse orthologues showed that the corresponding amino acids at the BRCA1-S1041, CCNI-S208, ERCC5-S310, IGHMBP2-S672, WRN-S1083 and XRCC3-T241 residues were also predicted to be phosphorylated, suggesting that these motifs/sites might have been evolutionarily conserved between two species. On the other hand, the remaining phosphorylation sites, which are not detected in mouse proteins, may represent the newly evolved phosphorylation motifs in human. However, considering the false-positive rate of NetPhos as well as the possibility that the negative selection acting on the nsSNP sites can result in higher false-positive rates, we cannot totally rule out that all predictions in Table are false. Yet these predictions are still of a great value and suggest possible phosphorylation sites that can be experimentally evaluated. In future, when sufficient molecular data regarding the phosphorylation status of orthologous proteins is available, more systematic analyses can be performed to maximize the accuracy of phosphorylation predictions.
We have also performed an extensive literature review to investigate the role of the reported nsSNPs (minor allele frequencies ≥5%) in human cancer predisposition (Table ). Supporting our hypothesis, three SNPs (CDKN1A-S31R, OGG1-S326C, and XRCC3-T241M) have already found to be associated with altered cancer risk. XRCC3-T241M nsSNP was reported to be associated with increased breast cancer [47
] and melanoma risk [49
], and was also found to be protective against bladder cancer in heavy smokers [50
]. XRCC3 is a key DNA repair protein involved in base excision repair [29
] and is involved in repairing the alterations caused by many DNA damaging agents. Recently, the XRCC3-M241 variant has been associated with increased risk of incidence of tetraploid cells, frequently observed in cancers, through affecting the function of the XRCC3- and Rad52-associated RPA protein [51
]. Similarly, the OGG1-S326C SNP was found to be associated with increased lung [52
], orolaryngeal and esophageal cancer risk [53
]. OGG1 is a DNA repair protein that is protective against the mutations induced by the 8-hydroxyguanine. Yamane et al
] suggested that OGG1-C326, when compared to OGG1-S326, was associated with a lower repair capacity for 8-hydroxyguanine induced mutations in human cells. In the case of CDKN1A-S31R, the CDKN1A-S31 was suggested to be associated with increased endometrial cancer [56
] whereas CDKN1A-R31 was associated with increased primary open-angle glaucoma [57
] and esophageal cancer risk [58
]. The CDKN1A-R31 form of the protein was not significantly different than the CDKN1A-S31 form in terms of its ability to suppress colony formation [59
]. However, it is not clear whether this result would suggest that the CDKN1A-R31 would be functionally equivalent to the wild type allele in other diverse cellular mechanisms that the CDKN1A protein is involved in, such as apoptosis, cell migration, and senescence [60
Table 3 Common nsSNPs with a possible role in cancer predisposition. Only the information derived from the studies on the protein function as well as the studies with a suggestion of disease-association have been included. 1 and 2 under the frequency column represents (more ...)
In addition to the SNPs already implicated in cancer risk, we identified one relatively common nsSNP potentially altering the phosphorylation pattern of a major breast and ovarian cancer susceptibility gene, BRCA1. The BRCA1-P871L SNP was not found to be associated with either breast [62
] or ovarian cancer risk [63
], however, further analyses is required to see whether this nsSNP or the other nsSNPs in Table play a role in susceptibility to other cancer types.
How can we explain that commonly occurring nsSNPs (minor allele frequencies ≥5%) are likely to affect the phosphorylation and thus the function of the proteins? If the phosphorylation site is necessary for the function of the protein and the protein is necessary for the fitness of the organism (indispensable/essential protein), then we would expect such nsSNPs (deleterious alleles) to be either removed from the population or be kept at low allele frequency by means of the purifying selection. Thus, in this case, one can conclude that the common nsSNPs presented in this report can be falsely predicted as removing/creating putative phosphorylation sites by NetPhos program. However, the allele frequencies of the deleterious alleles from proteins that are essential for fitness get higher than expected when the nsSNPs are a) created by hot-spot mutation mechanism(s), b) subject to balancing selection, too [64
]. Alternatively, even though the nsSNPs (and the abolished/created phosphorylation sites) have important impact on the protein function, the protein and/or the altered protein function may not affect the fitness, which can also explain the lack of purifying selection against such nsSNPs and their relatively high minor allele frequencies. Besides, the biological consequences of altered protein function may only be exerted under certain environmental conditions.