Many independent studies have revealed gain of new biochemical and biological functions as a result of TP53
mutations, suggesting that this gene additionally has properties of an oncogene [4
]. This notion has been supported by bioinformatic analysis of the tumor-specific mutation spectra in the TP53
gene which show a highly significant excess of missense mutations over the neutral expectation, suggesting that p53 evolution in tumors is subject to positive selection [14
]. In Table , we present an update of this comparison based on the latest somatic mutation data; the substantial excess of non-synonymous substitutions suggests positive selection [13
] acting on p53 in all tumor types for which sufficient information was available.
NSMC and NSCS test results for TP53 somatic mutation spectra (H0: mutational bias; H1: selectional bias)
We developed a simple statistical test (hereinafter NSMC test, after Non-Synonymous Monte Carlo) that specifically addressed the dilemma of mutational vs. selectional origin of the hotspots. This test included comparison of samples of synonymous and non-synonymous sites selected such that both the number of sites and the number of mutations in the samples was the same (Fig. ). Only positions in which mutations were found were analyzed. Sites in which both synonymous and non-synonymous substitutions were observed (e.g., third positions in two-codon series) were analyzed independently for the two types of substitutions. The NSMC test was designed to account for differences in the nucleotide compositions and the frequencies of substitutions in synonymous and non-synonymous sites (non-synonymous sites were sampled to mimic the nucleotide composition in synonymous sites). With this normalization, the comparison of the number of hotspots, i.e., sites with at least two substitutions, between the samples of synonymous and non-synonymous (designated NSH and NNH, respectively) sites gives a measure of the skewness of the distribution of mutations (Fig. ). Monte Carlo simulations, repeated 100,000 times, were used to assess the statistical significance of differences between the distributions of hotspots in the synonymous and non-synonymous sites. Two alternative statistical hypotheses were tested: H0 – mutational bias (no difference between the distributions of hotspots in the synonymous and non-synonymous sites) and H1 – selectional bias (the distributions of hotspots in the synonymous and non-synonymous sites are different). The fraction of simulated sets in which NNH > NSH is the probability P(H1) of the rejection of H0. Large values of P(H1) (≥ 0.95) indicate that the hypothesis H0 is rejected and there is a significant excess of hotspots in non-synonymous sites.
Figure 1 The procedure used for random sampling of mutations at non-synonymous sites in the NSMC test. Step 1 includes the selection of a sample of non-synonymous sites such that the number of sites and their base composition were the same as in the entire set (more ...)
Using the NSMC test, we detected a statistically significant excess of hotspots in non-synonymous sites in 50% of the tumors for which extensive mutational data was available (Table ). When the data from all tumor types were pooled, the excess of hotspots in non-synonymous sites was highly significant: the null hypothesis, i.e., that the distributions of mutations in the synonymous and non-synonymous sites were identical, was rejected with P < 0.001 (Table ). Since we accounted for differences in nucleotide compositions, mutational biases are not expected to differ between synonymous and non-synonymous positions. Thus, the greater skew of the mutation distribution in non-synonymous positions should be viewed as evidence of, primarily, selectional origin of the hotspots.
Combined NSMC and NSCS test results for the TP53 somatic mutations spectra
This conclusion was further supported by analysis of mutation spectra after removal of CpG dinucleotides, the most prominent mutational hotspots in the human genome [24
]. Under this test, many hotspots in CpG sites overlapping arginine, glycine and valine codons were removed but the selection hypothesis was nevertheless supported for several tissues and for the combined spectrum (Additional file 1
). Furthermore, the results of the NSMC test performed before or after removal of the CpG sites did not depend on the threshold used for hotspot identification (Additional file 2
We also applied the NSMC test to compare the distributions of hotspots in nonsense and synonymous sites. An excess of hotspots in nonsense sites would be indicative of positive selection for loss of p53 function. A significant excess of hotspots in nonsense sites was detected only in colorectal cancers as opposed to 8 of the 16 analyzed tumor types in which hotspots non-randomly associated with non-synonymous sites were identified (Table ). The difference between the excess of hotspots in non-synonymous sites and the excess of hotspots in nonsense sites was statistically significant (P = 0.015 by the Fisher's exact test). This observation is compatible with the notion that non-synonymous hotspots in p53 evolve under positive selection for gain of function.
We further tested the hypothesis of independence between the mutation class (hotspot vs. non-hotspot) and site class (non-synonymous vs. synonymous). The data for all analyzed spectra were represented as 2 × 2 contingency tables which were analyzed using the χ2
test (hereinafter NSCS test, after Non-Synonymous Chi-Square). Using the NSCS test, we observed a significant excess of hotspots in non-synonymous sites compared to the expectation under the independence hypothesis. Thus, two independent statistical tests show that, in the spectra of somatic mutations in the TP53
gene from most tumors, the hotspots are highly non-randomly associated with non-synonymous sites. In a direct analogy to the classical Ka/Ks signature of positive selection [13
], this preferential occurrence of hotspots in non-synonymous positions indicates that the hotspots result, mostly, from positive selection for new functions of the p53 protein.
Both the NSMC and the NSCS tests produced opposite results when applied to the available mutational spectra of three other tumor suppressor genes, BRCA1
, and p16INK4a
(Table ) [26
]. The hypothesis that hotspots are randomly distributed among synonymous and non-synonymous sites could not be rejected for these genes. This observation suggests that p53 might be unique among tumor suppressors in that its somatic evolution in many tumors involves intense positive selection for gain of function. Alternatively, however, it cannot be ruled out that the available mutation data for the other tumor suppressors is insufficient to detect statistically significant association of hotspots with non-synonymous sites.
The NSMC and NSCS test results for BRCA1, BRCA2, and p16 genes
We also developed a third statistical test (hereinafter NSB test, after Non-Synonymous Binomial) to identify non-synonymous substitution hotspots (analyzed, for this purpose, at the level of codons), i.e., those with a statistically significant excess of non-synonymous substitutions over the random expectation. The expected numbers of non-synonymous and synonymous substitutions were calculated using a Monte-Carlo simulation procedure, which was repeated 1,000 times for each codon. Each step involved random shuffling of transitions and transversions among the three positions of a codon. The statistical significance of the observed excess of the number of the detected non-synonymous substitution hotspots over the random expectation was assessed using the binomial test and the Bonferroni correction for multiple tests.
The NSB test revealed from 1 (p16
) to 59 (TP53
) hotspots non-randomly associated with non-synonymous sites in each of the tumor suppressors (Table ). Thus, it appears that positive selection might affect not only somatic evolution of p53 but also that of other tumor suppressors albeit, seemingly, to a lesser extent. The failure of the NSMC and NSCS tests to detect the signature of positive selection in genes other than p53 could be due to the fact that these tests require a large number of synonymous substitutions which is currently available only for p53. Alternatively, however, it cannot be ruled out that synonymous substitutions are underrepresented in the databases for BRCA1, BRCA2, and p16. Such an artifact would affect the NSB test, potentially resulting in false-positives, but not the NSMC or the NSCS tests. Expanded compendia of somatic mutations for these genes and thorough database curation are critical for a reliable assessment of the contribution of positive selection to their evolution in tumors. Even the largest available database of somatic mutations, that for p53, is not large enough for some statistical experiments. For example, we were unable to apply our tests to G>T substitutions in lung tumors [18
] because only a few unique synonymous G>T substitutions associated with lung tumors were found in the p53 database, whereas the non-synonymous G>T mutations are the most frequent type of substitutions in lung tumors [14
]. Furthermore, more data on somatic mutations is required to explore the effect of nucleotide context other than that of CpG that was examined here.
Hotspots non-randomly associated with non-synonymous sites in tumor suppressor genes according to the NSB test
It should be emphasized that, although we detected a highly statistically significant association of non-synonymous sites with hospots, in particular, for p53, the results of the present analysis do not allow us to assign any individual mutation to the gain-of-function or loss-of-function category. Nevertheless, these results can be used for devising experimental studies of gain-of-function by tumor suppressors mutated in specific sites (hotspots) and/or specific tumor types with particularly strong evidence of positive selection.