The complete lists of non-silent mtDNA mutations in oncocytic, non-oncocytic and other tumors (which we refer to as "general cancer") used in this work are reported in Additional files: Table S1-Table S3 respectively. The general cancer dataset are from papers in which no mention is made of either oncocytic or non-oncocytic phenotype, so these tumors cannot be classified into either of the first two categories. The total data sum up to 67 mutations (40 of which lead to frameshifts or premature stop codons) in oncocytic tumors, 14 mutations (including 3 frameshifts or premature stop codons) in non-oncocytic tumors and 107 mutations (including 16 frameshifts or premature stop codons) in other cancers. The proportion of disruptive variations (the frameshifts and premature stop codons) is 60% in oncocytic tumors and only 21% in the non-oncocytic tumors, constituting a significant difference (p
= 0.016) by a Fisher's exact test. This testifies to a significant accumulation of severe mutations in oncocytic tumors when compared with non-oncocytic tumors, as has been previously reported [6
For each of the 13 mtDNA encoded protein genes we compared the number of non-silent variations (nonsynonymous, indels and premature stop codons) found in the oncocytic tumors to the number in the general cancer tumors. With only 14 non-silent variations in the non-oncocytic tumors, there was not sufficient data to break those data down by gene. For two genes there were highly significant differences in the variation frequencies in the oncocytic and general cancer categories. The occurrence of non-silent mutations in the MT-ND1
gene was 4.3 times higher in the oncocytic tumors than in the general cancer tumors (p
value = 0.0006 by two-tailed Fisher's exact test). Conversely, the MT-CO3
gene had 0/67 non-silent variants in the oncocytic tumors but 12/107 in the general cancers (p
value = 0.004). Even when correcting for 13 tests these two tests remain significant at a threshold p
value of 0.05/13 = 0.004. The other eleven genes had no significant difference between the oncocytic tumors and general cancers. These statistically significant values by comparing oncocytic and general cancers strengthen the observation made previously without statistical testing [11
] (where only oncocytic mutations and the ratio per gene (normalized for the gene size) were analyzed) that mutations accumulate preferentially in the MT-ND1
gene. The authors in that work also reported that MT-CO1
genes seemed to be protected, a characteristic common to all other complex IV and V genes when information about the potential pathogenicity of the mutations was taken into consideration. This result is consistent with our independent observation of a significant lower mutation frequency in MT-CO3
gene in oncocytic tumors. Our results significantly extend these earlier observations by using a comparison of mutations reported in oncocytic tumors to mutations reported in general cancers and by showing that these differences are highly statistically significant.
The 13 proteins encoded by mtDNA are core subunits for four of the five protein complexes that make up the electron transfer chain (ETC). If we analyze the distribution of the non-silent variations by ETC complex, then there is enough data in the non-oncocytic tumors for significant results (Table ). With 12 tests, the adjusted target significance level is 0.05/12 = 0.004. Consistent with the analysis in the previous paragraph, non-silent variants in the Complex I genes were far more likely to be found in the oncocytic tumors than in the non-oncocytic tumors. Conversely, non-silent variations were less likely in the Complex IV genes in oncocytic tumors compared to non-oncocytic tumors. There also was a significant decrease in the non-silent variants in Complex V in the oncocytic tumors compared to the non-oncocytic tumors. Only Complex III, which is represented by just a single mtDNA encoded gene, did not have a significant difference between the oncocytic and non-oncocytic tumors. The comparison of the oncocytic tumor variants with the general cancer variants gives the same pattern of significant differences, with the interesting exception of Complex V, which has no significant difference in this comparison. Finally, in the comparison between the non-oncocytic tumor variants and the general cancer variants, there was a nominally significant difference only in the Complex V genes (though this was not significant after correction for multiple testing). The picture that results from these comparisons is that non-silent mtDNA mutations in Complex I are more likely to be found in the oncocytic tumors, while non-silent variations in Complex V are more likely in the non-oncocytic tumors.
Statistics for counts of non-silent mtDNA variants organized by electron transfer chain complex.
While the pathogenicity of variations causing premature stop codons or frameshifts is obvious, the pathogenicity of non-synonymous variants may be highly variable, ranging from benign to highly pathogenic variations. Several methods of predicting the pathogenicity of nonsynonymous variations exist [55
]. For the reported pathogenic variations resulting in a single amino acid change, we calculated predicted pathogenicity scores using the MutPred software [16
]. The pathogenicity score in this method ranges from 0 to 1, with higher values indicating more severe pathogenicity. The nonsynonymous variations in the oncocytic tumors (Figure ) have significantly higher median pathogenicity scores than the variations in non-oncocytic tumors (p
= 0.016, by Wilcoxon rank sum test). The oncocytic tumors are also significantly higher in median pathogenicity score (p
= 3 × 10-4
) than the variations reported in general cancers. The difference between the pathogenicity scores in the non-oncocytic tumors and the general cancers is not significant, and the distribution of scores in these two categories, as shown in the box plots (Figure ), is quite similar.
Figure 1 Pathogenicity scores of mtDNA variants in oncocytic, non-oncocytic, and general cancers. Each point represents a somatic mtDNA mutation resulting in a single amino acid change. In each category, individual data points are given to the left, and the statistics (more ...)
For a wider comparison we also considered three other categories of mtDNA variations; all reported pathogenic mtDNA protein variations (compiled from OMIM), all possible variations in the mtDNA encoded proteins defined by single nucleotide variations from the reference sequence rCRS [53
], and all observed mtDNA-encoded protein variations reported in large human phylogenetic trees (representing the general population variants). These values were all reported in [16
] and detailed explanations of their definition are given there. Briefly, we take set of OMIM variations as a set of nonsynonymous mtDNA variations with some level of proof of pathogenicity. The set of all possible variations in the mtDNA encoded proteins contains all 24206 amino acid changes that can be generated by a single nucleotide change from the rCRS. This is meant to represent the set of all possible random changes. The final group is the set of all observed non-synonymous mtDNA variants collected from human phylogenetic trees (further details of the trees are given in [16
]). This group represents the population level variants in these proteins.
Figure presents the distributions of the predicted pathogenicity scores for each of these categories, compared with the oncocytic and non-oncocytic tumor mtDNA variations. The median pathogenicity scores for the oncocytic tumors are significantly higher than the scores for all these categories of variations (p = 0.007 for oncocytic vs OMIM pathogenic variants; p = 1 × 10-5 for oncocytic vs all possible variants; p = 6 × 10-14 for oncocytic vs general population variants). The fact that the oncocytic tumor variants have significantly higher pathogenicity scores than the reported pathogenic mtDNA variation in OMIM emphasizes the point that the variants reported in these tumors should be considered highly pathogenic. If the somatic mtDNA variants are created randomly along the mitochondrial genome, then they should, at least approximately, be random samplings from the set of all possible variants. The fact that the oncocytic mtDNA variants have significantly higher pathogenicity scores than the set of all possible variants means that the oncocytic mtDNA variants are even worse than would be expected from random changes to the mtDNA.
Figure 2 Comparison to pathogenic mtDNA variants, all possible variants and population level variants. The distribution of pathogenicity scores of mtDNA variations in oncocytic tumors (red) and non-oncytic and general cancer (both blue) are compared to reported (more ...)
The non-oncocytic tumor variants were only nominally significantly different from the general population variants (p = 0.026). This p-value is not low enough to survive multiple testing corrections for five tests. Conversely, the non-oncocytic tumor variations are not significantly different from the set of all possible variants (p = 0.6), though this lack of significance must be interpreted with care due to the small amount of non-oncocytic variant data. However it is clear that both the non-oncocytic and oncocytic mtDNA variations have for the most part escaped the purifying selection that causes the mean pathogenicity score in the population variants to be so small (Figure ). The median pathogenicity score for the oncocytic variants is significantly higher than the median score for all possible variants while the median score for the non-oncocytic variants is smaller than that for all possible variants (though that difference does not reach significance). A reasonable interpretation of this pattern is that the somatic variations arise as a random sampling from all possible variations (at least approximately), and that those tumor cells that contain high levels of mtDNA variants with very high pathogenicity scores tend to develop the oncocytic phenotype, while those tumor cells with lower pathogenicity scores tend to maintain the non-oncocytic phenotype.
MutPred is only one of many available methods for predicting the pathogenicity of nonsynonymous variants. A recent test [55
] of several of these methods determined that the overall best performing methods were MutPred and SNPs&GO [54
]. To test whether these results generalized to other pathogenicity scoring systems, we repeated the analysis using the SNPs&GO software. SNPs&GO classifies variants into "Neutral" or "Disease" categories, along with a reliability index ranging from 0 to 10, with high values denoting more reliable predictions. In these datasets few variants had reliability scores of 7 or higher, so we chose to only include variants with SNPs&GO reliability scores ≥ 5 in order to have a reasonably high reliability score while also having enough data to analyze. The results reported below were significant for all choices of reliability score cut-off from 0 (using all data) to 6, and there was not enough data with a reliability score above 6 to warrant testing. Our first test was to see whether the MutPred scores and SNPs&GO categories for the variants in this study were consistent. In Figure , we compare the MutPred pathogenicity scores for variants in the SNPs&GO "Disease" category to the MutPred scores for variants in the SNPs&GO "Neutral" category. The comparison is very highly significant (p
-value = 4 × 10-8
by nonparametric Wilcoxon rank sum test), proving that the pathogenicity assessment of these two different methods has good agreement (i.e. variants classified by SNPs&GO as "Disease" also had significantly higher MutPred pathogenicity scores on average).
Comparison of the pathogenicity assessment of the SNPs&GO method and the MutPred method. All mtDNA variants in this study were assessed together. The bar plots represent the statistics of each data set as in Figure 1
Finally, we used the SNPs&GO pathogenicity analysis to compare the nonsynonymous mtDNA variations in the oncocytic, non-oncocytic and general cancer tumors. Of the 11 nonsynonymous variants in the non-oncocytic tumors, only two had SNPs&GO reliability indices ≥ 5, so there was not enough data to analyze that category using this method. In the oncocytic tumors, 7/8 nonsynonymous mtDNA variants were reliably classified by SNPs&GO as "Disease", while in the general cancer tumors only 12/46 were reliably classified as "Disease", a highly significant difference (p-value = 0.0018 by Fisher's exact test). Thus, the SNPs&GO analysis agrees with the MutPred analysis. Both tests conclude that mtDNA variations reported in oncocytic tumors have higher pathogenicity than the mtDNA variants reported in general cancers.