|Home | About | Journals | Submit | Contact Us | Français|
Genetic prediction of common diseases is based on testing multiple genetic variants with weak effect sizes. Standard logistic regression and Cox Proportional Hazard models that assess the combined effect of multiple variants on disease risk assume multiplicative joint effects of the variants, but this assumption may not be correct. The risk model chosen may affect the predictive accuracy of genomic profiling. We investigated the discriminative accuracy of genomic profiling by comparing additive and multiplicative risk models. We examined genomic profiles of 40 variants with genotype frequencies varying from 0.1 to 0.4 and relative risks varying from 1.1 to 1.5 in separate scenarios assuming a disease risk of 10%. The discriminative accuracy was evaluated by the area under the receiver operating characteristic curve. Predicted risks were more extreme at the lower and higher risks for the multiplicative risk model compared with the additive model. The discriminative accuracy was consistently higher for multiplicative risk models than for additive risk models. The differences in discriminative accuracy were negligible when the effect sizes were small (<1.2), but were substantial when risk genotypes were common or when they had stronger effects. Unraveling the exact mode of biological interaction is important when effect sizes of genetic variants are moderate at the least, to prevent the incorrect estimation of risks.
Common diseases such as type 2 diabetes, cardiovascular disease, asthma and osteoporosis are caused by a complex interplay of many genetic and non-genetic factors, each of which conveys only a minor increase in the risk of disease. Genetic associations typically have odds ratios, with effect sizes ranging from 1.1 to 1.5, with each single polymorphism explaining only a minor fraction of the variation in a phenotype. Because the predictive accuracy of testing for a single genetic variant is limited, genetic prediction of disease will be based on testing for multiple genetic variants simultaneously (genomic profiling).
Studies on the combined effect of multiple genetic variants generally assume that joint effects result from multiplying the risk of single variants rather than from adding them.1, 2, 3, 4 Further, empirical studies implicitly assume multiplicative risk models, as the joint effects of multiple genes are typically examined using logistic regression or Cox proportional hazard analyses.5, 6, 7 The assumption that susceptibility genes interact in a multiplicative manner may not be correct.8 On the basis of the ratio of disease risks in monozygotic and dizygotic twins of cancer cases, an additive model provided the best fit for most common cancers, including breast cancer.9 Conversely, a very high incidence was reported for monozygotic twins of women with breast cancer, which would be more consistent with a multiplicative model for joint genetic effects.10
Because estimated disease risks of genomic profiles may differ depending on whether a multiplicative or additive risk model is assumed, the clinical validity of the combined testing may change accordingly. The aim of this study was to investigate predicted risks and the discriminative accuracy of predictive testing using multiple genetic variants, by comparing multiplicative and additive risk models for the estimation of the joint genetic effects. The two risk models were compared using hypothetical scenarios with equal genotype frequencies and risks and using realistic scenarios of predictive testing for multiple genetic variants in prostate cancer.
Distribution of predicted risks and discriminative accuracy were obtained from formulae. The formulae are presented for the hypothetical scenario in which we considered genomic profiles based on K genetic variants with equal genotype frequencies (m) and equal relative risks (R). In the hypothetical scenario, for ease of interpretation, we consider genetic variants that have dominant or recessive effects yielding two possible results for each gene: a risk genotype and a non-risk or referent genotype. Under the assumption of equal frequencies and relative risks, genomic profiles can be expressed as risk genotype scores, values of which range from zero to K. The formulae to calculate disease risk when the risk score equals c for both multiplicative models and additive models are given by equation (A4) in Appendix A and by equation (B1) in Appendix B, respectively.
The discriminative accuracy, quantified as the area under the receiver operating characteristic curve (AUC), is determined by the distribution of disease risks in those who will develop the disease and those who will not. The AUC indicates the discriminative accuracy of a continuous test. The AUC ranges from 0.5 (total lack of discrimination) to 1.0 (perfect discrimination) and is independent of the prevalence of disease. The AUC can be mainly considered as the probability that the test correctly identifies the affected subject from a pair in whom one is affected and one is unaffected. An AUC of 0.95 means that 95% of the pairs are correctly classified, whereas a test with an AUC of 0.50 is non-discriminative – as accurate as tossing a fair coin.
The AUC can be estimated non-parametrically from the empirical ROC curve. In this study, we derive equations for sensitivity and specificity for both additive and multiplicative models and construct empirical ROC curves. An empirical ROC curve is constructed by connecting all combinations of sensitivity and specificity obtained at all possible cutoff levels of the genotype score. AUC is calculated using the Trapezoidal rule. ROC curves and its characteristics have been described in many papers.11, 12
The sensitivity and specificity of genotype scores are calculated from the sensitivity and specificity of each single genetic variant. The sensitivity of a single genetic variant is the percentage of carriers of the risk genotype among individuals who will develop the disease, and specificity is the percentage of non-carriers among those who will not develop the disease. The sensitivity, βS, and specificity, αS, of a single gene test are given by the following equation:13
where R is the relative risk of the risk genotype, p is the risk of the disease in the population and m is the risk genotype frequency.
For a multiplicative model, the sensitivity (βMc) at each cutoff value of the genotype score is a function of the sensitivities of the single genetic variants:
where βS is the sensitivity of each single genetic variant, K is the total number of variants that are considered in the genotype score, and c is the cutoff value of the genotype score. The derivation is presented in Appendix A When βMc is known, the specificity at each cutoff value (αMc) can be obtained from Table 1:
For the additive model, sensitivity βAc and specificity αAc are given by the following equation:
The derivations for these formulae are given in Appendix B.
We considered genotype scores that were based on 40 genes, and investigated risk distributions and AUC for different combinations of relative risks and genotype frequencies. We assumed that the disease risk in the population was 10%.
Figure 1 shows the disease risks for genotype scores of up to 20 for the two models when relative risks are 1.1, 1.2 and 1.5, and genotype frequency is 0.1 and 0.3. We considered genotype scores of up to 20 because anyone in the population having a genotype score of more than 20 is almost zero for these genotype frequencies. The disease risk for the additive model increases linearly with increasing risk genotypes. When genotype frequency is 10%, the disease risk for the additive model is higher than the disease risk for the multiplicative model for up to a genotype score of 5, and when genotype frequency is 30%, the disease risk for the additive model is higher than the disease risk for the multiplicative model for up to a genotype score of 13 for relative risk 1.2 and up to a genotype score of 14 for relative risk 1.5. When the genotype score is 4 in the genomic profile, with relative risk 1.2 and genotype frequency 10%, the increase in disease risk for the additive model is only 0.006; when relative risk is increased to 1.5, the increase is 0.028; when genotype frequency is 30% and relative risk 1.2, the increase is 0.033; and when genotype frequency is 30% and relative risk 1.5, the increase is 0.041. Theoretically, multiplicative models give higher disease risk compared with additive models when there are a large number of risk genotypes.
The vertical lines in Figure 1 denote the distribution of risk genotypes in the population. When the genotype frequency is 0.1, only 8 out of 10000 in the population can have more than 10 risk genotypes, whereas when the genotype frequency is 0.3, more than 75% of the population can have more than 10 risk genotypes. For more than 79% of the population, the risk under the additive model is higher than the risk under the multiplicative model. However, when relative risk is around 1.1, the difference in risk between the two models is almost negligible. For the population for which the multiplicative model gives higher risk than the additive model, the difference in risk is very large. This situation is to be expected, as both models assume identical prevalence of disease in the population.
We also considered 40 risk genotypes with genotype frequency 0.1 and relative risk 1.2, and another risk genotype, A, with the same genotype frequency and relative risk of 3.5 in a genomic profile. The risks for the two models for the populations with and without risk genotype A are shown in Figure 2. For the sub-population that has risk genotype A, almost 77% shows a higher risk under the multiplicative model compared with the additive model. However, the proportion of the sub-population that has the risk genotype is only 10% of the overall population.
Figure 3 shows the ROC curves for the different combinations of relative risks and genotype frequencies studied. The AUC for the multiplicative model (AUCM) and the AUC for the additive model (AUCA) increase with increasing values of relative risks. When relative risk is 1.1 and genotype frequency is 10%, the AUCM and AUCA are 0.56 and 0.53, respectively. When the relative risk is 1.5, AUCM increases to 0.74 and AUCA increases to only 0.60. This shows that higher relative risks result in larger differences between AUCM and AUCA. When genotype frequency is increased from 0.1 to 0.4, keeping relative risk at 1.5, the AUCM increases from 0.74 to 0.85, but AUCA declines from 0.6 to 0.55. As seen in Figure 2, there is not much difference between AUCM and AUCA for lower values of relative risks (around 1.1) and for lower values of genotype frequencies (less than or equal 10%). For example, when relative risk is 1.1 and genotype frequency is 0.1, the difference between AUCM and AUCA is only 0.02. However, when relative risk is 1.5 and genotype frequency is 0.4, the difference between AUCM and AUCA is 0.30. AUCM increases steeply with increasing relative risks and genotype frequency, whereas the increase in AUCA remains small.
To explore this phenomenon further, we plotted the AUC against genotype frequency for relative risks of 1.2 and 1.5 (Figure 4). The AUC for the multiplicative model increases with increasing genotype frequency up to 50% and then declines. The AUC for the additive model increases with increasing genotype frequency for rare genotypes of up to 5–7% range and then declines. Figures 3 and and44 show that, for rare genotypes (less than 10%) and low relative risks (les than 1.2), there is not much difference between the AUC for multiplicative models and that for additive models. When relative risks are around 1.2–1.5, the largest difference in AUC for the two models is achieved when genotype frequency is around 50%. The AUCs for the 41 genetic variants illustrated in Figure 2 are 0.635 and 0.590 for the multiplicative and additive models, respectively. When the genotype frequency is increased to 0.3 for all the 41 risk genotypes, the AUCs for the multiplicative and additive models are 0.690 and 0.588, respectively.
The population attributable fractions (PAF) of the 40 risk genotypes for the additive and multiplicative models are also different.14 For example, when R=1.2 and G=0.1, the PAF for the additive model is 0.444, whereas the PAF for the multiplicative model is 0.615. When the genotype frequency is increased to 0.2, the PAF values for additive and multiplicative models are 0.547 and 0.792, respectively.
Zheng et al15 studied the genetic predisposition to prostate cancer by examining the association between prostate cancer and five SNPs that map to the three 8q24 loci, to 17q12, and to 17q24.3. The genotype frequencies of the five SNPs were 0.3, 0.25, 0.07, 0.77, and 0.6, and the relative risks were 1.38, 1.28, 1.53, 1.37, and 1.22. The prevalence of diagnosed prostate cancer in the US adult population is about 1.6%, based on estimates from the National Health Interview Survey. As the true prevalence of prostate cancer is unknown, we assumed an upper bound of 3.2% for the prevalence of prostate cancer. We used the same methods developed in this study for identical risk genotypes to calculate AUC for these five SNPs after adjusting the probabilities calculated in the equations for sensitivity and specificity based on different genotype frequencies and relative risks. The AUCs for the multiplicative model and additive model were 0.569 and 0.541, respectively.
In this paper, we compared the clinical discriminative accuracy of a set of 40 polymorphisms in a genomic profile for multiplicative and additive models assuming identical risk and genotype frequency. We showed that multiplicative models yield more extreme risk estimates than additive models. The multiplicative models have higher AUC compared with additive models for the ranges of genotype frequency and relative risk considered in this study.
The difference between clinical discriminative accuracy for multiplicative models and additive models increased with increasing genotype frequency (up to 0.5) and increasing relative risk. There is no difference in clinical discriminative accuracy for the two models when genotype frequency is less than 0.1 and relative risk is around 1.1. The discriminative accuracy for the additive model increases slightly at first and then declines with increasing genotype frequency.
Several researchers have used logistic regression models to calculate the AUC for joint effects of multiple risk genotypes. When using logistic regression to calculate the AUC, a multiplicative (log additive) model for joint effects is assumed. In reality, however, we do not know the true underlying model for joint effects of genetic variants. If the true biology is additive or less than multiplicative, we overestimate the predictive accuracy by using multiplicative risk models, which particularly affects the extreme ends of the risk distribution. On the other hand, but this is hardly practised, if we apply additive models in instances in which the true biology is multiplicative or more than additive, we underestimate the predictive accuracy. If the relative risk is low (around 1.1) and the genotype frequency is less than 0.1, there would not be any difference in AUC for the two models. Therefore, for evaluating the predictive value of genomic profiling, which typically combines multiple weak susceptibility variants, the underlying risk model may not affect the discriminative accuracy so much, but it may affect the estimations of absolute risks.
Lund16 compared additive and multiplicative models for reproductive risk factors and postmenopausal breast cancer. The author explored a relative risk function ranging from multiplicative to additive by changing the exponent in a power transformation and calculated the goodness-of-fit statistics for different power models. On the basis of this analysis, the author concluded that mathematical models close to being additive fitted slightly better than the multiplicative models. Sample sizes required in the baseline group of a cohort study and a case–control study to detect departure from a multiplicative model in the direction of an additive model and from an additive model in the direction of a multiplicative model for the joint effect of two binary risk factors have been studied.17
The mutually adjusted locus relative risks from a multilocus main effects logistic regression will, in principle, be smaller than the individual locus marginal relative risks when the true multilocus interaction model is additive. This may offset the overestimation of risks from the multiplicative interaction assumption.
We considered binary risk genotypes (dominant or recessive models) because the number of binary risk genotypes in the population has a binomial (K, G) distribution, where G is the genotype frequency and K is the number of genetic variants in the genomic profile. For codominant models, the risk of alleles within each locus could be additive or multiplicative. If we assume identical risk alleles and Hardy–Weinberg equilibrium within each locus, it can be shown that the distribution of risk alleles in K loci in the population has a binomial (2K, A) distribution, where A is the allele frequency.18 Hence, our method can be easily extended to risk alleles when allele risks are either additive or multiplicative within and across biallelic loci.
In this paper, we clearly show the difference in AUC between additive and multiplicative models for genomic profiles. Our study is limited by the assumption of independence of the genetic variants, and our inability to model gene–gene interactions. More methodological work is needed in this area to detect the joint effect of multiple genetic variants for both additive and multiplicative models when gene–gene interactions are present.
This study was supported by the Centre for Medical Systems Biology (CMSB) in the framework of The Netherlands Genomics Initiative (NGI). ACJW Janssens was sponsored by the VIDI Grant of The Netherlands Organisation for Scientific Research (NWO).
Let R be the relative risk and m the genotype frequency for each of the K risk genotypes. Let X be the genotype score in the genomic profile of K risk genotypes. Disease frequency p can be expressed as
Consider the multiplicative risk model:
The sensitivity of the test for the multiplicative model is given by
Substituting the expression for I from (A3) gives
where is the sensitivity for screening for individual markers.
The disease risk when the genotype score is c in a genomic profile with K genetic variants using the multiplicative model is given by
The sensitivity for the test for the additive model is given by
and αAc is obtained from Table A1 given below:
The disease risk when the genotype score is c in a genomic profile with K genetic variants using the additive model is given by
The authors declare no conflict of interest.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention/the Agency for Toxic Substances and Disease Registry.