This study investigated the relationships between sensitivity, PPV, prevalence of risk group and disease prevalence when genetic risk scores, as opposed to single risk variants, are used for risk stratification. A major finding from this analysis is that when the frequency of the high-risk group approximates the disease frequency, both sensitivity and PPV increase with higher AUC. At all other frequencies of the high-risk group, higher AUC will increase either sensitivity or PPV. Selecting the optimal cut-off threshold will consequently be a trade-off between higher sensitivity at the price of lower PPV, or vice versa.
While the relationship between the number of individuals carrying a certain genetic risk factor and the risk of disease in the population was shown to influence the screening performance for a single marker [
15], we have proven this is also true for a genetic test composed of multiple genetic risk factors. Furthermore, we extended the analyses to the context of the overall model performance, and looked at the influence of the discriminatory ability of a genetic model on screening parameters for risk groups with a frequency lower than, equal to or higher than the disease risk.
Genetic tests are usually assessed in terms of their ability to distinguish risk groups with large differences in risk. Nevertheless, it has been shown that large relative risks are not sufficient to demonstrate the model's clinical validity and utility [
20-
22]. Measures like sensitivity, specificity, PPV and NPV are needed to determine the clinical utility of the test [
22]. While sensitivity and specificity are not affected by the incidence of disease because they are characteristics of the test, PPV and NPV strongly depend on disease risk. However, even for rare diseases, risk groups with a high PPV may be selected. Kraft
et al. [
22] used the example of prostate cancer 5-year risk prediction to illustrate this. They show that 60-year-old men with nine or more risk alleles and a positive family history for prostate cancer, which represent 1% of the population, have a risk of 30% to develop prostate cancer over the next 5 years. The incidence of disease in the population of 60-year-old men is about 2%. Thus, the size of the group at high risk was smaller than disease risk. We show that in addition to a smaller size of the high risk group and high OR for the risk factors, a high AUC is needed to obtain a high PPV. In a recent study the AUC of a genetic score of 33 SNPs and family history of prostate cancer was estimated at 0.64 [
23]. A higher AUC is needed to select a risk group with bigger PPV, especially if the high risk group is targeted for invasive interventions.
The observation that the sensitivity and PPV are equal when the frequency of the high-risk group equals the frequency of disease in the population holds across different settings. First, this relationship holds irrespective of whether the disease risk refers to the lifetime risk, a cumulative incidence over certain time period or the disease prevalence. Evidently, if we consider, for example, lifetime risks instead of 10-year risks, the frequency of the high-risk group for which the sensitivity and PPV are equal will be larger, because lifetime risks by definition are higher than 10-year risks. Then for the same AUC values, these larger high-risk groups will have higher sensitivity and PPV. However, prediction models that consider longer time periods generally have lower AUC, implying that combinations of higher sensitivity and PPV may not be observed. Put differently, lifetime risk models with lower AUC may yield the same sensitivity/PPV combination as 10-year risk models with higher AUC, but the value of using a model with low AUC may become questionable.
Second, the relationship also holds irrespective of how the risks are calculated. There are several ways in which genetic risks can be expressed. One is to use a simple genetic risk score based on the number of risk alleles carried. This approach, which we used in our analyses, assumes that each allele has the same effect on the risk of disease [
24,
25]. Another option is to calculate a weighted risk score, which is a genetic risk score where the risk alleles are weighted for their effect on disease risk [
14]. Besides constructing risk scores, one can also directly derive predicted risks from multivariate logistic regression analyses with genetic variants entered as continuous or categorical variables. Results presented in this study are applicable to simple count scores and more complex weighted risk scores, such as predicted risks, as emphasized by the simulation of AMD risk prediction, since in this study we have evaluated cut-off values that simply dichotomize the risk. Nevertheless, it should be pointed out that different approaches will likely yield different AUC values.
Third, the relationship also holds for risk models in general, that is, including other non-genetic risk models, such as the Framingham risk score for prediction of cardiovascular disease. Basically the relationship is valid for any continuous variable that is dichotomized to create risk groups, such as blood pressure, cholesterol or triglyceride level. This is also true for risk models that include together novel biomarkers and established risk factors, a topic that has recently attracted a lot of research [
26,
27].
When risk models are used to target interventions to high-risk subgroups, these subgroups are defined by choosing cut-off values for the predicted risks. The cut-off corresponding to a frequency of the high-risk group equal to the disease frequency optimizes both the sensitivity and the PPV, but is not necessarily optimal. Cut-off values are chosen on the basis of cost-benefit analyses, balancing the harms and benefits of false positive and false negative classifications of risk. The cut-off defining a risk group with a frequency equal to disease frequency is optimal only when the harm and benefit have equal weights. Selection of optimal cut-off based on a decision-analytic approach is a complex process that requires detailed input information of measures like sensitivity, specificity, PPV, NPV and related costs. For example, a recent study reported the effect of family history and 14 SNPs on the cost-effectiveness of chemoprevention with finasteride for prostate cancer [
28]. The results show that genetic testing may marginally improve the cost-effectiveness of chemoprevention in individuals with more risk alleles, especially in men with a positive family history. However, no optimal cut-off number of risk alleles was determined and the cost-effectiveness varied significantly with small changes of the model parameters. Our analyses do show, however, that when AUC is low to moderate, selecting a subgroup with a substantially increased risk (that is, high PPV) will include only a small percentage of all people who will develop the disease (that is, low sensitivity). Obviously, the predictive ability is the fundamental prerequisite of a test, but what level of predictive ability is needed varies between applications.
Our observations have implications for health care applications of genetic testing, but also for the direct-to-consumer offer of personal genome tests via the internet. For health care applications that need high PPV, such as targeting invasive interventions to people at the highest risk, a low AUC means that only a small proportion of this group will be identified. For applications that need high sensitivity, such as screening programs, the interventions will be given to a very large part of the population, mostly to people who will not develop the disease. And finally, low AUC means for personal genome testing that most people who will develop the disease will not be identified as having high risks.