Scenarios Evaluated

Janssens et al (

2) evaluated the predictive potential of genetic profiling by simulating a wide variety of scenarios. We investigate the same scenarios as Janssens did. The illustrative dataset shown in – was simulated using two such scenarios. Our simulation program is publically available so that an investigator can simulate specific scenarios of interest for themselves. Use of the program is described in the

appendix.

A scenario is specified by the number of subjects, the overall event rate in the population, ρ, the number of genes that confer risk, the allele frequencies for the genes, and the association of each allele with risk. We consider simple settings where each gene has 2 alleles, with genotypes and allele frequencies in Hardy Weinberg equilibrium, and no linkage disequilibrium between genes. The true risk of an event for a subject is derived from a standard additive model on the logistic scale. That is, the logarithm of the odds of having an event is a sum of terms associated with each high risk allele, high risk homozygotes contributing two equal terms, one for each allele, and there are no statistical interactions between genes. The lower frequency allele for each gene is associated with higher risk. The magnitude of the association between a gene and risk is quantified by the odds ratio for the high risk allele: OR=odds of an event for heterozygotes/odds of an event for homozygotes with the dominant lower risk allele variant. Details of how data are simulated are given in the

appendix.

Very large sample sizes were used in our simulation studies. Consequently, the results in , and show the true values (precise to 2 decimal places) of the prediction performance measures for each risk model, not estimates. We evaluate predictive performance by focusing on the proportions of high risk subjects identified from the information in their genetic profiles and expected benefit. We employ the high risk threshold equal to 20% for illustration. Tables for other risk thresholds and other scenarios are provided in the

appendix. In contrast to our approach, Janssens reported AUCs and R-squared summary statistics. These are provided here as well for completeness. In addition to generating data, investigators can use our programs to calculate all of the summary indices shown in and after specifying a risk threshold that defines the high (or low) risk category.

| **Table 2**The predictive capacity of genetic profiling under different scenarios defined by the number of genes involved, the odds ratio associated with the high risk allele (OR) and the population frequency of the risk allele. The proportion of subjects in the (more ...) |

| **Table 3**The predictive capacity of genetic profiling when there is a mixture of strongly predictive genes and weakly predictive genes. Odds ratios for the weakly predictive genes vary from 1.05 to 1.15 while odds ratios for the 20 strongly predictive genes vary (more ...) |

Results for Equally Predictive Genes

In the first set of simulations () all genes in a gene profile have the same minor allele frequency and are equally predictive. We investigated settings where the number of genes associated with risk ranged from 50 to 350, the frequency of the minor allele varied from 5% to 30% and the odds ratio associated with the heterozygous genotype ranged from 1.05 to 1.5. Subjects whose risks are 20% or more are considered at high risk. This contrasts with the overall event rate of 10%.

The proportion of high risk subjects identified is generally low in the scenarios we studied. The maximum value for the proportion of high risk subjects identified was approximately 17%. For example, when the gene profile consists of 350 predictive genes each with a minor allele frequency of 5% and odds ratio equal to 1.5, 17% of the population have calculated risk values exceeding 20%.

The high risk population proportion typically increases with larger numbers of predictive genes, with stronger associations of genes with risk and with higher minor allele frequencies. However, counter examples abound. For example, with common OR=1.5, the proportion of the population at high risk is 17.5% when 150 genes are predictive but smaller, 13.1% when 250 genes are predictive. The overall reduction in the proportion at high risk in this example is due to the facts that fewer controls are deemed at high risk by the more predictive 250 gene model and that the bulk of the population is comprised of controls.

The sensitivity of risk models is low especially when genetic associations are weak. We see that less than half the cases are classified as high risk when odds ratios are less than or equal to 1.1, regardless of the number of genes in the profile. Even when the common odds ratio is 1.25, in order to classify > 50% of cases, at least 250 genes with allele frequencies of 10% or 150 genes with allele frequencies of 30% are required in the model. When only 50 genes are in the model, the proportion of cases classified as high risk only exceeded 50% in one scenario, namely for common genes with allele frequencies of 30% and large odds ratios equal to 1.5.

In there are tendencies for improvements in proportions of cases and controls classified as high risk by the models with inclusion of larger numbers of predictive genes, with stronger associations of genes with risk and with higher minor allele frequencies. However there are no absolute rules evident in this regard. On the other hand, the expected benefit due to use of the risk model always improved with these 3 factors: with inclusion of larger numbers of predictive genes, with stronger associations of genes with risk and with higher minor allele frequencies.

Note that the expected benefit values displayed in are weighted averages of the proportions of cases and controls classified as high risk. The weighting acknowledges that use of 0.20 as the high risk threshold implies that the cost for a control classified as high risk is equivalent to 1/4^{th} the benefit for a case classified as high risk. Let's consider how to interpret expected benefit values shown in with a concrete example. Suppose a policy maker is deciding if ascertaining information such as genotype is economically advantageous. Assume some hypothetical monetary costs for treatment, $20,000 say, for treating a subject diagnosed with disease, and $1000 for interventions to prevent disease occurring in the first place. If prevention interventions reduce the risk of disease by 25% then the expected benefit for a subject that would be a case in the absence of intervention is 0.25×($20,000)−$1000=$4,000 while the expected cost for a subject that would be a control in the absence of intervention is $1,000. The cost-benefit ratio is therefore $1,000/$4,000=1/4 in this setting leading to use of the risk threshold 0.2. The expected benefit values in are in units corresponding to the benefit of high risk designation for a case. That is, to convert the values in to monetary values in this hypothetical setting, we multiply by $4000. Thus, for example, the expected monetary benefit associated with the model in the last row in is 0.05×$4000=$200 per person. If testing costs more than $200, there is no gain in financial terms by using this risk model. However, nonmonetary aspects must be factored into policy making as well.

Results for Heterogeneously Predictive Genes

In the second set of simulations summarized in , the genetic profiles are such that the odds ratios and minor allele frequencies both vary. The odds ratios for the strongest 20 genes vary uniformly from a maximum value displayed in to 1.15, while the odds ratio decreases uniformly from 1.15 to 1.05 over the remaining genes. The minor allele frequency starts at .05 and increases by .005 for each gene over the first 50 genes, then by .0005 for each of the remaining genes. A key feature in these scenarios is that the strong genes are uncommon while the genes weakly associated with risk are relatively more common. Again, our scenarios mimic those reported by Janssens et al (

2).

We see that the population proportions at high risk, overall, for cases and for controls, and the expected net benefit, are determined to a large extent by the relatively few genes in the strong set especially when their odds ratios are high.

Use of Risk Distributions versus AUC

and display values of the AUC for each risk model. Janssens et al (

2) use the criterion AUC ≥ 0.80 to indicate high discriminative accuracy. Others use similar criteria. However, a model may have AUC as large as 0.80 yet it may not be useful in practice. For example, the model in row 12 of has expected benefit = 0.024. Assuming the hypothetical values mentioned earlier for monetary costs and benefits as well as risk reductions afforded by prevention interventions, the expected monetary benefit of using this test is $96 per person. If the cost of testing is $96, there is no net benefit despite the fact that the AUC for the risk model is 0.801. On the other hand, Gail and Pfeiffer (2005) have shown that the modified Gail Model for breast cancer risk (model 2 in Constantino et al 1999) is useful for selecting women for prevention treatment with tamoxifen despite the fact that its AUC=0.66. As another example consider that the expected monetary benefit for the model in with 50 genes each with odds ratios 1.25 is .002×$4000 = $8 per person which is derived from its capacity to classify as high risk 7.5% cases and 2.3% controls using the risk threshold of 0.20 which is deemed clinically relevant in our hypothetical example. If the corresponding genetic test costs less than $8 per person, then it will be cost effective to offer it people. Yet the AUC for this model is only 0.64.

The crucial issue is that one cannot assess the value of a risk model according to AUC which ignores the population and clinical context in which the model is to be applied. For example, the AUC does not incorporate the case prevalence in the population. Another problem with the AUC is that it does not take into consideration risk thresholds that motivate intervention in the clinical context. Consider the setting in row 12 of again. If the benefit of treating a case is constant but the cost of treating a control is high, so that only subjects at very high risk, say >30%, should receive intervention, the benefit of using the model will be different than if the cost of treating a control is less where subjects with risks > 10% say, should be intervened upon. With risk threshold equal to 30%, only 32% of cases and 5% of controls satisfy the criterion for high risk and the expected benefit is 0.014. The corresponding numbers using the lower risk threshold equal to 10% are 73% of cases, 28% of controls and expected benefit of 0.045. Clearly the implications of the risk model are different in these two scenarios. Yet, AUC makes no distinction. Indeed it accumulates over all possible risk thresholds, considering all values between 0 and 1 as plausible.

The R-squared summary statistic and the NRI, also shown in and , share many of the same drawbacks as AUC. They do not incorporate the clinical context into their calculations. Interestingly R-squared does vary with population prevalence and NRI does vary with the high risk threshold. But neither are incorporated in ways that make the resulting measure clinically relevant for evaluating the risk prediction model.