Using simulations incorporating parameters from real-world studies, we have translated the consensus criterion for clinical significance from comparative clinical trials to predictive biomarker research using genetic or continuously distributed biomarkers. In practice, the clinical applicability of predictive biomarkers will depend not only on effect size, but also on the biomarker availability and the cost, burden, delays associated with testing and the specificity of prediction to one treatment over another [1
]. However, the present results provide several approximate rules that allow the translation of clinical significance between different types of predictors.
At the level of the whole patient population, the predictive power of genotypes depends on genotype frequency and absolute difference in outcome per each allele. A genotype with a minor allele frequency of 50% reaches the clinically significant prediction level if it is associated with a 2.2 HRSD point difference in outcome per allele. A genotype with a minor allele frequency of 10% needs a difference of 3.6 HRSD points per allele to reach the clinical significance criterion equivalent to that established in clinical trials. A normally distributed continuous biomarker that predicts a 1.5 HRSD point difference in outcome per SD has an explanatory power corresponding to the clinically significant difference of three HRSD points between equally sized groups. The graphs provided in this article (and an online calculator [101
]) can serve to translate smaller or larger effect sizes for genetic or continuous predictors and compare their predictive power with a placebo–drug difference in randomized controlled trials.
If the continuous outcome measured on a depression rating scale is dichotomized to a categorical outcome of remission defined by a final score below a clinical cutoff, the predictive power of biomarkers is approximately halved. This is in line with previous studies demonstrating that dichotomization of continuous variables leads to a substantial loss of information and of statistical power [19
]. Dichotomous outcomes allow an approximate translation between the proportion of variance explained (estimated as pseudo r2
in logistic models) and the clinically meaningful effect size measure of number needed to assess. The proportion of variance explained corresponding to the previously established clinical significance criterion (6.3%) corresponds approximately to a NNA of three. This means that for every three patients assessed for a biomarker, one significantly more accurate prediction of outcome can be made.
The literature on the pharmacogenetics of antidepressants suggests that a single genetic marker is unlikely to achieve clinically significant prediction [23
]. Therefore, polygenic scores summarizing information from multiple markers may replace single genotypes as predictors of outcome [26
]. The normally distributed continuous predictors in the current study are applicable to such polygenic scores. Similarly, the continuous biomarker results can be applied to any linear combination of multiple biomarkers and clinical variables in a predictive score [1
The relationship between the distribution of a biomarker in the population and its clinical usefulness may change dramatically in the future. If genetic biomarkers become routinely available and do not require additional testing, the absolute effect size of the prediction in a given individual becomes more important than the population-based explanatory power that is primarily considered in this article. However, the proportion of variance explained will remain a useful measure as it is applicable to multivariate models that may combine a large number of genotypes or other biomarkers to achieve a clinically meaningful prediction [1
The applicability of the present results is limited to studies of similar character to those that served as a basis for the simulation. However, since the two studies that were considered were real-world pragmatic trials with relatively broad inclusion criteria [12
], the conclusions should relatively well generalize to the population of patients treated in routine primary and secondary care settings. Our conclusions about the benchmark for clinical significance are based on the assumption that similar effect size is relevant for predictive biomarkers as for drug–placebo differences. Since the proposed clinical significance is based on a difference that is noticeable for patients, people close to the patient and clinicians [2
], this assumptions appears reasonable. The simulations of genetic biomarkers in the present study have been limited to an additive genetic model, which assumes that heterozygotes are intermediate between the two homozygous groups. We chose an additive model, since this is the most commonly applied genetic model in practice and most recessive or dominant effects can be seen with an additive test. Extension of the present results to additive and dominant models depends on minor allele frequency and would be difficult to estimate for genetic markers with very low minor allele frequency owing to the very low number of homozygotes. In the present study, we have not separately considered the role of additional clinical variables that may contribute to prediction of outcome, such as number of previous episodes, duration of present episode, age, subtypes of depression and symptom dimensions [1
]. Similar estimates for biomarker effects that are conditional on any such variables will have to be calculated with respect to the distribution of each such additional variable. When interpreting the results, it is important to keep in mind that the outcome of antidepressant treatment is measured with a certain error. Therefore, 100% of variance could never be explained even by a perfect predictor and the relatively low percentage of variance explained may represent a relatively larger proportion of what is explainable. For example, with a typical reliability of measurement of 0.80, 36% of variance in outcome is owing to measurement error and a clinically significant predictor would explain 10% rather than 6.3% of the theoretical outcome that is measured with perfect accuracy. However, since it is unlikely that depression severity will ever be measured without error, we keep the results in the raw metrics that of necessity is attenuated by measurement error. A final limitation is that the conclusions regarding the proportion of variance explained in categorical outcomes depend on the estimation of pseudo r2
from logistic models, which is inexact, and should therefore be treated only as approximation.