In this paper we propose characterizing a marker’s capacity for predicting individual treatment effect using a potential-outcomes based framework. Under this framework with a monotone treatment effect assumption, we can identify the joint distribution of the pair of the potential outcomes under the two treatment assignments, which is meaningful to individual patients. We propose measures that quantify a marker’s ability to classify a subject according to potential treatment-effectiveness — the ROC measure, as well as a marker’s ability to inform treatment decisions — the CDFQ
measure. While the CDFQ
measure depends on the scale of response rate, the ROC measure can be thought of an intrinsic property of a biomarker since it does not depend on disease prevalence within treatment group and thus is potentially useful to compare markers across different populations or studies. Under a randomized trial design, we develop a constrained maximum likelihood method for estimation of these measures. The procedure is closely related to common risk model estimation and is easy to implement. R code is available for downloading at http://labs.fhcrc.org/huang/index.html
In practice, we recommend the method with monotonicity assumption be used when its plausibility is supported by data and ideally also by the scientific knowledge. If no major subgroups of interest have an overall treatment effect estimate in the ‘wrong direction’, then monotonicity is supported. When this assumption holds, the method we use to incorporate it is desirable for the simple interpretation and the efficiency gain. We have shown through simulation studies that when violation of the monotonicity assumption is minor, we still get meaningful estimates that approximate the measures defined in a general setting that allows for a non-monotone treatment effect. We further propose a sensitivity analysis as a way to relax the monotonicity assumption.
Our proposed approach facilitates a more-informed assessment of a marker’s value for treatment selection than does the classical approach of testing for an interaction between marker and treatment in an ordinary risk model. As we have demonstrated in the example of Web Supplementary Appendix A
, a strong interaction coefficient is important for a marker to have value for treatment selection but is not useful for summarizing performance because it depends on other coefficients in the risk model as well as the functional form of the model. Therefore the interaction coefficient is not directly comparable between markers (models). In practice, a common approach is to plot disease risk versus marker value for each treatment group. This provides useful information to individual patients who have marker results in hand about their expected benefit of treatment given their marker measure. In this manuscript, we address a different question which is more relevant to biomarker researchers and policy makers: i.e., how to characterize and compare markers (risk models) with respect to their treatment-selection capacity. The ROC and CDFQ
measures proposed here are suitable for this task. For a univariate marker, a plot of q
) is actually just the difference in risk between treatment groups as a function of marker percentile.
We want to point out that good treatment selection performance as characterized by our proposed measures corresponds to large variability in risk difference as a function of marker value. The actual impact of measuring the marker on treatment decisions depends on the risk difference threshold. Our measures provide an overview of treatment-selection capacity allowing the risk difference threshold to vary. This is helpful in situations where there does not exist a well-established decision threshold and the choice relies on other factors such as the cost and side-effects of the active treatment. This is oftentimes true in practice.
Song and Pepe (2004)
proposed two measures to characterize a univariate marker’s treatment-selection performance: (i) the population response rate (θv
) given a treatment policy that assigns a subject to active treatment if one’s marker value exceeds the vth
quantile in the population; (ii) individual difference in risk with and without treatment, d
), conditional on the vth
quantile of the marker value. For the special case with Q
) equal to the risk difference, our proposed CDFQ
measure is the inverse function of d
) and gauges the population impact of the marker in helping make informed treatment decisions.
We studied a univariate marker in our example, but the methodology is general and applies to models with multiple biomarkers by entering the joint distribution of markers into (1). Our measures can be used to compare two general risk models. For example, we can compare two biomarkers with respect to their treatment-selection capacity as demonstrated in this paper, and we can look at the gain by adding a new marker into a baseline model.
The methods we propose summarize the treatment-selection capacity of a marker in the whole population. In practice, a marker’s performance might vary with other baseline covariates such as age or gender and its performance conditional on those covariates is often of interest. Our framework can be easily extended to allow the assessment of a marker’s treatment-selection capacity within subpopulations defined by other covariates. Specifically, for estimation of the proposed performance measures conditional on specific covariates value, we need to estimate the risk model with the additional covariates included and estimate the marker’s distribution conditional on covariates. Again constraints can be enforced on the risk model. This is currently under investigation.
In this manuscript we focus on a binary endpoint, but the ROC and CDFQ measures proposed can be naturally extended to handle an event time endpoint. Specifically, for given time t of interest, these measures can be constructed from the risk difference between treated and untreated groups, where risk is defined as the probability of developing the disease before time t. Existing techniques for estimating risk using censored data can be employed.