If the
c-statistic is used as a guide for variable selection into a propensity score model, it may lead to the inclusion of useless or even harmful variables in that model. In particular, the inclusion of covariates strongly related to treatment but unrelated to the outcome will increase the
c-statistic and thus be preferentially included in the model; but the inclusion of such variables will lead to distributions of propensity scores with relatively little overlap between the treated and the untreated
23. Because the treatment-outcome effect is estimated in persons with the same propensity score, data that fall outside a common range of the propensity score distributions in treated and untreated are typically lost for the second stage of a propensity score analysis: either because these individuals cannot be matched, or because they are specifically excluded from further analysis.
We exclude subjects in the non-overlapping tails of the propensity score distribution because treatment effect cannot be estimated without variation of treatment given the propensity score. Formally,
positivity is violated in these subjects.
24 Positivity requires that there are both treated and untreated subjects at every level of all covariates under consideration, and is one of the key assumptions for causal inference.
24–25Propensity score distribution overlap is often considered in terms of treatment-stratified histograms or kernel-smoothed density estimates. If propensity score density estimates do not overlap, it is likely that there is non-positivity for some combination of covariates. (However, positivity is
not guaranteed even if propensity score distributions fully overlap; smoothed curves may obscure regions of non-positivity.) Identification of populations never- or always-treated may be one of the main advantages of propensity scores
16; regions of non-overlap are often trimmed in propensity scores.
16 Trimming of propensity score curves reduces sample size and thus precision, and also changes the population in which inference is being made in complex ways. Consistent with this, Austin et al. found that as the c-statistic (area under the ROC curve) increased, the number of matched pairs for analysis decreased.
26 Creating unnecessary non-overlap by inclusion of unnecessary covariates (i.e., non-confounders) in propensity score models should be avoided on both counts.
Using a c-statistic to guide propensity score modeling is therefore ill-advised. Perhaps more to the point, a high c-statistic in the propensity score model is neither necessary nor sufficient for the control of confounding. Imagine a propensity score estimated in a randomized trial in which all risk factors for the outcome are perfectly balanced between treatment arms. The propensity score model built with these risk factors will have a c-statistic of 0.5: risk factors do not help predict the treatment assignment. But given perfect balance, there will be no confounding bias; thus a high c-statistic is not necessary. Conversely, a high c-statistic can be achieved by the inclusion of a strong instrument, independent of all confounders; thus, thus a high c-statistic is not sufficient.
Correspondingly, Weitzen et al. found in a simulation study that the
c-statistic “had no relationship with residual confounding in…treatment effect estimates.”
27 Austin et al. reiterated that “there was no clear relationship between the [
c-statistic] of a given propensity score model and the degree to which conditioning on the propensity score balanced prognostically important variables between treated and untreated subjects in the matched sample.”
26 In a third report, Austin argued that the
c-statistic gives no indication as to whether confounders have been omitted from the propensity-score model, nor as to whether the propensity-score model has been correctly specified.
28