The stratification score was originally proposed to control confounding when testing hypotheses in a case-control study (3). Here we have extended the stratification score approach to accommodate estimation, which is preferred by many epidemiologists over hypothesis-testing (
15). By showing that the stratification score is a retrospective balancing score, we have developed a standardization-based approach to controlling confounding in case-control studies which allows us to compare the exposure distributions between case and control participants that would be observed if both groups had the same distribution of confounding covariables. This comparison is attractive, since differences in exposure frequency can be easily interpreted at the population level in a way that odds ratios from a logistic regression model cannot. Similar comparisons could also be made by stratifying on all confounders if the data were not too finely stratified. Correspondingly, matched studies are simplified by matching on stratification scores rather than matching on multiple potential confounders.
In our previous article (
3), we tested whether the common odds ratio over strata was equal to 1. Here we have shown how to estimate the difference in mean exposure after standardizing the distribution of exposures, and have further described how to estimate the variance of this difference for discrete-valued exposures. As a result, we can construct confidence intervals or test hypotheses about these standardized differences. As indicates in the context of a single analysis, these tests can be comparable in power to standard logistic regression.
We have considered both stratified and individually weighted estimators of the exposure distribution. When deriving the stratified estimators, we assumed that the stratification score had a constant value within each stratum. Violations of this assumption may lead to residual confounding and favor the individually weighted estimator. Increasing the number of strata or even fine matching based on the stratification score may be needed to resolve large-scale within-stratum variability in the stratification score. However, as Rubin (
16) noted in the context of propensity score modeling, stratification is more robust to misspecification of the stratification score model. An additional advantage of stratification is that the extent of confounding can be seen. For example, our illustrates the extent to which cases and controls are mismatched, which may be hard to ascertain when individually weighted estimators are used.
When choosing variables to include in the stratification score model, it is important to note that the goal is control of confounding, rather than prediction of case status (
17). Thus, variables that predict case status but do not predict exposure should not be included in the stratification score model (
18). Similarly, Brookhart et al. (
19) found that variables that predict exposure but not outcome should not be included in a propensity score model. Brookhart et al. (
19) further stated that variables which predict outcome but not necessarily exposure can be beneficial when modeling the propensity score. The stratification score analog to this finding would be that variables which predict exposure but not necessarily case status are salutory in a stratification score model; however, we have not evaluated this claim and hence make no recommendation at this time.
We assumed that all confounding variables were measured. In fact, we only require that unmeasured confounders
U be balanced given the stratification score—that is, that

. This is reasonable if, as is often assumed in epidemiologic studies, measured covariates are strongly correlated with
U. For example, we may adjust for demographic covariates that may not be causal but covary with unmeasured confounders that are.
We have considered a “general exposure” without specifying its nature. Thus, levels of exposure could, for example, correspond to combinations of genotypes and environmental covariables, allowing comparison of interaction terms in case and control populations having the same distribution of potential confounders. We are also developing a modeling approach to such interaction models (unpublished data). Finally, our presentation emphasized the situation where the exposure E is categorical; this was done for ease of presentation and is not a restriction of our approach.