Negative controls have been used to detect confounding (the influenza vaccine example7
), recall bias, (the MS example9
), and selection bias (the nasal corticosteroid example10
). Furthermore, it may be possible to specify how negative controls should be designed to aid in detecting biased causal inferences resulting from each of these mechanisms, and also perhaps to detect other forms of analytical errors. In this section, we focus on the conditions under which negative controls in epidemiology can detect confounding.1
The essential purpose of a negative control is to reproduce a condition that cannot involve the hypothesized causal mechanism, but is very likely to involve the same sources of bias that may have been present in the original association. If a contaminant (source of bias) were responsible for the effect of the cytokine on bacteria, it should have its effect even when the hypothesized mechanism of the effect (through neutrophils) is prevented through neutralization of the cytokine or through omission of neutrophils from the experiment. If an uncontrolled confounder (general good health or healthful practices) is responsible for the protection observed from influenza vaccine against mortality or pneumonia/influenza hospitalization, the same confounder might be associated with other outcomes that are not plausibly prevented by influenza vaccination.
This description suggests a general principle for the selection of negative controls to detect residual confounding. Ideally, a negative control outcome (N) should be an outcome such that the set of common causes of exposure A and outcome Y should be as identical as possible to the set of common causes of A and N (). To the extent that the set of unobserved common causes of A and Y overlaps with the set of unobserved common causes (U) of A and N, we call the negative control outcome N “U-comparable” to Y. If N and Y are U-comparable outcomes (i.e. with an identical set of common causes that are associated with A), and assuming that N is not caused by A, an association A-N when analyzed according to the same procedure used to analyze A-Y would indicate bias in the association A-Y. If N and Y are perfectly U-comparable and N is not caused by A, then a null finding of A-N implies that the A-Y association is not likely biased by the pathways examined through this negative control.
FIG 2 Causal diagram showing an ideal negative control outcome N for use in evaluating studies of the causal relationship between exposure A and outcome Y. N should ideally have the same incoming arrows as Y, except that A does not cause N; to the extent this (more ...)
Negative control outcomes in practice will be only approximately U-comparable, at best. Thus it is possible that the observed association between A and N is caused by some uncontrolled confounder U2, which is not a confounder of the AY association; hence, finding an unexpected association between A and N does not prove unequivocally that the A-Y association is biased. In the example of using death or hospitalization from injury as a negative control outcome for death or pneumonia/influenza hospitalization, one could argue that there may be some common causes of vaccination and injury that are not causes of all-cause death or pneumonia/influenza hospitalization. Such common causes (we cannot think of a plausible one) would create an association in the negative control analysis of vaccination and injury, even if the primary analyses of vaccination and death or pneumonia/influenza hospitalization were unconfounded— thus making the negative control detect bias even where none exists. On the other hand, if N is associated only with some, but not all, of the uncontrolled confounders of the association between A and Y, it is possible that A and N will appear unassociated despite the presence of uncontrolled confounding between A and Y. In the influenza vaccine example, one could argue that there are common causes of vaccination and death or pneumonia/influenza hospitalization = that are not causes of injury-related outcomes. Such a common cause (say, an aversion to vaccination that makes an individual less likely to get the pneumococcal vaccine) would be undetectable by this particular negative control. Despite these limitations, negative controls have value in alerting the analyst to possible residual confounding.
In principle, the measured confounders L of the A-Y relationship need not be causes of N as well, since a properly specified model that accounted for the confounding by L of A-Y would not be misled if such confounding were absent for A-N. In practice, the ideal negative control outcome should nonetheless be one with incoming arrows as similar as possible to those of Y, including the incoming arrows from L. This is true, first, because it is difficult in practice to imagine an outcome N that lacks association with known confounders L, but has an association with uncontrolled (or even unknown) confounders similar to that of U-Y. In addition, because negative controls may be useful in detecting residual confounding by measured confounders L or analytic errors, it would be beneficial to have the L-N relationship be as similar as possible, quantitatively, to the L-Y relationship. In eAppendix 1
), we describe the analytic basis for use of a U-comparable negative control outcome.
A negative control exposure B should be an exposure such that the common causes of A and Y are as nearly identical as possible to the common causes of B and Y (). To the extent that the set of unobserved common causes U of A and Y overlaps with the set of unobserved common causes of B and Y, we call the negative control exposure B “U-comparable” to A. If A and B are perfectly U-comparable and B does not cause Y, then an association B-Y when analyzed according to the same model used to analyze A-Y would indicate bias in the association A-Y. If A and B are perfectly U-comparable and B does not cause Y, then a null finding of A-N means that the A-Y association is unbiased. We are not aware of an example of the use of a negative control exposure to detect confounding in this sense. In the influenza vaccination example, one might hypothesize that whatever residual confounders U (e.g., poor health status) made one less likely to get influenza vaccine (A) and more likely to die of influenza or pneumonia (Y), might also make one less likely to get other vaccines, such as booster tetanus vaccine (B). Because tetanus does not cause pneumonia, tetanus vaccine receipt might be an appropriate negative control exposure for such a study. In the previous section, we mentioned the use of “probe variables” as negative controls to detect recall bias that might lead MS patients to over-report a history of childhood infections. Recall bias, a form of reverse causation, has a different causal structure from confounding,1
and we do not outline here the causal requirements for negative controls to detect reverse causation.
FIG 3 Causal diagram showing an ideal negative control exposure B for use in evaluating studies of the causal relationship between exposure A and outcome Y. B should ideally have the same incoming arrows as A; to the extent this criterion is met, B is called (more ...)
In observational settings, the comparability between exposure A and negative control exposure B will be only approximate. As in the case of negative control outcomes, this approximate comparability means that B and Y may be associated even when A-Y is unbiased; this would occur if there is some other confounder U2 linking B and Y that does not confound A-Y. Similarly, if A and B are only approximately comparable, it is possible for B and Y to show no association yet for A-Y to be biased, if the confounder biasing A-Y does not connect B to Y. An analytic basis for the use of negative control exposures is given in eAppendix 2
In a cohort study, in which multiple exposures and outcomes are measured on each person, it is relatively straightforward to analyze negative control exposures and outcomes, assuming that suitable variables have been measured. In a case-control study, the use of negative control exposures is similarly straightforward because negative control exposures can be added to the set of exposure variables collected for each subject. If a case-control study is nested within a cohort, irrelevant outcomes can be selected and analyzed. A stand-alone case-control study presents some logistical problems for implementing negative-control outcomes. This might require a second case-control study in which “cases” include some irrelevant but comparable outcome to the cases in the main study. This difficulty is reduced if multiple control groups are used, as is occasionally done for other reasons.11,12
A useful contrast can be drawn between variables that can serve as negative controls and those that can be used as instruments.13–15
An instrumental variable is any variable that is connected causally to A but free of any of the confounding connections to Y from which A suffers. In contrast, a negative control outcome is connected to A through all possible confounding routes but not causally. Similarly, a negative control exposure is connected to Y through all possible confounding routes but not causally. depicts an instrumental variable Z that satisfies the necessary conditions of an instrument 16,17
while the variable B is an ideal negative exposure candidate.