Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Environ Res Public Health**|**v.7(4); 2010 April**|**PMC2872335

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Example: Acknowledging Unobserved Confounding in a Simple Setting
- 3. Example: Selection Bias in Case-Control Studies
- 4. Exposure Misclassification
- 5. Discussion
- References

Authors

Related links

Int J Environ Res Public Health. 2010 April; 7(4): 1520–1539.

Published online 2010 April 1. doi: 10.3390/ijerph7041520

PMCID: PMC2872335

Received 2010 February 12; Revised 2010 March 26; Accepted 2010 March 29.

Copyright © 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This article has been cited by other articles in PMC.

Typical statistical analysis of epidemiologic data captures uncertainty due to random sampling variation, but ignores more systematic sources of variation such as selection bias, measurement error, and unobserved confounding. Such sources are often only mentioned via qualitative caveats, perhaps under the heading of ‘study limitations.’ Recently, however, there has been considerable interest and advancement in probabilistic methodologies for more integrated statistical analysis. Such techniques hold the promise of replacing a confidence interval reflecting only random sampling variation with an interval reflecting all, or at least more, sources of uncertainty. We survey and appraise the recent literature in this area, giving some prominence to the use of Bayesian statistical methodology.

Much of the methodological literature on inferring exposure-disease relationships from observational data looks, either implicitly or explicitly, at the best-case situation: a random sample from the study population can be obtained, and all pertinent variables can be measured without error on sampled individuals. Commensurately, in real applications it is common to see best-case quantitative methods applied in settings where the best-case assumptions likely, or perhaps surely, fail. This is often ameliorated with some qualitative discussion of how the best-case assumptions might be violated, and some speculation on what impact such violations may have had on the quantitative results which are reported.

Unfortunately, the alliance between the quantitative results based on best-case assumptions and the qualitative comments casting doubt on these assumptions is typically shaky. If the only quantitative results given pertain to the best-case analysis, then these results tend to form the take-away message of the research. However well-intended, caveats and provisos about the realism of best-case assumptions are easily set aside by readers. Plus, even the most diligent readers will not be able to turn the qualitative remarks into clear inferential summaries. For instance, say an estimate and 95% confidence interval are reported for an effect of interest, using a method which assumes best-case conditions. Then qualitative caveats are added about possible discrepancies between the actual sampling scheme and random sampling, possible imperfections in measuring the available variables, and omissions of variables which are possibly germane to the relationship of interest. There may be some intuition about the direction (if any) in which the estimate ought to be shifted given these concerns, but there is seldom clarity on how large a shift is needed. Worse still, there is no recipe to indicate how much wider the confidence interval ought to be in light of the caveats, and it can be hard to generate even rough intuition on this question. Quantitative results based on overly strong assumptions are just not readily synthesized with qualitative remarks on the possible violations of these assumptions.

A situation commonly faced by consumers of research is the best-case confidence interval for the effect of interest excludes the null value, but qualitative concerns are expressed about the likely divergence of reality from the best-case. Given such concerns, there is a literature on various forms of sensitivity analysis, going back to Cornfield’s work on tobacco smoking and lung cancer [1]. The basic idea is to consider a range of different *quantitative* assumptions about how the best-case assumptions could fail, and see what inference arises for the exposure-disease relationship under each assumption. Recently there has been emphasis on probabilistic forms of sensitivity analysis [2–8]. Instead of constructing a list of different assumptions about how best-case conditions are violated, a *probability distribution* is used to describe a spectrum of plausible assumptions.

Before proceeding with further discussion of probabilistic sensitivity analysis, we review some of the major ways in which best-case assumptions will often be violated in epidemiological investigations of exposure-disease relationships. We do this in the framework of exposure and disease variables which are labeled *X* and *Y* respectively, along with *p* possible confounding variables, denoted *C* = (*C*_{1}, . . . ,*C _{p}*).

Implicit in a standard analysis is the assumption the data are representative of the study population. This is guaranteed for simple random sampling of (*Y*, *X*, *C*), but is more delicate for complex study designs. Cohort studies involve sampling on exposure, meaning that the prevalence of *X* is fixed by design. Individuals with rare exposures are more likely to be selected into the study. Provided that study participation does not depend on *Y* or *C*, then the investigator can study the distribution of the outcome within exposure groups at the end of follow-up.

In contrast, case-control studies involve sampling on the outcome, in the sense that the probability of being selected into the study will necessarily depend on *Y*. Particularly for rare diseases one can expect that members of the population with the disease will have a higher probability of participating in the study. The retrospective study design compares the exposure distribution amongst cases to that amongst controls, as a route to comparing the disease distribution amongst exposed to that amongst unexposed. Consequently, a crucial assumption is that the exposure distribution in the controls is representative of the general population.

Things go awry if the exposure affects the probability of selection. For example, in case-control studies of magnetic field exposure and childhood leukaemia, Mezei and Kheifets [9] speculate that higher socioeconomic status may reduce one’s probability of exposure but simultaneously increase the likelihood of participation as a control. In this case the summaries of the exposure within cases and controls are biased, making any comparisons less meaningful. In the epidemiology literature, this phenomenon is called *selection bias* and it has direct connections to the missing data literature in statistics. It is worth noting that randomized clinical trials can also be hampered by selection bias, if loss to follow-up is differential across study arms.

In concept selection bias is readily dealt with by modifying a standard analysis, *if it is known how inclusion probabilities depend on pertinent variables*. Particularly, methods for analyzing weighted data apply, with sampled individuals receiving less/more weight in the analysis if their inclusion probabilities are high/low. This directly adjusts for the selection bias, in a manner analogous to making ‘extra copies’ of those individuals who were sampled despite this being unlikely, while ‘destroying copies’ of those who were very likely to be sampled [10]. The problem, however, is that often selection bias is suspected, but without knowledge of how the inclusion probabilities depend on *X* and/or *Y* and/or *C*. At this point one must turn to sensitivity analysis.

Many variables of interest in epidemiologic studies are not easily measured. It may be expensive or even technically impossible to measure a particular variable without error. This could apply to an outcome variable, to an exposure variable, or to a confounding variable. There is a particular emphasis in the literature, however, on poor measurement of exposure. This arises in part because many human exposures of a toxicological or nutritional nature are inherently hard to measure well. It also arises in part because parameter estimates from statistical models explaining the conditional distribution of disease outcome given exposure are particularly susceptible to bias as a result of unacknowledged exposure measurement error.

In general, if enough is known about exposure measurement error then the effect of this error can be ‘undone’ statistically. Roughly speaking, this applies if the measured exposure *W* arises via a combination of the unobserved true exposure *X* and ‘random noise,’ where the distribution of this noise is known. Such knowledge does not suffice to recreate the actual *X* values, but it does allow valid estimation of the conditional association between *Y* and *X* given *C* from data on (*Y*, *W*, *C*), via suitable methods. Thus the bias can be undone. The same is not true of estimator variability, however. The estimator arising from suitable treatment of the (*Y*, *W*, *C*) data will be more variable than the standard estimator applied to (*Y*, *X*, *C*) data, were such data available.

Unfortunately, many epidemiological settings involve insufficient knowledge of the nature and distribution of exposure measurement error. Thus the problem devolves to one of sensitivity analysis. One can examine the inferences that arise about the desired exposure-disease relationship over a range of assumptions concerning the magnitude of exposure measurement error.

While not emphasized here, it should also be mentioned that poor measurement of confounding variables is also a challenging situation in practice. Intuitively, error in measuring confounders C should result in an analysis which is intermediate between confounder adjustment based on correctly measured C and a crude analysis without confounder adjustment. See Fewell *et al.* [11] for recent discussion of this problem.

A further challenge in epidemiology is unobserved confounding. Suppose that *U* = (*U*_{1}, . . . , *U _{q}*) denotes a collection of confounders that are unmeasured by happenstance or limitations in the data collection process. For example, the investigator may, out of necessity, use population databases or registries that contain only limited information on the determinants of the disease. Alternatively, one could imagine settings where there are

Suppose that we study the distribution of the outcome across levels of exposure, but without taking *U* into consideration. Then we will observe a mixing of the effect of the exposure (if any) with that of *U*. By definition, the components of *U* are unequally distributed between exposure groups. Thus any observed association between *X* and *Y* could be a spurious artifact of confounding.

The most intuitive approach to dealing with confounding–when the confounding variable is observed–is via stratified analysis. We estimate the exposure effect within levels of *U* and then pool the results if it is deemed appropriate. More generally, we can use model-based regression adjustment, standardization, or more advanced techniques involving propensity scores. Unfortunately, when *U* is missing none of these techniques can be used without strong assumptions about the magnitude of confounding by *U*.

In recent years the has been renewed interest in techniques for sensitivity analysis for unmeasured confounding; see Schneeweiss [12] or Greenland [13] for accessible reviews. A further refinement uses a probabilistic Bayesian approach that seeks more formal integration of the data with prior beliefs about the magnitude of bias. These methods allow the investigator to compute bias-corrected effect estimates that incorporate uncertainty from bias in addition to random error.

In what follows, Sections 2 through 4 illustrate sensitivity analysis applied to unobserved confounding, selection bias, and misclassification, respectively. Some concluding remarks are given in Section 5. The more technical arguments involved in Sections 2 and 4 appear in the Appendix.

To give some simple illustrations of sensitivity analysis, we focus on a setting which doesn’t involve any observed confounding variables, and the only observed variables are the binary exposure *X* and the binary disease status *Y*. Moreover, these two variables are taken to be measured without error, and further it is assumed that there is no selection bias. Whether they arise from a cohort study or a case-control study, the observed data are typically reported in the form of a simple 2 × 2 table.

As an example, consider the data in Table 1, taken from MacMahon *et. al.* [14], and also emphasized in Jewell [15]. They concern a case-control study from the late 1970’s that identified an apparent association between coffee drinking and pancreatic cancer. The cases were 367 cancer patients from hospitals in Boston and Rhode Island. The 643 controls were selected from patients who were hospitalized by the same attending physicians that treated the cases. The exposure status is dichotomized according to whether or not a subject reports drinking at least one cup of coffee per day. Henceforth these data are referred to as the coffee and pancreatic cancer (CPC) data.

The crude odds ratio for these data is 2.75, which, assuming no selection bias or measurement error, we can regard as an estimate of the exposure-disease odds ratio in the study population. Standard methods give a 95% confidence interval around this estimate as (1.66, 4.55).

Now we focus on what inferences we might draw in the face of concern about unobserved confounding. To wit, suppose that we consider the simple case where a binary variable *U* is a potential confounder, but is not observed in the data. In fact, one could easily argue that there may be several missing confounders. However Wang and Krieger [16] have shown that the binary confounder case represents the worst-case scenario, and furthermore as we shall argue, it considerably simplifies articulating assumptions about bias. If *U* were observed, we could simply apply standard methods to infer the conditional association between *Y* and *X* given *U*. Particularly, a logistic regression model could be used to explain *Y* in terms of *X* and *U*.

Without the ability to observe *U*, sensitivity analysis can be implemented by constructing various scenarios regarding how *U* relates to the other variables, and then producing inference on the target quantity under each scenario. In the present setting, a scenario can be built upon specification of the following four quantities:

- The prevalence of
*U*,*i.e.*, is*U*= 1 rare or common? (More technically, the specification is made in terms of prevalence amongst unexposed controls.) - The extent to which
*U*is associated with exposure*X*. - The extent to which
*U*is associated with the disease indicator*Y*. - How the conditional association between
*Y*and*X*given*U*, which is the target of inferential interest, depends on*U*.

For convenience, and in line with much of the literature on sensitivity analysis (see, for instance [3]), we deal with item 4 by assuming that the conditional odds ratio between *Y* and *X* given *U* = 0 is the same as the conditional odds ratio between *Y* and *X* given *U* = 1. This common odds ratio is then the single and unambiguous target of inference. Moreover, this target will differ from the unconditional odds ratio between *X* and *Y* if *U* is indeed a confounding variable. Note that assuming a common exposure-disease odds-ratio within strata defined by a confounder is the basis of pooled estimators such as the Mantel-Haenzel estimator, and corresponds to the assumption of no interaction between *X* and *U* if fitting a logistic regression model for *Y* in terms of *X* and *U*.

Items 1 through 3 are conveniently dealt with together by assuming a relationship of the form

$$\text{logit}\mathit{\text{Pr}}(U=1|X,Y)={\alpha}_{u}+{\alpha}_{xu}X+{\alpha}_{yu}Y.$$

(1)

Thus the parameter *α _{u}* controls the prevalence of

As explained in further detail in the Appendix, via a log-linear model for the cell counts in the 2×2×2 table stratified by (*X*, *Y*, *U*), model (1) can be combined with a logistic regression model for (*Y*|*X*, *U*) of the form

$$\text{logit}\mathit{\text{Pr}}(Y=1|X,U)={\beta}_{y}+{\beta}_{xy}X+{\alpha}_{yu}U.$$

(2)

This model encapsulates the parameter of interest as exp(*β _{xy}*), the conditional odds ratio between

Thus a basic sensitivity analysis is comprised of assuming values for the three components of *α* and then estimating *β _{xy}*. As outlined in the Appendix, this inference can be carried out via standard statistical software for fitting generalized linear models (and specifically log-linear models).

One point of interest under this set up is that if *U* is assumed to not be a confounder then inference about the conditional exposure-disease association given *U* will match exactly the inference about the unconditional exposure-disease association. Thus setting *α _{xu}* = 0

Another point of interest is that the labels *U* = 0 and *U* = 1 are arbitrary, and consequently the same inferences about *β _{xy}* would arise if all three components of

Returning to the CPC data given in Table 1, in Table 2 we give inferences for the target odds ratio exp(*β _{xy}*), under different scenarios for the unobserved confounder

Point estimate and 95% confidence interval for target parameter under different values of bias parameters. Recall that under the assumption of no unobserved confounding the estimated odds ratio is 2.75, with a 95% confidence interval of (1.66, 4.55). **...**

One way to summarize the findings from Table 2, or the findings from a more exhaustive TSA involving a wider variety of settings for bias parameters, is via ranges. For instance, across the eight scenarios considered in Table 2, the point estimate for the (*X. Y*|*U*) odds ratio ranges from 2.16 to 3.34. A challenge, however, in interpreting this range is that it ignores sampling variability. Alternately, one might form an overall interval from the constituent 95% confidence interval endpoints by taking the ‘lowest of the low’ and the ‘highest of the high,’ *i.e.*, (1.31, 5.52) in the present situation. Clearly this strategy is highly conservative, since, roughly put, each endpoint corresponds to a pairing of the most extreme scenario with the most extreme random sampling error.

An alternative strategy is a *probabilistic sensitivity analysis* based on sampling scenarios for how *U* relates to the other variables from a distribution over such scenarios. The underlying notion is that we wish to use this distribution over scenarios to induce a distribution of plausible values over the target parameter of interest. In the present context we could achieve this by repeating the following three steps many times.

- Sample a value for each of (
*α*_{u}_{,}*α*,_{xu}*α*) from a chosen distribution over scenarios._{yu} - Based on the selected scenario, sample a value of
*U*for each study subject. - Based on the resulting set of ‘completed’ data (actual values of
*X*and*Y*augmented with sampled values of*U*), sample a value of*β*which is consistent with these data._{xy}

The ensemble of generated *β _{xy}* values then is taken to represent uncertainty about this target arising from uncertainty about which scenario applies (Step 1), plus uncertainty about the actual values of the unobserved variable for the study subjects given the scenario (Step 2), plus the usual statistical uncertainty in estimating population parameters from a sample were all the variables observed (Step 3). Following [3], we refer to this procedure as

While Step 1 simply involves computer-implemented random sampling from the chosen distribution over scenarios, and Step 2 involves such sampling dictated by Equation (1), Step 3 requires more comment. We instantiate this step by generating *β* from a normal distribution centred at the maximum likelihood estimate of *β* based on the augmented data, with a variance matrix taken to be the estimated variance matrix of the MLE based on the curvature of the log-likelihood at the MLE. This is elaborated upon in the Appendix.

We illustrate this method by assigning the following distribution to the bias parameters, under which *α _{u}*,

The use of uniform distributions over bias parameters lends itself to comparison with the Table 2 approach of looking at all possible combinations of two different values for each bias parameter. It is not very defensible in real problems, however, to argue that all values inside an interval are equally plausible, while all values outside are impossible. Thus we consider replacing the uniform distributions on *α _{u}*,

As a more formal step beyond MCSA, we consider *Bayesian sensitivity analysis* (BSA). Formally this involves specifying a prior distribution over *all* parameters (not just the bias parameters), which, upon combination with the observed data, yields the *posterior distribution* over the parameters. This distribution over the target parameter is taken as the basis for inference. In many problems we can intuit that MCSA and BSA will yield similar results. In fact, as outlined in the Appendix, we can even use the MCSA results as a stepping stone to obtaining BSA results. More formally, importance sampling can be used to *weight* the MCSA sample of parameter values as a route to obtaining the BSA distribution. In the present situation this weighting moves the 95% central interval away from the null slightly, *i.e.*, the BSA interval on exp(*β _{xy}*) arising from the uniform prior distribution is (1.69, 4.79). Again replacing the uniform prior distributions with ‘matching’ normal distributions has a negligible impact on the BSA interval. Some more specialized applications of Bayesian techniques for unobserved confounding can be found in [17, 18].

In case-control studies, selection bias can result from the manner in which controls are sampled. Conceptually one can regard the case-control design either as comparing exposure patterns amongst those diseased to exposure patterns amongst the general population, or as comparing exposure patterns amongst those diseased to exposure patterns amongst the disease-free portion of the population. For most applications the disease is very rare in the study population, in which case the distinction between these two formulations becomes unimportant. What matters is that the scheme to select cases yields an exposure prevalence matching the target population.

In the CPC example, the cases were hospital patients with pancreatic cancer. The controls were sampled from patients who were hospitalized by the same attending physicians who had hospitalized the cases. MacMahon *et. al.* [14] argued that the coffee consumption in the controls was a good approximation of coffee consumption in the general population without pancreatic cancer. The problem is that by virtue of being hospitalized by a physician, the controls may have had illnesses that led them to reduce their coffee consumption prior to enrollment in the study. Thus their exposure distribution would be abnormally low compared to the general population. Unmeasured illness in the controls may have induced an inverse association between the exposure and the probability of being selected into the study as a control. Because the exposure and disease both affect the probability of selection, the significant odds ratio discovered in [14] could be a spurious artifact of bias.

In fact, the possibility of selection bias was acknowledged in [14]. However the authors argued that any impact of the controls’ health status on coffee consumption would be negligible compared to the differences observed between cases and controls. They concluded that *“It is inconceivable that this bias would account for the total difference between cases and controls, but it is possible that the risk may be either overestimated or underestimated on this account.”*

The study generated great controversy because coffee drinking is so common. In later years, researchers tried unsuccessfully to replicate the study findings, and the relationship between coffee drinking and pancreatic cancer has been largely refuted [19]. Interestingly, MacMahon *et. al.* [14] did not attempt to study the robustness of their results to quantitative assumptions about selection bias.

We now use this example to illustrate how to adjust for selection bias. In case-control studies, selection bias occurs when the exposure affects the probability of participating in the study. This dependence has the effect of censoring the exposure distribution so that the odds ratio is no longer unbiased. More generally, selection bias results from conditioning on a variable that is affected by both the exposure and disease. It can also emerge in prospective studies, in which case it is called informative censoring. See Hernán [10] for a review.

As before, let *X* and *Y* model exposure and disease status. We introduce an additional variable *S*, which is an indicator for whether an individual is selected into the study. We have *S* = 1 if the subject is selected and *S* = 0 otherwise. Thus the complete data are replicates of (*S*, *X*, *Y*), with probability model *p*(*S*|*X*, *Y*)*p*(*Y*, *X*), but we only observe data when *S* = 1.

A simple formula for adjusting for selection bias is given by Greenland [13] and Lash *et. al.* [8]. We have

$$O{R}_{\mathit{\text{XY}}}=O{R}_{\mathit{\text{XY}}|S=1}\times B,$$

(3)

where *OR _{XY}*

$$B=\frac{\mathit{\text{Pr}}(S=1|Y=1,X=1)}{\mathit{\text{Pr}}(S=1|Y=0,X=1)}/\frac{\mathit{\text{Pr}}(S=1|Y=1,X=0)}{\mathit{\text{Pr}}(S=1|Y=0,X=0)},$$

which is computed from the selection probabilities *P*(*S* = 1|*X*, *Y*) over levels of *X* and *Y*.

In the pancreatic cancer example, the crude odds ratio *OR _{XY|S}*

$$\text{logit}\mathit{\text{Pr}}(S=1|X,Y)={\theta}_{0}+{\theta}_{x}X+{\theta}_{y}Y+{\theta}_{\mathit{\text{xy}}}\mathit{\text{XY}}.$$

In this case, the odds ratios exp(*θ _{x}*) and exp(

To illustrate bias adjustment in action, Table 3 gives values of *OR _{XY}* in the cancer data as we toy with the bias parameters

Bias corrected odds ratios *OR*_{XY} as a function of (*θ*_{0}; *θ*_{x}; *θ*_{y}) in the case where *θ*_{xy} = 0.

As was the case with unmeasured confounding, the bias parameters *θ _{x}* and

Table 3 displays the sensitivity of the odds ratio for different assumptions about the manner in which the exposure and outcome affect the probability of selection. We let *θ _{x}* and

In Table 3, we see that as we vary the bias parameters the odds ratio shifts toward or away from the null value of 1. Standard errors and commensurate 95% confidence intervals are calculated from the sandwich variance estimate of weighted logistic regression of *Y* on *X*, as described below. Setting either *θ _{x}* or

The significant association between coffee drinking and cancer is very robust to different assumptions about selection bias. For example, the bias parameter combination (*θ*_{0}, *θ _{y}*,

Taking the analysis a step further, we can conduct MCSA where we assign probability distributions to the bias parameters (*θ*_{0}, *θ _{x}*,

Paralleling the discussion of MCSA in Section 2, we repeatedly implement the following three step procedure described by Greenland [6].

- Draw a value of (
*θ*_{0},*θ*,_{x}*θ*,_{y}*θ*) from the prior distribution._{xy} - Compute the inverse probabilities of selection, {expit(
*θ*+*θ*+_{x}X*θ*)}_{y}Y^{−1}, as weights for fitting a weighted logistic regression of*Y*on*X*. - Based on the fit, use the estimated
*X*coefficient and its standard error as the mean and standard deviation in drawing a single normally-distributed value for log*OR*._{XY}

By repeating the steps a large number of times, an ensemble of values for log *OR _{XY}* is generated. Using the prior distributions outlined above we obtain the mean and standard deviation of the ensemble as 0.75 and 0.28, with 95% central interval of (0.25, 1.30). Note that the standard deviation of 0.28 is only modestly larger than the standard error of the crude log odds ratio, which is 0.26. Correspondingly, the 95% central interval based on MCSA (again on the log odds ratio scale) is only 7.7% wider than the conventional interval ignoring selection bias. Upon exponentiating, the MCSA estimate of

Based on these results we arrive at several conclusions. Plausible priors on the bias parameters (*θ*_{0}, *θ _{x}*,

It is difficult to know how to interpret these results. Selection bias alone does not appear to be driving the association in Table 1. Since later research was unable to replicate the coffee-cancer association it suggests we look elsewhere for explanations. One possibility is that the association arose through multiple comparisons and poor type I error control. Another explanation is that there may have been several biases at play at the same time. For instance, coffee consumption may have been measured with error in cases and controls, and relevant confounding variables may not have been identified.

We conclude with a discussion of a full Bayesian treatment of selection bias. Interestingly, unlike other biases, selection bias adjustment does not have a natural Bayesian interpretation. The reason is that the bias adjustment formula in Equation (3) is an inverse probability weighting estimator [10]. The formula is derived by reweighing the exposure distribution using the probabilities of selection. This estimator does not arise through Bayes theorem because there is no likelihood function, which is essential in a Bayesian analysis. For 2 × 2 tables, a Bayesian approach is described by Greenland [6]. Other recent work in this area is given by Geneletti *et. al.* [21].

In the coffee and pancreatic cancer example, exposure status is obtained by asking study participants about coffee consumption via a questionnaire. Given that exposure is defined in this study as any non-zero level of daily consumption, one might anticipate that the answers provided by participants are quite accurate in this setting. The situation would be muddied, however, for participants whose consumption pattern may have changed over time. More generally, concern about accurate measurement or classification of exposure abounds in epidemiology. Typically this falls under the heading of exposure measurement error when the exposure variable is continuous, and exposure misclassification when the exposure variable is categorical (with absent/present being an important special case).

To give an illustration of how we might conduct an analysis which acknowledges exposure misclassification, say the relationship of interest is

$$\mathit{\text{logitPr}}(Y=1|X)={\beta}_{y}+{\beta}_{xy}X.$$

(4)

Thus we focus on a situation where adjustment for confounding is not attempted. Rather, a challenge arises in that the observed data are (*W*, *Y*) rather than (*X*, *Y*). That is, our available binary exposure measurement *W* is imperfect, and will differ from the true binary exposure *X* for some subjects. The rate or extent of corruption can be characterized by sensitivity and specificity, *i.e.*, *SN* = *Pr*(*W* = 1|*X* = 1), *SP* = *Pr*(*W* = 0|*X* = 0).

To focus on a simple case for illustration, we consider *nondifferential* misclassification, whereby the proclivity for exposure misclassification is independent of outcome. This can be expressed as conditional independence of *W* and *Y* given *X*, allowing us to extend (4) to the form

$$\mathit{\text{logitPr}}(Y=1|X,W)={\beta}_{y}+{\beta}_{xy}X,$$

(5)

whilst also postulating that

$$\mathit{\text{logitPr}}(W=1|X,Y)={\alpha}_{w}+{\alpha}_{xw}X.$$

(6)

Thus (*α _{w}*,

Note that the present framework is somewhat parallel to our treatment of an unobserved confounder in Section 2, with two regression relationships at play. That is, (5), like (2) before, governs the relationship of scientific interest, while (6), like (1) before, governs the link between observed and unobserved variables. Moreover, just as before, (5) and (6) can be subsumed by a Poisson model for the counts of subjects stratified by (*X*, *Y*, *W*).

Despite the appealing similarity, it is more challenging to provide some form of probabilistic sensitivity analysis in the present setting than in the Section 2 setting. The challenge arises via the fact that the distribution of the unobserved variable *X* given observed variables (*Y*, *W*) depends not just on bias parameters *α* = (*α _{w}*,

With case-control data and nondifferential misclassification, we can focus on the four parameter model studied in detail in Gustafson, Le and Saskin [25], rather than the more general Poisson model for cell counts. The four unknown parameters are exposure prevalences amongst controls and cases, *Pr*(*X* = 1|*Y* = *y*) for *y* = 0, 1, and sensitivity and specificity. A simple prior distribution treats these four parameters as independent, with uniform priors (between zero and one) for each exposure prevalence, and Beta priors (also between zero and one) describing plausible values for *SN* and *SP*. As an example, say that both *SN* and *SP* are ascribed Beta(13.5,1.5) prior distributions. This distribution has mean 0.9 and mode 0.893, and puts only 5% probability on values below 0.75. Applied to the CPC data via WinBUGS, the resulting posterior distribution on the target parameter is extremely wide, with 95% central interval (1.99, 75.1) on the odds-ratio exp(*β _{xy}*).

To explore this further, the 95% central interval above translates to (0.67, 4.44) on the log odds-ratio, *i.e.*, the interval for *β _{xy}*. Moreover, the posterior distribution is somewhat asymmetric, with a mean of 1.79. In comparison, the 95% confidence interval assuming perfect exposure classification is (0.51, 1.52), centered around the point estimate of 1.01 (again on the log odds-ratio scale). Thus, relative to pretending that the exposure classification is perfect, the point estimate is pushed away from the null considerably, and the interval estimate is widened very considerably. To large extent these two changes ‘cancel out’ in terms of the lower bound of the interval estimate only moving modestly away from the null value of zero.

In fact, even in the simpler setting of known (*SN*, *SP*) values one sees that adjustment for nondifferential misclassification pushes estimates away from the null, since the adjustment compensates for the necessarily weaker association between *W* and *Y* compared to that between *X* and *Y*. It should be noted, however, that often such adjustment does not strengthen evidence about the direction of the effect [26, 27]. Also, the widening of the interval arises via two sources. First, even if (*SN*, *SP*) were known, adjustment would acknowledge that (*W*, *Y*) data are less informative than (*X*, *Y*) data when the (*X*, *Y*) relationship is of interest. Second, the interval also acknowledges the uncertainty about (*SN*, *SP*) values, as encoded in the prior distribution.

A particular point of interest in this example is comparison of the prior and posterior distributions on the bias parameters, *SN* and *SP*. These are displayed in Figure 1, along with the posterior distributions on the target parameter of interest *β _{xy}*. We are accustomed to sensitivity analysis problems where the bias parameters are not informed by the observed data, and we see this to be the situation for

$$\begin{array}{l}\mathit{\text{Pr}}(W=1|Y=y)=\mathit{\text{Pr}}(X=1|Y=y)\mathit{\text{SN}}+\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\{1-\mathit{\text{Pr}}(X=1|Y=y)\}(1-\mathit{\text{SP}}),\end{array}$$

(7)

and consequently *Pr*(*W* = 1|*Y* = *y*) must lie between 1 − *SP* and *SN*. The present data provide very strong evidence that *Pr*(*W* = 1|*Y* = *y*) is large, for both *y* = 0 and *y* = 1. This then serves to rule out some lower values of *SN*, at least probabilistically. In fact, the present data are unusual for an epidemiologic setting. It is much more common to encounter low-prevalence exposures. In such settings lower values of *SP* may be discredited, by the same sort of logic described above.

Posterior distributions of *β*_{xy}, *SN*, *SP* after acknowledging nondifferential misclassification. The prior densities of *SN* and *SP* are indicated via smooth density curves.

As a second point, the fact that substantial prior-to-posterior updating of a bias parameter can be seen, at least for some datasets, speaks against the applicability of TSA or MCSA in such settings. With TSA or MCSA, their is no attempt to access any information *in the data* about relative plausibilities of values for the bias parameters.

Probabilistic sensitivity analysis is a new suite of methods to help epidemiologists deal with bias in observational studies. On the one hand, such methods permit shifting odds-ratios away from or toward the null value, based on beliefs about biasing mechanisms. Moreover, they also give more plausible quantifications of uncertainty, as they incorporate uncertainties about the magnitude and direction of bias. For example, in the CPC data example, the actual amount of selection bias is debatable. It could be large or small depending on assumptions about the probabilities of selection. Prior distributions on bias parameters incorporate this uncertainty into the analysis.

The methods are particularly useful in settings where confidence intervals around effect estimates are small, e.g., in meta-analysis or studies using large population databases. In such settings random sampling variation contributes only a small fraction of the total uncertainty that is at play. Thus probabilistic bias analysis gives a more honest assessment of the major sources of uncertainty.

In the CPC example, we illustrated that the association between coffee drinking and cancer is robust to different assumptions about selection bias. These findings are surprising in light of the fact that the study results were never successfully replicated. One possible interpretation is that there were multiple biases involved in the data collection process. Indeed the simultaneous modeling of *multiple* biases is a further refinement of sensitivity analysis techniques that we have not discussed, but is a major focus in [3, 6]. For example, in addition to selection bias it is possible that coffee consumption was measured with error in cases and controls, and that this may have further exaggerated the association between exposure and disease.

One outstanding question is determining why exactly the odds ratio in Table 1 is so robust to bias. Certainly the sample size and the magnitude of the crude association between *X* and *Y* should play a role. But the situation is more complex in studies with covariate data. In general, there has been little research characterizing the design considerations when one hopes to minimize sensitivity to bias. Some recent work in this area is given by Rosenbaum [28]. One possibility is to make formal use of the notion of ‘prior data’, described by Greenland [29], to help understanding the information content of prior distributions by representing these distributions in the form of ‘extra’ data points.

Work supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Institutes of Health Research (Funding Reference Number 62863).

In the unobserved confounding setting, let *c _{xyu}* be the count of study subjects having

$$\text{log}E({c}_{\mathit{\text{xyu}}})={\beta}_{0}+{\beta}_{x}X+{\beta}_{y}Y+{\beta}_{\mathit{\text{xy}}}\mathit{\text{XY}}+{\alpha}_{u}U+{\alpha}_{\mathit{\text{xu}}}\mathit{\text{XU}}+{\alpha}_{\mathit{\text{yu}}}\mathit{\text{YU}}.$$

(8)

If (*X*, *Y*, *U*) were observed for all study subjects, so that *c _{xyu}* were observed for all cells, then (8) could be fit to the data using any standard software package for fitting log-linear models, resulting in estimates of both sets of parameters,

Turning to the real problem at hand with *U* unobserved, TSA corresponds to fitting (8) with *α* taken as known, but with only partially observed data. In general, fitting generalized linear models with some parameters fixed at ‘known’ values is simply done via standard software. In particular, the terms involving known parameters are specified as an *offset* in the *linear predictor*. The issue of partially observed data, however, requires more attention.

In the present setting, for each level of *x* and *y*, *c _{xy•}* =

Now we consider the MCSA scheme. We formally denote the missing and observed data as *c _{mis}* = (

$${p}_{\mathit{\text{MCSA}}}(\alpha ,\beta ,{c}_{\mathit{\text{mis}}}|{c}_{\mathit{\text{obs}}})=p(\alpha )\left\{\prod _{i,j}p({c}_{\mathit{\text{ij}}1}|{c}_{\mathit{\text{ij}}\u2022},\alpha )\right\}p(\beta |{c}_{\mathit{\text{com}}},\alpha ).$$

(9)

Here *p*(*α*) is the selected prior distribution, while *p*(*c _{ij}*

In contrast, under a fully Bayesian analysis the posterior density is

$$\begin{array}{l}{p}_{\mathit{\text{BSA}}}(\alpha ,\beta ,{c}_{\mathit{\text{mis}}}|{c}_{\mathit{\text{obs}}})\propto p(\alpha )p(\beta )\prod _{x,y,u}f({c}_{\mathit{\text{xyu}}}|\alpha ,\beta )\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\propto p(\alpha )p(\beta )\prod _{x,y}f({c}_{\mathit{\text{xy}}\u2022}-{c}_{\mathit{\text{xy}}1}|\alpha ,\beta )f({c}_{\mathit{\text{xy}}1}|\alpha ,\beta ),\end{array}$$

(10)

where *f*(*c _{xyu}*|

The weighted sample represents the BSA results just as the unweighted sample represents the MCSA results. For instance, the Bayesian 95% interval for the target parameter *β _{xy}* could be formed as the 2.5-th and 97.5-th percentiles of the weighted sample (

The R code we developed to implement the TSA, MCSA, and BSA analyses is posted online (www.stat.ubc.ca/~gustaf). Also, the TSA and MCSA analsyes are readily implemented via The Stata package developed by Orsini *et. al.* [30]. Lash *et. al.* [8] provide SAS code and Excel worksheets implementing various sensitivity analyses: (sites.google.com/site/biasanalysis).

For a case-control setting with binary exposure and nondifferential misclassification, the sampling model can be expressed via the numbers of controls and cases classified as exposed, given the total numbers of controls and cases, as being independent Binomial realizations. The ‘success’ probabilities are given by equation (7). WinBUGS code to specify this model along with the prior distributions described in Section 4 is as follows.

model

{

wtot0 ˜ dbin (p0, n0)

wtot1 ˜ dbin (p1, n1)

p0 <– r0SN + (1–r0)(1–SP)

p1 <– r1SN + (1–r1)(1-SP)

lor <– logit (r1) – logit (r0)

r0 ˜ dunif (0, 1)

r1 ˜ dunif (0, 1)

SN ˜ dbeta (a. sn, b. sn)

SP ˜ dbeta (a. sp, b. sp)

}

This code requires input of the dataset in the form of wtot0 out of n0 controls and wtot1 out of n1 cases having apparent exposure *W* = 1, and the particular specification of a prior specification via hyperparameters a.sn, b.sn, a.sp, b.sp.

1. Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. J. Nat. Cancer Inst. 1959;22:173–203. [PubMed]

2. Greenland S. Sensitivity analysis, Monte-Carlo risk analysis, and Bayesian uncertainty assessment. Risk Anal. 2001;21:579–583. [PubMed]

3. Greenland S. The impact of prior distributions for uncontrolled confounding and response bias: A case study of the relation of wire codes and magnetic fields to childhood leukemia. J. Amer. Statist. Assn. 2003;98:47–55.

4. Lash TL, Fink AK. Semi-automated sensitivity analysis to assess systematic errors in observational data. Epidemiology. 2003;14:451–458. [PubMed]

5. Steenland K, Greenland S. Monte-Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Amer. J. Epidemiol. 2004;160:384–392. [PubMed]

6. Greenland S. Multiple-bias modelling for analysis of observational data. J. Royal Statist. Soc., A. 2005;168:267–306.

7. Fox MP, Lash TL, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int. J. Epidemiol. 2005;34:1370–1377. [PubMed]

8. Lash TL, Fox MP, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. Springer; Berlin, Germany: 2009.

9. Mezei G, Kheifets L. Selection bias and its implications for case-control studies: a case study of magnetic field exposure and childhood leukaemia. Int. J. Epidemiol. 2006;35:397–406. [PubMed]

10. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. [PubMed]

11. Fewell Z, Davey Smith G, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Amer. J. Epidemiol. 2007;166:646–655. [PubMed]

12. Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol. Drug Saf. 2006;15:291–303. [PubMed]

13. Greenland S. Bayesian perspectives for epidemiological research: III. Bias analysis via missing data methods. Int. J. Epidemiol. 2009;38:1662–1673. [PubMed]

14. MacMahon B, Yen S, Trichopoulos D, Warren K, Nardi G. Coffee and cancer of the pancreas. N. Engl. J. Med. 1981;304:630–633. [PubMed]

15. Jewell NP. Statistics for Epidemiology. Chapman and Hall; CRC Press; Boca Raton, FL, USA: 2003.

16. Wang L, Krieger AM. Causal conclusions are most sensitive to unobserved binary covariates. Statist. Med. 2006;25:2257–2271. [PubMed]

17. McCandless LC, Gustafson P, Levy AR. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statist. Med. 2007;26:2331–2347. [PubMed]

18. McCandless LC, Gustafson P, Levy AR. A sensitivity analysis using information about measured confounders yielded improved assessments of uncertainty from unmeasured confounding. J. Clin. Epidemiol. 2008;61:247–255. [PubMed]

19. Tavani A, La Vecchia C. Coffee and cancer: A review of epidemiologic studies 1990–1999. Eur. J. Cancer Prev. 2000;9:241–256. [PubMed]

20. Rosenberg L, Slone D, Shapiro S, Kaufman DW, Stolley PD, Miettinen OS. Coffee drinking and myocardial infarction in young women. Amer. J. Epidemiol. 1980;111:675–681. [PubMed]

21. Geneletti S, Richardson S, Best N. Adjusting for selection bias in retrospective case-control studies. Biostatistics. 2009;10:17–31. [PubMed]

22. Flegal KM, Keyl PM, Nieto FJ. Differential misclassification arising from nondifferential errors in exposure measurement. Amer. J. Epidemiol. 1991;134:1233–1246. [PubMed]

23. Gustafson P, Le ND. Comparing the effects of continuous and discrete covariate measurement error, with emphasis on the dichotomization of mismeasured predictors. Biometrics. 2002;58:878–887. [PubMed]

24. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – A Bayesian modelling framework: concepts, structure, and extensibility. Statist. Comput. 2000;10:325–337.

25. Gustafson P, Le ND, Saskin R. Case-control analysis with partial knowledge of exposure misclassification probabilities. Biometrics. 2001;57:598–609. [PubMed]

26. Greenland S, Gustafson P. Accounting for independent nondifferential misclassification does not increase certainty than an observed association is in the correct direction. Amer. J. Epidemiol. 2006;164:63–68. [PubMed]

27. Gustafson P, Greenland S. Curious phenomena in Bayesian adjustment for exposure misclassification. Statist. Med. 2006;25:87–103. [PubMed]

28. Rosenbaum PR. Heterogeneity and causality: unit heterogeneity and design sensitivity in observational studies. Amer. Statist. 2005;59:147–153.

29. Greenland S. Prior data for non-normal priors. Statist. Med. 2007;26:3578–3590. [PubMed]

30. Orsini N, Bellocco R, Bottai M, Wolk A, Greenland S. A tool for deterministic and probabilistic sensitivity analysis of epidemiologic studies. Stata J. 2008;8:29–48.

Articles from International Journal of Environmental Research and Public Health are provided here courtesy of **Multidisciplinary Digital Publishing Institute (MDPI)**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |