Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3777387

Formats

Article sections

Authors

Related links

Biom J. Author manuscript; available in PMC 2013 September 19.

Published in final edited form as:

PMCID: PMC3777387

NIHMSID: NIHMS484107

Evaluation of impact of potential uncontrolled confounding is an important component for causal inference based on observational studies. In this article, we introduce a general framework of sensitivity analysis that is based on inverse probability weighting. We propose a general methodology that allows both non-parametric and parametric analyses, which are driven by two parameters that govern the magnitude of the variation of the multiplicative errors of the propensity score and their correlations with the potential outcomes. We also introduce a specific parametric model that offers a mechanistic view on how the uncontrolled confounding may bias the inference through these parameters. Our method can be readily applied to both binary and continuous outcomes and depends on the covariates only through the propensity score that can be estimated by any parametric or non-parametric method. We illustrate our method with two medical data sets.

When the treatment assignment is not randomized, as in most observational studies, bias may be introduced in causal inference when pre-treatment covariates associated with treatment assignment and the outcome are not properly adjusted. This is particularly critical for inference based on large data sets (*i.e*. electronic medical record databases) where precision can be rather high, and thus the accuracy of the estimator is dominated by the bias. The potential outcome model has been established as a major theoretical framework that underlies most of the available methods for causal inference including matching covariates (Cochran, 1953; Rubin, 1973), outcome regression, methods based on propensity scores (Rosenbaum and Rubin, 1983b; Rubin and Thomas, 1996) and double-robust estimator (Bang and Robins, 2005). The key idea of the potential outcome model is that for each unit, there is a potential outcome under each treatment and assessment of the causal effect involves comparison of these potential outcomes (Holland, 1986). Since not all of the potential outcomes are observed for each unit in empirical data, causal inference can also be viewed as a missing-data problem (Rubin, 1978).

Common to many methods in observational studies is the assumption of “no uncontrolled confounding”, which states that conditional on the pre-treatment covariates included in the analysis, the treatment assignment and potential outcome are independent. Such an assumption serves as the basis for the large volume of research articles on different methods for bias correction and efficiency improvement. However, it is also of great practical interest to examine the extent to which the inference is sensitive to violation of this assumption. It is well known that it is not possible to test the assumption of no uncontrolled confounding using observed data. Therefore, sensitivity analysis is an important component for proper interpretation of the data and accurate statement of the conclusion when assessing causal effects using observational data. The general strategy in sensitivity analysis involves postulation of various assumptions on the nature of the associations of the uncontrolled confounders with treatment assignment and outcomes, followed by examination of the bias induced. Usually, these assumptions are in the format of plausible values of parameters not identifiable from the observed data that characterize those associations. These parameters will be referred to as non-estimable parameters in this article.

Compared with methods assuming no uncontrolled confounding, sensitivity analysis has been relatively sparse in the literature. Rosenbaum (1995) developed a logistic model in the randomization-based framework, where a single parameter measuring the strength of association between the missed covariate and treatment assignment conditional on covariates already included is linked to a minimum and a maximum *p*-values of the inference. The *p*-values under different values of this parameter allow one to assess the sensitivity to uncontrolled confounding. Rosenbaum and Rubin (1983a) proposed an approach for binary outcome with one categorical covariate, where the sensitivity of the estimate to different values of non-estimable parameters (due to unobserved covariates) in the full likelihood function was examined. The same authors also examined the bias due to incomplete matching (Rosenbaum and Rubin, 1985). Brumback et al. (2004) developed an approach where the mean difference between treatment groups within each covariate stratum is used to examine the impact of residual confounding, which uses a similar type of strategy reported previously (Cole et al., 2005; Ko et al., 2003; Robins, 1999). Lin et al. (1998) studied bias induced by uncontrolled confounding in the parametric regression setting. It was shown that under certain conditions there exists a simple algebraic relationship between the true effect and the “apparent” effect when there is uncontrolled confounding – see Hernan and Robins (1999) for more discussions. MacLehose et al. (2005) and Kuroki and Cai (2008) proposed linear programming methods to derive the upper and lower bounds for the causal effect on a binary outcome. Greenland (2005) developed a general Bayes framework to correct bias due to confounding, missing data, and classification error by including a bias model with relevant bias parameters. Other methods include Arah et al. (2008), Copas and Li (1997), Imbens (2003), Sturmer et al. (2007a,b).

In this article, we propose a general framework for sensitivity analysis of inference based on inverse probability weighting (IPW) that supports both parametric and non-parametric modeling of the probability of treatment assignment (propensity score). We also propose a specific type of parametric model along the same line that provides a more mechanistic view on how the uncontrolled confounding may bias the inference. The parametric model also provides guidance on how to postulate the non-estimable parameters when the association of uncontrolled confounders with treatment assignment is entirely unstructured. Our method generates upper and lower bounds for the causal effect under fixed values of two non-estimable parameters that characterize the error in the estimated propensity score due to uncontrolled confounding and its correlation with the potential outcome. The advantages of our method over previously published methods include: (i) it can be used for both continuous and binary outcomes; (ii) it involves the covariates only through the propensity score and therefore many covariates can be included; (iii) the sensitivity analysis is driven by only two parameters with sensible scales; (iv) the propensity score can be estimated from any parametric or non-parametric method under the general framework; and (v) the parametric model provides insight on the process that induces biases. In particular, property (ii) allows our method to deal with high-dimensional propensity score model, which is difficult to implement in previous method (*e.g*. Brumback et al., 2004). A limitation of our method is that the assumption is based on potential outcomes instead of observed outcomes (Vander Weele, 2008), which make it less intuitively appealing. Nevertheless, our method is set up in a very general framework that can be applied to most practical applications. In Section 2, we introduce some notations and describe our method. We illustrate our method with two application examples in Section 3. We conclude the article with a discussion in Section 4.

Our method is built on the potential outcome framework for two-level treatment (treated *versus* control) (Holland, 1986; Rubin, 1974, 1978, 2005). Let *Y*_{1} and *Y*_{0} be the potential outcomes when there is treatment or not, respectively. The principle idea of causal inference based on the potential outcomes is that the causal effect is captured by some kind of contrast between *Y*_{1} and *Y*_{0} (e.g. *Y*_{1}−*Y*_{0}). As in many practical problems, we only observe one of the two potential outcomes, direct estimation of the causal effect at individual level is not possible. Most of the causal inference problems focus on the estimation of the average treatment effect (ATE) for a target population, that is, Δ = *E*(*Y*_{1})−*E*(*Y*_{0}). In observational studies, because of the non-randomness of treatment assignment, difference in the sample means between the two treatment arms would be a biased estimator of ATE. One of the approaches to eliminate the potential bias is called IPW estimation (Hirano and Imbens, 2001; Robins et al., 1994). For the IPW method, each unit has a probability of being assigned to the treatment arm that depends on certain baseline characteristics. This probability is also called the propensity score (Rosenbaum and Rubin, 1983b). The basic idea of IPW is to calculate the difference in the weighted sample means between the two arms with the weight being the inverse of the propensity score (for treated units) or one minus the propensity score (for control units). To be specific, let Z be the treatment indicator such that Z = 1 means treated and 0 means control. Therefore, Y=ZY_{1}+(1−*Z)Y*_{0} is the observed outcome. We denote by **X** the observed covariate vector and *S = E(Z|*** X**) the propensity score. Under the assumption of no uncontrolled confounding (weak ignorability) (Rubin, 2005),

(1)

Equation (1) is the basis for the unbiased estimation of the marginal mean of the outcome under each treatment level and the causal effect using IPW. The intuitive explanation of Eq. (1) can be best understood through the concept of pseudo-population. For instance, each unit with Z = 1 has a probability of *S* to be assigned to the treatment. Therefore, in terms of the distribution of *Y*_{1}, a pseudo-population where such a unit is duplicated 1/S times represents the original population from which the unit is sampled. The mean of *Y*_{1} for the two populations then should be the same.

When there is uncontrolled confounding, *Y _{k}* (

We denote by ε_{1} = *S ^{*}*/

(2)

where the expectation is with respect to the distribution of (**X**,**X**^{*}) within a pseudo-propensity score stratum with value *S.* To see this, note that

Equation (2) says that *S ^{*}* will center at

Since *S* is not the true propensity score when there is uncontrolled confounding, the IPW based on *S* may be biased. As an example, consider the assessment of the magnitude of bias in estimating the mean of *Y*_{1}. First, we observe that

where the fourth equality is due to the independence of Z and *Y*_{1} conditional on (X,X^{*}). Then we have

(3)

Here, ρ_{1} is the correlation coefficient between *Y*_{1} and ε_{1}. Equation (3) represents the key idea of our method. It says that the bias of *YZ*/*S* in estimating *E*(*Y*_{1}) is governed by two non-estimable quantities in addition to the variance of *Y*_{1} (also non-estimable), the correlation between the multiplicative error ε_{1} in propensity score due to uncontrolled confounders and the potential outcome *Y*_{1}, and the variance of ε_{1}. The first quantity, in some sense, characterizes the strength of the association between the missed confounders and the potential outcome. It can be interpreted as a measure that captures the concordance between the outcome value and the amount of extra weight assigned to the outcome value due to uncontrolled confounders. The second quantity, as seen later in the parametric model, characterizes the strength of the association between the missed confounders and the treatment assignment. Stronger correlation and higher variation in ε_{1} will lead to higher bias. Intuitively, since higher ε_{1} means that *S* severely under-estimates *S ^{*}*, a positive correlation implies that higher

To derive the bounds for marginal means and the causal effect, we will assume that *Y*_{1},*Y*_{0} [0,1]. As in most practical problems continuous outcomes can be monotonically transformed to the unit interval based on conservative estimates of the range of the outcomes, this assumption has quite general applicability.

We first introduce some notations. Let , , , , , , and . Then

(4)

Since Y_{1} [0,1], it is easy to show that
. Because *μ*_{0}=(*E*(*Y*_{1})–πμ_{1})/(1−π), we can replace η_{0} in (4) by the expression of its bounds. After some algebra, we have

(5)

where ψ= π/(1−*π*)*, b =* π(μ_{1}−η_{1}), and
are all estimable quantities. Hence, in consideration of Eq. (3) and the inequalities above, we have

(6)

The region defined by inequalities (6) includes two intervals, one for positive ρ_{1} and one for negative ρ_{1} We will call the interval that corresponds to the intended sign of ρ_{1} the *feasible region* of *E(Y*_{1}*).* In Fig. 1, we illustrate the idea through a geometric demonstration. Essentially, Fig. 1 shows that the feasible region for *E*(*Y*_{1}) is determined by the positions and shapes of three second-order polynomials. The distances between θ_{1} and the two shaded regions on the *X*-axis represent possible magnitude of bias depending on the sign of ρ_{1} The width of the interval originates from uncertainty on the relationship between *E*(*Y*_{1}) and Var(*Y*_{1}). Note that inequalities in (6) involve non-estimable parameters (ρ_{1},ν_{1}). For any given (ρ_{1},ν_{1}), one can solve the equations in (6) to obtain the lower (*l*_{1}) and upper (*L*_{1}) limits of the interval. These limits set the boundaries for *E*(*Y*_{1}) after adjusting for the error in estimating the propensity score (ν_{1}) and its correlation with the potential outcome (ρ_{1}) By examining *l*_{1} and *L*_{1} under different values of (ρ_{1},ν_{1}), one can gain insight on the robustness of the estimate of *E*(*Y*_{1}).

The same type of bounds *l*_{0} and *L*_{0} can also be derived for *E*(*Y*_{0}) for a fixed pair of (ρ_{0},ν_{0}), where ν_{0} is the variance of ε_{0} = (1−*S ^{*}*)

As the correlation coefficient is a popular and sensible measure to most users, the variance of the error term might not be sensible enough to be specified easily. Since a random variable taking values in the unit interval with mean μ has the maximum variance of μ(1−μ)(Var(*Y*) = *E*(*Y*^{2})−[*E*(*Y*)]^{2} ≤ *E*(*Y*) − [*E*(*Y*)]^{2} = μ(1−μ)), we have

Therefore, instead of proposing a value of ν_{1}, we can postulate the percentage of the maximum possible variance λ_{1}= ν_{1}/*V*_{1}. Similarly, we can propose λ_{0}=*v*_{0}/*V*_{0}, where *V*_{0} = *E*(*S*/(1*S*)).

When the outcome is binary, the right-hand side inequality in (5) becomes

(7)

For any fixed (ρ_{1},ν_{1}), the region defined by (6), if exists, is a point. Therefore, *l*_{1}= *L*_{1}. Similarly, *l*_{0} = *L _{0}*.

It is obvious that *l*_{1}−*L*_{0} and *L*_{1}−*l*_{0} are a set of lower and upper bounds for the causal effect. In practice, a propensity score model is first fitted to the data and the estimated propensity score will play the role of pseudo-propensity score. The lower and upper limits are then derived using parameters from the empirical distributions. It often provides sufficient insight into the robustness of the result by setting |ρ_{1}| = |ρ_{0}| ^{=} ρ and λ_{1}= λ_{0}=λ (as long as λ_{1} and λ_{0} are compatible) so that one needs only to assess the sensitivity of the estimates by examining various values of (ρ,λ) (note that this condition is for simplicity purpose and one can certainly postulate the four parameters without such a constraint). Nevertheless, one will still need to decide on the signs of ρ_{1} and ρ_{0}, which can often be chosen to work against the finding of the original analysis to examine the robustness. We will demonstrate how this strategy works by real data examples in Section 3. Inference can also be made on the limits. For instance, if a positive treatment effect is found *(i.e.* treatment has higher outcome value), one would be interested in testing *H*_{0}*:l*_{1}−*L*_{0} ≤ 0 *versus H*_{1} : *l*_{1}−*L*_{0} > 0 under certain values of (ρ,λ). Rejection of the null will increase the confidence of the finding.

Certain values of λ_{1} and *λ*_{0} may not be compatible due to their connection through the distribution of *S* and Var(*S ^{*}|S*). The following result establishes the range of compatible

For a fixed λ_{1}*, E*[*I(S* ≤ *q*_{1})*S/*(1−*S*)]*/V*_{0} ≤ λ_{0}≤*E*[*I*(*S* ≥ *q*_{2})*S*/(1−*S*)]*/V*_{0}, where *q*_{1} and *q*_{2} satisfy *E*[*I*(*S* ≤ *q*_{1})(1−*S*)*/S*]= *E*[*I*(*S* ≥ *q*_{2})(1−*S*)*/S*]=*v*_{1}=λ_{1}*V _{1}*.

Note that both bounds of λ_{0} are estimable from the observed data given λ_{1}.

In summary, formulae (6) and (7) allow one to evaluate the feasible regions of the marginal means under different values of the two parameters that characterize the potential influence of uncontrolled confounding when using IPW. Evaluations of both potential outcomes then allow one to conduct sensitivity analysis on causal inference. From the practical point of view, ρ might be difficult to postulate, as it is a relative abstract quantity. However, we need to emphasize that a sensitivity analysis should account for both known confounders and potential confounders we are not even aware of. For the latter, it is difficult to speculate on the magnitude of non-estimable parameters no matter what method is used. One solution is to propose different values of the parameters that have well-recognized meaning of “high”, “medium”, and “low”, and assess the effect on inference. In this sense, correlation coefficient is a very sensible parameter, for which most users have a clear understanding of what is considered high or low in their subject area. In general, the intervals 0.1–0.3, 0.3–0.5, and 0.5–1.0 are considered low, medium, and high. On the other hand, λ is a more obtuse parameter with less sensible scale. In the following section, we introduce a parametric model that provides a more mechanistic view on how the uncontrolled confounding introduces bias in the inference and therefore provides some guidance of the scale of λ. It should be noted that the parametric model by itself also serves as a way of sensitivity analysis.

Consider the following probit model as the true propensity score model:

(8)

Here, Φ is the cumulative distribution function of a standard normal variable. Model (8) says that the contribution of **X** and **X ^{*}** to Φ

Odds ratio (OR) ranges associated with one interquartile range increase in *U* for *S*^{*} ranging from 0.1 to 0.9 (the first number corresponds to the odds ratio at *S*^{*}= 0.1 and the second number corresponds to the odds ratio at *S*^{*}= 0.9).

It is well known that integrating out *U* leads to

(9)

Equations (8) and (9) imply that when (8) is the true propensity score, the pseudo-propensity score is still a probit model. This means although we miss *U*, the probit model is the correct model for E[Z|**X**]. After some calculations (see Appendix A.2 for details), it can be shown that

(10)

where *F _{τ}*(

Since *v _{1}* =

(11)

Therefore, λ is the maximum proportion of Var_{τ}(S^{*}|S) relative to the maximum conditional variance of *S** across all *S*. λ is more conservative than τ in the sense that Var_{λ}(*S*^{*}|*S*) is always higher than Var_{τ}(*S*^{*}|*S*) for all *S*, suggesting a more severe impact of the missed confounders on the probability of treatment assignment. Therefore, users unwilling to make the probit assumption of the true propensity score model can select proper τ values and use *λ*(τ) for the sensitivity analysis, provided that setting the common value *λ*(*τ*) for λ_{1} and *λ*_{2} does not violate the compatibility rule specified in Result 1.

Model (8) also allows straightforward large sample inference of the bounds that accounts for sampling variation (see Appendix A.3 for details).

We have described a framework of sensitivity analysis for causal inference using IPW. We have shown that the bias due to uncontrolled confounding can be understood through two non-estimable quantities, the variance of the multiplicative error in assignment probability and its correlation with the potential outcome. We summarize below how to implement our method:

- Propose a fixed value of (ρ,τ) and the signs of ρ
_{1}and ρ_{0}, where ρ =|ρ_{1}|=|ρ_{0}|. The signs of ρ_{1}and ρ_{0}can be conservatively chosen to work against the treatment effect found under the assumption of no uncontrolled confounders (see the examples in Section 3). - If the true propensity score is assumed to follow the probit model: use Eq. (10) to calculate conditional variances Var
(_{τ}*S*|^^{*}*S*) at each fitted pseudo-propensity score ^*S*. Calculate and by averaging Var(_{τ}*S*|^{*}*^S*)/*^S*^{2}and Var(_{τ}*S*|^{*}*^S*)/(1*−̂S*)^{2}over sampled units. - Otherwise, use Eq. (11) to calculate
*λ*(τ) and Var_{λ}(S^{*}|*^S*). Calculate and by averaging Var_{λ}(*S*|^{*}*^S*)/*^S*^{2}and Var_{λ}(*S*|^{*}*^S*)/(1−*^S*)^{2}over sampled units. Here,*^S*is the fitted pseudo-propensity score by any parametric or non-parametric method. Make sure that the two λ parameters are allowed to take the same chosen value by Result 1.

- Use either the large sample theory (probit model) or bootstrap (unspecified true propensity model) to calculate confidence intervals of the relevant bounds.
- Repeat (i)–(iv) under alternative values of (ρ,τ) and possibly different signs of ρ
_{1}and ρ_{0}.

Clinics in resource-limited setting of Sub-Saharan Africa are typically under-resourced and under-staffed. Computerized reminder is one possible approach to improve the quality of care. For HIV-positive patients, CD4 measure is an important health index that needs to be monitored regularly for proper treatment arrangement. Were et al. (2011) conducted a study to evaluate the effect of computer-generated reminders on clinicians’ compliance with ordering guidelines of CD4 test in western Kenya. This study was conducted in HIV clinics affiliated with the Academic Model for Providing Access to Care (AMPATH), a collaborative initiative between Moi University School of Medicine (Kenya) and a consortium of universities in North America led by Indiana University. Since 2006, AMPATH clinics have used the AMPATH Medical Record System (AMRS) to store comprehensive electronic patient records for all enrolled patients. Patient records in the system contain demographic information, diagnoses and problem lists, clinical observations, medications, and diagnostic test results.

The study involved two clinics that were arbitrarily selected from three similar adult clinics at Eldoret, Kenya that provide the same type of care to HIV-positive adult patients. For each patient visit, a clinical summary was generated in the computer that includes selected fields from the patient’s record for quick reference to the patient’s most recent and pertinent information. In addition, patient-specific alert/reminder on CD4 ordering was included at the bottom of the summary. For one randomly selected clinic (intervention) out of the two, the clinical summary was provided to care providers in the format of a printed PDF, whereas care providers in the other clinic (control) did not receive the printed clinical summary. In spite of the lack of the summary, care providers in the control clinic still had access to all information included in the summary if they wanted to check them. The summary and reminder offered a convenient way for the care providers to examine the most relevant and important information. The overall goal of the study was to compare the ordering rate of CD4 test between the two clinics for those patient visits where CD4 should be ordered by standard criteria.

The clinical summary with reminders was provided to the intervention clinic during the whole month of February 2009. Data from both clinics in that month were analyzed. Specifically, 207 and 309 patient visits at the intervention and the control clinics that met the CD4 ordering criteria were included in the analysis. The crude CD4 ordering rates were 65.2% for the intervention clinic and 41.8% for the control clinic (Chi-square test *p*<0.0001).

Apparently, if provider–patient encounters were different between the two clinics in ways that affected the likelihood of CD4 ordering, the difference in the crude rate may be a biased estimate of the true intervention effect. At the provider level, those who have worked longer (and therefore more experienced) in the clinic might have different behaviors in ordering CD4 test as compared with those less experienced; female and male providers could also behave differently. At the patient level, characteristics that may have affected provider’s decision included the most recent CD4 test date and value, the length of period the patient had been in the AMPATH program, and the stages of AIDS by the definition of World Health Organization (WHO). We fit a probit model to the treatment assignment by including the above characteristics and use the inverse propensity score weighting to estimate the CD4 ordering rates had all provider–patient encounters been or not been provided with the summary and reminders. The rates are 64.5 and 46.0%, respectively, with a difference of 18.5% (Z-test *p=* 0.005, two-sided). Therefore, there is a strong evidence that the summary with reminder improves the CD4 ordering rate after confounding adjustment.

To examine the potential impact of uncontrolled confounding, we implemented the proposed sensitivity analysis. We consider the scenario that most likely puts the significant result in question. Specifically, when the correlation between the multiplicative error ε_{1} and *Y*_{1} (CD4 ordering under intervention) is positive and that between ε_{0} and *Y*_{0} (CD4 ordering without intervention) is negative, the estimators based on pseudo-propensity score over-estimate *E*(*Y*_{1}) and under-estimate *E*(*Y*_{0}). Therefore, we examined different combinations of ρ = ρ_{1}=−ρ_{0} and τ for their effects on the estimate of the intervention effect. The result is shown in Fig. 2. Figure 2(A) shows that if we assume the true propensity score also follows the probit model, then for the estimate of the intervention effect to go below 0, ρ has to be almost as high as 0.5 even when τ = 0.4 (or OR is beyond 2.4, Table 1). If we leave the true propensity score model unspecified, ρ has to be higher than 0.4 for λ(0.4) = 0.0881. On the other hand, when τ = 0.2 or the OR is 1.5–1.7, the estimate of the intervention effect is still positive even r ρ = 0.7 for both methods. In many practical applications, these correlation coefficients or ORs correspond to quite strong associations. To account for the sampling variation, Fig. 2(B) shows the lower limit of the 90% confidence interval (one-sided) for the intervention effect corresponding to t τ = 0.2. Here, the confidence interval accounts for the sampling variation when estimating treatment effect based on different values of ρ and τ. Clearly, the limit is above 0 when the correlation is less than 0.4. Hence, we feel the conclusion that the reminder improves CD4 ordering rate is fairly robust to uncontrolled confounding. Here, we chose the 90% instead of 95% confidence interval to avoid being over-conservative as we already (i) chose parameters that will mostly offset the treatment effect, (ii) examined the lower bound of the treatment effect (treatment effect is positive), and (iii) estimated the lower confidence limit of the lower bound. Note that the common values for λ_{1} and λ_{2} when assumption is made on the true propensity score model are 0.0063, 0.0245, 0.0526, and 0.0881. If we fix λ_{1} at these values and examine the compatible region of λ_{0} using Result 1, then the regions are (0.00003,0.64), (0.0001,0.88), (0.0003,0.91), and (0.0007,0.93). Therefore, the common values are legitimate.

Kereiakes et al. (2000) conducted an observational study at the Christ Hospital, Cincinnati, to evaluate the impact of adjunctive pharmacotherapy with abciximab platelet glycoprotein IIb/IIIa blockade administered during percutaneous coronary intervention (PCI) on costs and clinical outcomes in a high-volume interventional practice. It was shown that patients receiving abciximab (treated) had lower mortality rates than those who did not (control) and there was no significant difference in cost. We will focus on the outcome of cost to illustrate how our method can be used to analyze a continuous outcome.

We first did a logarithm-transformation of the cost variable, and then scaled the transformed variable to the unit interval. There are a total of 996 patients, among whom 698 received abciximab and 298 did not. A two-sample *T*-test shows that the intervention group had significantly higher cost than the control group (0.792 *versus* 0.777, *T*-test *p*<0.0001, two-sided). To account for potential confounders, we fit a probit propensity score model that includes coronary stent deployment (Yes/No), acute myocardial infarction within seven days prior to PCI (Yes/No), diagnosis of diabetes mellitus (Yes/No), and the left ventricular ejection fraction (continuous). These factors univariately correlate with at least one of the two measures: cost or abciximab usage. The inverse propensity score weighting leads to an estimate of the treatment effect of 0.024 (Z-test *p*<0.0001, two-sided), suggesting again an increase in cost in the intervention group after adjustment.

To investigate potential uncontrolled confounding, we conducted a similar sensitivity analysis. Since the outcome is continuous, the sensitivity analysis will generate estimates of the lower and upper bounds. For this example, it was the lower bound of the causal effect that is of our interest as a positive treatment effect was found. The result is shown in Fig. 3. Clearly, the positive intervention effect is much more sensitive to uncontrolled confounding compared with the previous example. For the estimate of the lower bound of the intervention effect to go below 0, ρ only needs to be a bit over 0.1 when τ = 0.4 or both Var_{τ}(*S ^{*}|S*)- and Var

In fact, we intentionally ignored a covariate in our probit model for the pseudo-propensity score: the number of vessels involved in the patient’s initial PCI. Univariate analysis shows that this variable is correlated with both treatment assignment and the cost. When included in the probit model after normalization,
(Wald test *P*<0.0001, two-sided). Therefore, this variable may be a potential confounder that induces the “treatment effect”. The sensitivity analysis shows that when τ = 0.3 a weak correlation ρ_{1}= − ρ_{0}>0.11 could be responsible for the “significant” increase in cost. In fact, when we use the propensity score obtained from the revised probit model with the number of vessels added, the intervention effect is not significant (−0.0019, Z-test *p*= 0.91, two-sided).

We propose a general framework and a parametric model for sensitivity analysis built on IPW, which can be used to examine the potential impact of uncontrolled confounding for causal inference using non-randomized data. It should be noted that the method can be readily applied to inference of marginal means when the data are subject to non-ignorable missingness (Little and Rubin, 1987). Our method offers a simple, intuitively appealing, and easy-to-implement procedure, which is only based on two parameters: the variance of the multiplicative error of the propensity score and its correlation with a potential outcome. Although we only illustrate the proposed methods through the estimation of ATE, the method can be readily applied to ATE on the treated (ATT). To see this, note that

Since Pr[Z= 1], *E*(*Y*_{0}|Z *=* 0), and *E*(*Y*_{1}|Z *=* 1) are estimable, the bounds of *E*(*Y*_{0}) can be directly translated to those of *E*(*Y*_{1}−*Y*_{0}|*Z*= 1).

We focus on uncontrolled confounding and its impact on causal inference in this article. Therefore, our working assumption is that the pseudo-propensity score model for included confounders is correctly specified. Our method does not address the bias induced by model-misspecification. This is why we considered the probit model instead of the logit model in Section 2.3. The reason is that if the true propensity score is a probit model, then the pseudo-propensity score is still a probit model, which eliminates model-misspecification in the pseudo-propensity score. This leaves uncontrolled confounders the only source of bias. It is well known that the logit model does not have this kind of property. Nevertheless, our general framework allows one to use any method to fit the pseudo-propensity score including the logit model. The probit model simply serves as a tool to guide the parameter specifications. In reality, bias can be induced by both model-misspecification and uncontrolled confounding. Therefore, an appropriate procedure should be employed to examine both factors for the evaluation of the robustness of the conclusion.

Sensitivity analysis is a critical component in observational studies to examine the robustness of the conclusion. For practical purposes, a key to construct such analysis is simple and sensible parameters, the scales of which are familiar to investigators. We choose to base our sensitivity analysis on (ρ,τ) as these two measures are of broad familiarity with clear and sensible scales. A sensitivity analysis should account for confounders we are not aware of. By its definition, the associations of these confounders with treatment assignment and potential outcomes are hard to postulate based on empirical experience/data. Therefore, the solution is to examine the impact of confounding under a range of different levels of these associations (*i.e*. low and high).

The motivation to restrict a continuous variable to the [0,1] interval is to bound its variance (from both up and below) by estimable parameters and the non-estimable target (*i.e. E*(*Y*_{1}), see Eq. 6). Then in Eq. (3), one can replace Var(*Y*_{1}) with its upper and lower bounds. This automatically eliminates Var(*Y _{1}*) from the equation so that the users do not need to worry about specifying Var(

Based on the shape of the second derivative of *f* in the interval defined by the bounds *of E*(*Y*_{1}), one can plug in the relevant bounds for *E*(*Y*_{1}) and Var(*Y*_{1}) to obtain bounds for *E*(*O*_{1}).

From the analytical standpoint, in many cases sensitivity analysis will generate some bounds. These bounds add another level of uncertainty in addition to sampling variation. Therefore, for a sensitivity analysis to be useful and informative, these bounds should be as sharp as possible to accurately characterize the robustness. Although we consider compatibility of λ_{1} and λ_{0} in our analysis, the compatibility between λ_{1} and ρ_{1} (or between λ_{0} and ρ_{0}) is more complicated and is not studied in this article. Therefore, our bounds are likely not to be sharp. Nevertheless, we believe our method is a balance of simplicity/sensibility and sharpness to be useful and accessible to medical researchers and scientists. A future direction is to investigate the simplification of the compatibility between λ_{1} and ρ_{1} to improve the sharpness of the bounds.

This work is supported in part by the National Institute of Aging (NIA) grant P30 AG10133, Agency for Healthcare Research and Quality (AHRQ) grants 290-04-0015 and R01 HS015409-02, a grant from the Regenstrief Foundation to the Regenstrief Center for Healthcare Improvement and Research, a grant from Abbott Fund, and a grant to the United States Agency for International Development-Academic Model Providing Access to Healthcare (USAID-AMPATH) Partnership from USAID as part of the President’s Emergency Plan for AIDS Relief (PEPFAR).

Therefore, λ_{0} = ν_{0}/*V*_{0}*≤ E*[*I*(*S*≥*q _{2}*)

Therefore;

Here, we show the derivation of the asymptotic distributions of and . Similar reasoning can be applied to derive the asymptotic distributions of and and the estimates of the bounds for the causal effect. We will consider a continuous outcome.

Let (X* _{i}, Z_{i}, Y_{i}*) (

where **Λ** is the variance-covariance matrix of **O*** _{i}*.

Since
and η_{1} = δ_{1}/π, the bounds determined by (6) are functions of ζ: *l*_{1} = *l*_{1}(*ζ*) and *L*_{1} = *L*_{1}(ζ). Then a direct application of the delta method leads to the asymptotic distributions of
and
.

Supporting Information for this article is available from the author or on the www under http://dx.doi.org/10.1002/bimj.201100042.

**Conflict of Interest**

The authors have declared no conflict of interest.

- Arah OA, Chiba Y, Greenland S. Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders. Annals of Epidemiology. 2008;18:637–646. [PubMed]
- Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–973. [PubMed]
- Brumback BA, Hernan MA, Haneuse SJ, Robins JM. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statistics in Medicine. 2004;23:749–767. [PubMed]
- Cochran WG. Matching in analytical studies. American Journal of Public Health and the Nations Health. 1953;43:684–691. [PubMed]
- Cole SR, Hernan MA, Margolick JB, Cohen MH, Robins JM. Marginal structural models for estimating the effect of highly active antiretroviral therapy initiation on CD4 cell count. American Journal of Epidemiology. 2005;162:471–478. [PubMed]
- Copas JB, Li HG. Inference for non-random samples. Journal of the Royal Statistical Society, Series B. 1997;59:55–95.
- Greenland S. Multiple-bias modelling for analysis of observational data (with discussion) Journal of the Royal Statistical Society, Series A. 2005;168:267–306.
- Hernan MA, Robins JM. Method for conducting sensitivity analysis. Biometrics. 1999;55:1316–1317. [PubMed]
- Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Services and Outcomes Research Methodology. 2001;2:259–278.
- Holland PW. Statistics and causal inference. Journal of the American Statistical Association. 1986;81:945–960.
- Imbens GW. Sensitivity to exogeneity assumptions in program evaluation. American Economic Review. 2003;93:126–132.
- Kereiakes DJ, Obenchain RL, Barber BL, Smith A, McDonald M, Broderick TM, Runyon JP, Shimshak TM, Schneider JF, Hattemer CR, Roth EM, Whang DD, Cocks D, Abbottsmith CW. Abciximab provides cost-effective survival advantage in high-volume inter-ventional practice. American Heart Journal. 2000;140:603–610. [PubMed]
- Ko H, Hogan JW, Mayer KH. Estimating causal treatment effects from longitudinal HIV natural history studies using marginal structural models. Biometrics. 2003;59:152–162. [PubMed]
- Kuroki M, Cai Z. Formulating tightest bounds on causal effects in studies with unmeasured confounders. Statistics in Medicine. 2008;27:6597–6611. [PubMed]
- Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948–963. [PubMed]
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; New York: 1987.
- MacLehose RF, Kaufman S, Kaufman JS, Poole C. Bounding causal effects under uncontrolled confounding using counterfactuals. Epidemiology. 2005;16:548–555. [PubMed]
- Robins JM. Association, causation, and marginal structural models. Synthese. 1999;121:151–179.
- Robins JM, Rotnitzky A, Zhao L. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866.
- Rosenbaum PR. Observational Studies. Springer; New York: 1995.
- Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Series B. 1983a;45:212–218.
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983b;70:41–55.
- Rosenbaum PR, Rubin DB. The bias due to incomplete matching. Biometrics. 1985;41:103–116. [PubMed]
- Rubin DB. Matching to remove bias in observational studies. Biometrics. 1973;29:159–183.
- Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701.
- Rubin DB. Bayesian inference for causal effects: the role of randomization. Annals of Statistics. 1978;6:34–58.
- Rubin DB. Causal inference using potential outcomes: design, modeling, decisions. Journal of the American Statistical Association. 2005;100:322–331.
- Rubin DB, Thomas N. Matching using estimated propensity scores: relating theory to practice. Biometrics. 1996;52:249–264. [PubMed]
- Sturmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. Performance of propensity score calibration – a simulation study. American Journal of Epidemiology. 2007a;165:1110–1118. [PMC free article] [PubMed]
- Sturmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. Propensity score calibration and its alternatives. American Journal of Epidemiology. 2007b;165:1122–1123. [PMC free article] [PubMed]
- Vander Weele TJ. The sign of the bias of unmeasured confounding. Biometrics. 2008;64:702–706. [PubMed]
- Were MC, Shen C, Tierney WM, Mamlin JJ, Biondich PG, Li X, Kimaiyo S, Mamlin BW. Evaluation of computer-generated reminders to improve CD4 laboratory monitoring in sub-Saharan Africa: a prospective comparative study. Journal of the American Medical Informatics Association. 2011;18:150–155. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |