Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2891886

Formats

Article sections

- Summary
- 1. INTRODUCTION
- 2. Motivating Problem
- 3. Potential Outcomes and Statistical Causality
- 4. Large Sample Properties and Standard Errors
- 5. Simulation Studies
- 6. Example
- 7. Discussion
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 June 28.

Published in final edited form as:

Published online 2009 June 9. doi: 10.1111/j.1541-0420.2009.01282.x

PMCID: PMC2891886

NIHMSID: NIHMS117725

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

For many diseases where there are several treatment options often there is no consensus on the best treatment to give individual patients. In such cases it may be necessary to define a strategy for treatment assignment; that is, an algorithm which dictates the treatment an individual should receive based on their measured characteristics. Such a strategy or algorithm is also referred to as a treatment regime. The optimal treatment regime is the strategy that would provide the most public health benefit by minimizing as many poor outcomes as possible. Using a measure that is a generalization of attributable risk and notions of potential outcomes, we derive an estimator for the proportion of events that could have been prevented had the optimal treatment regime been implemented. Traditional attributable risk studies look at the added risk that can be attributed to exposure of some contaminant, here we will instead study the benefit that can be attributed to using the optimal treatment strategy.

We will show how regression models can be used to estimate the optimal treatment strategy and the attributable benefit of that strategy. We also derive the large sample properties of this estimator. As a motivating example we will apply our methods to an observational study of 3856 patients treated at the Duke University Medical Center with prior coronary artery bypass graft surgery and further heart related problems requiring a catheterization. The patients may be treated with either medical therapy alone or a combination of medical therapy and percutaneous coronary intervention without general consensus on which is the best treatment for individual patients.

Since its inception by Levin (1953), attributable risk (*AR*) has become a widely used measure of association between risk factors and disease status. Jewell (2004) denotes the population attributable risk as the fraction of all disease cases that can be linked to exposure from some factor. *AR* has been referred to by many other names (e.g. ‘etiologic fraction’ or ‘attributable fraction’) and has a long history within the medical and epidemiology communities.

Observational data can be used to assess the impact of various treatment strategies on patient health outcomes, where a treatment strategy is defined as an algorithm which dictates the treatment an individual should receive based on their characteristics. Furthermore, we define the optimal treatment strategy to be the strategy that will result in the smallest proportion of poor outcomes (events) among the study population. In this manuscript, we propose a generalization of the attributable risk measure which is defined as the fraction of observed events that could have been prevented had the optimal treatment strategy been implemented. In contrast to traditional *AR* studies where the goal is to measure the added risk attributed to exposure of some contaminant, this estimator is intended to study the positive public health impact of a treatment strategy. As such we will refer to it as attributable benefit (*AB*).

For simplicity we consider the case where there are two treatment options which will be labeled by the treatment indicator *T* = 0 or 1. We denote the event of interest by the binary indicator *Y* where *Y* = 1 denotes poor outcome and *Y* = 0 denotes good outcome. Let *X* denote the vector of covariates measured on each individual that may be used to determine which treatment they receive, we denote a treatment strategy *g* as a function that maps values of *X* to 0 or 1. That is, *g*(*x*) denotes the treatment *T* (either 1 or 0) that will be assigned to an individual with covariates *X* = *x*.

The goals of this paper are to define and estimate the optimal treatment strategy *g _{opt}*(

Our motivating problem comes from the Duke Databank for Cardiovascular Diseases (DDCD), a large database housed at the Duke University Medical Center (DUMC). Started in 1969, the DDCD collects data on the in-hospital clinical course of all patients who undergo cardiac catheterization at Duke University Medical Center. A wide variety of baseline patient information is collected and the DDCD attempts to collect follow-up data on patients with at least one diseased vessel six months and one year after hospitalization and yearly thereafter. Data are collected in the form of a mailed questionnaire or surveyed by phone for those who do not respond to the mailing (Al-Khatib *et al.*, 2007). One of the largest and oldest such databases, the DDCD provides essential information to cardiologists about the effectiveness of various treatments on patient health.

Here we focus on the subset of patients with a previous coronary artery bypass graft (CABG) surgery and requiring a later catheterization due to continued symptoms. For technical reasons, the vast majority of patients are not considered to be candidates for a second CABG surgery and therefore their treatment options are limited to either medical therapies (MED) or some combination of medical therapies and percutaneous coronary intervention (PCI) which is also known as angioplasty. A question of clinical interest is whether PCI provides some benefit (here in the form of lower 1-year mortality rates) above optimal medical therapy for either the overall study population or some subset defined by clinical characteristics.

Using the ideas and notation developed by Rubin (1974) we define the potential outcomes *Y**(1) and *Y**(0) to denote the indicator of a poor outcome (event) for an arbitrary individual in our population had they received treatment 1 or 0 (respectively). In actuality, at most one of these potential outcomes can be observed for any given individual. In addition, we also collect baseline covariates *X* on these individuals. Consequently, the potential outcomes for any individual will be defined as ** W** = {

In contrast, the observable data will be denoted by ** O** = (

The first assumption, also known as the Stable Unit Treatment Value Assumption (SUTVA), states that the observed outcome for an individual is the same as the potential outcome for the corresponding treatment that the individual received. An underlying and often overlooked part of this assumption is that under SUTVA, we assume that there is no interference between units. According to Rubin (1986), the SUTVA assumption implies that the value of the potential outcome for a subject does not depend on what treatments other subjects receive. In terms of the current framework, the SUTVA assumption can be written as *Y* = *Y**(1)*T* + *Y**(0)(1 − *T*).

The second assumption is also referred to as the strong ignorability assumption or the no unmeasured confounders assumption. This assumption states that the treatment *T* assigned to an individual is independent of the potential outcomes {*Y**(1), *Y**(0)} conditional on the covariates *X*. In an observational study the treatment that a physician chooses to give a patient can be reasonably assumed to only be based on characteristics of the patient at the time of treatment and not on the patients potential outcome (which of course is not known at the time of treatment). Consequently, this second assumption will be tenable if the key factors influencing the decision making process for a physician were captured in the data *X*. If, however, there are additional factors beyond the data *X* that influence treatment decisions, then this assumption may not hold. We discuss this issue further in Section 7; however, for the remainder of this paper we will assume both SUTVA and strong ignorability hold.

Under these assumptions, for example, we can compute the marginal probability *P*{*Y**(0) = 1} from the distribution of the observed data as

$$\begin{array}{l}P\{{Y}^{\ast}(0)=1\}={E}_{X}[P\{{Y}^{\ast}(0)=1\mid X\}]\\ ={E}_{X}[P\{{Y}^{\ast}(0)=1\mid X,T=0\}]\\ ={E}_{X}\{P(Y=1\mid X,T=0)\},\end{array}$$

where we use the notation *E _{X}*(·) to emphasize that this expectation is taken with respect to the marginal distribution of

$$P\{{Y}^{\ast}(1)=1\}={E}_{X}\{P(Y=1\mid X,T=1)\}.$$

As mentioned previously, a treatment regime is an algorithm which assigns treatment to a patient according to factors that are measured on the individual. As such, we define a treatment regime to be a function *g* which maps the values of *X* to 0 or 1. Hence the treatment regime would assign treatment *g*(*x*) (either 0 or 1) to an individual whose covariates *X* = *x*. We also denote by the class of all such treatment regimes. We can now define the potential outcome for an arbitrary individual in our population under the hypothetical situation that treatment regime *g* was being implemented by *Y**{*g*(*X*)} = *Y**(1)I{*g*(*X*) = 1} + *Y**(0)I{*g*(*X*) = 0} where I(·) denotes the indicator function. The most simple strategies would be *g*_{0} or *g*_{1} where all patients were given treatment 0 or 1 (respectively) regardless of the values of their covariates *X*.

We now define the optimal strategy as the one which results in the smallest probability of a poor outcome. That is,

$${g}_{\mathit{opt}}(X)=arg\underset{g\in \mathcal{G}}{min}(E[{Y}^{\ast}\{g(X)\}]).$$

Under the SUTVA and no unmeasured confounders assumption, it is straightforward to show that

$$\begin{array}{l}E[{Y}^{\ast}\{g(X)\}]=P[{Y}^{\ast}\{g(X)\}=1]\\ ={E}_{X}[P(Y=1\mid T=1,X)\text{I}\{g(X)=1\}\\ +P(Y=1\mid T=0,X)\text{I}\{g(X)=0\}],\end{array}$$

(1)

and the optimal treatment regime is given by

$${g}_{\mathit{opt}}(X)=\text{I}\{P(Y=1\mid T=1,X)-P(Y=1\mid T=0,X)<0\}.$$

(2)

That is, the optimal treatment regime would assign treatment 1 to an individual with baseline covariates X if the conditional probability that *Y* = 1 given *T* = 1 and *X* is smaller than the conditional probability *Y* = 1 given *T* = 0 and *X*; otherwise give treatment 0.

Now that we have some notation and insight into the different treatment regimes, we can formally define the parameter of interest. Recall that the objective is to estimate the proportion of events that would have been prevented had the optimal strategy been used. We define this quantity as the attributable benefit of using the optimal treatment regime. We start by defining the attributable benefit for any treatment regime, say *g*(*X*). From the notation above we write *AB _{g}* as:

$$A{B}_{g}=\frac{P(Y=1)-P[{Y}^{\ast}\{g(X)\}=1]}{P(Y=1)}$$

(3)

$$=1-\frac{P[{Y}^{\ast}\{g(X)\}=1]}{P(Y=1)}.$$

(4)

Thus we define *AB _{opt}* as:

$$A{B}_{\mathit{opt}}=1-\frac{P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}}{P(Y=1)}.$$

(5)

Since *P*(*Y* = 1) denotes the overall probability of failure in the population using whatever is the current clinical practice, then this can be estimated by the simple sample proportion
${\overline{Y}}_{n}={\displaystyle \sum _{i=1}^{n}}{Y}_{i}/n$ without making any additional assumptions. Therefore, we will focus attention on estimating the treatment regime *g _{opt}* and the proportion of events for that regime

We note that *AB _{g}* is a more general type of adjusted attributable risk that is commonly used in epidemiological research. That is, if we were considering estimating the proportion of disease that can be attributed to exposure (

$$\frac{P(Y=1)-P\{{Y}^{\ast}(0)=1\}}{P(Y=1)},$$

which under the assumptions of SUTVA and no unmeasured confounders is equal to

$$\frac{P(Y=1)-{E}_{X}\{P(Y=1\mid T=0,X)\}}{P(Y=1)},$$

(6)

which is what is being estimated using the estimators for adjusted attributable risk given by Benichou (2001) as well as Graubard and Fears (2005).

Returning to the more general case, we now consider how to estimate *P*{*Y**(*g _{opt}*) = 1} from the sample of observed data (

$$P(Y=1\mid T,X)=\mu (T,X,\mathit{\beta}).$$

(7)

Because *Y* is a binary variable, a natural choice is to use logistic regression models which are used to study the impact of treatment, covariates, and their interaction. If we define *d*(*x*, ** β**) =

$$\prod _{i=1}^{n}{\{\mu ({T}_{i},{X}_{i},\mathit{\beta})\}}^{{Y}_{i}}{\{1-\mu ({T}_{i},{X}_{i},\mathit{\beta})\}}^{1-{Y}_{i}}.$$

For many practical situations there may exist a preferred or standard treatment, say treatment 0. As suggested to us by the Editor, for such scenarios, we may only want to give treatment 1 when this treatment is shown to be significantly better than treatment 0. Toward that end we also want to consider the more conservative estimated treatment strategy *ĝ _{opt}*(

Now if we define

$$\begin{array}{l}h(x,\mathit{\beta},c)=\mu (1,x,\mathit{\beta})A(x,\mathit{\beta},c)+\\ \mu (0,x,\mathit{\beta})\{1-A(x,\mathit{\beta},c)\},\end{array}$$

(8)

then in accordance to equations (1) and (2), *P*{*Y**(*g _{opt}*) = 1} =

$${\widehat{P}}_{n}\{{Y}^{\ast}({g}_{\mathit{opt}})=1,c\}={n}^{-1}\sum _{i=1}^{n}h({X}_{i},{\widehat{\mathit{\beta}}}_{n},c).$$

Finally, the estimator for attributable benefit for the optimal treatment strategy would be

$${\widehat{AB}}_{\mathit{opt}}(c)=1-\frac{{n}^{-1}{\displaystyle \sum _{i=1}^{n}}h({X}_{i},{\widehat{\mathit{\beta}}}_{n},c)}{{\overline{Y}}_{n}}.$$

The standard estimator for *AB _{opt}* without using the conservative strategy will also be denoted by
${\widehat{AB}}_{\mathit{opt}}$; that is,
${\widehat{AB}}_{\mathit{opt}}={\widehat{AB}}_{\mathit{opt}}(0)$.

Because the response *Y* is binary, we generally will posit a logistic regression model,

$$\text{logit}\{P({Y}_{i}=1\mid {X}_{i},{T}_{i})\}={\mathit{\beta}}^{T}f({X}_{i},{T}_{i}),$$

(9)

where *f*(*X _{i}*,

When considering the optimal treatment strategy we are entertaining the possibility of assigning different treatments depending on the value of the measured covariates *X*. Statistical models where *μ*(1, *x*, ** β**) <

The issue of qualitative interactions is controversial. It is not unusual in the analysis of a particular study to discover qualitative interactions which are then found to be spurious in subsequent studies. This has caused many warnings by statisticians to be cautious about the interpretation of the results of such models. Formal statistical tests for qualitative interactions have been developed; for example, see Gail and Simon (1985). In fact, Peto (1995) argues that quantitative interactions may be expected in clinical trials but qualitative interactions are not.

Although it may be argued that in well controlled targeted clinical trials qualitative interactions are not to be expected, the setting we are considering in this paper is one in which different treatments are actually being used in practice with no general consensus on which one is the superior treatment. In such settings, it may be the case that different treatments may be better or worse for different subgroups of patients. Moreover, thanks to the recommendation made by the Editor, if there are situations where one treatment is considered to be standard, then we are also employing a conservative treatment strategy where we only recommend the alternative treatment when there is strong statistical evidence that it is significantly better. Such a strategy should alleviate the difficulties that spurious qualitative interactions create to some extent.

There may also be the belief that quantitative interactions need not be considered in a model because the inclusion of such terms will not affect treatment choice. Indeed, leaving out quantitative interactions in a model will generally not affect the optimal treatment regime but, as we will demonstrate later in some of our simulation studies, not including important interaction terms, even if they are quantitative interactions, will lead to biased estimators of the attributable benefit.

Because the estimator for attributable benefit involves the ratio of two estimators, it will be convenient to work with the transformation log(1 − *AB _{opt}*). We use large sample theory to show that
$log\{1-{\widehat{AB}}_{\mathit{opt}}(c)\}=log[{\widehat{P}}_{n}\{{Y}^{\ast}({g}_{\mathit{opt}})=1,c\}]-log({\overline{Y}}_{n})$ is asymptotically normal and derive standard errors via the sandwich variance estimator. Most of this work surrounds finding the influence function for
$log\{1-{\widehat{AB}}_{\mathit{opt}}(c)\}$, the details are available in the appendix. The reader may see Tsiatis (2006) for a complete overview of the theory of influence functions.

In the case of a regression model (9), we estimate the influence function for
$log\{1-{\widehat{AB}}_{\mathit{opt}}(c)\}$ for the *i ^{th}* individual,

$${\widehat{\psi}}_{i}=\frac{{\mathrm{\Delta}}_{1i}+{\mathrm{\Delta}}_{2i}}{{\mathrm{\Delta}}_{3}}-{\mathrm{\Delta}}_{4i},$$

(10)

where

$$\begin{array}{l}{\mathrm{\Delta}}_{1i}=h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)-{n}^{-1}\sum _{i=1}^{n}h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c),\\ {\mathrm{\Delta}}_{2i}=\frac{\widehat{\partial}E\{h(X,{\mathit{\beta}}_{\mathbf{0}},0)\}}{\partial {\beta}^{T}}{\left({n}^{-1}\sum _{i=1}^{n}\left[\frac{{f}_{i}{f}_{i}^{T}exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}{{\{1+exp({\widehat{\mathit{\beta}}}_{\mathit{n}}{f}_{i})\}}^{2}}\right]\right)}^{-1}\\ \times {f}_{i}\left\{{Y}_{i}-\frac{exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}{1+exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}\right\},\\ {\mathrm{\Delta}}_{3}={n}^{-1}\sum _{i=1}^{n}h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c),\\ {\mathrm{\Delta}}_{4i}=\frac{{Y}_{i}-{n}^{-1}{\displaystyle \sum _{i=1}^{n}}{Y}_{i}}{{n}^{-1}{\displaystyle \sum _{i=1}^{n}}{Y}_{i}}.\end{array}$$

Note that the estimated gradient ∂̂*E*{*h*(*X*, *β*_{0}, 0)}/*β** ^{T}* is derived using numerical derivatives as described in the appendix (A.6).

Now that we have derived an estimated influence function for $log\{1-{\widehat{AB}}_{\mathit{opt}}(c)\}$, the estimated variance, using the sandwich variance estimator, is given by

$$\widehat{V}={n}^{-1}\sum _{i=1}^{n}\left(\frac{{\mathrm{\Delta}}_{1i}+{\mathrm{\Delta}}_{2i}}{{\mathrm{\Delta}}_{3}}-{\mathrm{\Delta}}_{4i}\right)\phantom{\rule{0.16667em}{0ex}}{\left(\frac{{\mathrm{\Delta}}_{1i}+{\mathrm{\Delta}}_{2i}}{{\mathrm{\Delta}}_{3}}-{\mathrm{\Delta}}_{4i}\right)}^{T}.$$

In the appendix, we argue that one of the conditions necessary to prove the asymptotic normality of our estimator is that *E*{*h*(*X*, ** β**, 0)} be differentiable in

For scenarios where there is a standard treatment and the strong null hypothesis is true, then the conservative strategy would with high probability conclude to give everyone the standard treatment. Consequently, the conservative strategy may not be as susceptible to the large sample difficulties under the null hypothesis. This will be investigated further in the simulation studies.

We consider two different methods for constructing (1− *α*) confidence intervals for *AB _{opt}*. The first involves exponentiating the (1 −

$$1-exp\left[log\left\{\sum _{i=1}^{n}h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)/\sum _{i=1}^{n}{Y}_{i}\right\}\pm {z}_{\alpha /2}\sqrt{\widehat{V}}\right],$$

where *z _{α/}*

$${\widehat{AB}}_{\mathit{opt}}(c)\pm {z}_{\alpha /2}\left\{\sum _{i=1}^{n}h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)/\sum _{i=1}^{n}{Y}_{i}\right\}\sqrt{\widehat{V}}.$$

It is not immediately clear if one interval is superior to the other, though it has been suggested that the back-transformed intervals are superior. We will explore this through simulation in the next Section.

We report results of several simulations, each based on 10000 Monte Carlo data sets. For simplicity we consider data of the form *O _{i}* = (

$$\text{logit}\{P({T}_{i}=1\mid {X}_{i})\}={\alpha}_{0}+{\alpha}_{1}{X}_{i},$$

(11)

and a Bernoulli disease indicator *Y _{i}* from the logistic regression model:

$$\text{logit}\{P({Y}_{i}=1\mid {T}_{i},{X}_{i})\}={\beta}_{0}+{\beta}_{1}{T}_{i}+{\beta}_{2}{X}_{i}+{\beta}_{3}{T}_{i}{X}_{i}.$$

(12)

Thus different choices of ** γ** = {

Once the data were generated, estimates of (*β*_{0}, *β*_{1}, *β*_{2}, *β*_{3}) were found for each Monte Carlo dataset using SAS logistic procedure. The generated data and logit model estimates were then input into SAS IML where
${\widehat{AB}}_{\mathit{opt}}$, and confidence intervals were calculated for each Monte Carlo dataset.

Table 1 illustrates seven different examples with different combinations of model parameters. The first six are cases where there is treatment difference and hence the asymptotic theory for estimating attributable benefit should hold. In the last example there is no treatment difference and as indicated in our remark 2 the asymptotic theory may break down. Although we did argue that we may not be interested in estimating attributable benefit if there is no evidence of a significant treatment effect, this scenario was included in order to investigate the extent of difficulty that such a scenario presents.

Simulation Results for ÂB_{opt}; BT represent back-transformed confidence intervals while DT represents delta-theorem based confidence intervals

We depict the true ** γ** and

Examining table 1 we see that for the first six scenarios the Monte Carlo average of the estimates for ${\widehat{AB}}_{\mathit{opt}}$ were indeed very close to the true values as was the Monte Carlo standard error and the average of the delta-theorem standard errors. Both sets of confidence intervals provided very good coverage with probabilities around 0.95. In most cases the back-transformed interval had better coverage than its delta-theorem counterpart.

From these examples we see the importance of *AB _{opt}* as a measure of a regime’s benefit when there are even small interactions between treatment and confounders. But in simulation scenario 7 we do see that when there is no treatment difference
${\widehat{AB}}_{\mathit{opt}}$ overestimates the true attributable benefit of zero (as would be expected) and underestimates the coverage probability.

We also considered the conservative estimate
${\widehat{AB}}_{\mathit{opt}}(c)$, with *c* = −1.96; i.e., the strategy where we give the nonstandard treatment only to those individuals where the nonstandard treatment is significantly better at the .025 level of significance. Specifically, we considered the conservative strategy *ĝ _{opt}*(

The results are given in table 2. In cases where *AB _{dom}* =

Simulation Results for ÂB_{cons}; BT represent back-transformed confidence intervals while DT represents delta-theorem based confidence intervals

In addition to the simulations listed in table 1, additional simulations were performed to look at various aspects of model and regime misspecification. Simulations with multiple covariates (see the Web Appendix B) had similar results to the single covariate case in showing the effectiveness of *AB _{opt}* as a measure of public health interest. Model overfitting was studied in the multiple covariate case and confidence intervals from overfit models maintained good coverage for the true

While model overfitting did not seem to be a problem, model underfitting could lead to biased estimators. Table 3 demonstrates four scenarios where the model was misspecified by leaving out an important interaction term. That is, the data were generated as specified in table 1, but the model that was fit was logit{*P*(*Y _{i}* = 1|

Misspecification Simulation Results; BT represent back-transformed confidence intervals while DT represents delta-theorem based confidence intervals

Consider first simulation 2, the proportion of deaths from the dominant treatment was only slightly larger than the optimal treatment regime resulting in similar *AB*’s (i.e., *AB _{dom}* = .577,

Recall our motivating problem from the Duke Databank for Cardiovascular Diseases (DDCD) and that we are focused on the subset of patients with a previous bypass surgery and requiring a later catheterization due to continued symptoms. For technical reasons, the vast majority of patients are limited to some form of medical therapy (MED) or a combination of medical therapy and percutaneous coronary intervention (PCI). The question of clinical interest is whether PCI provides some benefit in lowering 1-year mortality rates above optimal medical therapy for either the overall study population or some subset defined by clinical characteristics.

In studying this question we will explore the effects that covariates such as patient demographics (i.e., age and gender), health status (i.e., body mass index, pulse, patient and family history of coronary artery disease), and cardiovascular measurements (i.e., ejection fraction, number of diseased vessels) play as potential confounders. To see the benefit of using *AB _{opt}* we will compare estimators from models with zero, one, and several confounding variables. The goal of which is to explore the impact of an optimal treatment strategy while simultaneously examining the effects of confounding. The benefit of the optimal treatment regime will be estimated by the proportion of one-year deaths that could have been prevented if that optimal regime had been followed. The study population includes 3856 individuals with a follow-up catheterization between the years 1986–2001 and a history of CABG surgery who did not qualify for a second bypass surgery.

In the entire sample there were 267 deaths within 1 year of the follow-up catheterization, 195 first year deaths for those with medical therapy alone (7.83%) and 72 first-year deaths from patients with a combination of medical therapy and angioplasty (5.27%). We will define treatment in terms of a patient obtaining the poorer naive treatment. That is *T* = {0 if the patient received angioplasty and medical therapy, 1 if the patient received only medical therapy}. In trying to find the optimal treatment regime we used logistic regression models to predict the probability of death at or before 1 year. In this example, we will assume that the optimal treatment regime will come from the “best” model of one year death rates based on treatment and covariates.

For this discussion we will consider many different models, starting with the simplest model: first year deaths modeled as a function of treatment with no confounding variables. Table 4 has several different logistic regression outputs, starting with treatment as the sole predictor in model 1. With the complete data we have an estimate of the probability of first year death (*Y* = 1) = 0.0692. Under this model the treatment regime would be *g*_{0}, because without interactions the model favors a combination of angioplasty and medical therapy. From the data we calculate an estimate of
${\widehat{AB}}_{\mathit{opt}}=0.2393$ with confidence intervals {0.0787, 0.3720} (back-transformed method) and {0.0936, 0.3851} (delta method).

To begin to see the effect of confounding, model 2 introduces a statistically significant covariate (*p* < 0.05). This model again favors *g*_{0} as the regime of choice and we find
${\widehat{AB}}_{\mathit{opt}}=0.2042$ with confidence intervals {0.0377, 0.3418} (back-transformed method) and {0.0530, 0.3553} (delta method). We see an 3.5% drop in risk reduction with the addition of a single confounder, and note the agreement between the *AB _{opt}* confidence intervals and logistic model p-value for treatment effect. For model 3, backward selection regression techniques were used to include main effects into the model. Individual tests for interactions between treatment and main effects were performed to help arrive at the “best” model for this data. Ejection fraction values were missing for 561 patients, 420 of which had angioplasty and medical therapy as their treatment. It was determined that a missing ejection fraction value was informative to treatment decision and as such was itself a potential confounding variable. In order to keep all observations in the analysis an indicator variable was defined for presence of ejection fraction value, then an interaction term was used in the model to determine the effect of an ejection fraction measurement. Note that if a patient did not have an ejection fraction measurement then the indicators value is zero. This is an important aspect to the results because this model now has different confounders working in both directions; that is some confounders result in a higher probability of dying within the first year of treatment, while others lower that probability.

Model 3 has an estimate of
${\widehat{AB}}_{\mathit{opt}}=0.2157$ with confidence intervals {0.0356, 0.3621} (back-transformed method) and {0.0536, 0.3777} (delta method). Therefore the “best” model for this data has *AB _{opt}* estimates very close to the naive model, but there would have been no way to know apriori that the confounders would have worked in opposing directions. Model 3 again favors the optimal treatment regime where every patient is treated with a combination of angioplasty and medical therapy. To examine the effect of extraneous variables in the model we consider model 4 which has a non-significant interaction term added (interacting treatment with congestive heart failure status). Model 4 had an estimate of
${\widehat{AB}}_{\mathit{opt}}=0.1950$ with confidence intervals {.0066, 0.3476} (back-transformed method) and {0.0257, 0.3642} (delta method). So we see that adding an insignificant variable to the model lowered
${\widehat{AB}}_{\mathit{opt}}$ by roughly 2% while increasing the delta theorem standard error. Therefore, we conclude from this analysis that all patients should be treated with a combination of angioplasty and medical therapy. We have restricted our discussion to calculating estimates of

In this paper we have introduced the notion of attributable benefit associated with a treatment strategy. This measure can be of public health interest and represents the fraction of events that could have been prevented by using the optimal treatment regime. While similar in spirit to adjusted attributable risk, the measure is designed to study the positive public health impact of various treatment strategies. This estimator is most useful in cases where there are qualitative interactions between treatment and patient covariates. In studying the optimal strategy two alternate strategies were also discussed; first a strategy involving giving everyone the dominant treatment, the second involving a conservative strategy where one assigns a standard treatment unless a competing treatment is shown to be significantly better in lowering the risk of an adverse outcome.

In this paper we have focused on deterministic treatment regimes *g* which mapped the values of *x* to either 0 or 1. We could have also considered random treatment regimes where *g*(*x*) = *p* for 0 ≤ *p* ≤1 to denote the strategy where patients are randomized to treatment 1 with probability *p* and to treatment 0 with probability (1−*p*) whenever *X* = *x*. In fact, under the assumptions of this paper, the observed data can be viewed as resulting from a random treatment regime *g _{curr}*(

$$\begin{array}{l}{g}^{\ast}(x)=A(x,{\widehat{\beta}}_{n},c)+\{1-B(x,{\widehat{\beta}}_{n},c)\}+\\ C(x,{\widehat{\beta}}_{n},c)P(T=1\mid X=x),\end{array}$$

where *A*(*x, _{n}, c*) =

$$\begin{array}{l}{n}^{-1}\sum _{i=1}^{n}[A({X}_{i},{\widehat{\beta}}_{n},c)\mu (1,{X}_{i},{\widehat{\beta}}_{n})+\\ \{1-B({X}_{i},{\widehat{\beta}}_{n},c)\}\mu (0,{X}_{i},{\widehat{\beta}}_{n})+C({X}_{i},{\widehat{\beta}}_{n},c){Y}_{i}]\end{array}$$

without having to posit a model for *p*(*T* = 1|*X*). We did not formally evaluate this treatment strategy but we expect it will have properties similar to the conservative strategy discussed in the paper.

Two types of confidence intervals were obtained; the back-transformed interval and the delta method interval. Both had good coverage in the simulation studies we displayed, although the back-transformed generally has more accurate coverage probabilities especially in simulation studies we conducted with smaller sample sizes (not shown here).

With regards to the role of quantitative versus qualitative interactions of treatment and patient covariates and the concern of over-interpreting these interactions, we found that leaving out important interactions, whether qualitative or quantitative, can lead to biased estimators for the attributable benefit of the optimal treatment strategy *AB _{opt}* and that models that were overfit generally led to better estimators for

There is also the concern about sample size, especially because the sample size necessary to detect important interactions with sufficient statistical power may be prohibitively large. Even though we do not have much control over the sample size of observational databases that are used for this type of research, we believe estimating the optimal treatment strategy, especially using a conservative strategy, is worthwhile. The utility and the accuracy of such a treatment strategy can then be evaluated by the estimator for the attributable benefit and the corresponding confidence interval.

We believe that estimating the attributable benefit of a treatment regime from observational data can have positive public health impact, but only in those situations where it is appropriate to do so. As we mentioned earlier, the key assumption that allows us to derive estimators for causal parameters is that of no unmeasured confounders. This, unfortunately, is also the most difficult to verify in practice and hence some have argued against using medical databases to make decisions on treatment; see, for example some excellent discussions on this point by Green and Byar (1984) and Yusuf et al. (1986). But we believe that these databases can offer valuable insight if the assumption of no unmeasured confounders was tenable. In order for that this assumption be plausible we must have some confidence that all important information that may affect the treatment decision process be captured in the database. For that reason, it is important during the creation of such a database that discussions with investigators be carried out to identify all the factors that they believe would influence their treatment decisions and all possible effort be made to capture such information. The Duke Databank for Cardiovascular Diseases (DDCD) used in our analysis has been a rich source of data for cardiovascular research in evaluating treatments for many years and has been refined to include what are believed to be the most important factors used in treatment decision making. Even so, there may be utility in conducting sensitivity analyses that look at the effect that violations of the no unmeasured confounders assumption would have on treatment effect. Such methods have been proposed, for example, by Rosenbaum and Rubin (1983) and Brumback *et al.* (2004). A sensitivity analysis in the spirit of Brumback *et al.* (2004) is proposed in the Web Appendix A.

One potential problem with this estimator, and all model based adjusted attributable risk estimators, is in model misspecification. This entire methodology is predicated on having the correct statistical model. In future work we will consider using doubly robust inverse propensity score estimators for the *AB* that will allow greater flexibility in finding unbiased estimators. For those interested in the current estimator, a SAS macro has been developed to calculate estimates along with back-transform and delta-theorem confidence intervals. Coding and details of the macro can be found in Web Appendix C.

This work was supported by NIH grants R37 AI031789 and R01 CA051962. The authors are indebted to the editor for the suggestion to consider the conservative estimator for the optimal treatment regime which is now an integral part of this paper.

In this Section we will demonstrate that
${\widehat{AB}}_{\mathit{opt}}(c)$ is asymptotically normal and derive an estimator for its asymptotic variance by finding its influence function. Specifically, if we denote the observed data by the i.i.d. random vectors ** O_{1}**, …,

$${n}^{1/2}({\widehat{\mathit{\alpha}}}_{\mathit{n}}-{\mathit{\alpha}}_{0})={n}^{-1/2}\sum _{i=1}^{n}{\psi}_{{\widehat{\mathit{\alpha}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}})+{o}_{p}(1),$$

(A.1)

where *ψ*_{n}(** O_{i}**) is a mean zero random vector with finite variance matrix and

$$\widehat{E}(\psi {\psi}^{T})={n}^{-1}\sum _{i=1}^{n}{\psi}_{{\widehat{\mathit{\alpha}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}}){\psi}_{{\widehat{\mathit{\alpha}}}_{\mathit{n}}}{({\mathit{O}}_{\mathit{i}})}^{T}.$$

Here we will determine the influence function for estimators of several parameters and show how the estimated influence functions will be used to derive the sandwich estimator for the asymptotic variance of our attributable benefit estimator.

In order to find the estimated influence function for
${\widehat{AB}}_{\mathit{opt}}(c)$, it suffices to find the estimated influence function of log[* _{n}*{

If we denote by *β*_{0} the true value of ** β** from equation (7) then we note that

$${\widehat{P}}_{n}\{{Y}^{\ast}({g}_{\mathit{opt}})=1,0\}={n}^{-1}\sum _{i=1}^{n}h({X}_{i},{\mathit{\beta}}_{0},0),$$

(A.2)

where *h*(*X _{i}*,

$$E\{h({X}_{i},{\mathit{\beta}}_{0},0)\}=P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}.$$

(A.3)

By definition, *h*(*X _{i}*,

$$\begin{array}{l}{n}^{1/2}\left[{\widehat{P}}_{n}\{{Y}^{\ast}({g}_{\mathit{opt}})=1,c\}-P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}\right]=\\ {n}^{-1/2}\sum _{i=1}^{n}[h({X}_{i},{\mathit{\beta}}_{0},0)-E\{h({X}_{i},{\mathit{\beta}}_{0},0)\}]\end{array}$$

(A.4)

$$+{n}^{-1/2}\sum _{i=1}^{n}\{h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)-h({X}_{i},{\mathit{\beta}}_{\mathbf{0}},0)\}.$$

(A.5)

Note that (A.4) is a sum of mean zero i.i.d. random variables Using empirical process theory as described by van der Vaart (1998), if we can show that the functions *h*(*x*, *β*, 0) in a neighborhood of *β*_{0} form a Donsker class, then we can use Lemma 19.24 of van der Vaart to show that (A.5) can be written as

$$\frac{\partial E\{h(X,{\beta}_{0},0)\}}{\partial {\beta}^{T}}{n}^{1/2}({\widehat{\beta}}_{n}-{\beta}_{0})+{o}_{p}(1),$$

in which case, the asymptotic results will go through as long as *E*{*h*(*X, β*, 0)} is differentiable in *β* even if *h*(*X, β,* 0) is not. For the models we are considering the functions will generally satisfy a Lipschitz condition; namely, that |*h*(*x, β*_{1,} 0) − *h*(*x, β*_{2,} 0)|≤ *m*(*x*)||*β*_{1} − *β*_{2}|| for *β*_{1} and *β*_{2} in a neighborhood of *β*_{0}, where *m*(*X*) is a square integrable function, in which case, Example 19.7 of van der Vaart (1998) can be used to show that the functions form a Donsker class. There are some cases however when even *E*{*h*(*X*, ** β**, 0)} is not differentiable, in which case, the asymptotic results may not hold and we may want to avoid such situations (see remark 2).

Standard results for the maximum likelihood estimator can be used to show that

$${n}^{1/2}({\widehat{\mathit{\beta}}}_{\mathit{n}}-{\mathit{\beta}}_{\mathbf{0}})={n}^{-1/2}\sum _{i=1}^{n}{\psi}_{{\widehat{\mathit{\beta}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}},{\beta}_{0})+{o}_{p}(1),$$

where

$$\begin{array}{l}{\psi}_{{\widehat{\mathit{\beta}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}})={\left[E\left\{\frac{(\partial {\mu}_{i}(\mathit{\beta})/\partial \mathit{\beta})(\partial {\mu}_{i}(\mathit{\beta})/\partial {\mathit{\beta}}^{T})}{{\mu}_{i}(\mathit{\beta})\{1-{\mu}_{i}(\mathit{\beta})\}}\right\}\right]}^{-1}\\ \left[\frac{(\partial {\mu}_{i}(\mathit{\beta})/\partial \mathit{\beta})}{{\mu}_{i}(\mathit{\beta})\{1-{\mu}_{i}(\mathit{\beta})\}}\right]\{{Y}_{i}-{\mu}_{i}(\mathit{\beta})\},\end{array}$$

and *μ _{i}*(

$$\begin{array}{l}{\psi}_{{\widehat{\mathit{\beta}}}_{\mathit{n}}}({Y}_{i},{T}_{i},{X}_{i},\mathit{\beta})={\left(E\left[\frac{{f}_{i}{f}_{i}^{T}exp({\mathit{\beta}}^{T}{f}_{i})}{{\{1+exp({\mathit{\beta}}^{T}{f}_{i})\}}^{2}}\right]\right)}^{-1}\\ \times {f}_{i}\left\{{Y}_{i}-\frac{exp({\mathit{\beta}}^{T}{f}_{i})}{1+exp({\mathit{\beta}}^{T}{f}_{i})}\right\},\end{array}$$

where *f _{i}* =

$$\begin{array}{l}{\widehat{\psi}}_{{\widehat{\mathit{\beta}}}_{\mathit{n}},i}={\left({n}^{-1}\sum _{i=1}^{n}\left[\frac{{f}_{i}{f}_{i}^{T}exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}{{\{1+exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})\}}^{2}}\right]\right)}^{-1}\\ \times {f}_{i}\left\{{Y}_{i}-\frac{exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}{1+exp({{\widehat{\mathit{\beta}}}_{\mathit{n}}}^{T}{f}_{i})}\right\}.\end{array}$$

Combining these results, we have

$$\begin{array}{l}{n}^{1/2}\left[{\widehat{P}}_{n}\{{Y}^{\ast}({g}_{\mathit{opt}})=1,c\}-P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}\right]\\ ={n}^{-1/2}\sum _{i=1}^{n}(h({X}_{i},{\mathit{\beta}}_{0},0)-E\{h({X}_{i},{\mathit{\beta}}_{0},0)\}\\ +[\partial E\{h(X,{\mathit{\beta}}_{0},0)\}/\partial {\mathit{\beta}}^{T}]{\psi}_{{\widehat{\mathit{\beta}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}},{\mathit{\beta}}_{0}))+{o}_{p}(1).\end{array}$$

Let us denote the *i ^{th}* term in this summation; i.e., the influence function of

$$\begin{array}{l}{\widehat{q}}_{i}=\left\{h({X}_{i},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)-{n}^{-1}\sum _{j=1}^{n}h({X}_{j},{\widehat{\mathit{\beta}}}_{\mathit{n}},c)\right\}\\ +[\widehat{\partial}E\{h(X,{\mathit{\beta}}_{0},0)\}/\partial {\mathit{\beta}}^{T}]{\widehat{\psi}}_{{\widehat{\mathit{\beta}}}_{\mathit{n},\mathit{i}}},\end{array}$$

where the gradient *E*{*h*(*X*, *β*_{0}, 0)}/*β** ^{T}* is estimated using numerical derivatives; namely, where the

$$\frac{{n}^{-1}{\sum}_{i=1}^{n}\left\{h({X}_{i},\widehat{{\mathit{\beta}}_{\mathit{n}}}+{\epsilon}_{j}{1}_{j},c)-h({X}_{i},\widehat{{\mathit{\beta}}_{\mathit{n}}}-{\epsilon}_{j}{1}_{j},c)\right\}}{2{\epsilon}_{j}},$$

(A.6)

*j* = 1, …, *k*, where *k* is the dimension of the parameter *β*, and 1* _{j}* is a

Now that we have derived the influence function for * _{n}*{

$$\begin{array}{l}{\psi}_{i}=\frac{h({X}_{i},{\mathit{\beta}}_{0},0)-E\{h({X}_{i},{\mathit{\beta}}_{0},0)\}}{P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}}+\\ \frac{[\partial E\{h(X,{\mathit{\beta}}_{0},0)\}/\partial {\mathit{\beta}}^{T}]{\psi}_{{\widehat{\mathit{\beta}}}_{\mathit{n}}}({\mathit{O}}_{\mathit{i}},{\mathit{\beta}}_{\mathbf{0}})}{P\{{Y}^{\ast}({g}_{\mathit{opt}})=1\}}-\frac{{Y}_{i}-E(Y)}{E(Y)},\end{array}$$

and the estimated influence function * _{i}* is given by (10).

Supplementary Materials: Web Appendices and Tables referenced in Sections 5 and 7 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.

- Al-Khatib SM, Shaw LK, O’Connor C, Kong M, Califf RM. Incidence and Predictors of Sudden Cardiac Death in Patients with Diastolic Heart Failure. Journal of Cardiovascular Electrophysiology. 2007;18:1231–1235. [PubMed]
- Brumback BA, Hernan MA, Sebastien JP, Robins JM. Sensitivity Analyses for Unmeasured Confounding Assuming a Marginal Structural Model for Repeated Measures. Statistics in Medicine. 2004;23:749–767. [PubMed]
- Benichou J. A review of adjusted estimators of attributable risk. Statistical Methods in Medical Research. 2001;10:195–216. [PubMed]
- Gail M, Simon R. Testing for Qualitative Interactions Between Treatment Effects and Patient Subsets. Biometrics. 1985;41:361–372. [PubMed]
- Graubard BI, Fears TR. Standard Errors for Attributable Risk for Simple and Complex Sample Designs. Biometrics. 2005;61:847–855. [PubMed]
- Green SB, Byar DP. Using observation data from registries to compare treatments: the fallacy of omnimetrics. Statistics in Medicine. 1984;3:361–370. [PubMed]
- Jewell NP. Statistics for Epidemiology. London, U.K: Chapman & Hall; 2004.
- Levin ML. The occurrence of lung cancer in man. Acta Unio Internationalis Contra Cancrum. 1953;9:531–541. [PubMed]
- Neyman J. Sur les applications de la thar des probabilities aux experiences Agaricales: Essay de principle. English translation of excerpts by Dabrowska, D. and Speed, T. (1990) Statistical Science. 1923;5:465–480.
- Peto R. Clinical Trials. In: Price P, Sikoa K, editors. Treatment of Cancer. London, U.K.: Chapman & Hall; 1995. pp. 1039–1043.
- Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society, Ser B. 1983;45:212–218.
- Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701.
- Rubin DB. Statistics and Causal Inference: Comment: Which If’s Have Causal Answers. Journal of the American Statistical Association. 1986;81:961–962.
- Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006.
- Yusuf Y, Wittes J, Bailey K, Furberg C. Digitalis–a new controversy regarding an old drug. The pitfalls of inappropriate methods. Circulation. 1986;73:14–18. [PubMed]
- van der Vaart AW. Asymptotic Statistics. New York: Cambridge; 1998.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |