|Home | About | Journals | Submit | Contact Us | Français|
Acute lung injury (ALI) is a condition characterized by acute onset of severe hypoxemia and bilateral pulmonary infiltrates. ALI patients typically require mechanical ventilation in an intensive care unit. Low tidal volume ventilation (LTVV), a time-varying dynamic treatment regime, has been recommended as an effective ventilation strategy. This recommendation was based on the results of the ARMA study, a randomized clinical trial designed to compare low vs. high tidal volume strategies (The Acute Respiratory Distress Syndrome Network, 2000) . After publication of the trial, some critics focused on the high non-adherence rates in the LTVV arm suggesting that non-adherence occurred because treating physicians felt that deviating from the prescribed regime would improve patient outcomes. In this paper, we seek to address this controversy by estimating the survival distribution in the counterfactual setting where all patients assigned to LTVV followed the regime. Inference is based on a fully Bayesian implementation of Robins’ (1986) G-computation formula. In addition to re-analyzing data from the ARMA trial, we also apply our methodology to data from a subsequent trial (ALVEOLI), which implemented the LTVV regime in both of its study arms and also suffered from non-adherence.
Acute lung injury (ALI) is a condition characterized by acute onset of severe hypoxemia and bilateral pulmonary infiltrates. Acute respiratory distress syndrome (ARDS) is a sub-set of ALI with more severe hypoxemia. The 1994 American-European consensus criteria for ALI are (1) acute onset, (2) bilateral infiltrates on chest radiography, (3) pulmonary-artery wedge pressure less than or equal to 18mm Hg when measured or the absence of clinical evidence of left atrial hypertension, and (4) ratio of the partial pressure of arterial oxygen (PaO2) to the fraction of inspired oxygen (FiO2) less than or equal to 300. A patient with ALI is considered to have ARDS if the PaO2 : FiO2 ratio is less than or equal to 200 (Bernard et al., 1994).
ALI can be caused by direct injury to the lung as in pneumonia and aspiration of gastric contents, or indirect lung injury as in sepsis from a source other than the lung and pancreatitis. The majority of deaths among ALI patients are due to sepsis or multi-organ dysfunction rather than primary respiratory causes (Ware and Matthay, 2000). A recent large prospective study of the incidence and mortality of ALI in the United States (Rubenfeld et al., 2005) reported an incidence of 86 per 100,000 person-years with an in-hospital mortality rate of 38.5 percent.
ALI patients frequently require mechanical ventilation in an intensive care unit (ICU). There are many different ways to provide mechanical ventilation support for ALI patients. In this paper, we focus on the assist-control (volume-control) mode of ventilation, during the most acute phase of respiratory failure. This mode is often used immediately after initiation of mechanical ventilation. In this mode, the physician selects a minimum respiratory rate and tidal volume. If a patient's respiratory rate falls below the prescribed rate, the ventilator will “control” the breathing by raising pressure in the airway to deliver breath at the prescribed tidal volume and at the prescribed minimum respiratory rate. If a patient's respiratory rate exceeds the prescribed minimum set rate, the ventilator assists the patient's inspiratory efforts by raising pressure in the airway until the prescribed tidal volume is achieved. The best ventilation strategy for ALI patients has been controversial. The resting tidal volume of healthy people is about 6 to 7 ml per kilogram of predicted body weight (PBW) (Tobin et al., 1983). The predicted body weight is calculated from sex and height, which are good predictors of lung volume. Traditionally, a generous tidal volume of 10-15 ml/kg PBW was recommended for ALI patients because they were useful for achieving clinical goals of gas exchange. However, in ALI patients, the volume of aerated lung available for ventilation is substantially reduced from the disease process. Consequently, generous tidal volumes may cause injury to the aerated lung from over-distention, causing further lung injury, known as barotrauma or volutrauma (Webb and Tierney, 1974; Dreyfuss and Saumon, 1998). This issue has led to several clinical trials of lung-protective methods of mechanical ventilation with low tidal volumes for patients with ALI (Fan et al., 2005).
The National Heart, Blood and Lung Institute ARDS Clinical Network conducted a randomized trial comparing mechanical ventilation using traditional tidal volumes versus low tidal volume ventilation (The Acute Respiratory Distress Syndrome Network, 2000). This study was dubbed the ARMA trial. A total of 429 and 432 patients were randomized to traditional and low tidal volume ventilation, respectively. The assist-control ventilation mode was used until a patient was ready to be weaned from the ventilator. Figure 1 shows a flowchart of traditional and low tidal volume ventilation strategies. In the group treated with traditional tidal volumes, the initial tidal volume was set to 12 ml/kg PBW. This tidal volume was then adjusted downward if necessary to keep the airway pressure measured after a 0.5-second pause at the end of inspiration (plateau pressure) less than or equal to 50 cm H2O. If the plateau pressure subsequently decreased below 45 cm H2O, then tidal volumes were increased so that plateau pressure was greater than or equal to 45 cm H2O or tidal volume was 12 ml/kg PBW. In the group treated with low tidal volumes, the tidal volume was initially set to 6 ml/kg PBW and then decreased to a minimum of 4 ml/kg PBW if necessary to maintain the plateau pressure less than or equal to 30 cm H2O. If the plateau pressure subsequently fell below 25 cm H2O, then tidal volume was increased so that plateau pressure was greater than or equal to 25 cm H2O or tidal volume was 6 ml/kg PBW. Tidal volumes greater than 6 ml/kg PBW were allowed if patients experienced severe dyspnea, provided the plateau pressure remained below 30 cm H2O. In both arms, plateau pressures were allowed to go above the threshold levels (50 and 30 cm H2O for the traditional and low tidal volume arms, respectively) if the tidal volume was 4 ml/kg PBW or arterial pH was less than 7.15. In both study groups, most other aspects of the mechanical ventilation strategies, including positive end-expiratory pressure (PEEP), FiO2, respiratory rate set on the ventilator, and inspiratory:expiratory ratio (I:E) were controlled according to explicit protocol rules.
The study demonstrated lower mortality, more ventilator-free days, and more organ failure-free days in ALI patients who received mechanical ventilation with the LTVV versus traditional strategy. The ARDS Network concluded that the LTVV strategy was better than the traditional strategy and recommended that the LTVV strategy be used for clinical management of ALI.
The ARMA trial, its results, and the ARDS Network recommendation were considered controversial by some clinicians and researchers. Some argued that the recommendation of a tidal volume goal of 6 ml/kg PBW is not optimal because the traditional tidal volume strategy used in the trial did not accurately represent usual care approaches that clinicians were using when the trial was conducted (Eichacker et al., 2002; Tobin, 2000). In the trial, the most common tidal volumes prescribed by clinicians before their patients were enrolled were in the range of 9 - 11ml/kg PBW (Thompson et al., 2001). Another perspective that is consistent with this concern was expressed in the editorial that accompanied the ARMA trial publication (Tobin, 2000). The editorial argued that in 5 randomized trials of ALI mechanical ventilation strategies (including the ARMA trial), clinical outcomes were better in the study groups that received low tidal volumes only when the mean plateau pressures in the traditional study groups exceeded 32 cm H2O. This was interpreted by some to suggest that it was not necessary to reduce tidal volumes to 6 ml/kg PBW in patients whose plateau pressures were below 32 cm H2O while they were receiving tidal volumes that exceeded 6 ml/kg (Tobin, 2004). In another review of these 5 clinical trials, the authors suggested that clinical outcomes would be optimized in most ALI patients with intermediate tidal volumes between 6 and 12 ml/kg and plateau pressures of 28-32 cm H2O (Eichacker et al., 2002).
In the ARMA trial, adherence to the LTVV protocol rules was not perfect. Since only the selected tidal volume and resulting plateau pressure can be recorded, not the selection procedure, a set of specifications has been carefully developed to determine whether the LTVV protocol was followed, using the values of tidal volume, plateau pressure, arterial pH, set respiratory rate, etc (see Table 1). Ventilation on a given day is considered adherent to LTVV if the settings are adherent to both the plateau pressure specification and the tidal volume specification. Plateau pressure adherence follows if lines 1 or 2 (Table 1) are satisfied; tidal volume adherence follows if any of lines 3-7 are satisfied. In the ARMA trial, clinicians sometimes deviated from these rules. Figure 2 (a) displays the observed plateau pressure/tidal volume combinations in the LTVV arm of the ARMA trial. Dots and crosses denote adherence and non-adherence with the LTVV protocol, respectively. Points falling inside region (1) are adherent to LTVV. Points falling inside region (2) are not adherent. Points falling inside regions (3), (4) and (5) must satisfy additional restrictions of arterial pH and set respiratory rate to be adherent to LTVV. Overall, 28% of the observed ventilation days deviated from the LTVV protocol. It is important to note that the majority of the non-adherent points within the regions (3), (4) and (5) of Figure 2 were due to violations of the pH restriction, potentially due to clinicians’ concern regarding immediate adverse effect of low pH. According to the LTVV protocol, each non-adherent (cross) point could have been adherent (dot) by adjustment of ventilator settings. Consider a non-adherent (cross) point in region (4), which does not satisfy the LTVV regime because pH greater than or equal to 7.2. By lowering tidal volume (with a consequential decrease in plateau pressure), pH will decrease, yielding adherence in the same region or movement to an alternative region (due to change in tidal volume and plateau pressure).
Some have argued that physicians deviated from the LTVV protocol when they thought specific aspects of the protocol were detrimental to their patients. For example, when patients appeared to be very dyspneic (short of breath) while receiving 6 ml/kg PBW, some physicians raised tidal volume above 6 ml/kg PBW in an attempt to alleviate the dyspnea. By making these deviations, clinical outcomes of some patients could have improved (Deans et al., 2005, 2006). Thus, there is continued controversy regarding the ARDS Network recommendation to use LTVV. To address this controversy, we seek to draw inference about the distribution of survival that would have resulted had all patients in the LTVV arm of the ARDSNet study strictly adhered to the LTVV protocol. We will compare this distribution to the observed LTVV survival curve that reflects imperfect adherence with the LTVV protocol. We will also evaluate the effect of a simpler regime, which is easier for clinicians to implement.
After the ARMA trial, the LTVV protocol was recommended by the ARDS Network and was used in subsequent trials conducted by ARDS Network. For example, ALVEOLI was a subsequent randomized trial to test the hypothesis that higher end-expiratory airway pressure/lower FiO2 ventilation strategy would reduce mortality in patients with ALI when compared to a lower end-expiratory airway pressure/lower FiO2 ventilation strategy (The National Heart, Blood and Lung Institute ARDS Clinical Trials Network, 2004) . This ALVEOLI trial was stopped due to lack of efficacy after a total of 549 patients were enrolled. In the ALVEOLI trial, the LTVV regime was used in both study arms and non-adherence was observed. Overall, 22% of the observed ventilation days were non-adherent with the LTVV protocol. As with the re-analysis of the ARMA trial, in this paper, we will estimate the distributions of survival under strict adherence to the complex LTVV protocols and under a simpler regime. We will then compare these results to the observed survival curve. Consistency of results across the ARMA and ALVEOLI studies will provide greater support for our conclusions.
There are two main challenges in the statistical analysis.
LTVV is a time-dependent dynamic treatment regime. Dynamic treatment regimes are individually tailored treatments (Murphy et al., 2001; Moodie et al., 2007). In a dynamic treatment regime, the treatment is provided to patients only when it is needed and the level of treatment is adjusted based on time-dependent measurements of the patient's need for treatment. About 40% of ALI patients die in ICU generally while receiving mechanical ventilation. Recovering patients who remain on mechanical ventilation are evaluated daily by a weaning procedure to assess for discontinuation of ventilation. According to his/her performance in the weaning procedure, the patient may begin unassisted breathing, change to other ventilation modes, or remain on assist-control mode. Only if the patient is alive and on assist-control mode, is he/she eligible or ‘at risk’ for LTVV. In contrast, in a non-dynamic (static) treatment regime the treatment level and type are specified before treatment is started. That is, the treatment regime is not changed with a patient's response, for example, administering a fixed drug dose for a fixed duration to a patient with a specific disease.
In observational studies of time-dependent treatments (dynamic or non-dynamic), there often exist time-dependent covariates which are influenced by past treatments as well as predict future treatments and the outcome. Such covariates are called time-dependent confounders. They are sometimes called endogenous or internal variables in the survival analysis literature (Kalbfleisch and Prentice, 2002). For example, a patient's organ failure status may be worse for patients who have not received LTVV. Moreover, those with worse organ failure also may be less likely to receive future LTVV (due to patients being perceived as “too sick” for LTVV by their doctors) and are more likely to die in-hospital. In this situation, the challenge is to isolate the effect of LTVV from the confounding effect of organ failure and other time-dependent confounders. An analysis which estimates the survival curve among those who are observed to adhere to LTVV will likely over-estimate the counterfactual survival curve of interest. If LTVV is an effective regime, the counterfactual survival curve will likely be higher than the observed (i.e, unconditional) survival curve. We illustrate this phenomenon in Section 4.1.
There is a significant and growing body of literature focused ondrawing inference about dynamic treatment regimes from observational data with time-dependent confounders. In this paper, we will utilize the G-computation algorithm, as developed by (Robins, 1986, 1987b,a, 1988b,a) and described nicely in Lok et al. (2004), to estimate the counterfactual survival curve of interest. Alternative approaches that could be used include marginal structural (Murphy et al., 2001; Robins et al., 2000; Bembom and van der Laan, 2008; Robins et al., 2008; Moodie, 2009b; Hernan et al., 2006; Robins, 1999a,b) and structual nested (Robins, 1997, 1999b; Lok et al., 2004) modelling. Recently, the literature has been focused on identifying optimal dynamic regimes (Robins et al., 2008; Moodie, 2009a; Rich et al., 2010; Moodie and Richardson, 2010; Murphy, 2003; Moodie et al., 2007; Chakraborty et al., 2010; Zhao et al., 2009; Robins, 2004; Orellana et al., 2010). Although a very interesting direction of research, we will not seek to find the best ventilation strategy in this paper.
In the ARMA and ALVEOLI studies, reference measurements of ventilator parameters and other variables were taken on days 1,2,3,4,7,14,21,28. The ventilator parameters on the other days were not recorded. In addition to the reference measurements, the care providers recorded ventilator parameters, arterial pH and plateau pressure daily at randomly selected times to assess the accuracy of the ventilator settings relative to the protocol requirements. Because these random ventilator check data were scheduled to be collected daily during the entire study, we used them for our data analysis. However, missing data is a major problem for the random ventilator check data (see Table 2) and for many observational studies. We will address it by specifying a fully parametric model for the distribution of the data that are scheduled to be collected and assuming that the data that are not recorded are missing at random. In specifying our model, we must recognize that some of the ventilation parameters have biologically bounded support.
In Section 2, we introduce the data structure and notation including the definition of potential survival time under a dynamic treatment regime. In Section 3, we introduce assumptions necessary for identification of the distribution of the potential survival time. Section 4 introduces the G-computation method for estimating the potential survival curve of a dynamic treatment regime and our fully Bayesian implementation. In Section 5, we apply the methods in the previous sections to our data. Section 6 is devoted to a discussion.
The notation used in this article is listed in Table 3. With this notation, the scheduled observed data on day t (t = 1, . . ., 28) are
where St is the indicator of surviving day t; HSPt is the indicator of hospitalization on day t, defined only when St = 1; ICUt is the indicator of being in the ICU on day t, defined only when St = HSPt = 1; VNTt is the indicator of being on the assist-control mode of the ventilator on day t, defined only when St = HSPt = ICUt = 1; Rt = I(HSPt = ICUt = VNTt = 1) is the indicator of being at-risk on day t, defined only when St = 1; Gt = (PEt, LCt) includes PEEP and lung compliance on day t, defined only when St = Rt = 1; Dt = (VTt, SRt) includes tidal volume and set rate on day t, defined only when St = Rt = 1; Wt = (PHt, PPt, DSt, SOt) includes pH, plateau pressure, indicator of severe dyspnea, and oxygen saturation on day t, defined only when St = Rt = 1; and OFt is an indicator of organ failure on day t, defined when St = HSPt = 1. When St = Rt = 1, the variables Gt, Dt and Wt are recorded simultaneously at a random time during the day. From a causal inference perspective, we will consider the variables that make up Lt to be sequentially measured in time; for example, when St = Rt = 1, Gt is measured before Dt which is measured before Wt and OFt. This implies that when St = Rt = 1, Dt can affect Wt and OFt, but not Gt.
In our study, L0 = (AP, OF0), where AP denotes the baseline APACHE III score. Let t = (L0, ..., Lt). In general, we use a bar over a variable to denote that variable and all the past values of that variable. The scheduled observed data are 28. Our identification assumptions in the next section will be based on a subset of the observed data. Towards this end, it is useful to define H0 = AP, U0 = OF0, Ht = (HSPt, ICUt, VNTt, Gt) and Ut = (Dt, Wt, OFt), for t = 1, . . ., 28. With this notation, note that Lt = (St, Ht, Ut).
When an individual is on the assist-control mode of ventilation, the tidal volume and set rate are directly manipulable by the treating physician. If the treating physician had followed the LTVV protocol (defined in the flowchart in Figure 1), there would be a unique ventilator setting (i.e., Dt) for each patient. Since the dynamic process of setting the ventilator dials was not recorded, a physician was considered to have followed the protocol if the tidal volume and plateau pressure specifications in Table 1 are satisfied. When St = Rt = 1, the adherence indicator on day t, At, is a deterministic function of Dt and Wt. That is, , where is the set of combinations of tidal volume, set rate, pH, plateau pressure, and severe dyspnea, which satisfy the tidal volume and plateau pressure specifications in Table 1. We let At = 1 if St = 0 or St = 1, Rt = 0. We denote the counterfactual random variables (potential outcomes) that would have been observed had an individual been sequentially treated according to the LTVV protocol with the superscript * notation. The goal is to draw inference about the counterfactual survival curve, i.e., for m = 1, . . ., 28.
There are four assumptions that we impose to identify the counterfactual survival curve of interest.
First, we assume that the potential survival outcome under the LTVV protocol is uniquely determined. This is the stable unit treatment assumption (Rubin, 1986) or consistency assumption (Robins, 1986). Under this assumption, when Āt–1 = 1. This implies that, for each t (t = 1, . . ., 28),
The LTVV protocol, as depicted in Figure 1, is designed to yield unique patient-level ventilator settings. As administration of the protocol was not directly observed, we assume, based on clinical considerations, that the physician followed the protocol if the tidal volume and plateau pressure specifications in Table 1 are satisfied. Accordingly, the treatment for an individual is unique and therefore her potential survival outcome is unique.
Second, we assume that the LTVV protocol is evaluable. Specifically, we assume that when a group of patients have been treated with the LTVV protocol until day t, at least some of them will continue to be treated with the protocol on day t + 1. This has been called the positivity assumption (Hernan and Robins, 2006), or equivalently, the evaluable treatment regime assumption (Lok et al., 2004). Mathematically, we require that, for each t and each possible value of t and ūt–1,
For the LTVV arm in the ARMA study and both arms in the ALVEOLI study, the positivity assumption is reasonable since physicians were instructed to follow the LTVV protocol. In the traditional tidal volume arm of the ARMA study, for example, the positivity assumptions would not be plausible.
Third, we assume there is no unmeasured confounding. In particular, we assume that, for individual on assist-control mode ventilation, adherence to the LTVV protocol is sequentially randomized. That is, we assume, for each t (t = 1, . . ., 28),
This assumption will hold if t and Ūt–1 includes all risk factors, other than treatment history, used by the physician to determine the patient's ventilator settings on day t. In the ARMA and ALVEOLI studies, there are many possible reasons why physicians could have deviated from the LTVV protocol, including respiratory acidosis resulting from LTVV and poor lung compliance (a measure of stiffness of lungs) which might cause some clinicians to increase the tidal volume in order to reduce acidosis. Therefore, arterial pH (a measurement of acidosis) and lung compliance are considered two key confounders. Other confounders considered in the analysis were baseline APACHE III score (a measurement of illness severity), organ failure, PEEP, most recent ‘at-risk’ status (on assist-control mode or not) and most recent LTVV adherence status.
The final assumption, which was alluded to earlier, is that missing data mechanism is ignorable, the parameters that govern a model for missing data mechanism are variation independent of those that govern the model for distribution of the data that are scheduled to be collected, and that the data that are not recorded are missing at random in the sense of Rubin (1976). Under this assumption, specification of a fully parametric model for 28 will be sufficient for us to achieve our inferential objectives; the parameters of this model are identifiable from the distribution of the observed data.
Given the distribution of 28 and the first three assumptions above, the potential survival function at day m (m ≤ 28) is identified via the following G-computation formula (Robins, 1986). The identification formula is derived as follows. By conditioning on H0, U0 and S1 and H1, we can write
Under sequential randomization,
By further conditioning on U1, S2 and H2,
Using sequential randomization,
By continuing to condition sequentially in time and employing sequential randomization,
Finally, by employing stochastic consistency,
This latter multivariate integral has been called the “G-computation” integral or formula. To illustrate how the formula works, consider the following example.
Consider the treatment of patients with ALI on the first two days. Suppose that (1) all patients survive day 1, (2) all patients are on assist-control or ‘at-risk’ on day one, (3) 70 percent of the patients on day 1 are treated with LTVV (A1 = 1) and the others with traditional ventilation (A1 = 0), (4) the assignment of treatment on day one is completely random, (5) organ failure on day 1 is highly associated with treatment on day 1, with patients on LTVV at much lower risk (20%) for organ failure as compared to those who are not (80%), (6) the risk of death on day 2 is higher for those with organ failure on day 1 and is lower for those treated with LTVV on day 1, (7) all surviving patients on day 2 are on assist-control ventilation on day 2, (8) treatment with LTVV on day 2 is positively associated with treatment with LTVV on day 1 and negatively associated with organ failure on day 1, (9) organ failure on day 2 is positively associated with organ failure on day 1 and negatively associated with treatment with LTVV on day 1 and (10) the risk of death on day 3 is higher for those with organ failure on day 2 and lower for those treated with LTVV on day 2. The distribution of the observed data is displayed in Figure 3.
In this example, we are interested in comparing the probability of surviving past day 3 under the LTVV regime to (a) the overall probability of surviving past day 3 and (b) the probability of surviving past day 3 among patients who are observed to adhere to the LTVV regime. Using the G-computation formula and noting that S1 = 1, , , Ut = OFt (t = 1, 2), we obtain
In contrast, the overall three-day survival probability is 0.60 and the survival probability among those observed to adhere to the LTVV protocol is 0.844.
In this example, organ failure serves as a time-varying confounder. That is, organ failure on day 1 is affected by treatment on day 1 and predicts subsequent treatment. Organ failure on day 1 is also negatively related to survival on day 3. The three-day survival probability for those who adhere to the LTVV regime is higher than the counterfactual survival probability because those who adhere to therapy tend to have less organ failure than those who do not. The overall three-day survival probability is even lower because non-adherence to therapy is associated with greater organ failure and worse survival, regardless of organ failure status.
The G-computation formula teaches us that the is a functional of the distribution of 28. We can write the likelihood of 28 as follows:
In our analysis, we assume that a patient's status on day t only depends on his most recent history and the baseline APACHE III score (H0). Thus, the likelhood can be written as:
Further modeling restrictions result from structural features of our datasets as well as from the clinical judgement of our physician co-authors (DN and RB). Structurally, we know that all patients survive day one and whenever a patient is on ventilator on day t – 1, she either dies or remains in the hospital and in the ICU at time t. Also, once a patient leaves the hospital (ICU), she does not return the hospital (ICU). With these considerations, we write
where PDt = PPt – PEt, conditioning on Rt, HSPt, ICUt or VNTt implies that St = 1, conditioning on ICUt or VNTt implies HSPt = 1 and conditioning on VNTt implies ICUt = 1. It is important to note that we model PDt instead of PPt since we are conditioning on PEt and that VTt does not appear in the likelihood because it is a deterministic function of LCt and PDt (i.e., VTt = LCtPDt). The distributions on the right hand side of the above three displays are modeled with separate regression models. The binary outcomes, St, HSPt, ICUt, VNTt, DSt and OFt, are modeled using logistic regression. The continuous variables, H0, LCt, PEt, PHt, PDt, SOt, and SRt, are linearly transformed to the range [0, 1] (using the bounds displayed in Table 4) and modeled using Beta regression models (Ferrari and Cribari-Neto, 2004). In these models, the logit of the mean (μ) is modeled as a function of covariates and the variance is equal to μ(1 – μ)/(η + 1), where η is a scalar dispersion parameter. In all regression models, there are separate intercepts for days 1 to 13, and common intercepts for days 14 to 21 and days 22-28; there are no interactions between covariates; the regression coefficients for time-varying covariates are assumed to be time-invariant as the dispersion parameters (in the Beta regression models).
We adopt a Bayesian approach to inference. We assume independently normally distributed mean zero, variance 100 priors on all the regression parameters. For each model, we built up a shrinkage prior on the time-varying intercept parameters as follows: the intercept on day one was assumed to be normally distributed with mean zero with variance 5, the intercept on day t (t = 2, . . ., 13) was assumed to be normally distributed with mean equal to the intercept on day t – 1 and variance σ2, the common intercept for days 14 to 21 (days 22 to 28) was assume to be normally distributed with mean equal to the intercept on day 13 (day 21) and variance σ2, and σ2 was assumed to be uniform on the interval [0, 5]. Finally, the dispersion parameters η in the Beta regression models were assumed to be uniform on the interval [0, 100].
The posterior sampling for the model parameters was performed in WinBUGS. Twelve-thousand iterations of which six-thousand were used for burn-in were performed. Trace plots of multiple chains were examined to check for convergence. For each iteration, we evaluated the G-computation formula by Monte-Carlo integration. The Monte-Carlo scheme is complicated by the fact that the integration requires us to sample from dF(St|St–1 = 1, At–1, Ht–1, Ut–1, H0), dF(Ht|St = 1, At–1 = 1, Ht–1, Ut–1, H0, St = 1) dF(Ut|St = 1, At = 1, Ht, Ut–1, H0) instead of dF(St|St–1 = 1, Ht–1, Ut–1, H0), dF(Ht|St = 1, Ht–1, Ut–1, H0), dF(Ut|St = 1, Ht, Ut–1, H0), where we know that At is a deterministic function of Ut.
Our sampling scheme proceeds according to Figure 4. For the sake of completeness, we document the procedure in each of the letter-marked boxes. In Box (A), the model in (4.1) is used. The sampling in Box (B) proceeds as follows:
the sampling in Box (C) proceeds as follows:
the sampling in Box (D) proceeds as follows:
and sampling in Box (E) proceeds as follows:
This scheme is repeated 10,000 times and the resulting survival curves are averaged. As this is performed for each posterior draw from the model parameters, we obtain 6000 posterior draws of the survival curve of interest.
To evaluate, at least in part, the goodness of fit of our model, we can use a modification of the above procedure to obtain posterior draws for the distribution of the “observed” survival curve and compare it to the empirically observed survival curve. We note that we can express the “observed” survival curve as the following integral:
This suggests that we can obtain posterior draws by using the same algorithm above, except that, in Fugure 4, we omit the evaluation of whether At = 1 or not.
In this section, we apply the G-computation formula to the ARMA and ALVEOLI trials. Figure 5 and Table 5 present the empirical observed (gray lines), posterior mean observed (black dashed lines), and posterior mean potential (black lines) survival curves for the ARMA and ALVEOLI studies. The table includes 95% posterior credible bands for these three curves and the figure depicts the confidence band for the potential survival curve by the shaded region. For reference, the empirical survival curves for those who were observed to adhere to the LTVV algorithm are included in the figures (gray dashed lines); this represents 92 and 141 patients in the ARMA and ALVEOLI studies, respectively. By visual comparison of the empiricial observed and posterior mean observed survival curves, empirical and model-based, we see that our model provides a reasonable fit to the observed data. The posterior mean potential survival curve is below the posterior mean observed survival curve in the ARMA study, but the curves are barely indistinguishable for the ALVEOLI study. In both studies, the empirical survival curve for the adherers drops steeply during the first 7 days and then tapers off. This reflects the fact that those who are observed to adhere are those who either die or recover quickly, so that their need for ventilation is limited; those with longer ventilation periods are less likely to be reflected in these curves as they have a greater chance of violating the LTVV protocol or have missing data that precludes evaluation of their adherence status.
In displays not shown, we evaluated the posterior mean difference between the observed and potential survival curves along with a 95% posterior credible band. In both studies, the 95% credible bands for the difference include zero for all time, which does not statistically support the theory that imperfect adherence to the protocol led to improved outcomes.
Despite these non-significant findings, it is interesting to speculate why the potential survival curve under LTVV adherence is noticeably lower than the observed survival curve in the ARMA study, but virtually identical in the ALVEOLI study. We suspect that, with the positive results in ARMA, many doctors were more comfortable using a strategy that targeted lower tidal volumes and plateau pressures. With these data, doctors were more willing to allow their patients to stay on lower tidal volumes. With more experience they learned that patients may experience some “agitation” (e.g., dysynchrony with the mechanical ventilator) early in their clinical course on lower tidal volumes, but this “agitation” improves within a couple of days. In ALVEOLI, the use of heavy sedation or neuromuscular blockade (which would address this dysynchrony with the mechanical ventilator) was more limited, primarily occurring in the first couple of days, whereas in ARMA many patients remained heavily sedated or neuromuscular-blocked for more extended periods of time. It is this difference in medication use which may explain the discrepancy between the results across trials.
In this paper, we used the G-computation formula to estimate the potential survival distribution under the dynamic LTVV treatment regime for patients with acute lung injury. Our results demonstrate that strict adherence to the LTVV algorithm does not lead to increased mortality, as has been suggested in the clinical literature.
Our methodology relies on the untestable assumption of no unmeasured confounding. That is, we have assumed that we have measured all the time-dependent confounders that, when conditioned upon, make compliance with LTVV on a given day, a random process, akin to flipping a possibly biased coin (Assumption 3.1). While this assumption cannot be empirically validated from the observed data, the clinical experts on our team (Dale Needham and Roy Brower) have come to feel comfortable with this assumption. The key question here is why some physicians deviate from the LTVV protocol. The concern of such physicians is that a low tidal volume may cause worse gas exchange, especially among those hard patients with stiff lungs (lungs that are more difficult to ventilate). Gas exchange can be measured by arterial pH and the stiffness of lungs can be measured by lung compliance. With these two variables and a few others included, we feel that we have captured the key factors that govern adherence to the LTVV protocol. Nonetheless, future work should focus on exploring the sensitivity of our conclusions to deviations from this assumption.
The G-computation has been criticized because it can suffer from the “null paradox.” When one uses the G-computation formula to estimate the means of a collection of counterfactual outcomes indexed by levels of treatment, the paradox states that one can falsely reject the null hypothesis of no variation in means across of levels of treatment because of the need (due to the curse of dimensionality) to parametrically specify the component distributions in the G-computation integral. Since we are not using the G-computation formula in this fashion, we do not think this paradox is a major concern.
There are alternatives to our G-computation analysis. Murphy et al. (2001) proposed an inverse-weighting procedure in which individuals who adhere to the dynamic treatment regime are inverse-weighted by their probability of being adherent through their course of follow-up. In our study, these weights are the inverse of the product of as many as 28 conditional probabilities. In implementing this procedure, we found that some of the weights became prohibitly large and the variances of the resulting estimates were too large to be of practical use. Truncation of weights was considered as a way of trading o bias with variance, but the results were highly sensitive to the choice of the truncation threshold. Weight stabilization is not an option here as our interest is in the marginal survival curve.
Another approach that has been advocated is structural nested failure time models (Lok et al., 2004). In these models, one specifies, for each day t, the conditional (on time-varying factors up to day t) quantile-quantile mapping between the distribution of the counterfactual survival time had patients followed the dynamic regime through day t and not followed the regime after day t and the distribution of the counterfactual survival time has patients followed the regime through day t – 1 and not followed the regime after day t. This is an interesting avenue to explore, which may ultimately involve less overall modeling than the G-computation algorithm and does not suffer from the “null paradox”.
We hope that this unique illustration of the complexities of analyzing a dynamic treatment regime will serve to make these methods more accessible to statisticians and their clinical collaborators.
†This research was sponsored by NIH contracts NO1-HR-46046-64, NO1-HR-16146-54, R01-CA-85295.
Weiwei Wang, University of Texas Health Science Center at Houston, Houston, TX U.S.A.
Daniel Scharfstein, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD U.S.A.
Chenguang Wang, Food and Drug Administration, Silver Spring, MD U.S.A.
Michael Daniels, University of Florida, Gainesville, FL U.S.A.
Dale Needham, The Johns Hopkins School of Medicine, Baltimore, MD U.S.A.
Roy Brower, The Johns Hopkins School of Medicine, Baltimore, MD U.S.A.