Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2084218

Formats

Article sections

- Summary
- 1. Introduction
- 2. Longitudinal Implementation of the Propensity Adjustment
- 3. Simulation Study
- 4.0. Discussion
- References

Authors

Related links

Comput Stat Data Anal. Author manuscript; available in PMC 2008 August 15.

Published in final edited form as:

Comput Stat Data Anal. 2007 August 15; 51(12): 6114–6122.

doi: 10.1016/j.csda.2006.12.021PMCID: PMC2084218

NIHMSID: NIHMS29377

Corresponding Author: Andrew C. Leon, Weill Medical College of Cornell University, Department of Psychiatry, Box 140, 525 East 68th Street, New York, NY 10021, telephone: (212) 746-3872, fax: (212) 746-8754, Email: acleon/at/med.cornell.edu

See other articles in PMC that cite the published article.

The propensity adjustment provides a strategy to reduce the bias in treatment effectiveness analyses that compare non-equivalent groups such as seen in observational studies (Rosenbaum and Rubin, 1983). The objective of this simulation study is to examine the effect of omitting confounding variables from the propensity score on the quintile-stratified propensity adjustment in a longitudinal study. The primary focus was the impact of a misspecified propensity score on bias. Three features of the omitted confounding variables were examined: type of predictor variable (binary vs. continuous), constancy over time (time-varying vs. time-invariant), and magnitude of the association with treatment and outcome (null, small, and large odds ratios). The simulation results indicate that omission of continuous, time-varying confounders that are strongly associated with treatment and outcome (i.e., an odds ratio of 1.75) adversely impacts bias, coverage, and type I error. Omitted time-varying continuous variables had somewhat more effect on bias than omitted binary variables. Time-invariant confounding variables that are not included in the propensity score have a much less effect on results. This evaluation only examined continuous treatment effectiveness outcomes and the propensity scores used for stratification included just four variables. Relative to the use of the propensity adjustment in applied settings that typically comprise numerous potential confounding variables, the impact of one omitted continuous, time-varying confound in this simulation study could be overstated.

An observational design typically includes subjects who are more representative of patients with a particular illness. For instance, observational studies tend to have less restrictive inclusion and exclusion criteria than those used in randomized controlled clinical trials (RCT). As a result the observational design can provide greater generalizability than seen in RCTs. However, because an investigator observes, but does not manipulate treatment, the data analyst in an observational study is faced with non-equivalent groups. For instance, those who receive more intensive treatment are likely to have had more severe symptoms, a history of failed treatments, or more comprehensive health insurance coverage.

The propensity adjustment is a strategy to compare non-equivalent groups such as in analyses of treatment effectiveness in observational studies through stratification, matching, or covariate adjustment (Rosenbaum and Rubin, 1983). The propensity score, *e(x)*, which represents the conditional probability of group assignment given demographic and clinical characteristics, forms the basis for the propensity adjustment. The propensity adjustment has been shown to reduce bias in the treatment effectiveness estimate relative to unadjusted analyses.

Recent work has examined the application of the propensity for treatment intensity to longitudinal data (Leon and Hedeker, 2005). This approach can incorporate repeated assessments of non-randomized, time-varying, ordinal doses of treatment over the course of a chronic illness. The analyses involve two stages: a propensity model and a treatment effectiveness model. Simulation studies have documented that a vast majority of the bias seen in unadjusted analyses is accounted for with an adjustment for the propensity for treatment intensity when the effectiveness evaluation involves either a mixed-effects grouped-time survival analysis (Leon, Hedeker and Teres, in press) or a mixed-effects linear regression analysis (Leon and Hedeker, in press).

The extent of balance between non-randomized treatment groups on demographic and clinical characteristics that is achieved with the propensity adjustment can be readily evaluated. Yet, it is not feasible to examine balance across treatment groups on confounders that are unknown or unmeasured. Nevertheless, an implicit assumption of the propensity strategy is that the propensity score is adequately specified. Rosenbaum (2002) has described sensitivity analyses that examine the range of bias in propensity-adjusted parameter estimates that could result from a misspecified propensity score. Drake (1993) examined the effect of misspecification of the quintile-stratified propensity adjustment in a simulation study of cross-sectional data in which the outcome variable is either continuous or binary and showed that considerable bias can remain if a key confounder is omitted from the propensity score. Here, that work is extended in several ways.

The objective of the current study is to examine the effect of omitting confounders from the propensity score on the quintile-stratified propensity adjustment in a longitudinal study. Initially the two stages of the adjustment procedure are described: the longitudinal propensity and treatment effectiveness models. A simulation study then evaluates the impact of propensity score misspecification.

Elsewhere we have described a modification of the Rosenbaum and Rubin (1983) propensity model to examine the longitudinal study of *K* time-varying ordinal doses denoted by the variable *T _{ij}* (Leon and Hedeker, 2005). The Rosenbaum and Rubin (1983) notation is adapted here. The

Where *x _{ij}* is a vector of covariates hypothesized to be related to intensity of treatment (i.e., ordinal dose) as well as an intercept term. The subject-specific random effect,

The propensity score can be estimated using an ordinal mixed-effects logistic regression model (Hedeker and Gibbons, 1994):

(1)

where *γ _{k}* represents the threshold for dose

Neither the intercept, which is a constant, nor the threshold is needed for the propensity-based ranking that is used for stratification. It is not necessary to include the thresholds, *γ _{k}*, for dose

(2)

The propensity score, *e*(*x _{ij}*,

The *k* treatment groups are compared on a longitudinal continuous dependent variable, *y _{ij}*, using a mixed-effects linear regression model. The model is specified as:

(3)

where *α _{0}* is the intercept term,

As stated earlier, the effectiveness analyses are conducted separately for each quintile. The quintile-specific results can be pooled using the Mantel-Haenszel procedure (as described by Fleiss, 1981) in which each quintile-specific parameter estimate is weighted by the inverse of its squared standard error. In order to pool the results, however, it is assumed that there is not a treatment by propensity interaction. A procedure to test that assumption, based on the likelihood ratio test, is described elsewhere (Leon and Hedeker, 2005).

The impact of propensity score misspecification on the performance of the quintile-stratified, longitudinal propensity adjustment in a mixed-effects linear regression model was examined in a Monte Carlo simulation study. The primary focus was to examine the impact on bias of three features of the omitted confounding variables including:

- type of predictor variable (binary vs. continuous)
- constancy over time (time-varying vs. time-invariant)
- magnitude of the association with treatment and outcome (specified as null, small, and large odds ratios).

In addition, we examined type I error, statistical power and coverage probability.

This simulation study distinguishes between *true*, *e _{T}*(

which includes the intercept, *x*_{0}, and eight randomly generated predictor variables:

- two time-invariant, continuous variables (
*x*,_{1}*x*)_{2} - two time-invariant, binary variables (
*x*,_{3}*x*)_{4} - two time-varying, continuous variables (
*x*,_{5}*x*)_{6} - two time-varying, binary variables (
*x*,_{7}*x*)_{8}

Each of these variables was generated based on an underlying standard normal distribution. In addition, the correlation between all pairs of the propensity model predictor variables,
, was set at either 0.20 or 0.40. For each binary variable the underlying standard normal was dichotomized to yield an even split of zeros and ones. The odds ratios for the eight predictors of dose in the propensity model varied (1.0, 1.25, 1.75), representing null, small, and large associations with treatment intensity, respectively. The true propensity score, *e _{T}*(

The effect of each of three doses, relative to dose 0 (i.e., the control), on the continuous outcome was specified in mixed-effects linear regression model (2), with between dose standardized effects of 0, .22, and .56, representing null, small and moderate treatment effects (in standard deviation units). The association of each covariate (*x _{1}* –

In an effort to evaluate the effect of misspecification, the *estimated propensity score*, *e _{E}*(

In this way, four confounds (*x _{2}*,

Performance of the models was evaluated using the following criteria: type I error, statistical power, coverage, standardized bias, and root mean square error (RMSE). Type I error and statistical power represent the respective proportions of true and false null hypotheses that were rejected. Coverage is defined as the proportion of simulated data sets for which the 95% confidence interval for the respective parameter estimate included the specified value. Standardized bias, , was the primary criterion in this study because it expresses bias in units of uncertainty of the parameter estimates across simulated data sets. Demirtas (2004) suggests that if the absolute value of standardized bias exceeds .40, efficiency, coverage and error rates are adversely affected. RMSE combines accuracy and precision and is defined as . All evaluation criteria were based on the Mantel-Haenszel pooled results. MIXOR software (Hedeker and Gibbons, 1996a) and MIXREG software (Hedeker and Gibbons, 1996b) were used for analyses of the propensity and effectiveness models, respectively. The simulations were designed to emulate the two-staged approach described above such that the results of a propensity model are used to estimate a propensity score which, in turn, is incorporated in the treatment effectiveness analyses. An example of SAS Code to implement this simulation is available from the first author.

Initially, consider as benchmarks, the performance of the procedure with correctly specified propensity models (two *italicized* models each: odds ratios of 1.25 and 1.75 for *x _{1}*,

Standardized Bias and Root Mean Squared Error in a Simulation Study of Quintile-Stratification based on a Misspecified Propensity Score: Correlation among propensity covariates,

RMSE is less sensitive than standardized bias to the misspecified propensity score (Tables 1 and and3).3). Nevertheless, RMSE is somewhat elevated when time-varying confounding variables with odds ratios of 1.75 are omitted, reflecting the pattern described for standardized bias.

The effect of an omitted confounder on type I error mirrors that of bias (Tables 2 and and4).4). That is, as the odds ratio for the omitted time-varying continuous confound (*x _{6}*) increases to 1.75, type I error increases from a nominal level to exceed 10%. An omitted time-invariant confound has little impact on type I error.

Type I Error, Statistical Power, and Coverage Probability in a Simulation Study of Quintile-Stratification based on a Misspecified Propensity Score: Correlation among propensity covariates,

There is increased statistical power (> .90) for the small standardized treatment effects when the continuous, time-varying confounding variable (*x _{6}*) of large association (odds ratio: 1.75) is omitted from the propensity score used for stratification. This corresponds to decreased coverage (described below), suggesting increases in both power and precision. Power appears to have reached an asymptote of 1.0 for a moderate treatment effect (.56), given the sample size of 250 with eight repeated observations over time.

The simulation-based coverage is adversely affected when the continuous, time-varying confounding variable (*x _{6}*), with odds ratio of 1.75, is ignored in the stratification process, resulting in coverage probability of less than 90%. (Coverage that is less than 90% is bolded in the Tables 2 and and4.)4.) The coverage is acceptable for nearly all other simulation specifications.

The sensitivity of a longitudinal, quintile-stratified, propensity adjustment to incomplete specification of the propensity model has been evaluated in this simulation study. The results indicate that time-varying confounders can play a critical role in bias reduction in longitudinal studies, apparently more so than confounders that do not change over follow-up. Although omission of time-varying confounders with a small association with treatment and outcome (i.e., odds ratio of 1.25) had minimal impact on the model performance, those with a larger association clearly had an impact on bias, coverage, and type I error. Omitted continuous time-varying variables had somewhat more effect on bias than did omitted binary time-varying variables. In contrast, omitted variables that were more highly correlated with variables that were included in the propensity score had less impact on bias than omitted variables with lower correlations.

This evaluation examined continuous treatment effectiveness outcomes and included only four variables in the estimated propensity score, which served as the basis for stratification. We acknowledge that this is somewhat oversimplified relative to the number of variables included when the propensity adjustment is applied in practice; and thus, the impact of just one omitted confound could be overstated. For example, studies of cardiovascular disease have included considerably more variables in the propensity score: 18 variables (Grzybowski et al., 2003), 34 variables (Gum et al., 2001) and 102 variables (Normand 2001). Nevertheless, when one large, time-varying confounding variable was omitted from the propensity score in the simulation study, the parameter estimate resulted in substantial bias, reduced coverage probability, and inflated type I error rates. The impact of an omitted time-varying confounding variable would likely be mitigated by including a term for time in the mixed-effects outcome analyses, to the extent that this omitted variable is associated with time.

The simulation results underscore the importance of conducting comprehensive assessments over the course of follow-up in a longitudinal study. Most importantly, the selection of assessments must be guided by clinicians and other researchers with expertise in the substantive area of focus. Whether bias is introduced because a variable is inadvertently excluded from analyses or not collected during assessment is immaterial once the analyses have been completed. The building of a propensity model is not simply a data analytic exercise, but instead must be an active collaboration among researchers. The availability of the variables for the propensity adjustment, through a well-guided choice of assessments at the design stage of an observational study, plays a critical role in bias reduction. Of course, there is a risk of a tradeoff between in depth, time-consuming assessments and retention in a longitudinal study.

Rosenbaum and Rubin (1983) described three approaches to implementing the propensity adjustment: stratification, matching and covariate adjustment. The use of the latter approach is typically discouraged. The simulation study described here was limited to stratification. It is not clear how these results compare to the impact of propensity score misspecification on matching. Furthermore, this simulation-based evaluation of misspecification does not examine sensitivity to hidden bias in the manner proposed by Rosenbaum (2002) in which one estimates a range of change in the magnitude of treatment effectiveness estimates based on the strength of the association of hypothetical omitted confounders with treatment assignment.

Observational studies, by design, will seldom, if ever, provide the complete data needed to calculate the true propensity score; instead an estimated score will be used to implement the propensity adjustment. This simulation study has shown that neglecting to include in the estimated propensity score time-varying confounds that are strongly associated with treatment and outcome had much greater impact on the performance of a quintile-stratified propensity adjustment than omission of time-invariant confounds. In conclusion, careful propensity model building and evaluation of group balance are essential when the adjustment is applied.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Agresti A. Categorical Data Analysis. 2nd. Hoboken, NJ: John Wiley and Sons; 2002.

2. Demirtas H. Simulation-driven inferences for mutiply imputed longitudinal datasets. Statistica Neerlandica. 2004;58:466–482.

3. Drake C. Effects of misspecification of the propensity score on estimators of the treatment effect. Biometrics. 1993;49:1231–1236.

4. Fleiss JL. Statistical Methods for Rates and Proportions. John Wiley and Sons; 1981.

5. Grzybowski M, Clements EA, Parsons L, Welch R, Tintinalli AT, Ross MA, Zalenski RJ. Mortality benefit of immediate revascularization of acute ST-segment elevation myocardial infarction in patients with contraindications to thrombolytic therapy: a propensity analysis. JAMA. 2003;290:1891–8. [PubMed]

6. Gum PA, Thamilarasan M, Watanabe J, Blackstone EH, Lauer MS. Aspirin use and all-cause mortality among patients being evaluated for known or suspected coronary artery disease: A propensity analysis. JAMA. 2001;286:1187–94. [PubMed]

7. Hedeker D, Gibbons RD. A random-effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933–944. [PubMed]

8. Hedeker D, Gibbons RD. MIXOR: a computer program for mixed-effects ordinal regression analysis. Computer Methods and Programs in Biomedicine. 1996;49:157–176. [PubMed]

9. Hedeker D, Gibbons RD. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors. Computer Methods and Programs in Biomedicine. 1996;49:229–252. [PubMed]

10. Leon AC, Hedeker D. A Mixed-Effects Quintile-Stratified Propensity Adjustment for Effectiveness Analyses of Ordered Categorical Doses. Statistics in Medicine. 2005;24:647–658. [PubMed]

11. Leon AC, Hedeker D. A Comparison of Mixed-Effects Quantile Stratification Propensity Adjustment Strategies for Longitudinal Treatment Effectiveness Analyses of Continuous Outcomes. Statistics in Medicine in press. [PubMed]

12. Leon AC, Hedeker D, Teres JJ. Bias Reduction in Effectiveness Analyses of Longitudinal Ordinal Doses with a Mixed-Effects Propensity Adjustment. Statistics in Medicine in press. [PubMed]

13. McCullagh P. Regression models for ordinal data (with discussion) J Roy Statist Soc. 1980;B 42:109–142.

14. Normand SLT, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD, McNeil BJ. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol. 2001;54:387–98. [PubMed]

15. Rosenbaum PR. Observational Studies. 2nd. New York: Springer; 2002.

16. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |