PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1189552)

Clipboard (0)
None

Related Articles

1.  The performance of different propensity score methods for estimating marginal hazard ratios 
Statistics in Medicine  2012;32(16):2837-2849.
Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non-randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population-average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes. Copyright © 2012 John Wiley & Sons, Ltd.
doi:10.1002/sim.5705
PMCID: PMC3747460  PMID: 23239115
propensity score; survival analysis; inverse probability of treatment weighting (IPTW); Monte Carlo simulations; observational study; time-to-event outcomes
2.  ESTIMATING TREATMENT EFFECTS ON HEALTHCARE COSTS UNDER EXOGENEITY: IS THERE A ‘MAGIC BULLET’? 
Methods for estimating average treatment effects, under the assumption of no unmeasured confounders, include regression models; propensity score adjustments using stratification, weighting, or matching; and doubly robust estimators (a combination of both). Researchers continue to debate about the best estimator for outcomes such as health care cost data, as they are usually characterized by an asymmetric distribution and heterogeneous treatment effects,. Challenges in finding the right specifications for regression models are well documented in the literature. Propensity score estimators are proposed as alternatives to overcoming these challenges. Using simulations, we find that in moderate size samples (n= 5000), balancing on propensity scores that are estimated from saturated specifications can balance the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates. Therefore, unlike regression model, even if a formal model for outcomes is not required, propensity score estimators can be inefficient at best and biased at worst for health care cost data. Our simulation study, designed to take a ‘proof by contradiction’ approach, proves that no one estimator can be considered the best under all data generating processes for outcomes such as costs. The inverse-propensity weighted estimator is most likely to be unbiased under alternate data generating processes but is prone to bias under misspecification of the propensity score model and is inefficient compared to an unbiased regression estimator. Our results show that there are no ‘magic bullets’ when it comes to estimating treatment effects in health care costs. Care should be taken before naively applying any one estimator to estimate average treatment effects in these data. We illustrate the performance of alternative methods in a cost dataset on breast cancer treatment.
doi:10.1007/s10742-011-0072-8
PMCID: PMC3244728  PMID: 22199462
Propensity score; non-linear regression; average treatment effect; health care costs
3.  The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies 
Statistics in Medicine  2010;29(20):2137-2148.
Propensity score methods are increasingly being used to estimate the effects of treatments on health outcomes using observational data. There are four methods for using the propensity score to estimate treatment effects: covariate adjustment using the propensity score, stratification on the propensity score, propensity-score matching, and inverse probability of treatment weighting (IPTW) using the propensity score. When outcomes are binary, the effect of treatment on the outcome can be described using odds ratios, relative risks, risk differences, or the number needed to treat. Several clinical commentators suggested that risk differences and numbers needed to treat are more meaningful for clinical decision making than are odds ratios or relative risks. However, there is a paucity of information about the relative performance of the different propensity-score methods for estimating risk differences. We conducted a series of Monte Carlo simulations to examine this issue. We examined bias, variance estimation, coverage of confidence intervals, mean-squared error (MSE), and type I error rates. A doubly robust version of IPTW had superior performance compared with the other propensity-score methods. It resulted in unbiased estimation of risk differences, treatment effects with the lowest standard errors, confidence intervals with the correct coverage rates, and correct type I error rates. Stratification, matching on the propensity score, and covariate adjustment using the propensity score resulted in minor to modest bias in estimating risk differences. Estimators based on IPTW had lower MSE compared with other propensity-score methods. Differences between IPTW and propensity-score matching may reflect that these two methods estimate the average treatment effect and the average treatment effect for the treated, respectively. Copyright © 2010 John Wiley & Sons, Ltd.
doi:10.1002/sim.3854
PMCID: PMC3068290  PMID: 20108233
propensity score; observational study; binary data; risk difference; number needed to treat; matching; IPTW; inverse probability of treatment weighting; propensity-score matching
4.  Variance reduction in randomised trials by inverse probability weighting using the propensity score 
Statistics in Medicine  2013;33(5):721-737.
In individually randomised controlled trials, adjustment for baseline characteristics is often undertaken to increase precision of the treatment effect estimate. This is usually performed using covariate adjustment in outcome regression models. An alternative method of adjustment is to use inverse probability-of-treatment weighting (IPTW), on the basis of estimated propensity scores. We calculate the large-sample marginal variance of IPTW estimators of the mean difference for continuous outcomes, and risk difference, risk ratio or odds ratio for binary outcomes. We show that IPTW adjustment always increases the precision of the treatment effect estimate. For continuous outcomes, we demonstrate that the IPTW estimator has the same large-sample marginal variance as the standard analysis of covariance estimator. However, ignoring the estimation of the propensity score in the calculation of the variance leads to the erroneous conclusion that the IPTW treatment effect estimator has the same variance as an unadjusted estimator; thus, it is important to use a variance estimator that correctly takes into account the estimation of the propensity score. The IPTW approach has particular advantages when estimating risk differences or risk ratios. In this case, non-convergence of covariate-adjusted outcome regression models frequently occurs. Such problems can be circumvented by using the IPTW adjustment approach. © 2013 The authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
doi:10.1002/sim.5991
PMCID: PMC4285308  PMID: 24114884
variance estimation; baseline adjustment
5.  Weight Trimming and Propensity Score Weighting 
PLoS ONE  2011;6(3):e18174.
Propensity score weighting is sensitive to model misspecification and outlying weights that can unduly influence results. The authors investigated whether trimming large weights downward can improve the performance of propensity score weighting and whether the benefits of trimming differ by propensity score estimation method. In a simulation study, the authors examined the performance of weight trimming following logistic regression, classification and regression trees (CART), boosted CART, and random forests to estimate propensity score weights. Results indicate that although misspecified logistic regression propensity score models yield increased bias and standard errors, weight trimming following logistic regression can improve the accuracy and precision of final parameter estimates. In contrast, weight trimming did not improve the performance of boosted CART and random forests. The performance of boosted CART and random forests without weight trimming was similar to the best performance obtainable by weight trimmed logistic regression estimated propensity scores. While trimming may be used to optimize propensity score weights estimated using logistic regression, the optimal level of trimming is difficult to determine. These results indicate that although trimming can improve inferences in some settings, in order to consistently improve the performance of propensity score weighting, analysts should focus on the procedures leading to the generation of weights (i.e., proper specification of the propensity score model) rather than relying on ad-hoc methods such as weight trimming.
doi:10.1371/journal.pone.0018174
PMCID: PMC3069059  PMID: 21483818
6.  Assessing Causality in the Association between Child Adiposity and Physical Activity Levels: A Mendelian Randomization Analysis 
PLoS Medicine  2014;11(3):e1001618.
Here, Timpson and colleagues performed a Mendelian Randomization analysis to determine whether childhood adiposity causally influences levels of physical activity. The results suggest that increased adiposity causes a reduction in physical activity in children; however, this study does not exclude lower physical activity also leading to increasing adiposity.
Please see later in the article for the Editors' Summary
Background
Cross-sectional studies have shown that objectively measured physical activity is associated with childhood adiposity, and a strong inverse dose–response association with body mass index (BMI) has been found. However, few studies have explored the extent to which this association reflects reverse causation. We aimed to determine whether childhood adiposity causally influences levels of physical activity using genetic variants reliably associated with adiposity to estimate causal effects.
Methods and Findings
The Avon Longitudinal Study of Parents and Children collected data on objectively assessed activity levels of 4,296 children at age 11 y with recorded BMI and genotypic data. We used 32 established genetic correlates of BMI combined in a weighted allelic score as an instrumental variable for adiposity to estimate the causal effect of adiposity on activity.
In observational analysis, a 3.3 kg/m2 (one standard deviation) higher BMI was associated with 22.3 (95% CI, 17.0, 27.6) movement counts/min less total physical activity (p = 1.6×10−16), 2.6 (2.1, 3.1) min/d less moderate-to-vigorous-intensity activity (p = 3.7×10−29), and 3.5 (1.5, 5.5) min/d more sedentary time (p = 5.0×10−4). In Mendelian randomization analyses, the same difference in BMI was associated with 32.4 (0.9, 63.9) movement counts/min less total physical activity (p = 0.04) (∼5.3% of the mean counts/minute), 2.8 (0.1, 5.5) min/d less moderate-to-vigorous-intensity activity (p = 0.04), and 13.2 (1.3, 25.2) min/d more sedentary time (p = 0.03). There was no strong evidence for a difference between variable estimates from observational estimates. Similar results were obtained using fat mass index. Low power and poor instrumentation of activity limited causal analysis of the influence of physical activity on BMI.
Conclusions
Our results suggest that increased adiposity causes a reduction in physical activity in children and support research into the targeting of BMI in efforts to increase childhood activity levels. Importantly, this does not exclude lower physical activity also leading to increased adiposity, i.e., bidirectional causation.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
The World Health Organization estimates that globally at least 42 million children under the age of five are obese. The World Health Organization recommends that all children undertake at least one hour of physical activity daily, on the basis that increased physical activity will reduce or prevent excessive weight gain in children and adolescents. In practice, while numerous studies have shown that body mass index (BMI) shows a strong inverse correlation with physical activity (i.e., active children are thinner than sedentary ones), exercise programs specifically targeted at obese children have had only very limited success in reducing weight. The reasons for this are not clear, although environmental factors such as watching television and lack of exercise facilities are traditionally blamed.
Why Was This Study Done?
One of the reasons why obese children do not lose weight through exercise might be that being fat in itself leads to a decrease in physical activity. This is termed reverse causation, i.e., obesity causes sedentary behavior, rather than the other way around. The potential influence of environmental factors (e.g., lack of opportunity to exercise) makes it difficult to prove this argument. Recent research has demonstrated that specific genotypes are related to obesity in children. Specific variations within the DNA of individual genes (single nucleotide polymorphisms, or SNPs) are more common in obese individuals and predispose to greater adiposity across the weight distribution. While adiposity itself can be influenced by many environmental factors that complicate the interpretation of observed associations, at the population level, genetic variation is not related to the same factors, and over the life course cannot be changed. Investigations that exploit these properties of genetic associations to inform the interpretation of observed associations are termed Mendelian randomization studies. This research technique is used to reduce the influence of confounding environmental factors on an observed clinical condition. The authors of this study use Mendelian randomization to determine whether a genetic tendency towards high BMI and fat mass is correlated with reduced levels of physical activity in a large cohort of children.
What Did the Researchers Do and Find?
The researchers looked at a cohort of children from a large long-term health research project (the Avon Longitudinal Study of Parents and Children). BMI and total body fat were recorded. Total daily activity was measured via a small movement-counting device. In addition, the participants underwent genotyping to detect the presence of several SNPs known to be linked to obesity. For each child a total BMI allelic score was determined based on the number of obesity-related genetic variants carried by that individual. The association between obesity and reduced physical activity was then studied in two ways. Direct correlation between actual BMI and physical activity was measured (observational data). Separately, the link between BMI allelic score and physical activity was also determined (Mendelian randomization or instrumental variable analysis). The observational data showed that boys were more active than girls and had lower BMI. Across both sexes, a higher-than-average BMI was associated with lower daily activity. In genetic analyses, allelic score had a positive correlation with BMI, with one particular SNP being most strongly linked to high BMI and total fat mass. A high allelic score for BMI was also correlated with lower levels of daily physical activity. The authors conclude that children who are obese and have an inherent predisposition to high BMI also have a propensity to reduced levels of physical activity, which may compound their weight gain.
What Do These Findings Mean?
This study provides evidence that being fat is in itself a risk factor for low activity levels, separately from external environmental influences. This may be an example of “reverse causation,” i.e., high BMI causes a reduction in physical activity. Alternatively, there may be a bidirectional causality, so that those with a genetic predisposition to high fat mass exercise less, leading to higher BMI, and so on, in a vicious circle. A significant limitation of the study is that validated allelic scores for physical activity are not available. Thus, it is not possible to determine whether individuals with a high allelic score for BMI also have a propensity to exercise less, or whether it is simply the circumstance of being overweight that discourages activity. This study does suggest that trying to persuade obese children to lose weight by exercising more is likely to be ineffective unless additional strategies to reduce BMI, such as strict diet control, are also implemented.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001618.
The US Centers for Disease Control and Prevention provides obesity-related statistics, details of prevention programs, and an overview on public health strategy in the United States
A more worldwide view is given by the World Health Organization
The UK National Health Service website gives information on physical activity guidelines for different age groups
The International Obesity Task Force is a network of organizations that seeks to alert the world to the growing health crisis threatened by soaring levels of obesity
MedlinePlus—which brings together authoritative information from the US National Library of Medicine, National Institutes of Health, and other government agencies and health-related organizations—has a page on obesity
Additional information on the Avon Longitudinal Study of Parents and Children is available
The British Medical Journal has an article that describes Mendelian randomization
doi:10.1371/journal.pmed.1001618
PMCID: PMC3958348  PMID: 24642734
7.  Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation 
Multivariate behavioral research  2012;47(1):115-135.
Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of ensemble-based methods (bagged regression trees, random forests, and boosted regression trees) to directly estimate average treatment effects by imputing potential outcomes. We used an extensive series of Monte Carlo simulations to estimate bias, variance, and mean squared error of treatment effects estimated using different ensemble methods. For comparative purposes, we compared the performance of these methods with inverse probability of treatment weighting using the propensity score when logistic regression or ensemble methods were used to estimate the propensity score. Using boosted regression trees of depth 3 or 4 to impute potential outcomes tended to result in estimates with bias equivalent to that of the best performing methods. Using an empirical case study, we compared inferences on the effect of in-hospital smoking cessation counseling on subsequent mortality in patients hospitalized with an acute myocardial infarction.
doi:10.1080/00273171.2012.640600
PMCID: PMC3293511  PMID: 22419832 CAMSID: cams2143
8.  A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality 
Multivariate behavioral research  2011;46(1):119-151.
Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.
doi:10.1080/00273171.2011.540480
PMCID: PMC3266945  PMID: 22287812 CAMSID: cams1834
9.  A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality 
Multivariate Behavioral Research  2011;46(1):119-151.
Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.
doi:10.1080/00273171.2011.540480
PMCID: PMC3266945  PMID: 22287812
10.  Confounding control in a non-experimental study of STAR*D data: Logistic regression balanced covariates better than boosted CART 
Annals of epidemiology  2013;23(4):204-209.
Purpose
Propensity scores, a powerful bias-reduction tool, can balance treatment groups on measured covariates in non-experimental studies. We demonstrate the use of multiple propensity score estimation methods to optimize covariate balance.
Methods
We used secondary data from 1,292 adults with non-psychotic major depressive disorder in the Sequenced Treatment Alternatives to Relieve Depression trial (2001–2004). After initial citalopram treatment failed, patient preference influenced assignment to medication augmentation (n=565) or switch (n=727). To reduce selection bias, we used boosted classification and regression trees (BCART) and logistic regression iteratively to identify two potentially optimal propensity scores. We assessed and compared covariate balance.
Results
After iterative selection of interaction terms to minimize imbalance, logistic regression yielded better balance than BCART (average standardized absolute mean difference across 47 covariates: 0.03 vs. 0.08, matching; 0.02 vs. 0.05, weighting).
Conclusions
Comparing multiple propensity score estimates is a pragmatic way to optimize balance. Logistic regression remains valuable for this purpose. Simulation studies are needed to compare propensity score models under varying conditions. Such studies should consider more flexible estimation methods, such as logistic models with automated selection of interactions or hybrid models using main effects logistic regression instead of a constant log-odds as the initial model for BCART.
doi:10.1016/j.annepidem.2013.01.004
PMCID: PMC3773847  PMID: 23419508
propensity score; statistics as topic; models, statistical; epidemiologic methods; estimation techniques
11.  Application of a Propensity Score Approach for Risk Adjustment in Profiling Multiple Physician Groups on Asthma Care 
Health Services Research  2005;40(1):253-278.
Objectives
To develop a propensity score-based risk adjustment method to estimate the performance of 20 physician groups and to compare performance rankings using our method to a standard hierarchical regression-based risk adjustment method.
Data Sources/Study Setting
Mailed survey of patients from 20 California physician groups between July 1998 and February 1999.
Study Design
A cross-sectional analysis of physician group performance using patient satisfaction with asthma care. We compared the performance of the 20 physician groups using a novel propensity score-based risk adjustment method. More specifically, by using a multinomial logistic regression model we estimated for each patient the propensity scores, or probabilities, of having been treated by each of the 20 physician groups. To adjust for different distributions of characteristics across groups, patients cared for by a given group were first stratified into five strata based on their propensity of being in that group. Then, strata-specific performance was combined across the five strata. We compared our propensity score method to hierarchical model-based risk adjustment without using propensity scores. The impact of different risk-adjustment methods on performance was measured in terms of percentage changes in absolute and quintile ranking (AR, QR), and weighted κ of agreement on QR.
Results
The propensity score-based risk adjustment method balanced the distributions of all covariates among the 20 physician groups, providing evidence for validity. The propensity score-based method and the hierarchical model-based method without propensity scores provided substantially different rankings (75 percent of groups differed in AR, 50 percent differed in QR, weighted κ=0.69).
Conclusions
We developed and tested a propensity score method for profiling multiple physician groups. We found that our method could balance the distributions of covariates across groups and yielded substantially different profiles compared with conventional methods. Propensity score-based risk adjustment should be considered in studies examining quality comparisons.
doi:10.1111/j.1475-6773.2005.00352.x
PMCID: PMC1361136  PMID: 15663712
Physician group; profiling; propensity score; regression-to-the-mean; risk adjustment
12.  A Tutorial on Propensity Score Estimation for Multiple Treatments Using Generalized Boosted Models 
Statistics in medicine  2013;32(19):3388-3414.
The use of propensity scores to control for pretreatment imbalances on observed variables in non-randomized or observational studies examining the causal effects of treatments or interventions has become widespread over the past decade. For settings with two conditions of interest such as a treatment and a control, inverse probability of treatment weighted (IPTW) estimation with propensity scores estimated via boosted models has been shown in simulation studies to yield causal effect estimates with desirable properties. There are tools (e.g., the twang package in R) and guidance for implementing this method with two treatments. However, there is not such guidance for analyses of three or more treatments. The goals of this paper are two-fold: (1) to provide step-by-step guidance for researchers who want to implement propensity score weighting for multiple treatments and (2) to propose the use of generalized boosted models (GBM) for estimation of the necessary propensity score weights. We define the causal quantities that may be of interest to studies of multiple treatments and derive weighted estimators of those quantities. We present a detailed plan for using GBM to estimate propensity scores and using those scores to estimate weights and causal effects. Tools for assessing balance and overlap of pretreatment variables among treatment groups in the context of multiple treatments are also provided. A case study examining the effects of three treatment programs for adolescent substance abuse demonstrates the methods.
doi:10.1002/sim.5753
PMCID: PMC3710547  PMID: 23508673
Causal Effects; Causal Modeling; GBM; Inverse Probability of Treatment Weighting; TWANG
13.  Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples 
Statistics in Medicine  2009;28(25):3083-3107.
The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile–quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.
doi:10.1002/sim.3697
PMCID: PMC3472075  PMID: 19757444
balance; goodness-of-fit; observational study; propensity score; matching; propensity-score matching; standardized difference; bias
14.  Propensity score based comparison of long term outcomes with 3D conformal radiotherapy (3DCRT) versus Intensity Modulated Radiation Therapy (IMRT) in the treatment of esophageal cancer 
Purpose
Although 3DCRT is the worldwide standard for the treatment of esophageal cancers, IMRT improves dose conformality and reduces radiation exposure to normal tissues. We hypothesized that the dosimetric advantages of IMRT should translate to substantive benefits in clinical outcomes compared to 3DCRT.
Methods and Materials
Analysis was performed on 676 nonrandomized patients (3DCRT=413, IMRT=263) with stage Ib-IVa (AJCC 2002) esophageal cancers treated with chemoradiation at a single institution from 1998–2008. An inverse probability of treatment weighting (IPW) and inclusion of propensity score (treatment probability) as a covariate were used to compare overall survival (OS) time, time to local failure, and time to distant metastasis, while accounting for effects of other clinically relevant covariates. Propensity scores were estimated using logistic regression.
Results
A fitted multivariate inverse probability weighted (IPW)-adjusted Cox model showed that OS time was significantly associated with several well-known prognostic factors, along with radiation modality (IMRT vs 3DCRT, HR=0.72, p<0.001). Compared to IMRT, 3DCRT patients had a significantly greater risk of dying (72.6% vs 52.9%, IPW log rank test: p<0.0001) and for local-regional recurrence (LRR) (p=0.0038). There was no difference in cancer-specific mortality (Gray’s test, p=0.86), or distant metastasis (p=0.99) between the two groups. An increased cumulative incidence of cardiac deaths was seen in the 3DCRT group (p=0.049), but most deaths were undocumented (5 year estimate: 11.7% in 3DCRT vs 5.4% in IMRT, Gray’s test, p=0.0029).
Conclusions
Overall survival, locoregional control, and non-cancer related deaths were significantly better for IMRT compared to 3DCRT. Although these results need confirmation, IMRT should be considered for the treatment of esophageal cancer.
doi:10.1016/j.ijrobp.2012.02.015
PMCID: PMC3923623  PMID: 22867894
IMRT; 3D-conformal radiation therapy; chemoradiation; esophageal cancer; propensity score
15.  Estimating Heterogeneous Treatment Effects with Observational Data* 
Sociological methodology  2012;42(1):314-347.
Individuals differ not only in their background characteristics, but also in how they respond to a particular treatment, intervention, or stimulation. In particular, treatment effects may vary systematically by the propensity for treatment. In this paper, we discuss a practical approach to studying heterogeneous treatment effects as a function of the treatment propensity, under the same assumption commonly underlying regression analysis: ignorability. We describe one parametric method and two non-parametric methods for estimating interactions between treatment and the propensity for treatment. For the first method, we begin by estimating propensity scores for the probability of treatment given a set of observed covariates for each unit and construct balanced propensity score strata; we then estimate propensity score stratum-specific average treatment effects and evaluate a trend across them. For the second method, we match control units to treated units based on the propensity score and transform the data into treatment-control comparisons at the most elementary level at which such comparisons can be constructed; we then estimate treatment effects as a function of the propensity score by fitting a non-parametric model as a smoothing device. For the third method, we first estimate non-parametric regressions of the outcome variable as a function of the propensity score separately for treated units and for control units and then take the difference between the two non-parametric regressions. We illustrate the application of these methods with an empirical example of the effects of college attendance on womens fertility.
PMCID: PMC3591476  PMID: 23482633
causal effects; treatment effects; heterogeneity; propensity scores; matching
16.  Bias associated with using the estimated propensity score as a regression covariate 
Statistics in medicine  2013;33(1):74-87.
The use of propensity score methods to adjust for selection bias in observational studies has become increasingly popular in public health and medical research. A substantial portion of studies using propensity score adjustment treat the propensity score as a conventional regression predictor. Through a Monte Carlo simulation study, Austin and colleagues. investigated the bias associated with treatment effect estimation when the propensity score is used as a covariate in nonlinear regression models, such as logistic regression and Cox proportional hazards models. We show that the bias exists even in a linear regression model when the estimated propensity score is used and derive the explicit form of the bias. We also conduct an extensive simulation study to compare the performance of such covariate adjustment with propensity score stratification, propensity score matching, inverse probability of treatment weighted method, and nonparametric functional estimation using splines. The simulation scenarios are designed to reflect real data analysis practice. Instead of specifying a known parametric propensity score model, we generate the data by considering various degrees of overlap of the covariate distributions between treated and control groups. Propensity score matching excels when the treated group is contained within a larger control pool, while the model-based adjustment may have an edge when treated and control groups do not have too much overlap. Overall, adjusting for the propensity score through stratification or matching followed by regression or using splines, appears to be a good practical strategy.
doi:10.1002/sim.5884
PMCID: PMC4004383  PMID: 23787715
observational studies; matching; stratification; weighting
17.  Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies 
Pharmaceutical Statistics  2010;10(2):150-161.
In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. Copyright © 2010 John Wiley & Sons, Ltd.
doi:10.1002/pst.433
PMCID: PMC3120982  PMID: 20925139
propensity score; observational study; binary data; risk difference; propensity-score matching; Monte Carlo simulations; bias; matching
18.  Is Economic Growth Associated with Reduction in Child Undernutrition in India? 
PLoS Medicine  2011;8(3):e1000424.
An analysis of cross-sectional data from repeated household surveys in India, combined with data on economic growth, fails to find strong evidence that recent economic growth in India is associated with a reduction in child undernutrition.
Background
Economic growth is widely perceived as a major policy instrument in reducing childhood undernutrition in India. We assessed the association between changes in state per capita income and the risk of undernutrition among children in India.
Methods and Findings
Data for this analysis came from three cross-sectional waves of the National Family Health Survey (NFHS) conducted in 1992–93, 1998–99, and 2005–06 in India. The sample sizes in the three waves were 33,816, 30,383, and 28,876 children, respectively. After excluding observations missing on the child anthropometric measures and the independent variables included in the study, the analytic sample size was 28,066, 26,121, and 23,139, respectively, with a pooled sample size of 77,326 children. The proportion of missing data was 12%–20%. The outcomes were underweight, stunting, and wasting, defined as more than two standard deviations below the World Health Organization–determined median scores by age and gender. We also examined severe underweight, severe stunting, and severe wasting. The main exposure of interest was per capita income at the state level at each survey period measured as per capita net state domestic product measured in 2008 prices. We estimated fixed and random effects logistic models that accounted for the clustering of the data. In models that did not account for survey-period effects, there appeared to be an inverse association between state economic growth and risk of undernutrition among children. However, in models accounting for data structure related to repeated cross-sectional design through survey period effects, state economic growth was not associated with the risk of underweight (OR 1.01, 95% CI 0.98, 1.04), stunting (OR 1.02, 95% CI 0.99, 1.05), and wasting (OR 0.99, 95% CI 0.96, 1.02). Adjustment for demographic and socioeconomic covariates did not alter these estimates. Similar patterns were observed for severe undernutrition outcomes.
Conclusions
We failed to find consistent evidence that economic growth leads to reduction in childhood undernutrition in India. Direct investments in appropriate health interventions may be necessary to reduce childhood undernutrition in India.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Good nutrition during childhood is essential for health and survival. Undernourished children are more susceptible to infections and more likely to die from common ailments such as diarrhea than well-nourished children. Thus, globally, undernutrition contributes to more than a third of deaths among children under 5 years old. Experts use three physical measurements to determine whether a child is undernourished. An "underweight" child has a low weight for his or her age and gender when compared to the World Health Organization Child Growth Standards, which chart the growth of a reference population. A "stunted" child has a low height for his or her age; stunting is an indicator of chronic undernutrition. A "wasted" child has a low weight for his or her height; wasting is an indicator of acute undernutrition and often follows an earthquake, flood, or other emergency. The prevalence (how often a condition occurs within a population) of undernutrition is particularly high in India. Here, almost half of children under the age of 3 are underweight, about half are stunted, and a quarter are wasted.
Why Was This Study Done?
Although the prevalence of undernutrition in India is decreasing, progress is slow. Economic growth is widely regarded as the major way to reduce child undernutrition in India. Economic growth, the argument goes, will increase incomes, reduce poverty, and increase access to health services and nutrition. But some experts believe that better education for women and reduced household sizes might have a greater influence on child undernutrition than economic growth. And others believe that healthier, better fed populations lead to increased economic growth rather than the other way around. In this study, the researchers assess the association between economic growth and child undernutrition in India by analyzing the relationship between changes in per capita income in individual Indian states and the individual risk of undernutrition among children in India.
What Did the Researchers Do and Find?
For their analyses, the researchers used data on 77,326 Indian children that were collected in the 1992–93, 1998–99, and 2005–06 National Family Health Surveys; these surveys are part of the Demographic and Health Surveys, a project that collects health data in developing countries to aid health-system development. The researchers used eight "ecological" statistical models to investigate whether there was an association between underweight, stunting, or wasting and per capita income at the state level in each survey period; these ecological models assumed that the risk of undernutrition was the same for every child in a state. They also used 10 "multilevel" models to quantify the association between state-level growth and the individual-level risk of undernutrition. The multilevel models also took account of various combinations of additional factors likely to affect undernutrition (for example, mother's education and marital status). In five of the ecological models, there was no statistically significant association between state economic growth and average levels of child undernutrition at the state level (statistically significant associations are unlikely to have arisen by chance). Similarly, in eight of the multilevel models, there was no statistical evidence for an association between economic growth and undernutrition.
What Do These Findings Mean?
These findings provide little statistical support for the widely held assumption that there is an association between the risk of child undernutrition and economic growth in India. By contrast, a previous study that used data from 63 countries collected over 26 years did find evidence that national economic growth was inversely associated with the risk of child undernutrition. However, this study was an ecological study and did not, therefore, allow for the possibility that the risk of undernutrition might vary between children in one state and between states. Further, the target of inference in this study was "explaining" between-country differences, while the target of inference in this analysis was explaining within country differences over time. The researchers suggest several reasons why there might not be a clear association between economic growth and undernutrition in India. For example, they suggest, economic growth in India might have only benefitted privileged sections of society. Whether this or an alternative explanation accounts for the lack of an association, it seems likely that further reductions in the prevalence of child undernutrition in India (and possibly in other developing countries) will require direct investment in health and health-related programs; expecting economic growth to improve child undernutrition might not be a viable option after all.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000424.
The charity UNICEF, which protects the rights of children and young people around the world, provides detailed statistics on child undernutrition and on child nutrition and undernutrition in India
The WHO Child Growth Standards are available (in several languages)
More information on the Demographic and Health Surveys and on the Indian National Family Health Surveys is available
The United Nations Millennium Development Goals website provides information on ongoing world efforts to reduce hunger and child mortality
doi:10.1371/journal.pmed.1000424
PMCID: PMC3050933  PMID: 21408084
19.  Adding propensity scores to pure prediction models fails to improve predictive performance 
PeerJ  2013;1:e123.
Background. Propensity score usage seems to be growing in popularity leading researchers to question the possible role of propensity scores in prediction modeling, despite the lack of a theoretical rationale. It is suspected that such requests are due to the lack of differentiation regarding the goals of predictive modeling versus causal inference modeling. Therefore, the purpose of this study is to formally examine the effect of propensity scores on predictive performance. Our hypothesis is that a multivariable regression model that adjusts for all covariates will perform as well as or better than those models utilizing propensity scores with respect to model discrimination and calibration.
Methods. The most commonly encountered statistical scenarios for medical prediction (logistic and proportional hazards regression) were used to investigate this research question. Random cross-validation was performed 500 times to correct for optimism. The multivariable regression models adjusting for all covariates were compared with models that included adjustment for or weighting with the propensity scores. The methods were compared based on three predictive performance measures: (1) concordance indices; (2) Brier scores; and (3) calibration curves.
Results. Multivariable models adjusting for all covariates had the highest average concordance index, the lowest average Brier score, and the best calibration. Propensity score adjustment and inverse probability weighting models without adjustment for all covariates performed worse than full models and failed to improve predictive performance with full covariate adjustment.
Conclusion. Propensity score techniques did not improve prediction performance measures beyond multivariable adjustment. Propensity scores are not recommended if the analytical goal is pure prediction modeling.
doi:10.7717/peerj.123
PMCID: PMC3740143  PMID: 23940836
Prediction; Propensity score; Calibration curve; Concordance index; Multivariable regression
20.  Determinants of Medicare All-Cause Costs Among Elderly Patients with Renal Cell Carcinoma 
Journal of Managed Care Pharmacy  2011;17(8):610-620.
BACKGROUND
Renal cell carcinoma (RCC) is the third most common genitourinary cancer and the most common primary renal neoplasm. Estimates of the economic burden of RCC in the United States range from approximately $400 million (in year 2000 dollars) to $4.4 billion (in year 2005 dollars). Actual costs associated with RCC, particularly for elderly Medicare patients who account for 46% of U.S. patients hospitalized for RCC, are poorly understood.
OBJECTIVE
To estimate all-cause health care costs associated with RCC using the combined Surveillance Epidemiology and End Results (SEER)-Medicare database.
METHODS
The sample was limited to non-HMO patients aged 65 years or older who were diagnosed with a first primary RCC (SEER site recode 59, kidney and renal pelvis) between 1995 and 2002. Our final sample included 4,938 patients with RCC and 9,876 non-HMO noncancer comparison group cases without chronic renal disease drawn from the SEER 5% Medicare sample and matched by a propensity score calculated from age, gender, race/ethnicity, and comorbidities. Costs were defined as payments made by Medicare for all-cause medical treatments including inpatient stays, emergency room visits, outpatient procedures, office visits, home health visits, durable medical equipment, and hospice care, but excluding out-patient prescription drugs. Using the method of Bang and Tsiatis (2000), we estimated cumulative costs at 1 and 5 years by estimating average costs for each patient in each month up to 60 months following diagnosis. Total costs were weighted sums of monthly costs, where weights were the inverse probability that the patient was not censored, and inverse probabilities were estimated by Kaplan-Meier estimates of time to censoring. Using the method of Lin (2000), we performed multivariate analyses of costs by fitting each of the 60 monthly costs to linear models that controlled for demographic characteristics and comorbidities. Marginal effects of covariates on 1- and 5-year costs were obtained by summing the coefficients for months 1 through 12 and months 1 through 60, respectively. Confidence intervals were obtained by bootstrapping.
RESULTS
Patients with RCC and matched comparison group cases had similar demographic characteristics, comorbidities, and chronic conditions. At the start of the fifth year post-diagnosis, there were 1,208 Medicare RCC cases of the original 4,938 (20.8%). Mean costs per patient per month (PPPM) in the first year were $3,673 for patients with RCC and $793 for comparison group patients. PPPM costs were higher for RCC patients with more advanced stage (i.e., regional or distant) disease. Average cumulative total costs for RCC patients were $33,605 per patient in the first year following diagnosis and $59,397 per patient in the first 5 years following diagnosis. Several patient-specific factors were associated with 1- and 5-year costs in multivariate analyses, including age, race/ethnicity, and comorbidities. Among RCC patients, treatment with surgery and radiation was associated with higher costs per patient than treatment with surgery alone at 1 year ($24,556, 95% CI = $16,673–$32,940) and 5 years ($30,540, 95% CI = $17,853–$43,648). RCC patients who received chemotherapy as part of their treatment regimen also had significantly higher costs per patient than those who received surgery alone at 1 year ($15,144, 95% CI = $9,979–$20,344) and 5 years ($13,440, 95% CI = $1,257–$27,572).
CONCLUSIONS
Newly diagnosed RCC is associated with a significant economic burden, which is largely determined by several patient characteristics, disease stage, and treatment choice.
PMCID: PMC3350946  PMID: 21942302
21.  Constructing Inverse Probability Weights for Marginal Structural Models 
American Journal of Epidemiology  2008;168(6):656-664.
The method of inverse probability weighting (henceforth, weighting) can be used to adjust for measured confounding and selection bias under the four assumptions of consistency, exchangeability, positivity, and no misspecification of the model used to estimate weights. In recent years, several published estimates of the effect of time-varying exposures have been based on weighted estimation of the parameters of marginal structural models because, unlike standard statistical methods, weighting can appropriately adjust for measured time-varying confounders affected by prior exposure. As an example, the authors describe the last three assumptions using the change in viral load due to initiation of antiretroviral therapy among 918 human immunodeficiency virus-infected US men and women followed for a median of 5.8 years between 1996 and 2005. The authors describe possible tradeoffs that an epidemiologist may encounter when attempting to make inferences. For instance, a tradeoff between bias and precision is illustrated as a function of the extent to which confounding is controlled. Weight truncation is presented as an informal and easily implemented method to deal with these tradeoffs. Inverse probability weighting provides a powerful methodological tool that may uncover causal effects of exposures that are otherwise obscured. However, as with all methods, diagnostics and sensitivity analyses are essential for proper use.
doi:10.1093/aje/kwn164
PMCID: PMC2732954  PMID: 18682488
bias (epidemiology); causality; confounding factors (epidemiology); probability weighting; regression model
22.  Using imputed pre-treatment cholesterol in a propensity score model to reduce confounding by indication: results from the multi-ethnic study of atherosclerosis 
Background
Studying the effects of medications on endpoints in an observational setting is an important yet challenging problem due to confounding by indication. The purpose of this study is to describe methodology for estimating such effects while including prevalent medication users. These techniques are illustrated in models relating statin use to cardiovascular disease (CVD) in a large multi-ethnic cohort study.
Methods
The Multi-Ethnic Study of Atherosclerosis (MESA) includes 6814 participants aged 45-84 years free of CVD. Confounding by indication was mitigated using a two step approach: First, the untreated values of cholesterol were treated as missing data and the values imputed as a function of the observed treated value, dose and type of medication, and participant characteristics. Second, we construct a propensity-score modeling the probability of medication initiation as a function of measured covariates and estimated pre-treatment cholesterol value. The effect of statins on CVD endpoints were assessed using weighted Cox proportional hazard models using inverse probability weights based on the propensity score.
Results
Based on a meta-analysis of randomized controlled trials (RCT) statins are associated with a reduced risk of CVD (relative risk ratio = 0.73, 95% CI: 0.70, 0.77). In an unweighted Cox model adjusting for traditional risk factors we observed little association of statins with CVD (hazard ratio (HR) = 0.97, 95% CI: 0.60, 1.59). Using weights based on a propensity model for statins that did not include the estimated pre-treatment cholesterol we observed a slight protective association (HR = 0.92, 95% CI: 0.54-1.57). Results were similar using a new-user design where prevalent users of statins are excluded (HR = 0.91, 95% CI: 0.45-1.80). Using weights based on a propensity model with estimated pre-treatment cholesterol the effects of statins (HR = 0.74, 95% CI: 0.38, 1.42) were consistent with the RCT literature.
Conclusions
The imputation of pre-treated cholesterol levels for participants on medication at baseline in conjunction with a propensity score yielded estimates that were consistent with the RCT literature. These techniques could be useful in any example where inclusion of participants exposed at baseline in the analysis is desirable, and reasonable estimates of pre-exposure biomarker values can be estimated.
doi:10.1186/1471-2288-13-81
PMCID: PMC3694006  PMID: 23800038
Multiple imputation; Confounding by indication; Propensity score; Inverse probability of treatment weights; Statins
23.  A comparison of perioperative outcomes of Video-Assisted Thoracic Surgical (VATS) Lobectomy with open thoracotomy and lobectomy: Results of an analysis using propensity score based weighting 
Background
Randomized trials comparing VATS lobectomy to open lobectomy are of small size. We analyzed a case-control series using propensity score-weighting to adjust for important covariates in order to compare the clinical outcomes of the two techniques.
Methods
We compared patients undergoing lobectomy for clinical stage I lung cancer (NSCLC) by either VATS or open (THOR) methods. Inverse probability of treatment weighted estimators, with weights derived from propensity scores, were used to adjust cohorts for determinants of perioperative morbidity and mortality including age, gender, preop FEV1, ASA class, and Charlson Comorbidity Index (CCI). Bootstrap methods provided standard errors. Endpoints were postoperative stay (LOS), chest tube duration, complications, and lymph node retrieval.
Results
We analyzed 136 consecutive lobectomy patients. Operative mortality was 1/62 (1.6%) for THOR and 1/74 (1.4%) for VATS, P = 1.00. 5/74 (6.7%) VATS were converted to open procedures. Adjusted median LOS was 7 days (THOR) versus 4 days (VATS), P < 0.0001, HR = 0.33. Adjusted median chest tube duration (days) was 5 (THOR) versus 3 (VATS), P < 0.0001, HR = 0.42. Complication rates were 39% (THOR) versus 34% (VATS), P = 0.61. Adjusted mean number of lymph nodes dissected per patient was 18.1 (THOR) versus 14.8 (VATS), p = 0.17.
Conclusions
After balancing covariates that affect morbidity, mortality and LOS in this case-control series using propensity-weighting, the results confirm that VATS lobectomy is associated with a statistically significant shorter LOS, similar mortality and complication rates and similar rates of lymph node removal in patients with clinical stage I NSCLC.
doi:10.1186/1750-1164-4-1
PMCID: PMC2848683  PMID: 20307297
24.  An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies 
Multivariate Behavioral Research  2011;46(3):399-424.
The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.
doi:10.1080/00273171.2011.568786
PMCID: PMC3144483  PMID: 21818162
25.  Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies 
Epidemiology (Cambridge, Mass.)  2009;20(4):488-495.
Overadjustment is defined inconsistently. This term is meant to describe control (eg, by regression adjustment, stratification, or restriction) for a variable that either increases net bias or decreases precision without affecting bias. We define overadjustment bias as control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome. We define unnecessary adjustment as control for a variable that does not affect bias of the causal relation between exposure and outcome but may affect its precision. We use causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish overadjustment bias from unnecessary adjustment. Using simulations, we quantify the amount of bias associated with overadjustment. Moreover, we show that this bias is based on a different causal structure from confounding or selection biases. Overadjustment bias is not a finite sample bias, while inefficiencies due to control for unnecessary variables are a function of sample size.
doi:10.1097/EDE.0b013e3181a819a1
PMCID: PMC2744485  PMID: 19525685

Results 1-25 (1189552)