PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (496509)

Clipboard (0)
None

Related Articles

1.  Collaborative Double Robust Targeted Maximum Likelihood Estimation* 
Collaborative double robust targeted maximum likelihood estimators represent a fundamental further advance over standard targeted maximum likelihood estimators of a pathwise differentiable parameter of a data generating distribution in a semiparametric model, introduced in van der Laan, Rubin (2006). The targeted maximum likelihood approach involves fluctuating an initial estimate of a relevant factor (Q) of the density of the observed data, in order to make a bias/variance tradeoff targeted towards the parameter of interest. The fluctuation involves estimation of a nuisance parameter portion of the likelihood, g. TMLE has been shown to be consistent and asymptotically normally distributed (CAN) under regularity conditions, when either one of these two factors of the likelihood of the data is correctly specified, and it is semiparametric efficient if both are correctly specified.
In this article we provide a template for applying collaborative targeted maximum likelihood estimation (C-TMLE) to the estimation of pathwise differentiable parameters in semi-parametric models. The procedure creates a sequence of candidate targeted maximum likelihood estimators based on an initial estimate for Q coupled with a succession of increasingly non-parametric estimates for g. In a departure from current state of the art nuisance parameter estimation, C-TMLE estimates of g are constructed based on a loss function for the targeted maximum likelihood estimator of the relevant factor Q that uses the nuisance parameter to carry out the fluctuation, instead of a loss function for the nuisance parameter itself. Likelihood-based cross-validation is used to select the best estimator among all candidate TMLE estimators of Q0 in this sequence. A penalized-likelihood loss function for Q is suggested when the parameter of interest is borderline-identifiable.
We present theoretical results for “collaborative double robustness,” demonstrating that the collaborative targeted maximum likelihood estimator is CAN even when Q and g are both mis-specified, providing that g solves a specified score equation implied by the difference between the Q and the true Q0. This marks an improvement over the current definition of double robustness in the estimating equation literature.
We also establish an asymptotic linearity theorem for the C-DR-TMLE of the target parameter, showing that the C-DR-TMLE is more adaptive to the truth, and, as a consequence, can even be super efficient if the first stage density estimator does an excellent job itself with respect to the target parameter.
This research provides a template for targeted efficient and robust loss-based learning of a particular target feature of the probability distribution of the data within large (infinite dimensional) semi-parametric models, while still providing statistical inference in terms of confidence intervals and p-values. This research also breaks with a taboo (e.g., in the propensity score literature in the field of causal inference) on using the relevant part of likelihood to fine-tune the fitting of the nuisance parameter/censoring mechanism/treatment mechanism.
doi:10.2202/1557-4679.1181
PMCID: PMC2898626  PMID: 20628637
asymptotic linearity; coarsening at random; causal effect; censored data; crossvalidation; collaborative double robust; double robust; efficient influence curve; estimating function; estimator selection; influence curve; G-computation; locally efficient; loss-function; marginal structural model; maximum likelihood estimation; model selection; pathwise derivative; semiparametric model; sieve; super efficiency; super-learning; targeted maximum likelihood estimation; targeted nuisance parameter estimator selection; variable importance
2.  A Targeted Maximum Likelihood Estimator for Two-Stage Designs 
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.
doi:10.2202/1557-4679.1217
PMCID: PMC3083136  PMID: 21556285
two-stage designs; targeted maximum likelihood estimators; nested case control studies; double robust estimation
3.  The Relative Performance of Targeted Maximum Likelihood Estimators 
There is an active debate in the literature on censored data about the relative performance of model based maximum likelihood estimators, IPCW-estimators, and a variety of double robust semiparametric efficient estimators. Kang and Schafer (2007) demonstrate the fragility of double robust and IPCW-estimators in a simulation study with positivity violations. They focus on a simple missing data problem with covariates where one desires to estimate the mean of an outcome that is subject to missingness. Responses by Robins, et al. (2007), Tsiatis and Davidian (2007), Tan (2007) and Ridgeway and McCaffrey (2007) further explore the challenges faced by double robust estimators and offer suggestions for improving their stability. In this article, we join the debate by presenting targeted maximum likelihood estimators (TMLEs). We demonstrate that TMLEs that guarantee that the parametric submodel employed by the TMLE procedure respects the global bounds on the continuous outcomes, are especially suitable for dealing with positivity violations because in addition to being double robust and semiparametric efficient, they are substitution estimators. We demonstrate the practical performance of TMLEs relative to other estimators in the simulations designed by Kang and Schafer (2007) and in modified simulations with even greater estimation challenges.
doi:10.2202/1557-4679.1308
PMCID: PMC3173607  PMID: 21931570
censored data; collaborative double robustness; collaborative targeted maximum likelihood estimation; double robust; estimator selection; inverse probability of censoring weighting; locally efficient estimation; maximum likelihood estimation; semiparametric model; targeted maximum likelihood estimation; targeted minimum loss based estimation; targeted nuisance parameter estimator selection
4.  Estimation of a non-parametric variable importance measure of a continuous exposure 
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x0 with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [23, 22]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation study and application to a genomic example that originally motivated this article. In the latter, the exposure X and response Y are, respectively, the DNA copy number and expression level of a given gene in a cancer cell. Here, the reference level is x0 = 2, that is the expected DNA copy number in a normal cell. The confounder is a measure of the methylation of the gene. The fact that there is no clear biological indication that X and Y can be interpreted as an exposure and a response, respectively, is not problematic.
doi:10.1214/12-EJS703
PMCID: PMC3546832  PMID: 23336014
Variable importance measure; non-parametric estimation; targeted minimum loss estimation; robustness; asymptotics
5.  Targeted Maximum Likelihood Estimation of Effect Modification Parameters in Survival Analysis 
The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this paper to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels.
doi:10.2202/1557-4679.1307
PMCID: PMC3083138  PMID: 21556287
causal effect; semi-parametric; censored longitudinal data; double robust; efficient influence curve; influence curve; G-computation; Targeted Maximum Likelihood Estimation; Cox-proportional hazards; survival analysis
6.  Dimension reduction with gene expression data using targeted variable importance measurement 
BMC Bioinformatics  2011;12:312.
Background
When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms.
Results
We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest.
Conclusions
We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.
doi:10.1186/1471-2105-12-312
PMCID: PMC3166941  PMID: 21849016
7.  Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation 
Statistics in medicine  2009;28(1):39-64.
SUMMARY
Covariate adjustment using linear models for continuous outcomes in randomized trials has been shown to increase efficiency and power over the unadjusted method in estimating the marginal effect of treatment. However, for binary outcomes, investigators generally rely on the unadjusted estimate as the literature indicates that covariate-adjusted estimates based on the logistic regression models are less efficient. The crucial step that has been missing when adjusting for covariates is that one must integrate/average the adjusted estimate over those covariates in order to obtain the marginal effect. We apply the method of targeted maximum likelihood estimation (tMLE) to obtain estimators for the marginal effect using covariate adjustment for binary outcomes. We show that the covariate adjustment in randomized trials using the logistic regression models can be mapped, by averaging over the covariate(s), to obtain a fully robust and efficient estimator of the marginal effect, which equals a targeted maximum likelihood estimator. This tMLE is obtained by simply adding a clever covariate to a fixed initial regression. We present simulation studies that demonstrate that this tMLE increases efficiency and power over the unadjusted method, particularly for smaller sample sizes, even when the regression model is mis-specified.
doi:10.1002/sim.3445
PMCID: PMC2857590  PMID: 18985634
clinical trails; efficiency; covariate adjustment; variable selection
8.  Usual Physical Activity and Hip Fracture in Older Men: An Application of Semiparametric Methods to Observational Data 
American Journal of Epidemiology  2011;173(5):578-586.
Few studies have examined the relation between usual physical activity level and rate of hip fracture in older men or applied semiparametric methods from the causal inference literature that estimate associations without assuming a particular parametric model. Using the Physical Activity Scale for the Elderly, the authors measured usual physical activity level at baseline (2000–2002) in 5,682 US men ≥65 years of age who were enrolled in the Osteoporotic Fractures in Men Study. Physical activity levels were classified as low (bottom quartile of Physical Activity Scale for the Elderly score), moderate (middle quartiles), or high (top quartile). Hip fractures were confirmed by central review. Marginal associations between physical activity and hip fracture were estimated with 3 estimation methods: inverse probability-of-treatment weighting, G-computation, and doubly robust targeted maximum likelihood estimation. During 6.5 years of follow-up, 95 men (1.7%) experienced a hip fracture. The unadjusted risk of hip fracture was lower in men with a high physical activity level versus those with a low physical activity level (relative risk = 0.51, 95% confidence interval: 0.28, 0.92). In semiparametric analyses that controlled confounding, hip fracture risk was not lower with moderate (e.g., targeted maximum likelihood estimation relative risk = 0.92, 95% confidence interval: 0.62, 1.44) or high (e.g., targeted maximum likelihood estimation relative risk = 0.88, 95% confidence interval: 0.53, 2.03) physical activity relative to low. This study does not support a protective effect of usual physical activity on hip fracture in older men.
doi:10.1093/aje/kwq405
PMCID: PMC3105440  PMID: 21303805
aged; confounding factors (epidemiology); exercise; hip fractures; men; motor activity; prospective studies
9.  Population intervention models in causal inference 
Biometrika  2008;95(1):35-47.
Summary
We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study.
PMCID: PMC2464276  PMID: 18629347
Attributable risk; Causal inference; Confounding; Counterfactual; Doubly-robust estimation; G-computation estimation; Inverse-probability-of-treatment-weighted estimation
10.  HANDLING MISSING DATA BY DELETING COMPLETELY OBSERVED RECORDS 
When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao and Lipsitz, 1992). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when when the missingness proportion is large.
doi:10.1016/j.jspi.2008.10.024
PMCID: PMC2674251  PMID: 20160863
11.  Insights into Different Results from Different Causal Contrasts in the Presence of Effect-Measure Modification 
Purpose
Both propensity score (PS) matching and inverse probability of treatment weighting (IPTW) allow causal contrasts, albeit different ones. In the presence of effect-measure modification, different analytic approaches produce different summary estimates.
Methods
We present a spreadsheet example that assumes a dichotomous exposure, covariate, and outcome. The covariate can be a confounder or not and a modifier of the relative risk (RR) or not. Based on expected cell counts, we calculate RR estimates using five summary estimators: Mantel-Haenszel (MH), maximum likelihood (ML), the standardized mortality ratio (SMR), PS matching, and a common implementation of IPTW.
Results
Without effect-measure modification, all approaches produce identical results. In the presence of effect-measure modification and regardless of the presence of confounding, results from the SMR and PS are identical, but IPTW can produce strikingly different results (e.g. RR=0.83 vs. RR=1.50). In such settings, MH and ML do not estimate a population parameter and results for those measures fall between PS and IPTW.
Conclusions
Discrepancies between PS and IPTW reflect different weighting of stratum specific effect estimates. SMR and PS matching assign weight according to the distribution of the effect-measure modifier in the exposed subpopulation, whereas IPTW assigns weights according to the distribution of the entire study population. In pharmacoepidemiology, contraindications to treatment that also modify the effect might be prevalent in the population, but would be rare among the exposed. In such settings, estimating the effect of exposure in the exposed rather than the whole population is preferable.
doi:10.1002/pds.1231
PMCID: PMC1581494  PMID: 16528796
epidemiologic methods; confounding factors (epidemiology); bias (epidemiology); effect measure modification; interaction; propensity score; inverse probability of treatment weighting; standardized mortality ratio; Mantel-Haenszel; maximum likelihood
12.  A Partially Linear Regression Model for Data from an Outcome-Dependent Sampling Design 
Summary
The outcome dependent sampling scheme has been gaining attention in both the statistical literature and applied fields. Epidemiological and environmental researchers have been using it to select the observations for more powerful and cost-effective studies. Motivated by a study of the effect of in utero exposure to polychlorinated biphenyls on children’s IQ at age 7, in which the effect of an important confounding variable is nonlinear, we consider a semi-parametric regression model for data from an outcome-dependent sampling scheme where the relationship between the response and covariates is only partially parameterized. We propose a penalized spline maximum likelihood estimation (PSMLE) for inference on both the parametric and the nonparametric components and develop their asymptotic properties. Through simulation studies and an analysis of the IQ study, we compare the proposed estimator with several competing estimators. Practical considerations of implementing those estimators are discussed.
doi:10.1111/j.1467-9876.2010.00756.x
PMCID: PMC3181132  PMID: 21966030
Outcome dependent sampling; Estimated likelihood; Semiparametric method; Penalized spline
13.  Robust Model-Free Multiclass Probability Estimation 
Classical statistical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). These methods often make certain assumptions on the form of probability functions or on the underlying distributions of subclasses. In this article, we develop a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. In particular, the new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate competitive performance of the new approach and compare it with several other existing methods.
doi:10.1198/jasa.2010.tm09107
PMCID: PMC2990887  PMID: 21113386
Fisher consistency; Hard classification; Multicategory classification; Probability estimation; Soft classification; SVM
14.  Propensity Score-based Sensitivity Analysis Method for Uncontrolled Confounding 
American Journal of Epidemiology  2011;174(3):345-353.
The authors developed a sensitivity analysis method to address the issue of uncontrolled confounding in observational studies. In this method, the authors use a 1-dimensional function of the propensity score, which they refer to as the sensitivity function (SF), to quantify the hidden bias due to unmeasured confounders. The propensity score is defined as the conditional probability of being treated given the measured covariates. Then the authors construct SF-corrected inverse-probability-weighted estimators to draw inference on the causal treatment effect. This approach allows analysts to conduct a comprehensive sensitivity analysis in a straightforward manner by varying sensitivity assumptions on both the functional form and the coefficients in the 1-dimensional SF. Furthermore, 1-dimensional continuous functions can be well approximated by low-order polynomial structures (e.g., linear, quadratic). Therefore, even if the imposed SF is practically certain to be incorrect, one can still hope to obtain valuable information on treatment effects by conducting a comprehensive sensitivity analysis using polynomial SFs with varying orders and coefficients. The authors demonstrate the new method by implementing it in an asthma study which evaluates the effect of clinician prescription patterns regarding inhaled corticosteroids for children with persistent asthma on selected clinical outcomes.
doi:10.1093/aje/kwr096
PMCID: PMC3202161  PMID: 21659349
confounding factors (epidemiology); inverse probability weighting; propensity score; sensitivity analysis; sensitivity function; uncontrolled confounding
15.  Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery 
In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.
The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference based on the influence curve. We demonstrate these properties through simulation under correct and incorrect model specification, and apply our method in practice to estimating the activity of transcription factor (TF) over cell cycle in yeast. We specifically target the importance of SWI4, SWI6, MBP1, MCM1, ACE2, FKH2, NDD1, and SWI5.
The semiparametric model allows us to determine the importance of a TF at specific time points by specifying time indicators as potential effect modifiers of the TF. Our results are promising, showing significant importance trends during the expected time periods. This methodology can also be used as a variable importance analysis tool to assess the effect of a large number of variables such as gene expressions or single nucleotide polymorphisms.
doi:10.2202/1544-6115.1553
PMCID: PMC3122882  PMID: 21291412
targeted maximum likelihood; semiparametric; repeated measures; longitudinal; transcription factors
16.  A Marginal Structural Modeling Approach to Assess the Cumulative Effect of Drug Treatment on the Later Drug Use Abstinence 
Journal of drug issues  2010;40(1):221-240.
In this article, we applied a marginal structural model (MSM) to estimate the effect on later drug use of drug treatments occurring over 10 years following first use of the primary drug. The study was based on the longitudinal data that were collected in three projects among 421 subjects and covered 15 years since first use of their primary drug. The cumulative treatment effect was estimated by the inverse-probability of treatment weighted estimators of MSM as well as the traditional regression analysis. Contrary to the traditional regression analysis, results of the MSM showed that the cumulative treatment occurring over the 10 years significantly increased the likelihood of drug use abstinence in the subsequent 5-year period. From both the statistical and empirical point of view, MSM is a better approach to assessing cumulative treatment effects, considering its advantage of controlling for self-selection bias over time.
PMCID: PMC3090640  PMID: 21566677
Cumulative treatment effect; causal inference; regression; marginal structural model
17.  Empirical-likelihood-based semiparametric inference for the treatment effect in the two-sample problem with censoring 
Biometrika  2005;92(2):271-282.
Summary
To compare two samples of censored data, we propose a unified semiparametric inference for the parameter of interest when the model for one sample is parametric and that for the other is nonparametric. The parameter of interest may represent, for example, a comparison of means, or survival probabilities. The confidence interval derived from the semiparametric inference, which is based on the empirical likelihood principle, improves its counterpart constructed from the common estimating equation. The empirical likelihood ratio is shown to be asymptotically chi-squared. Simulation experiments illustrate that the method based on the empirical likelihood substantially outperforms the method based on the estimating equation. A real dataset is analysed.
doi:10.1093/biomet/92.2.271
PMCID: PMC2825713  PMID: 20179776
Estimating equation; Confidence interval; Coverage; Kaplan-Meier estimation; Empirical likelihood ratio; Empirical likelihood function
18.  Causal analysis of case-control data 
In a series of papers, Robins and colleagues describe inverse probability of treatment weighted (IPTW) estimation in marginal structural models (MSMs), a method of causal analysis of longitudinal data based on counterfactual principles. This family of statistical techniques is similar in concept to weighting of survey data, except that the weights are estimated using study data rather than defined so as to reflect sampling design and post-stratification to an external population. Several decades ago Miettinen described an elementary method of causal analysis of case-control data based on indirect standardization. In this paper we extend the Miettinen approach using ideas closely related to IPTW estimation in MSMs. The technique is illustrated using data from a case-control study of oral contraceptives and myocardial infarction.
doi:10.1186/1742-5573-3-2
PMCID: PMC1431532  PMID: 16441879
19.  A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality 
Multivariate behavioral research  2011;46(1):119-151.
Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.
doi:10.1080/00273171.2011.540480
PMCID: PMC3266945  PMID: 22287812 CAMSID: cams1834
20.  A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality 
Multivariate Behavioral Research  2011;46(1):119-151.
Propensity score methods allow investigators to estimate causal treatment effects using observational or nonrandomized data. In this article we provide a practical illustration of the appropriate steps in conducting propensity score analyses. For illustrative purposes, we use a sample of current smokers who were discharged alive after being hospitalized with a diagnosis of acute myocardial infarction. The exposure of interest was receipt of smoking cessation counseling prior to hospital discharge and the outcome was mortality with 3 years of hospital discharge. We illustrate the following concepts: first, how to specify the propensity score model; second, how to match treated and untreated participants on the propensity score; third, how to compare the similarity of baseline characteristics between treated and untreated participants after stratifying on the propensity score, in a sample matched on the propensity score, or in a sample weighted by the inverse probability of treatment; fourth, how to estimate the effect of treatment on outcomes when using propensity score matching, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, or covariate adjustment using the propensity score. Finally, we compare the results of the propensity score analyses with those obtained using conventional regression adjustment.
doi:10.1080/00273171.2011.540480
PMCID: PMC3266945  PMID: 22287812
21.  On L∞ convergence of Neumann series approximation in missing data problems 
Statistics & probability letters  2010;80(9-10):864-873.
The inverse of the nonparametric information operator is key to finding doubly robust estimators and the semiparametric efficient estimator in missing data problems. It is known that no closed-form expression for the inverse of the nonparametric information operator exists when missing data form nonmonotone patterns. Neumann series is usually applied to approximate the inverse. However, Neumann series approximation is only known to converge in L2 norm, which is not sufficient for establishing statistical properties of the estimators yielded from the approximation. In this article, we show that L∞ convergence of the Neumann series approximations to the inverse of the non-parametric information operator and to the efficient scores in missing data problems can be obtained under very simple conditions. This paves the way to the study of the asymptotic properties of the doubly robust estimators and the locally semiparametric efficient estimator in those difficult situations.
doi:10.1016/j.spl.2010.01.021
PMCID: PMC2850222  PMID: 20383317
Auxiliary information; Induction; Rate of convergence; Weighted estimating equation
22.  Estimating causal effects from epidemiological data 
In ideal randomised experiments, association is causation: association measures can be interpreted as effect measures because randomisation ensures that the exposed and the unexposed are exchangeable. On the other hand, in observational studies, association is not generally causation: association measures cannot be interpreted as effect measures because the exposed and the unexposed are not generally exchangeable. However, observational research is often the only alternative for causal inference. This article reviews a condition that permits the estimation of causal effects from observational data, and two methods—standardisation and inverse probability weighting—to estimate population causal effects under that condition. For simplicity, the main description is restricted to dichotomous variables and assumes that no random error attributable to sampling variability exists. The appendix provides a generalisation of inverse probability weighting.
doi:10.1136/jech.2004.029496
PMCID: PMC2652882  PMID: 16790829
causal inference; confounding; inverse probability weighting; randomisation; standardisation
23.  Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation* 
Matched case-control study designs are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case-control studies is a gain in efficiency. Methods for analyzing matched case-control studies have focused on utilizing conditional logistic regression models that provide conditional and not causal estimates of the odds ratio. This article investigates the use of case-control weighted targeted maximum likelihood estimation to obtain marginal causal effects in matched case-control study designs. We compare the use of case-control weighted targeted maximum likelihood estimation in matched and unmatched designs in an effort to explore which design yields the most information about the marginal causal effect. The procedures require knowledge of certain prevalence probabilities and were previously described by van der Laan (2008). In many practical situations where a causal effect is the parameter of interest, researchers may be better served using an unmatched design.
doi:10.2202/1557-4679.1127
PMCID: PMC2827892  PMID: 20231866
24.  Statistical inference on the penetrances of rare genetic mutations based on a case–family design 
Biostatistics (Oxford, England)  2010;11(3):519-532.
We propose a formal statistical inference framework for the evaluation of the penetrance of a rare genetic mutation using family data generated under a kin–cohort type of design, where phenotype and genotype information from first-degree relatives (sibs and/or offspring) of case probands carrying the targeted mutation are collected. Our approach is built upon a likelihood model with some minor assumptions, and it can be used for age-dependent penetrance estimation that permits adjustment for covariates. Furthermore, the derived likelihood allows unobserved risk factors that are correlated within family members. The validity of the approach is confirmed by simulation studies. We apply the proposed approach to estimating the age-dependent cancer risk among carriers of the MSH2 or MLH1 mutation.
doi:10.1093/biostatistics/kxq009
PMCID: PMC2883298  PMID: 20179148
Case–family design; Penetrance; Proportional hazards model; Rare mutation; Unobserved risk factors
25.  Evaluating bias correction in weighted proportional hazards regression 
Lifetime Data Analysis  2008;15(1):120-146.
Often in observational studies of time to an event, the study population is a biased (i.e., unrepresentative) sample of the target population. In the presence of biased samples, it is common to weight subjects by the inverse of their respective selection probabilities. Pan and Schaubel (2008) recently proposed inference procedures for an inverse selection probability weighted (ISPW) Cox model, applicable when selection probabilities are not treated as fixed but estimated empirically. The proposed weighting procedure requires auxiliary data to estimate the weights and is computationally more intense than unweighted estimation. The ignorability of sample selection process in terms of parameter estimators and predictions is often of interest, from several perspectives: e.g., to determine if weighting makes a significant difference to the analysis at hand, which would in turn address whether the collection of auxiliary data was required in future studies; to evaluate previous studies which did not correct for selection bias. In this article, we propose methods to quantify the degree of bias corrected by the weighting procedure in the partial likelihood and Breslow-Aalen estimators. Asymptotic properties of the proposed test statistics are derived. The finite-sample significance level and power are evaluated through simulation. The proposed methods are then applied to data from a national organ failure registry to evaluate the bias in a post kidney transplant survival model.
doi:10.1007/s10985-008-9102-4
PMCID: PMC3367517  PMID: 18958616
Confidence bands; Inverse-selection-probability weights; Observational studies; Proportional hazards model; Selection bias; Wald test

Results 1-25 (496509)