1.  Diagnosing and responding to violations in the positivity assumption 
The assumption of positivity or experimental treatment assignment requires that observed treatment levels vary within confounder strata. This article discusses the positivity assumption in the context of assessing model and parameter-specific identifiability of causal effects. Positivity violations occur when certain subgroups in a sample rarely or never receive some treatments of interest. The resulting sparsity in the data may increase bias with or without an increase in variance and can threaten valid inference. The parametric bootstrap is presented as a tool to assess the severity of such threats and its utility as a diagnostic is explored using simulated and real data. Several approaches for improving the identifiability of parameters in the presence of positivity violations are reviewed. Potential responses to data sparsity include restriction of the covariate adjustment set, use of an alternative projection function to define the target parameter within a marginal structural working model, restriction of the sample, and modification of the target intervention. All of these approaches can be understood as trading off proximity to the initial target of inference for identifiability; we advocate approaching this tradeoff systematically.
PMCID: PMC4107929  PMID: 21030422
experimental treatment assignment; positivity; marginal structural model; inverse probability weight; double robust; causal inference; counterfactual; parametric bootstrap; realistic treatment rule; trimming; stabilised weights; truncation
2.  Individual Differences in Arsenic Metabolism and Lung Cancer in a Case-Control Study in Cordoba, Argentina 
Toxicology and applied pharmacology  2010;247(2):10.1016/j.taap.2010.06.006.
In humans, ingested inorganic arsenic is metabolized to monomethylarsenic (MMA) then to dimethylarsenic (DMA), although in most people this process is not complete. Previous studies have identified associations between the proportion of urinary MMA (%MMA) and increased risks of several arsenic-related diseases, although none of these reported on lung cancer. In this study, urinary arsenic metabolites were assessed in 45 lung cancer cases and 75 controls from arsenic-exposed areas in Cordoba, Argentina. Folate has also been linked to arsenic-disease susceptibility, thus an exploratory assessment of associations between single nucleotide polymorphisms in folate metabolizing genes, arsenic methylation, and lung cancer was also conducted. In analyses limited to subjects with metabolite concentrations above detection limits, the mean %MMA was higher in cases than in controls (17.5% versus 14.3%, p = 0.01). The lung cancer odds ratios for subjects with %MMA in the upper tertile compared to those in the lowest tertile was 3.09 (95% CI, 1.08–8.81). Although the study size was too small for a definitive conclusion, there was an indication that lung cancer risks might be highest in those with a high %MMA who also carried cystathionine β-synthase (CBS) rs234709 and rs4920037 variant alleles. This study is the first to report an association between individual differences in arsenic metabolism and lung cancer, a leading cause of arsenic-related mortality. These results add to the increasing body of evidence that variation in arsenic metabolism plays an important role in arsenic-disease susceptibility.
PMCID: PMC3849353  PMID: 20600216
arsenic; lung cancer; drinking water; metabolism
3.  The Relative Performance of Targeted Maximum Likelihood Estimators 
There is an active debate in the literature on censored data about the relative performance of model based maximum likelihood estimators, IPCW-estimators, and a variety of double robust semiparametric efficient estimators. Kang and Schafer (2007) demonstrate the fragility of double robust and IPCW-estimators in a simulation study with positivity violations. They focus on a simple missing data problem with covariates where one desires to estimate the mean of an outcome that is subject to missingness. Responses by Robins, et al. (2007), Tsiatis and Davidian (2007), Tan (2007) and Ridgeway and McCaffrey (2007) further explore the challenges faced by double robust estimators and offer suggestions for improving their stability. In this article, we join the debate by presenting targeted maximum likelihood estimators (TMLEs). We demonstrate that TMLEs that guarantee that the parametric submodel employed by the TMLE procedure respects the global bounds on the continuous outcomes, are especially suitable for dealing with positivity violations because in addition to being double robust and semiparametric efficient, they are substitution estimators. We demonstrate the practical performance of TMLEs relative to other estimators in the simulations designed by Kang and Schafer (2007) and in modified simulations with even greater estimation challenges.
PMCID: PMC3173607  PMID: 21931570
censored data; collaborative double robustness; collaborative targeted maximum likelihood estimation; double robust; estimator selection; inverse probability of censoring weighting; locally efficient estimation; maximum likelihood estimation; semiparametric model; targeted maximum likelihood estimation; targeted minimum loss based estimation; targeted nuisance parameter estimator selection
4.  Association of Genetic Variation in Cystathionine-β-Synthase and Arsenic Metabolism 
Environmental research  2010;110(6):580-587.
Variation in individual susceptibility to arsenic-induced disease may be partially explained by genetic differences in arsenic metabolism. Mounting epidemiological evidence and in vitro studies suggest that methylated arsenic metabolites, particularly monomethylarsonic (MMA3), are more acutely toxic than inorganic arsenic; thus, MMA3 may be the primary toxic arsenic species. To test the role of genetic variation in arsenic metabolism, polymorphisms in genes involved in one-carbon metabolism [methylenetetrahydrofolate reductase (MTHFR), methionine synthase (MTR), cystathionine-β-synthase (CBS), thymidylate synthase (TYMS), dihydrofolate reductase (DHFR), serine hydroxymethyltransferase 1 (SHMT1] and glutathione biosynthesis [glutathione S-transferase omega 1 (GSTO1)] were examined in an arsenic exposed population to determine their influence in urinary arsenic metabolite patterns. In 142 subjects in Cordoba Province, Argentina, variant genotypes for CBS rs234709 and rs4920037 SNPs compared with wild-type homozygotes were associated with 24% and 26% increases, respectively, in the mean proportion of arsenic excreted as monomethylarsonic acid (%MMA). This difference is within the range of differences in %MMA seen between people with arsenic-related disease and those without such disease in other studies. Small inverse associations with CBS rs234709 and rs4920037 variants were also found for the mean levels of the proportion of arsenic excreted as dimethylarsinous acid (%DMA). No other genetic associations were found. These findings are the first to suggest that CBS polymorphisms may influence arsenic metabolism in humans and susceptibility to arsenic-related disease.
PMCID: PMC2913479  PMID: 20670920
arsenic; polymorphism; cystathionine-β-synthase; CBS; SNP
5.  Global Gene Expression Profiling of a Population Exposed to a Range of Benzene Levels 
Environmental Health Perspectives  2010;119(5):628-640.
Benzene, an established cause of acute myeloid leukemia (AML), may also cause one or more lymphoid malignancies in humans. Previously, we identified genes and pathways associated with exposure to high (> 10 ppm) levels of benzene through transcriptomic analyses of blood cells from a small number of occupationally exposed workers.
The goals of this study were to identify potential biomarkers of benzene exposure and/or early effects and to elucidate mechanisms relevant to risk of hematotoxicity, leukemia, and lymphoid malignancy in occupationally exposed individuals, many of whom were exposed to benzene levels < 1 ppm, the current U.S. occupational standard.
We analyzed global gene expression in the peripheral blood mononuclear cells of 125 workers exposed to benzene levels ranging from < 1 ppm to > 10 ppm. Study design and analysis with a mixed-effects model minimized potential confounding and experimental variability.
We observed highly significant widespread perturbation of gene expression at all exposure levels. The AML pathway was among the pathways most significantly associated with benzene exposure. Immune response pathways were associated with most exposure levels, potentially providing biological plausibility for an association between lymphoma and benzene exposure. We identified a 16-gene expression signature associated with all levels of benzene exposure.
Our findings suggest that chronic benzene exposure, even at levels below the current U.S. occupational standard, perturbs many genes, biological processes, and pathways. These findings expand our understanding of the mechanisms by which benzene may induce hematotoxicity, leukemia, and lymphoma and reveal relevant potential biomarkers associated with a range of exposures.
PMCID: PMC3094412  PMID: 21147609
benzene; biomarker; human; microarray; transcriptomics

