Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Finding Quantitative Trait Loci Genes with Collaborative Targeted Maximum Likelihood Learning 
Statistics & probability letters  2011;81(7):792-796.
Quantitative trait loci mapping is focused on identifying the positions and effect of genes underlying an an observed trait. We present a collaborative targeted maximum likelihood estimator in a semi-parametric model using a newly proposed 2-part super learning algorithm to find quantitative trait loci genes in listeria data. Results are compared to the parametric composite interval mapping approach.
PMCID: PMC3090625  PMID: 21572586
collaborative targeted maximum likelihood estimation; quantitative trait loci; super learner; machine learning
2.  Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique 
American Journal of Epidemiology  2011;173(7):731-738.
The growing body of work in the epidemiology literature focused on G-computation includes theoretical explanations of the method but very few simulations or examples of application. The small number of G-computation analyses in the epidemiology literature relative to other causal inference approaches may be partially due to a lack of didactic explanations of the method targeted toward an epidemiology audience. The authors provide a step-by-step demonstration of G-computation that is intended to familiarize the reader with this procedure. The authors simulate a data set and then demonstrate both G-computation and traditional regression to draw connections and illustrate contrasts between their implementation and interpretation relative to the truth of the simulation protocol. A marginal structural model is used for effect estimation in the G-computation example. The authors conclude by answering a series of questions to emphasize the key characteristics of causal inference techniques and the G-computation procedure in particular.
PMCID: PMC3105284  PMID: 21415029
air pollution; asthma; causality; methods; regression analysis
3.  A Targeted Maximum Likelihood Estimator for Two-Stage Designs 
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.
PMCID: PMC3083136  PMID: 21556285
two-stage designs; targeted maximum likelihood estimators; nested case control studies; double robust estimation
4.  Effects of PON Polymorphisms and Haplotypes on Molecular Phenotype in Mexican-American Mothers and Children 
Paraoxonase 1 (PON1) prevents oxidation of low density lipoproteins and inactivates toxic oxon derivatives of organophosphate pesticides (OPs). Over 250 SNPs have been previously identified in the PON1 gene, yet studies of PON1 genetic variation focus primarily on a few promoter SNPs (-108,-162) and coding SNPs (192, 55). We sequenced the PON1 gene in 30 subjects from a Mexican-American birth cohort and identified 94 polymorphisms with minor allele frequencies > 5%, including several novel variants (6 SNPs, 1 insertion, 2 deletions). Variants of the PON1 gene and 3 SNPs from PON2 and PON3 were genotyped in 700 children and mothers from the same cohort. PON1 phenotype was established using two substrate-specific assays: arylesterase (AREase) and paraoxonase (POase). Twelve PON1 and two PON2 polymorphisms were significantly associated with AREase activity, and 37 polymorphisms with POase activity, however only nine were not in strong linkage disequilibrium (LD) with either PON1-108 or PON1192 (r2>0.20), SNPs with known effects on PON1 quantity and substrate-specific activity. Single tagSNPs PON155 and PON1192 accounted for similar ranges of AREase variation compared to haplotypes comprised of multiple SNPs within their haplotype blocks. However, PON155 explained 11-16% of POase activity, while six SNPs in the same haplotype block explained 3-fold more variance (36-56%). Although LD structure in the PON cluster seems similar between Mexicans and Caucasians, allele frequencies for many polymorphisms differed strikingly. Functional effects of PON genetic variation related to susceptibility to OPs and oxidative stress also differed by age, and should be considered in protecting vulnerable subpopulations.
PMCID: PMC3003760  PMID: 20839225
functional genomics; oxidative stress; pesticides; indels; haplotype blocks; children
5.  Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation* 
Matched case-control study designs are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case-control studies is a gain in efficiency. Methods for analyzing matched case-control studies have focused on utilizing conditional logistic regression models that provide conditional and not causal estimates of the odds ratio. This article investigates the use of case-control weighted targeted maximum likelihood estimation to obtain marginal causal effects in matched case-control study designs. We compare the use of case-control weighted targeted maximum likelihood estimation in matched and unmatched designs in an effort to explore which design yields the most information about the marginal causal effect. The procedures require knowledge of certain prevalence probabilities and were previously described by van der Laan (2008). In many practical situations where a causal effect is the parameter of interest, researchers may be better served using an unmatched design.
PMCID: PMC2827892  PMID: 20231866
6.  Simple Optimal Weighting of Cases and Controls in Case-Control Studies 
Researchers of uncommon diseases are often interested in assessing potential risk factors. Given the low incidence of disease, these studies are frequently case-control in design. Such a design allows a sufficient number of cases to be obtained without extensive sampling and can increase efficiency; however, these case-control samples are then biased since the proportion of cases in the sample is not the same as the population of interest. Methods for analyzing case-control studies have focused on utilizing logistic regression models that provide conditional and not causal estimates of the odds ratio. This article will demonstrate the use of the prevalence probability and case-control weighted targeted maximum likelihood estimation (MLE), as described by van der Laan (2008), in order to obtain causal estimates of the parameters of interest (risk difference, relative risk, and odds ratio). It is meant to be used as a guide for researchers, with step-by-step directions to implement this methodology. We will also present simulation studies that show the improved efficiency of the case-control weighted targeted MLE compared to other techniques.
PMCID: PMC2835459  PMID: 20231910
7.  Modelling the network of cell cycle transcription factors in the yeast Saccharomyces cerevisiae 
BMC Bioinformatics  2006;7:381.
Reverse-engineering regulatory networks is one of the central challenges for computational biology. Many techniques have been developed to accomplish this by utilizing transcription factor binding data in conjunction with expression data. Of these approaches, several have focused on the reconstruction of the cell cycle regulatory network of Saccharomyces cerevisiae. The emphasis of these studies has been to model the relationships between transcription factors and their target genes. In contrast, here we focus on reverse-engineering the network of relationships among transcription factors that regulate the cell cycle in S. cerevisiae.
We have developed a technique to reverse-engineer networks of the time-dependent activities of transcription factors that regulate the cell cycle in S. cerevisiae. The model utilizes linear regression to first estimate the activities of transcription factors from expression time series and genome-wide transcription factor binding data. We then use least squares to construct a model of the time evolution of the activities. We validate our approach in two ways: by demonstrating that it accurately models expression data and by demonstrating that our reconstructed model is similar to previously-published models of transcriptional regulation of the cell cycle.
Our regression-based approach allows us to build a general model of transcriptional regulation of the yeast cell cycle that includes additional factors and couplings not reported in previously-published models. Our model could serve as a starting point for targeted experiments that test the predicted interactions. In the future, we plan to apply our technique to reverse-engineer other systems where both genome-wide time series expression data and transcription factor binding data are available.
PMCID: PMC1570153  PMID: 16914048

Results 1-7 (7)