Search tips
Search criteria

Results 1-15 (15)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Major depressive disorder subtypes to predict long-term course 
Depression and anxiety  2014;31(9):765-777.
Variation in course of major depressive disorder (MDD) is not strongly predicted by existing subtype distinctions. A new subtyping approach is considered here.
Two data mining techniques, ensemble recursive partitioning and Lasso generalized linear models (GLMs) followed by k-means cluster analysis, are used to search for subtypes based on index episode symptoms predicting subsequent MDD course in the World Mental Health (WMH) Surveys. The WMH surveys are community surveys in 16 countries. Lifetime DSM-IV MDD was reported by 8,261 respondents. Retrospectively reported outcomes included measures of persistence (number of years with an episode; number of with an episode lasting most of the year) and severity (hospitalization for MDD; disability due to MDD).
Recursive partitioning found significant clusters defined by the conjunctions of early onset, suicidality, and anxiety (irritability, panic, nervousness-worry-anxiety) during the index episode. GLMs found additional associations involving a number of individual symptoms. Predicted values of the four outcomes were strongly correlated. Cluster analysis of these predicted values found three clusters having consistently high, intermediate, or low predicted scores across all outcomes. The high-risk cluster (30.0% of respondents) accounted for 52.9-69.7% of high persistence and severity and was most strongly predicted by index episode severe dysphoria, suicidality, anxiety, and early onset. A total symptom count, in comparison, was not a significant predictor.
Despite being based on retrospective reports, results suggest that useful MDD subtyping distinctions can be made using data mining methods. Further studies are needed to test and expand these results with prospective data.
PMCID: PMC5125445  PMID: 24425049
Epidemiology; Depression; Anxiety/Anxiety Disorders; Suicide/Self Harm; Panic Attacks
2.  The Impact of Exclusion Criteria on a Physician’s Adenoma Detection Rate 
Gastrointestinal endoscopy  2015;82(4):668-675.
The adenoma detection rate (ADR) is a validated and widely used measure of colonoscopy quality. There is uncertainty in the published literature on which colonoscopy examinations should be excluded when measuring a physician’s ADR.
To examine the impact of varying the colonoscopy exclusion criteria on physician ADR.
We applied different exclusion criteria used in 30 prior studies to a dataset of endoscopy and pathology reports. Under each exclusion criterion, we calculated physician ADR.
A private practice colonoscopy center affiliated with the University of Illinois College of Medicine.
Data on 20,040 colonoscopy examinations and associated pathology notes performed by 11 gastroenterologists from July 2009 to May 2013.
Main Outcome Measurements
ADR across all colonoscopy exainations, each physician’s ADR, and ADR ranking.
There were 28 different exclusion criteria used when measuring ADR. Each study used a different combination of these exclusion criteria. The fraction of all colonoscopy examinations in the dataset excluded under these combinations of exclusion criteria ranged from 0 to 93.1%. The mean ADR across all colonoscopy examination was 35.9%. The change in mean ADR after applying the 28 exclusion criteria ranged from −4.6 to +3.1 percentage points. However, the exclusion criteria impacted each physician’s ADR relatively equally, and therefore physicians’ rankings via ADR were stable.
ADR assessment was limited to a single private endoscopy center.
There is wide variation in the exclusion criteria used when measuring ADR. Although these exclusion criteria can impact overall ADR, the relative rankings of physicians by ADR were stable. A consensus definition on which exclusion criteria are applied when measuring ADR is needed.
PMCID: PMC4575765  PMID: 26385275
3.  Public reporting of colonoscopy quality is associated with an increase in endoscopist adenoma detection rate 
Gastrointestinal endoscopy  2015;82(4):676-682.
Colonoscopy is the predominant method for colorectal cancer screening in the US. Prior studies have documented variation across physicians in colonoscopy quality as measured by the adenoma detection rate (ADR). ADR is the primary quality measure of colonoscopy exams and an indicator of the likelihood of subsequent patient colorectal cancer. There is interest in mechanisms to improve ADR. In Central Illinois, a local employer and a quality improvement organization partnered to publically report physician colonoscopy quality.
To assess whether this initiative was associated with an improvement in ADR.
This study compares ADR before and after public reporting at a private practice endoscopy center of 11 gastroenterologists in Peoria, Illinois who participated in the initiative. To generate ADR, colonoscopy and pathology reports from exams performed over four years at the endoscopy center were analyzed using previously validated natural language processing software.
Central Illinois Endoscopy Center
The ADR for colonoscopy in the pre-public reporting era was 25.1%, and after public reporting was 36.4% (increase of 11.3%, p<0.001). Detection of advanced adenomas increased from 10.0% to 12.7% (p<0.001). Each physician’s ADR increased (range of 4.3% to 17.4%). Similar increases in ADR were observed when the analysis was restricted to screening colonoscopy.
There was no concurrent control group to assess whether the increased ADR was due to a secular trend.
A public reporting initiative on colonoscopy quality was associated with a relative forty-five percent increase in ADR and a 25% increase in advanced adenoma detection. Public reporting may be a means to improve colonoscopy quality.
PMCID: PMC4575767  PMID: 26385276
4.  Risk-Adjustment Simulation: Plans May Have Incentives To Distort Mental Health And Substance Use Coverage 
Health affairs (Project Hope)  2016;35(6):1022-1028.
Under the Affordable Care Act, the risk-adjustment program is designed to compensate health plans for enrolling people with poorer health status so that plans compete on cost and quality rather than the avoidance of high-cost individuals. This study examined health plan incentives to limit covered services for mental health and substance use disorders under the risk-adjustment system used in the health insurance Marketplaces. Through a simulation of the program on a population constructed to reflect Marketplace enrollees, we analyzed the cost consequences for plans enrolling people with mental health and substance use disorders. Our assessment points to systematic underpayment to plans for people with these diagnoses. We document how Marketplace risk adjustment does not remove incentives for plans to limit coverage for services associated with mental health and substance use disorders. Adding mental health and substance use diagnoses used in Medicare Part D risk adjustment is one potential policy step toward addressing this problem in the Marketplaces.
PMCID: PMC5027954  PMID: 27269018
5.  Predicting U.S. Army suicides after hospitalizations with psychiatric diagnoses in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS) 
JAMA psychiatry  2015;72(1):49-57.
The U.S. Army experienced a sharp rise in suicides beginning in 2004. Administrative data show that among those at highest risk are soldiers in the 12 months after inpatient treatment of a psychiatric disorder.
To develop an actuarial risk algorithm predicting suicide in the 12 months after US Army soldier inpatient treatment of a psychiatric disorder to target expanded post-hospital care.
There were 53,769 hospitalizations of active duty soldiers in 2004–2009 with ICD-9-CM psychiatric admission diagnoses. Administrative data available prior to hospital discharge abstracted from a wide range of data systems (socio81 demographic, Army career, criminal justice, medical/pharmacy) were used to predict suicides in the subsequent 12 months using machine learning methods (regression trees, penalized regressions) designed to evaluate cross-validated linear, nonlinear, and interactive predictive associations.
Suicides of soldiers hospitalized with psychiatric disorders in the 12 months after hospital discharge.
68 soldiers died by suicide within 12 months of hospital discharge (12.0% of all Army suicides), equivalent to 263.9 suicides/100,000 person-years compared to 18.5 suicides/100,000 person-years in the total Army. Strongest predictors included socio-demographics (male, late age of enlistment), criminal offenses (verbal violence, weapons possession), prior suicidality, aspects of prior psychiatric inpatient and outpatient treatment, and disorders diagnosed during the focal hospitalizations. 52.9% of post-hospital suicides occurred after the 5% of hospitalizations with highest predicted suicide risk (3,824.1 suicides/100,000 person years). These highest-risk hospitalizations also accounted for significantly elevated proportions of several other adverse post-hospital outcomes (unintentional injury deaths, suicide attempts, re-hospitalizations).
The high concentration of risk of suicides and other adverse outcomes might justify targeting expanded post-hospital interventions to soldiers classified as having highest post-hospital suicide risk, although final determination requires careful consideration of intervention costs, comparative effectiveness, and possible adverse effects.
PMCID: PMC4286426  PMID: 25390793
Army; machine learning; elastic net regression; military; penalized regression; predictive modeling; risk assessment; suicide
6.  Changes in Health Care Spending and Quality 4 Years into Global Payment 
The New England journal of medicine  2014;371(18):1704-1714.
Spending and quality under global budgets remain unknown beyond 2 years. We evaluated spending and quality measures during the first 4 years of the Blue Cross Blue Shield of Massachusetts Alternative Quality Contract (AQC).
We compared spending and quality among enrollees whose physician organizations entered the AQC from 2009 through 2012 with those among persons in control states. We studied spending changes according to year, category of service, site of care, experience managing risk contracts, and price versus utilization. We evaluated process and outcome quality.
In the 2009 AQC cohort, medical spending on claims grew an average of $62.21 per enrollee per quarter less than it did in the control cohort over the 4-year period (P<0.001). This amount is equivalent to a 6.8% savings when calculated as a proportion of the average post-AQC spending level in the 2009 AQC cohort. Analogously, the 2010, 2011, and 2012 cohorts had average savings of 8.8% (P<0.001), 9.1% (P<0.001), and 5.8% (P = 0.04), respectively, by the end of 2012. Claims savings were concentrated in the outpatient-facility setting and in procedures, imaging, and tests, explained by both reduced prices and reduced utilization. Claims savings were exceeded by incentive payments to providers during the period from 2009 through 2011 but exceeded incentive payments in 2012, generating net savings. Improvements in quality among AQC cohorts generally exceeded those seen elsewhere in New England and nationally.
As compared with similar populations in other states, Massachusetts AQC enrollees had lower spending growth and generally greater quality improvements after 4 years. Although other factors in Massachusetts may have contributed, particularly in the later part of the study period, global budget contracts with quality incentives may encourage changes in practice patterns that help reduce spending and improve quality. (Funded by the Commonwealth Fund and others.)
PMCID: PMC4261926  PMID: 25354104
8.  A Double Robust Approach to Causal Effects in Case-Control Studies 
American Journal of Epidemiology  2014;179(6):663-669.
In a recent issue of the Journal, VanderWeele and Vansteelandt (Am J Epidemiol. 2011;174(10):1197–1203) discussed an inverse probability weighting method for case-control studies that could be used to estimate an additive interaction effect, referred to as the “relative excess risk due to interaction.” In this article, we reinforce the well-known disadvantages of inverse probability weighting and comment on the desirability of the described parameter. Further, we review an existing double robust estimator not considered by VanderWeele and Vansteelandt, the case-control-weighted targeted maximum likelihood estimator, which has improved properties in comparison with a previously described inverse-probability-weighted estimator. This targeted maximum likelihood estimator can be used to target various parameters of interest, and its implementation has been described previously for the risk difference, relative risk, and odds ratio.
PMCID: PMC3939846  PMID: 24488515
case-control studies; causality; epidemiologic methods; estimation techniques
9.  Finding Quantitative Trait Loci Genes with Collaborative Targeted Maximum Likelihood Learning 
Statistics & probability letters  2011;81(7):792-796.
Quantitative trait loci mapping is focused on identifying the positions and effect of genes underlying an an observed trait. We present a collaborative targeted maximum likelihood estimator in a semi-parametric model using a newly proposed 2-part super learning algorithm to find quantitative trait loci genes in listeria data. Results are compared to the parametric composite interval mapping approach.
PMCID: PMC3090625  PMID: 21572586
collaborative targeted maximum likelihood estimation; quantitative trait loci; super learner; machine learning
10.  Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique 
American Journal of Epidemiology  2011;173(7):731-738.
The growing body of work in the epidemiology literature focused on G-computation includes theoretical explanations of the method but very few simulations or examples of application. The small number of G-computation analyses in the epidemiology literature relative to other causal inference approaches may be partially due to a lack of didactic explanations of the method targeted toward an epidemiology audience. The authors provide a step-by-step demonstration of G-computation that is intended to familiarize the reader with this procedure. The authors simulate a data set and then demonstrate both G-computation and traditional regression to draw connections and illustrate contrasts between their implementation and interpretation relative to the truth of the simulation protocol. A marginal structural model is used for effect estimation in the G-computation example. The authors conclude by answering a series of questions to emphasize the key characteristics of causal inference techniques and the G-computation procedure in particular.
PMCID: PMC3105284  PMID: 21415029
air pollution; asthma; causality; methods; regression analysis
11.  A Targeted Maximum Likelihood Estimator for Two-Stage Designs 
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.
PMCID: PMC3083136  PMID: 21556285
two-stage designs; targeted maximum likelihood estimators; nested case control studies; double robust estimation
12.  Effects of PON Polymorphisms and Haplotypes on Molecular Phenotype in Mexican-American Mothers and Children 
Paraoxonase 1 (PON1) prevents oxidation of low density lipoproteins and inactivates toxic oxon derivatives of organophosphate pesticides (OPs). Over 250 SNPs have been previously identified in the PON1 gene, yet studies of PON1 genetic variation focus primarily on a few promoter SNPs (-108,-162) and coding SNPs (192, 55). We sequenced the PON1 gene in 30 subjects from a Mexican-American birth cohort and identified 94 polymorphisms with minor allele frequencies > 5%, including several novel variants (6 SNPs, 1 insertion, 2 deletions). Variants of the PON1 gene and 3 SNPs from PON2 and PON3 were genotyped in 700 children and mothers from the same cohort. PON1 phenotype was established using two substrate-specific assays: arylesterase (AREase) and paraoxonase (POase). Twelve PON1 and two PON2 polymorphisms were significantly associated with AREase activity, and 37 polymorphisms with POase activity, however only nine were not in strong linkage disequilibrium (LD) with either PON1-108 or PON1192 (r2>0.20), SNPs with known effects on PON1 quantity and substrate-specific activity. Single tagSNPs PON155 and PON1192 accounted for similar ranges of AREase variation compared to haplotypes comprised of multiple SNPs within their haplotype blocks. However, PON155 explained 11-16% of POase activity, while six SNPs in the same haplotype block explained 3-fold more variance (36-56%). Although LD structure in the PON cluster seems similar between Mexicans and Caucasians, allele frequencies for many polymorphisms differed strikingly. Functional effects of PON genetic variation related to susceptibility to OPs and oxidative stress also differed by age, and should be considered in protecting vulnerable subpopulations.
PMCID: PMC3003760  PMID: 20839225
functional genomics; oxidative stress; pesticides; indels; haplotype blocks; children
13.  Why Match? Investigating Matched Case-Control Study Designs with Causal Effect Estimation* 
Matched case-control study designs are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case-control studies is a gain in efficiency. Methods for analyzing matched case-control studies have focused on utilizing conditional logistic regression models that provide conditional and not causal estimates of the odds ratio. This article investigates the use of case-control weighted targeted maximum likelihood estimation to obtain marginal causal effects in matched case-control study designs. We compare the use of case-control weighted targeted maximum likelihood estimation in matched and unmatched designs in an effort to explore which design yields the most information about the marginal causal effect. The procedures require knowledge of certain prevalence probabilities and were previously described by van der Laan (2008). In many practical situations where a causal effect is the parameter of interest, researchers may be better served using an unmatched design.
PMCID: PMC2827892  PMID: 20231866
14.  Simple Optimal Weighting of Cases and Controls in Case-Control Studies 
Researchers of uncommon diseases are often interested in assessing potential risk factors. Given the low incidence of disease, these studies are frequently case-control in design. Such a design allows a sufficient number of cases to be obtained without extensive sampling and can increase efficiency; however, these case-control samples are then biased since the proportion of cases in the sample is not the same as the population of interest. Methods for analyzing case-control studies have focused on utilizing logistic regression models that provide conditional and not causal estimates of the odds ratio. This article will demonstrate the use of the prevalence probability and case-control weighted targeted maximum likelihood estimation (MLE), as described by van der Laan (2008), in order to obtain causal estimates of the parameters of interest (risk difference, relative risk, and odds ratio). It is meant to be used as a guide for researchers, with step-by-step directions to implement this methodology. We will also present simulation studies that show the improved efficiency of the case-control weighted targeted MLE compared to other techniques.
PMCID: PMC2835459  PMID: 20231910
15.  Modelling the network of cell cycle transcription factors in the yeast Saccharomyces cerevisiae 
BMC Bioinformatics  2006;7:381.
Reverse-engineering regulatory networks is one of the central challenges for computational biology. Many techniques have been developed to accomplish this by utilizing transcription factor binding data in conjunction with expression data. Of these approaches, several have focused on the reconstruction of the cell cycle regulatory network of Saccharomyces cerevisiae. The emphasis of these studies has been to model the relationships between transcription factors and their target genes. In contrast, here we focus on reverse-engineering the network of relationships among transcription factors that regulate the cell cycle in S. cerevisiae.
We have developed a technique to reverse-engineer networks of the time-dependent activities of transcription factors that regulate the cell cycle in S. cerevisiae. The model utilizes linear regression to first estimate the activities of transcription factors from expression time series and genome-wide transcription factor binding data. We then use least squares to construct a model of the time evolution of the activities. We validate our approach in two ways: by demonstrating that it accurately models expression data and by demonstrating that our reconstructed model is similar to previously-published models of transcriptional regulation of the cell cycle.
Our regression-based approach allows us to build a general model of transcriptional regulation of the yeast cell cycle that includes additional factors and couplings not reported in previously-published models. Our model could serve as a starting point for targeted experiments that test the predicted interactions. In the future, we plan to apply our technique to reverse-engineer other systems where both genome-wide time series expression data and transcription factor binding data are available.
PMCID: PMC1570153  PMID: 16914048

Results 1-15 (15)