Estimates of treatment effectiveness in epidemiologic studies using large observational health care databases may be biased due to inaccurate or incomplete information on important confounders. Study methods that collect and incorporate more comprehensive confounder data on a validation cohort may reduce confounding bias.
Study Design and Setting
We applied two such methods, imputation and reweighting, to Group Health administrative data (full sample) supplemented by more detailed confounder data from the Adult Changes in Thought study (validation sample). We used influenza vaccination effectiveness (with an unexposed comparator group) as an example and evaluated each method’s ability to reduce bias using the control time period prior to influenza circulation.
Both methods reduced, but did not completely eliminate, the bias compared with traditional effectiveness estimates that do not utilize the validation sample confounders.
Although these results support the use of validation sampling methods to improve the accuracy of comparative effectiveness findings from healthcare database studies, they also illustrate that the success of such methods depends on many factors, including the ability to measure important confounders in a representative and large enough validation sample, the comparability of the full sample and validation sample, and the accuracy with which data can be imputed or reweighted using the additional validation sample information.
aged; bias (epidemiologic); comparative effectiveness research; confounding factors (epidemiology); influenza vaccines; propensity score
Summary: GWASTools is an R/Bioconductor package for quality control and analysis of genome-wide association studies (GWAS). GWASTools brings the interactive capability and extensive statistical libraries of R to GWAS. Data are stored in NetCDF format to accommodate extremely large datasets that cannot fit within R’s memory limits. The documentation includes instructions for converting data from multiple formats, including variants called from sequencing. GWASTools provides a convenient interface for linking genotypes and intensity data with sample and single nucleotide polymorphism annotation.
Availability and implementation: GWASTools is implemented in R and is available from Bioconductor (http://www.bioconductor.org). An extensive vignette detailing a recommended work flow is included.
An analysis of a case-control study of rhabdomyolysis was conducted to screen for previously unrecognized CYP2C8 inhibitors that may cause other clinically important drug-drug interactions. Cases of rhabdomyolysis using cerivastatin (n=72) were compared with controls using atorvastatin (n=287) between 1998–2001. The use of clopidogrel (OR 29.6; 95% CI, 6.1–143) was strongly associated with rhabdomyolysis. In a replication effort that used the FDA Adverse Event Reporting System (AERS), clopidogrel was used more commonly by rhabdomyolysis cases using cerivastatin (17%) than by rhabdomyolysis cases using atorvastatin (0%, OR infinity; 95% CI = 5.2-infinity). Several medications were tested in vitro for their potential to cause drug-drug interactions. Clopidogrel, rosiglitazone and montelukast were the most potent inhibitors of cerivastatin metabolism. Clopidogrel and its metabolites also inhibited cerivastatin metabolism in human hepatocytes. These epidemiological and in-vitro findings suggest that clopidogrel may cause clinically important, dose dependent, drug-drug interactions with other medications metabolized by CYP2C8.
rhabdomyolysis; statins; clopidogrel; adverse drug reaction; drug-drug interaction prediction; 2-oxo-clopidogrel; acyl glucuronide
The Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) was initiated in 2004 to investigate the relation between individual-level estimates of long-term air pollution exposure and the progression of subclinical atherosclerosis and the incidence of cardiovascular disease (CVD). MESA Air builds on a multicenter, community-based US study of CVD, supplementing that study with additional participants, outcome measurements, and state-of-the-art air pollution exposure assessments of fine particulate matter, oxides of nitrogen, and black carbon. More than 7,000 participants aged 45–84 years are being followed for over 10 years for the identification and characterization of CVD events, including acute myocardial infarction and other coronary artery disease, stroke, peripheral artery disease, and congestive heart failure; cardiac procedures; and mortality. Subcohorts undergo baseline and follow-up measurements of coronary artery calcium using computed tomography and carotid artery intima-medial wall thickness using ultrasonography. This cohort provides vast exposure heterogeneity in ranges currently experienced and permitted in most developed nations, and the air monitoring and modeling methods employed will provide individual estimates of exposure that incorporate residence-specific infiltration characteristics and participant-specific time-activity patterns. The overarching study aim is to understand and reduce uncertainty in health effect estimation regarding long-term exposure to air pollution and CVD.
air pollution; atherosclerosis; cardiovascular diseases; environmental exposure; epidemiologic methods; particulate matter
Motivation: Statistical analyses of genome-wide association studies (GWAS) require fitting large numbers of very similar regression models, each with low statistical power. Taking advantage of repeated observations or correlated phenotypes can increase this statistical power, but fitting the more complicated models required can make computation impractical.
Results: In this article, we present simple methods that capitalize on the structure inherent in GWAS studies to dramatically speed up computation for a wide variety of problems, with a special focus on methods for correlated phenotypes.
Availability: The R package ‘boss’ is available on the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/boss/
Supplementary data are available at Bioinformatics online.
Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the ‘estimated weights’ paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models.
Narrow arterioles in the retina have been shown to predict hypertension as well as other vascular diseases, likely through an increase in the peripheral resistance of the microcirculatory flow. In this study, we performed a genome-wide association study in 18,722 unrelated individuals of European ancestry from the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium and the Blue Mountain Eye Study, to identify genetic determinants associated with variations in retinal arteriolar caliber. Retinal vascular calibers were measured on digitized retinal photographs using a standardized protocol. One variant (rs2194025 on chromosome 5q14 near the myocyte enhancer factor 2C MEF2C gene) was associated with retinal arteriolar caliber in the meta-analysis of the discovery cohorts at genome-wide significance of P-value <5×10−8. This variant was replicated in an additional 3,939 individuals of European ancestry from the Australian Twins Study and Multi-Ethnic Study of Atherosclerosis (rs2194025, P-value = 2.11×10−12 in combined meta-analysis of discovery and replication cohorts). In independent studies of modest sample sizes, no significant association was found between this variant and clinical outcomes including coronary artery disease, stroke, myocardial infarction or hypertension. In conclusion, we found one novel loci which underlie genetic variation in microvasculature which may be relevant to vascular disease. The relevance of these findings to clinical outcomes remains to be determined.
We investigated recent loss of or separation from afamily member or friend and risk of sudden cardiac arrest.
Our case-crossover study included 490 apparently-healthy married residents of King County, Washington, who suffered sudden cardiac arrest between 1988 and 2005. We compared exposure to spouse-reported family/friend events occurring ≤ 1 month before sudden cardiac arrest with events occurring n the previous 5 months. We evaluated potential effect modification by habitual vigorous physical activity.
Recent family/friend events were associated with a higher risk of sudden cardiac arrest (odds ratio (OR) = 1.6 [95% confidence interval (CI) = 1.1-2.4]). ORs for cases with and without habitual vigorous physical activity were 1.1 (0.6-2.2) and 2.0 (1.2-3.1), respectively, (interaction P = 0.02).
These results suggest family/friend events may trigger sudden cardiac arrest and raise the hypothesis that habitual vigorous physical activity may lower susceptibility to these potential triggers.
Mild retinopathy (microaneurysms or dot-blot hemorrhages) is observed in persons without diabetes or hypertension and may reflect microvascular disease in other organs. We conducted a genome-wide association study (GWAS) of mild retinopathy in persons without diabetes.
A working group agreed on phenotype harmonization, covariate selection and analytic plans for within-cohort GWAS. An inverse-variance weighted fixed effects meta-analysis was performed with GWAS results from six cohorts of 19,411 Caucasians. The primary analysis included individuals without diabetes and secondary analyses were stratified by hypertension status. We also singled out the results from single nucleotide polymorphisms (SNPs) previously shown to be associated with diabetes and hypertension, the two most common causes of retinopathy.
No SNPs reached genome-wide significance in the primary analysis or the secondary analysis of participants with hypertension. SNP, rs12155400, in the histone deacetylase 9 gene (HDAC9) on chromosome 7, was associated with retinopathy in analysis of participants without hypertension, −1.3±0.23 (beta ± standard error), p = 6.6×10−9. Evidence suggests this was a false positive finding. The minor allele frequency was low (∼2%), the quality of the imputation was moderate (r2 ∼0.7), and no other common variants in the HDAC9 gene were associated with the outcome. SNPs found to be associated with diabetes and hypertension in other GWAS were not associated with retinopathy in persons without diabetes or in subgroups with or without hypertension.
This GWAS of retinopathy in individuals without diabetes showed little evidence of genetic associations. Further studies are needed to identify genes associated with these signs in order to help unravel novel pathways and determinants of microvascular diseases.
Genetic factors explain a majority of risk variance for age-related macular degeneration (AMD). While genome-wide association studies (GWAS) for late AMD implicate genes in complement, inflammatory and lipid pathways, the genetic architecture of early AMD has been relatively under studied. We conducted a GWAS meta-analysis of early AMD, including 4,089 individuals with prevalent signs of early AMD (soft drusen and/or retinal pigment epithelial changes) and 20,453 individuals without these signs. For various published late AMD risk loci, we also compared effect sizes between early and late AMD using an additional 484 individuals with prevalent late AMD. GWAS meta-analysis confirmed previously reported association of variants at the complement factor H (CFH) (peak P = 1.5×10−31) and age-related maculopathy susceptibility 2 (ARMS2) (P = 4.3×10−24) loci, and suggested Apolipoprotein E (ApoE) polymorphisms (rs2075650; P = 1.1×10−6) associated with early AMD. Other possible loci that did not reach GWAS significance included variants in the zinc finger protein gene GLI3 (rs2049622; P = 8.9×10−6) and upstream of GLI2 (rs6721654; P = 6.5×10−6), encoding retinal Sonic hedgehog signalling regulators, and in the tyrosinase (TYR) gene (rs621313; P = 3.5×10−6), involved in melanin biosynthesis. For a range of published, late AMD risk loci, estimated effect sizes were significantly lower for early than late AMD. This study confirms the involvement of multiple established AMD risk variants in early AMD, but suggests weaker genetic effects on the risk of early AMD relative to late AMD. Several biological processes were suggested to be potentially specific for early AMD, including pathways regulating RPE cell melanin content and signalling pathways potentially involved in retinal regeneration, generating hypotheses for further investigation.
While laboratory data suggest that antidepressants may promote mammary tumor growth, there has been little research investigating whether antidepressant use after breast cancer diagnosis is associated with the risk of breast cancer recurrence.
We conducted a retrospective cohort study within Group Health, an integrated healthcare delivery system in Washington state. Women diagnosed with a first primary invasive, stage I, IIA, or IIB, unilateral breast carcinoma between 1990–1994 (aged ≥65 years) and 1996–1999 (aged ≥18 years) were eligible for the study (N=1306). Recurrence within 5-years of diagnosis was ascertained by medical chart review. We used the pharmacy database to identify antidepressant dispensings from Group Health pharmacies. We used multiple Cox regression to estimate the hazard ratio for recurrence and breast cancer mortality, comparing users and non-users antidepressant medications. Results for recurrence were examined separately in users and non-users of tamoxifen.
We did not observe an association between antidepressant use after breast cancer diagnosis and the risk of recurrence either in general (hazard ratio for any antidepressant use: 0.8; 95% confidence interval: 0.5 to 1.4) or for specific types of antidepressant medication. Risk of death from breast cancer did not differ between non-users and users of antidepressants.
The results of this study suggest that women who use antidepressants after breast cancer diagnosis do not have an increased risk of recurrence or mortality.
antidepressant medications; breast cancer; cancer epidemiology; pharmacoepidemiology; recurrence
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the “parameter bootstrap” that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency.
Environmental epidemiology; Environmental statistics; Exposure modeling; Kriging; Measurement error
For residents of long term care, hospitalisations can cause distress and disruption, and often result in further medical complications. Multi-disciplinary team interventions have been shown to improve the health of Residential Aged Care (RAC) residents, decreasing the need for acute hospitalisation, yet there are few randomised controlled trials of these complex interventions. This paper describes a randomised controlled trial of a structured multi-disciplinary team and gerontology nurse specialist (GNS) intervention aiming to reduce residents’ avoidable hospitalisations.
This Aged Residential Care Healthcare Utilisation Study (ARCHUS) is a cluster- randomised controlled trial (n = 1700 residents) of a complex multi-disciplinary team intervention in long-term care facilities. Eligible facilities certified for residential care were selected from those identified as at moderate or higher risk of resident potentially avoidable hospitalisations by statistical modelling. The facilities were all located in the Auckland region, New Zealand and were stratified by District Health Board (DHB).
The intervention provided a structured GNS intervention including a baseline facility needs assessment, quality indicator benchmarking, a staff education programme and care coordination. Alongside this, three multi-disciplinary team (MDT) meetings were held involving a geriatrician, facility GP, pharmacist, GNS and senior nursing staff.
Hospitalisations are recorded from routinely-collected acute admissions during the 9-month intervention period followed by a 5-month follow-up period. ICD diagnosis codes are used in a pre-specified definition of potentially reducible admissions.
This randomised-controlled trial will evaluate a complex intervention to increase early identification and intervention to improve the health of residents of long term care. The results of this trial are expected in early 2013.
Australian New Zealand Clinical Trials Registry: ACTRN 12611000187943
To produce valid seroincidence estimates, the serologic testing algorithm for recent HIV seroconversion (STARHS) assumes independence between infection and testing, which may be absent in clinical data. STARHS estimates are generally greater than cohort-based estimates of incidence from observable person-time and diagnosis dates. The authors constructed a series of partial stochastic models to examine whether testing motivated by suspicion of infection could bias STARHS.
One thousand Monte Carlo simulations of 10,000 men who have sex with men (MSM) were generated using parameters for HIV incidence and testing frequency from data from a clinical testing population in Seattle. In one set of simulations, infection and testing dates were independent. In another set, some intertest intervals were abbreviated to reflect the distribution of intervals between suspected HIV exposure and testing in a group of Seattle MSM recently diagnosed with HIV. Both estimation methods were applied to the simulated datasets. Both cohort-based and STARHS incidence estimates were calculated using the simulated data and compared with previously calculated, empirical cohort-based and STARHS seroincidence estimates from the clinical testing population.
Under simulated independence between infection and testing, cohort-based and STARHS incidence estimates resembled cohort estimates from the clinical dataset. Under simulated motivated testing, cohort-based estimates remained unchanged but STARHS estimates were inflated similar to empirical STARHS estimates. Varying motivation parameters appreciably affected STARHS incidence estimates, but not cohort-based estimates.
Cohort-based incidence estimates are robust against dependence between testing and acquisition of infection whereas STARHS incidence estimates are not.
STARHS; HIV incidence; clinical populations; recency of infection; bias
The serologic testing algorithm for recent HIV seroconversion (STARHS) calculates incidence using the proportion of testers who produce a level of HIV antibody high enough to be detected by ELISA but low enough to suggest recent infection. The validity of STARHS relies on independence between dates of HIV infection and dates of antibody testing. When subjects choose the time of their own test, testing may be motivated by risky behaviour or symptoms of infection and the criterion may not be met. This analysis was conducted to ascertain whether estimates of incidence derived using STARHS were consistent with estimates derived using a method more robust against motivated testing.
A cohort-based incidence estimator and two STARHS methods were applied to identical populations (n=3821) tested for HIV antibody at publicly funded sites in Seattle. Overall seroincidence estimates, demographically stratified estimates and incidence rate ratios were compared across methods. The proportion of low-antibody testers among HIV-infected individuals was compared with the proportion expected given their testing histories.
STARHS estimates generally exceeded cohort-based estimates. Incidence ratios derived using STARHS between demographic strata were not consistent across methods. The proportion of HIV-infected individuals with lower antibody levels exceeded that which would be expected under independence between infection and testing.
Incidence estimates and incidence rate ratios derived using methods that rely on the changing antibody level over the course of HIV infection may be vulnerable to bias when applied to populations who choose the time of their own testing.
The integrated discrimination improvement (IDI) index is a popular tool for evaluating the capacity of a marker to predict a binary outcome of interest. Recent reports have proposed that the IDI is more sensitive than other metrics for identifying useful predictive markers. In this article, the authors use simulated data sets and theoretical analysis to investigate the statistical properties of the IDI. The authors consider the common situation in which a risk model is fitted to a data set with and without the new, candidate predictor(s). Results demonstrate that the published method of estimating the standard error of an IDI estimate tends to underestimate the error. The z test proposed in the literature for IDI-based testing of a new biomarker is not valid, because the null distribution of the test statistic is not standard normal, even in large samples. If a test for the incremental value of a marker is desired, the authors recommend the test based on the model. For investigators who find the IDI to be a useful measure, bootstrap methods may offer a reasonable option for inference when evaluating new predictors, as long as the added predictive capacity is large.
biological markers; bootstrap confidence interval; prediction; risk assessment; sampling distribution; sampling error; selection bias; type I error
The general availability of reliable and affordable genotyping technology has enabled genetic association studies to move beyond small case-control studies to large prospective studies. For prospective studies, genetic information can be integrated into the analysis via haplotypes, with focus on their association with a censored survival outcome. We develop non-iterative, regression-based methods to estimate associations between common haplotypes and a censored survival outcome in large cohort studies. Our non-iterative methods—weighted estimation and weighted haplotype combination—are both based on the Cox regression model, but differ in how the imputed haplotypes are integrated into the model. Our approaches enable haplotype imputation to be performed once as a simple data-processing step, and thus avoid implementation based on sophisticated algorithms that iterate between haplotype imputation and risk estimation. We show that non-iterative weighted estimation and weighted haplotype combination provide valid tests for genetic associations and reliable estimates of moderate associations between common haplotypes and a censored survival outcome, and are straightforward to implement in standard statistical software. We apply the methods to an analysis of HSPB7-CLCNKA haplotypes and risk of adverse outcomes in a prospective cohort study of outpatients with chronic heart failure.
Cox regression; phase ambiguity; prospective study; unphased genotypes
Background and Purpose
Little is known about acute precipitants of ischemic stroke, although evidence suggests infections contribute to risk. We hypothesized that acute hospitalization for infection is associated with short-term risk of stroke.
The case-crossover design was used to compare hospitalization for infection during case periods (90, 30, or 14 days prior to incident ischemic stroke) and control periods (equivalent time periods exactly 1 or 2 years prior to stroke) in the Cardiovascular Health Study, a population-based cohort of 5888 elderly participants from 4 US sites. Odds ratios and 95% confidence intervals (OR, 95% CI) were calculated using conditional logistic regression. Confirmatory analyses assessed hazard ratios (HR) of stroke from Cox regression models with hospitalization for infection as a time-varying exposure.
During a median follow-up of 12.2 years, 669 incident ischemic strokes were observed in participants without baseline history of stroke. Hospitalization for infection was more likely during case than control time periods; for 90 days prior to stroke, OR=3.4 (95% CI 1.8–6.5). The point estimates of risks were higher when examining shorter intervals: for 30 days, OR= 7.3 (95% CI 1.9–40.9), and 14 days, OR=8.0 (95% CI 1.7–77.3). In survival analyses, risk of stroke was associated with hospitalization for infection in the preceding 90 days, adjusted HR=2.4 (95% CI 1.6–3.4).
Hospitalization for infection is associated with a short-term increased risk of stroke, with higher risks observed for shorter intervals preceding stroke.
Epidemiology; Cerebral Infarction; Infectious Diseases
Elevated serum urate levels can lead to gout and are associated with cardiovascular risk factors. We performed genome-wide association to search for genetic susceptibility loci for serum urate and gout, and investigated the causal nature of the associations of serum urate with gout and selected cardiovascular risk factors and coronary heart disease (CHD).
Methods and Results
Meta-analyses of genome-wide association studies (GWAS) were performed in 5 population-based cohorts of the CHARGE consortium for serum urate and gout in 28,283 white individuals. The effect of the most significant SNP at all genome-wide significant loci on serum urate was added to create a genetic urate score. Findings were replicated in the Women’s Genome Health Study (WGHS; n=22,054). SNPs at 8 genetic loci achieved genome-wide significance with serum urate levels (p-values 4×10−8 to 2×10−242; SLC22A11, GCKR, R3HDM2-INHBC region, RREB1, PDZK1, SLC2A9, ABCG2, SLC17A1). Only two loci [SLC2A9, ABCG2] showed genome-wide significant association with gout. The genetic urate score was strongly associated with serum urate and gout (odds ratio 12.4 per 100 umol/L; p-value=3×10−39), but not with blood pressure, glucose, eGFR, chronic kidney disease, or CHD. The lack of association between the genetic score and the latter phenotypes was also observed in WGHS.
The genetic urate score analysis suggested a causal relationship between serum urate and gout but did not provide evidence for one between serum urate and cardiovascular risk factors and CHD.
urate; gout; cardiovascular disease risk factors; genome-wide association study; Mendelian randomization
White matter hyperintensities (WMH) detectable by magnetic resonance imaging (MRI)are part of the spectrum of vascular injury associated with aging of the brain and are thought to reflect ischemic damage to the small deep cerebral vessels. WMH are associated with an increased risk of cognitive and motor dysfunction, dementia, depression, and stroke. Despite a significant heritability, few genetic loci influencing WMH burden have been identified.
We performed a meta-analysis of genome-wide association studies (GWAS) for WMH burden in 9,361 stroke-free individuals of European descent from 7 community-based cohorts. Significant findings were tested for replication in 3,024 individuals from 2 additional cohorts.
We identified 6 novel risk-associated single nucleotide polymorphisms (SNPs)in one locus on chromosome 17q25 encompassing 6 known genes including WBP2, TRIM65, TRIM47, MRPL38, FBF1, and ACOX1. The most significant association was for rs3744028 (Pdiscovery= 4.0×10−9; Preplication =1.3×10−7; Pcombined =4.0×10−15). Other SNPs in this region also reaching genome-wide significance are rs9894383 (P=5.3×10−9), rs11869977 (P=5.7×10−9), rs936393 (P=6.8×10−9), rs3744017 (P=7.3×10−9), and rs1055129 (P=4.1×10−8). Variant alleles at these loci conferred a small increase in WMH burden (4–8% of the overall mean WMH burden in the sample).
This large GWAS of WMH burden in community-based cohorts of individuals of European descent identifies a novel locus on chromosome 17. Further characterization of this locus may provide novel insights into the pathogenesis of cerebral WMH.
The withdrawal of cerivastatin involved an uncommon but serious adverse reaction, rhabdomyolysis. The bimodal response--rhabdomyolysis in a small proportion of users-- points to genetic factors as a potential cause. We conducted a case-control study to evaluate genetic markers for cerivastatin-associated rhabdomyolysis.
The study had two components: a candidate gene study to evaluate variants in CYP2C8, UGT1A1, UGT1A3, and SLCO1B1; and a genome-wide association (GWA) study to identify risk factors in other regions of the genome. 185 rhabdomyolysis cases were frequency matched to statin-using controls from the Cardiovascular Health Study (n=374) and the Heart and Vascular Health Study (n=358). Validation relied on functional studies.
Permutation test results suggested an association between cerivastatin-associated rhabdomyolysis and variants in SLCO1B1 (p = 0.002), but not variants in CYP2C8 (p = 0.073) or the UGTs (p = 0.523). An additional copy of the minor allele of SLCO1B1 rs4149056 (p.Val174Ala) was associated with the risk of rhabdomyolysis (OR: 1.89, 95% CI: 1.40 to 2.56). In transfected cells, this variant reduced cerivastatin transport by 40% compared with the reference transporter (p < 0.001). The GWA identified an intronic variant (rs2819742) in the ryanodine receptor 2 gene (RYR2) as significant (p=1.74E-07). An additional copy of the minor allele of the RYR2 variant was associated with a reduced risk of rhabdomyolysis (OR: 0.48; 95% CI: 0.36 to 0.63).
We identified modest genetic risk factors for an extreme response to cerivastatin. Disabling genetic variants in the candidate genes were not responsible for the bimodal response to cerivastatin.
Genetics; drugs; epidemiology; rhabdomyolysis
The two-phase design has recently received attention in the statistical literature as an extension to the traditional case-control study for settings where a predictor of interest is rare or subject to missclassification. Despite a thorough methodological treatment and the potential for substantial efficiency gains, the two-phase design has not been widely adopted. This may be due, in part, to a lack of general-purpose, readily-available software. The osDesign package for R provides a suite of functions for analyzing data from a two-phase and/or case-control design, as well as evaluating operating characteristics, including bias, efficiency and power. The evaluation is simulation-based, permitting flexible application of the package to a broad range of scientific settings. Using lung cancer mortality data from Ohio, the package is illustrated with a detailed case-study in which two statistical goals are considered: (i) the evaluation of small-sample operating characteristics for two-phase and case-control designs and (ii) the planning and design of a future two-phase study.
operating characteristics; power; simulation; study design
Permutation tests are widely used in genomic research as a straightforward way to obtain reliable statistical inference without making strong distributional assumptions. However, in this paper we show that in genetic association studies it is not typically possible to construct exact permutation tests of gene-gene or gene-environment interaction hypotheses. We describe an alternative to the permutation approach in testing for interaction, a parametric bootstrap approach. Using simulations, we compare the finite-sample properties of a few often-used permutation tests and the parametric bootstrap. We consider interactions of an exposure with single and multiple polymorphisms. Finally, we address when permutation tests of interaction will be approximately valid in large samples for specific test statistics.
Interaction testing; Parametric bootstrap; Permutation methods
Fibrin fragment D-dimer is one of several peptides produced when cross-linked fibrin is degraded by plasmin, and is the most widely-used clinical marker of activated blood coagulation. To identity genetic loci influencing D-dimer levels, we performed the first large-scale, genome-wide association search.
Methods and Results
A genome-wide investigation of the genomic correlates of plasma D-dimer levels was conducted among 21,052 European-ancestry adults. Plasma levels of D-dimer were measured independently in each of 13 cohorts. Each study analyzed the association between ~2.6 million genotyped and imputed variants across the 22 autosomal chromosomes and natural-log transformed D-dimer levels using linear regression in additive genetic models adjusted for age and sex. Among all variants, 74 exceeded the genome-wide significance threshold and marked 3 regions. At 1p22, rs12029080 (p-value 6.4×10−52) was 46.0 kb upstream from F3, coagulation factor III (tissue factor). At 1q24, rs6687813 (p-value 2.4×10−14) was 79.7 kb downstream of F5, coagulation factor V. At 4q32, rs13109457 (p-value 2.9×10−18) was located between 2 fibrinogen genes: 10.4 kb downstream from FGG and 3.0 kb upstream from FGA. Variants were associated with a 0.099, 0.096, and 0.061 unit difference, respectively, in natural-log transformed D-dimer and together accounted for 1.8% of the total variance. When adjusted for non-synonymous substitutions in F5 and FGA loci known to be associated with D-dimer levels, there was no evidence of an additional association at either locus.
Three genes were associated with fibrin D-dimer levels, of which the F3 association was the strongest and has not been previously reported.
genome-wide variation; D-dimer; epidemiology; meta-analysis; thrombosis; hemostasis
Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies. This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium (HWE) test p-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis (PCA) to SNP selection. The methods are illustrated with examples from the ‘Gene Environment Association Studies’ (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of genome-wide association studies.
GWAS; DNA sample quality; genotyping artifact; Hardy-Weinberg equilibrium; chromosome aberration