With the advent of modern genomic methods to adjust for population stratification, the use of external or publicly available controls has become an attractive option for reducing the cost of large-scale case-control genetic association studies. In this article, we study the estimation of joint effects of genetic and environmental exposures from a case-control study where data on genome-wide markers are available on the cases and a set of external controls while data on environmental exposures are available on the cases and a set of internal controls. We show that under such a design, one can exploit an assumption of gene-environment independence in the underlying population to estimate the gene-environment joint effects, after adjustment for population stratification. We develop a semiparametric profile likelihood method and related pseudolikelihood and working likelihood methods that are easy to implement in practice. We propose variance estimators for the methods based on asymptotic theory. Simulation is used to study the performance of the methods, and data from a multi-centre genome-wide association study of bladder cancer is further used to illustrate their application.
Case-control study; Gene-environment interaction; Genetic epidemiology; Genome-wide association study; Logistic regression; Population stratification; Profile likelihood; Retrospective study; Semiparametric method
Metabolic syndrome (MetS) refers to the clustering of cardio-metabolic risk factors including dyslipidemia, central adiposity, hypertension and hyperglycemia in individuals. Identification of pleiotropic genetic factors associated with MetS traits may shed light on key pathways or mediators underlying MetS.
Methods and Results
Using the Metabochip array in 15,148 African Americans (AA) from the PAGE Study, we identify susceptibility loci and investigate pleiotropy among genetic variants using a subset-based meta-analysis method, ASsociation-analysis-based-on-subSETs (ASSET). Unlike conventional models which lack power when associations for MetS components are null or have opposite effects, ASSET uses one-sided tests to detect positive and negative associations for components separately and combines tests accounting for correlations among components. With ASSET, we identify 27 SNPs in 1 glucose and 4 lipids loci (TCF7L2, LPL, APOA5, CETP, LPL, APOC1/APOE/TOMM40) significantly associated with MetS components overall, all P< 2.5e-7, the Bonferroni adjusted P-value. Three loci replicate in a Hispanic population, n=5172. A novel AA-specific variant, rs12721054/APOC1, and rs10096633/LPL are associated with ≥3 MetS components. We find additional evidence of pleiotropy for APOE, TOMM40, TCF7L2 and CETP variants, many with opposing effects; e.g. the same rs7901695/TCF7L2 allele is associated with increased odds of high glucose and decreased odds of central adiposity.
We highlight a method to increase power in large-scale genomic association analyses, and report a novel variant associated with all MetS components in AA. We also identify pleiotropic associations that may be clinically useful in patient risk profiling and for informing translational research of potential gene targets and medications.
metabolic syndrome; population studies; high-density lipoprotein cholesterol; genetic variation; hyperglycemia; ASSET; PAGE Study; African Americans; cardio-metabolic traits; Metabochip
A genome-wide association study (GWAS) of bladder cancer identified a genetic marker rs8102137 within the 19q12 region as a novel susceptibility variant. This marker is located upstream of the CCNE1 gene, which encodes cyclin E, a cell cycle protein. We performed genetic fine mapping analysis of the CCNE1 region using data from two bladder cancer GWAS (5,942 cases and 10,857 controls). We found that the original GWAS marker rs8102137 represents a group of 47 linked SNPs (with r2≥0.7) associated with increased bladder cancer risk. From this group we selected a functional promoter variant rs7257330, which showed strong allele-specific binding of nuclear proteins in several cell lines. In both GWAS, rs7257330 was associated only with aggressive bladder cancer, with a combined per-allele odds ratio (OR) =1.18 (95%CI=1.09-1.27, p=4.67×10−5 vs. OR =1.01 (95%CI=0.93-1.10, p=0.79) for non-aggressive disease, with p=0.0015 for case-only analysis. Cyclin E protein expression analyzed in 265 bladder tumors was increased in aggressive tumors (p=0.013) and, independently, with each rs7257330-A risk allele (ptrend=0.024). Over-expression of recombinant cyclin E in cell lines caused significant acceleration of cell cycle. In conclusion, we defined the 19q12 signal as the first GWAS signal specific for aggressive bladder cancer. Molecular mechanisms of this genetic association may be related to cyclin E over-expression and alteration of cell cycle in carriers of CCNE1 risk variants. In combination with established bladder cancer risk factors and other somatic and germline genetic markers, the CCNE1 variants could be useful for inclusion into bladder cancer risk prediction models.
Aggressive bladder cancer; cyclin E; cell cycle; single nucleotide polymorphism; GWAS
Brain glioma is a relatively rare and fatal malignancy in adulthood with few known risk factors. Some observational studies have reported inverse associations between diabetes and subsequent glioma risk, but possible mechanisms are unclear.
We conducted a pooled analysis of original data from five nested case-control studies and two case-control studies from the U.S. and China that included 962 glioma cases and 2,195 controls. We examined self-reported diabetes history in relation to glioma risk, as well as effect modification by seven glioma risk-associated single-nucleotide polymorphisms (SNPs). We also examined the associations between 13 diabetes risk-associated SNPs, identified from genome-wide association studies, and glioma risk. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using multivariable-adjusted logistic regression models.
We observed a 42% reduced risk of glioma for individuals with a history of diabetes (OR=0.58, 95% CI: 0.40–0.84). The association did not differ by sex, study design, or after restricting to glioblastoma, the most common histological sub-type. We did not observe any significant per-allele trends among the 13 diabetes-related SNPs examined in relation to glioma risk.
These results support an inverse association between diabetes history and glioma risk. The role of genetic susceptibility to diabetes cannot be excluded, and should be pursued in future studies together with other factors that might be responsible for the diabetes-glioma association.
These data suggest the need for studies that can evaluate, separately, the association between type 1 and type 2 diabetes and subsequent risk of adult glioma.
diabetes mellitus; brain cancer; glioma; cancer; epidemiology
Cancer risk is determined by a complex interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified hundreds of common (minor allele frequency [MAF]>0.05) and less common (0.01
Gene-environment interactions; complex phenotypes; genetic epidemiology
To identify common genetic variants that contribute to lung cancer susceptibility, we conducted a multistage genome-wide association study of lung cancer in Asian women who never smoked. We scanned 5,510 never-smoking female lung cancer cases and 4,544 controls drawn from 14 studies from mainland China, South Korea, Japan, Singapore, Taiwan, and Hong Kong. We genotyped the most promising variants (associated at P < 5 × 10-6) in an additional 1,099 cases and 2,913 controls. We identified three new susceptibility loci at 10q25.2 (rs7086803, P = 3.54 × 10-18), 6q22.2 (rs9387478, P = 4.14 × 10-10) and 6p21.32 (rs2395185, P = 9.51 × 10-9). We also confirmed associations reported for loci at 5p15.33 and 3q28 and a recently reported finding at 17q24.3. We observed no evidence of association for lung cancer at 15q25 in never-smoking women in Asia, providing strong evidence that this locus is not associated with lung cancer independent of smoking.
In the National Cancer Institute Cancer Genetic Markers of Susceptibility (CGEMS) genome-wide association study of breast cancer, a single nucleotide polymorphism (SNP) marker, rs999737, in the 14q24.1 interval, was associated with breast cancer risk. In order to fine map this region, we imputed a 3.93MB region flanking rs999737 for Stages 1 and 2 of the CGEMS study (5,692 cases, 5,576 controls) using the combined reference panels of the HapMap 3 and the 1000 Genomes Project. Single-marker association testing and variable-sized sliding-window haplotype analysis were performed, and for both analyses the initial tagging SNP rs999737 retained the strongest association with breast cancer risk. Investigation of contiguous regions did not reveal evidence for an additional independent signal. Therefore, we conclude that rs999737 is an optimal tag SNP for common variants in the 14q24.1 region and thus narrow the candidate variants that should be investigated in follow-up laboratory evaluation.
RAD51L1; breast cancer; genome-wide association study; fine-mapping; imputation
The Lasso shrinkage procedure achieved its popularity, in part, by its tendency to shrink estimated coefficients to zero, and its ability to serve as a variable selection procedure. Using data-adaptive weights, the adaptive Lasso modified the original procedure to increase the penalty terms for those variables estimated to be less important by ordinary least squares. Although this modified procedure attained the oracle properties, the resulting models tend to include a large number of “false positives” in practice. Here, we adapt the concept of local false discovery rates (lFDRs) so that it applies to the sequence, λn, of smoothing parameters for the adaptive Lasso. We define the lFDR for a given λn to be the probability that the variable added to the model by decreasing λn to λn−δ is not associated with the outcome, where δ is a small value. We derive the relationship between the lFDR and λn, show lFDR=1 for traditional smoothing parameters, and show how to select λn so as to achieve a desired lFDR. We compare the smoothing parameters chosen to achieve a specified lFDR and those chosen to achieve the oracle properties, as well as their resulting estimates for model coefficients, with both simulation and an example from a genetic study of prostate specific antigen.
Adaptive Lasso; Local false discovery rate; Smoothing parameter; Variable selection
The genetic regulation of the human epigenome is not fully appreciated. Here we describe the effects of genetic variants on the DNA methylome in human lung based on methylation-quantitative trait loci (meQTL) analyses. We report 34,304 cis- and 585 trans-meQTLs, a genetic-epigenetic interaction of surprising magnitude, including a regulatory hotspot. These findings are replicated in both breast and kidney tissues and show distinct patterns: cis-meQTLs mostly localize to CpG sites outside of genes, promoters, and CpG islands (CGIs), while trans-meQTLs are over-represented in promoter CGIs. meQTL SNPs are enriched in CTCF binding sites, DNaseI hypersensitivity regions and histone marks. Importantly, 4 of the 5 established lung cancer risk loci in European ancestry are cis-meQTLs and, in aggregate, cis-meQTLs are enriched for lung cancer risk in a genome-wide analysis of 11,587 subjects. Thus, inherited genetic variation may affect lung carcinogenesis by regulating the human methylome.
Common variants in two of the five genetic regions recently identified from genome-wide association studies (GWAS) of risk of glioma were reported to interact with a history of allergic symptoms. In a pooled analysis of five epidemiologic studies, we evaluated the association between the five GWAS implicated gene variants and allergies and autoimmune conditions (AIC) on glioma risk (851 adult glioma cases and 3,977 controls). We further evaluated the joint effects between allergies and AIC and these gene variants on glioma risk. Risk estimates were calculated as odds ratios (OR) and 95 % confidence intervals (95 % CI), adjusted for age, gender, and study. Joint effects were evaluated by conducting stratified analyses whereby the risk associations (OR and 95 % CI) with the allergy or autoimmune conditions for glioma were evaluated by the presence or absence of the ‘at-risk’ variant, and estimated p interaction by fitting models with the main effects of allergy or autoimmune conditions and genotype and an interaction (product) term between them. Four of the five SNPs previously reported by others were statistically significantly associated with increased risk of glioma in our study (rs2736100, rs4295627, rs4977756, and rs6010620); rs498872 was not associated with glioma in our study. Reporting any allergies or AIC was associated with reduced risks of glioma (allergy: adjusted OR = 0.71, 95 % CI 0.55–0.91; AIC: adjusted OR = 0.65, 95 % CI 0.47–0.90). We did not observe differential association between allergic or autoimmune conditions and glioma by genotype, and there were no statistically significant p interactions. Stratified analysis by glioma grade (low and high grade) did not suggest risk differences by disease grade. Our results do not provide evidence that allergies or AIC modulate the association between the four GWAS-identified SNPs examined and risk of glioma.
Single-nucleotide polymorphisms; Glioma; Allergies; Autoimmune conditions; Gene–environment interaction
Bladder cancer results from the combined effects of environmental and genetic factors, smoking being the strongest risk factor. Evaluating absolute risks resulting from the joint effects of smoking and genetic factors is critical to evaluate the public health relevance of genetic information. Analyses included up to 3,942 cases and 5,680 controls of European background in seven studies. We tested for multiplicative and additive interactions between smoking and 12 susceptibility loci, individually and combined as a polygenic risk score (PRS). Thirty-year absolute risks and risk differences by levels of the PRS were estimated for US-males aged 50-years. Six out of 12 variants showed significant additive gene-environment interactions, most notably NAT2 (P=7×10-4) and UGT1A6 (P=8×10-4). The 30-year absolute risk of bladder cancer in US males was 6.2% for all current smokers. This risk ranged from 2.9% for current smokers in the lowest quartile of the PRS to 9.9% for current smokers in the upper quartile. Risk difference estimates indicated that 8,200 cases would be prevented if elimination of smoking occurred in 100,000 men in the upper PRS quartile, compared to 2,000 cases prevented by a similar effort in the lowest PRS quartile (P-additive =1×10-4). The impact of eliminating smoking the on number of bladder cancer cases prevented is larger for individuals at higher than lower genetic risk. Our findings could have implications for targeted prevention strategies. However, other smoking-related diseases, as well as practical and ethical considerations, need to be considered before any recommendations could be made.
A few epidemiologic studies have found that use of nonsteroidal anti-inflammatory drugs (NSAIDs) is associated with reduced risk of bladder cancer. However, the effects of specific NSAID use and individual variability in risk have not been well studied. We examined the association between NSAIDs use and bladder cancer risk, and its modification by 39 candidate genes related to NSAID metabolism. A population-based case–control study was conducted in northern New England, enrolling 1,171 newly diagnosed cases and 1,418 controls. Regular use of nonaspirin, nonselective NSAIDs was associated with reduced bladder cancer risk, with a statistically significant inverse trend in risk with duration of use (ORs of 1.0, 0.8, 0.6 and 0.6 for <5, 5–9, 10–19 and 201 years, respectively; ptrend = 0.015). This association was driven mainly by ibuprofen; significant inverse trends in risk with increasing duration and dose of ibuprofen were observed (ptrend = 0.009 and 0.054, respectively). The reduced risk from ibuprofen use was limited to individuals carrying the T allele of a single nucleotide polymorphism (rs4646450) compared to those who did not use ibuprofen and did not carry the T allele in the CYP3A locus, providing new evidence that this association might be modified by polymorphisms in genes that metabolize ibuprofen. Significant positive trends in risk with increasing duration and cumulative dose of selective cyclooxygenase (COX-2) inhibitors were observed. Our results are consistent with those from previous studies linking use of NSAIDs, particularly ibuprofen, with reduced risk. We observed a previously unrecognized risk associated with use of COX-2 inhibitors, which merits further evaluation.
bladder cancer; nonsteroidal anti-inflammatory drugs; gene–drug interaction; CYP3A
Neuronal nicotinic acetylcholine receptor (nAChR) genes (CHRNA5/CHRNA3/CHRNB4) have been reproducibly associated with nicotine dependence, smoking behaviors, and lung cancer risk. Of the few reports that have focused on early smoking behaviors, association results have been mixed. This meta-analysis examines early smoking phenotypes and SNPs in the gene cluster to determine: (1) whether the most robust association signal in this region (rs16969968) for other smoking behaviors is also associated with early behaviors, and/or (2) if additional statistically independent signals are important in early smoking. We focused on two phenotypes: age of tobacco initiation (AOI) and age of first regular tobacco use (AOS). This study included 56,034 subjects (41 groups) spanning nine countries and evaluated five SNPs including rs1948, rs16969968, rs578776, rs588765, and rs684513. Each dataset was analyzed using a centrally generated script. Meta-analyses were conducted from summary statistics. AOS yielded significant associations with SNPs rs578776 (beta = 0.02, P = 0.004), rs1948 (beta = 0.023, P = 0.018), and rs684513 (beta = 0.032, P = 0.017), indicating protective effects. There were no significant associations for the AOI phenotype. Importantly, rs16969968, the most replicated signal in this region for nicotine dependence, cigarettes per day, and cotinine levels, was not associated with AOI (P = 0.59) or AOS (P = 0.92). These results provide important insight into the complexity of smoking behavior phenotypes, and suggest that association signals in the CHRNA5/A3/B4 gene cluster affecting early smoking behaviors may be different from those affecting the mature nicotine dependence phenotype.
CHRNA5; CHRNA3; CHRNB4; meta-analysis; nicotine; smoke
Primary analysis of case–control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case–control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case–control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case–control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Biased samples; Homoscedastic regression; Secondary data; Secondary phenotypes; Semiparametric inference; Two-stage samples
A recent genome-wide association study (GWAS) of subjects from Japan and South Korea reported a novel association between the TP63 locus on chromosome 3q28 and risk of lung adenocarcinoma (p = 7.3 × 10−12); however, this association did not achieve genome-wide significance (p < 10−7) among never-smoking males or females. To determine if this association with lung cancer risk is independent of tobacco use, we genotyped the TP63 SNPs reported by the previous GWAS (rs10937405 and rs4488809) in 3,467 never-smoking female lung cancer cases and 3,787 never-smoking female controls from 10 studies conducted in Taiwan, Mainland China, South Korea, and Singapore. Genetic variation in rs10937405 was associated with risk of lung adenocarcinoma [n = 2,529 cases; p = 7.1 × 10−8; allelic risk = 0.80, 95% confidence interval (CI) = 0.74–0.87]. There was also evidence of association with squamous cell carcinoma of the lung (n = 302 cases; p = 0.037; allelic risk = 0.82, 95% CI = 0.67–0.99). Our findings provide strong evidence that genetic variation in TP63 is associated with the risk of lung adenocarcinoma among Asian females in the absence of tobacco smoking.
We present a Bayesian approach to modeling dynamic smoking addiction behavior processes when cure is not directly observed due to censoring. Subject-specific probabilities model the stochastic transitions among three behavioral states: smoking, transient quitting, and permanent quitting (absorbent state). A multivariate normal distribution for random effects is used to account for the potential correlation among the subject-specific transition probabilities. Inference is conducted using a Bayesian framework via Markov chain Monte Carlo simulation. This framework provides various measures of subject-specific predictions, which are useful for policy-making, intervention development, and evaluation. Simulations are used to validate our Bayesian methodology and assess its frequentist properties. Our methods are motivated by, and applied to, the Alpha-Tocopherol, Beta-Carotene Lung Cancer Prevention study, a large (29,133 individuals) longitudinal cohort study of smokers from Finland.
Cure model; MCMC; Mixed-effects model; Prediction; Recurrent events; Smoking cessation
There has been a long-standing controversy in epidemiology with regard to an appropriate risk scale for testing interactions between genes (G) and environmental exposure (E ). Although interaction tests based on the logistic model—which approximates the multiplicative risk for rare diseases—have been more widely applied because of its convenience in statistical modeling, interactions under additive risk models have been regarded as closer to true biologic interactions and more useful in intervention-related decision-making processes in public health. It has been well known that exploiting a natural assumption of G-E independence for the underlying population can dramatically increase statistical power for detecting multiplicative interactions in case-control studies. However, the implication of the independence assumption for tests for additive interaction has not been previously investigated. In this article, the authors develop a likelihood ratio test for detecting additive interactions for case-control studies that incorporates the G-E independence assumption. Numerical investigation of power suggests that incorporation of the independence assumption can enhance the efficiency of the test for additive interaction by 2- to 2.5-fold. The authors illustrate their method by applying it to data from a bladder cancer study.
additive risk model; case-control studies; gene-environment independence; gene-environment interaction; multiplicative risk model
Pulmonary inflammation may contribute to lung cancer etiology. We conducted a broad evaluation of the association of single nucleotide polymorphisms (SNPs) in innate immunity and inflammation pathways with lung cancer risk, and conducted comparisons with a lung cancer genome wide association study (GWAS).
We included 378 lung cancer cases and 450 controls from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. An Illumina GoldenGate oligonucleotide pool assay was used to genotype 1,429 SNPs. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated for each SNP, and p-values for trend were calculated. For statistically significant SNPs (p-trend<0.05), we replicated our results with genotyped or imputed SNPs in the GWAS, and adjusted p-values for multiple testing.
In our PLCO analysis, we observed a significant association between 81 SNPs located in 44 genes and lung cancer (p-trend<0.05). Of these 81 SNPS, there was evidence for confirmation in the GWAS for 10 SNPs. However, after adjusting for multiple comparisons, the only SNP that remained significantly associated with lung cancer in the replication phase was rs4648127 (NFKB1; multiple testing adjusted p-trend=0.02). The CT/TT genotype of NFKB1 was associated with reduced odds of lung cancer in the PLCO study (OR=0.56; 95% CI 0.37–0.86) and the GWAS (OR=0.79; 95% CI 0.69–0.90).
We found a significant association between a variant in the NFKB1 gene and lung cancer risk. Our findings add to evidence implicating inflammation and immunity in lung cancer etiology.
lung cancer; genetics; inflammation; immunity; epidemiology
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before.
In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Genome-wide association studies; gene-environment interaction; post-GWAS analysis; association tests; exploratory methods
We report a new model to project the predictive performance of polygenic models based on the number and distribution of effect sizes for the underlying susceptibility alleles and the size of the training dataset. Using estimates of effect-size distribution and heritability derived from current studies, we project that while 45% of the variance of height has been attributed to common tagging Single Nucleotide Polymorphisms (SNP), a model trained on one million people may only explain 33.4% of variance of the trait. Current studies can identify 3.0%, 1.1%, and 7.0%, of the populations who are at two-fold or higher than average risk for Type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate the percentages to 18.8%, 6.1%, and 12.2%, respectively. The utility of future polygenic models will depend on achievable sample sizes, underlying genetic architecture and information on other risk-factors, including family history.