The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before.
In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Genome-wide association studies; gene-environment interaction; post-GWAS analysis; association tests; exploratory methods
We report a new model to project the predictive performance of polygenic models based on the number and distribution of effect sizes for the underlying susceptibility alleles and the size of the training dataset. Using estimates of effect-size distribution and heritability derived from current studies, we project that while 45% of the variance of height has been attributed to common tagging Single Nucleotide Polymorphisms (SNP), a model trained on one million people may only explain 33.4% of variance of the trait. Current studies can identify 3.0%, 1.1%, and 7.0%, of the populations who are at two-fold or higher than average risk for Type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate the percentages to 18.8%, 6.1%, and 12.2%, respectively. The utility of future polygenic models will depend on achievable sample sizes, underlying genetic architecture and information on other risk-factors, including family history.
Recent studies have identified common genetic variants that are unequivocally associated with central adiposity, BMI, and/or fasting plasma glucose among individuals of European descent. Our objective was to evaluate these associations in a population of Asian-Indians. We examined 16 single-nucleotide polymorphisms (SNPs) from loci previously linked to waist circumference, BMI, or fasting glucose in 1,129 Asian-Indians from New Delhi and Trivandrum. Trained medical staff measured waist circumference, height, and weight. Fasting plasma glucose was measured from collected blood specimens. Genotype–phenotype associations were evaluated using linear regression, with adjustments for age, gender, religion, and study region. For gene–environment interaction tests, total physical activity (PA) during the past 7 days was assessed by the International Physical Activity Questionnaire (IPAQ). The T allele at the FTO rs3751812 locus was associated with increased waist circumference (per allele effect of +1.58 cm, Ptrend = 0.0015) after Bonferroni adjustment for multiple testing (Padj = 0.04). We also found a nominally statistically significant FTO–PA interaction (Pinteraction = 0.008). Among participants with <81 metabolic equivalent (MET)-h/wk of PA, the rs3751812 variant was associated with increased waist size (+2.68 cm; 95% confidence interval (CI) = 1.24, 4.12), but not among those with 212+ MET-h/wk (−1.79 cm; 95% CI = −4.17, 0.58). No other variant had statistically significant associations, although statistical power was modest. In conclusion, we confirmed that an FTO variant associated with central adiposity in European populations is associated with central adiposity among Asian-Indians and corroborated prior reports indicating that high PA attenuates FTO-related genetic susceptibility to adiposity.
Background Some, but not all, observational studies have suggested that taller stature is associated with a significant increased risk of glioma. In a pooled analysis of observational studies, we investigated the strength and consistency of this association, overall and for major sub-types, and investigated effect modification by genetic susceptibility to the disease.
Methods We standardized and combined individual-level data on 1354 cases and 4734 control subjects from 13 prospective and 2 case–control studies. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) for glioma and glioma sub-types were estimated using logistic regression models stratified by sex and adjusted for birth cohort and study. Pooled ORs were additionally estimated after stratifying the models according to seven recently identified glioma-related genetic variants.
Results Among men, we found a positive association between height and glioma risk (≥190 vs 170–174 cm, pooled OR = 1.70, 95% CI: 1.11–2.61; P-trend = 0.01), which was slightly stronger after restricting to cases with glioblastoma (pooled OR = 1.99, 95% CI: 1.17–3.38; P-trend = 0.02). Among women, these associations were less clear (≥175 vs 160–164 cm, pooled OR for glioma = 1.06, 95% CI: 0.70–1.62; P-trend = 0.22; pooled OR for glioblastoma = 1.36, 95% CI: 0.77–2.39; P-trend = 0.04). In general, we did not observe evidence of effect modification by glioma-related genotypes on the association between height and glioma risk.
Conclusion An association of taller adult stature with glioma, particularly for men and stronger for glioblastoma, should be investigated further to clarify the role of environmental and genetic determinants of height in the etiology of this disease.
Height; brain cancer; glioma; cancer; epidemiology
Epidemiological studies have yielded inconsistent associations between vitamin D status and prostate cancer risk, and few studies have evaluated whether the associations vary by disease aggressiveness. We investigated the association between vitamin D status, as determined by serum 25-hydroxy-vitamin D [25(OH)D] level, and risk of prostate cancer in a case–control study nested within the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial.
The study included 749 case patients with incident prostate cancer who were diagnosed 1 to 8 years after blood draw and 781 control subjects who were frequency-matched by age at cohort entry, time since initial screening, and calendar year of cohort entry. All study participants were selected from the trial screening arm (which includes annual standardized prostate cancer screening). Conditional logistic regression was used to estimate odds ratios (ORs) with 95% confidence intervals (CIs) by quintile of 25(OH)D. Statistical tests were two-sided.
No statistically significant trend in overall prostate cancer risk was observed with increasing serum season-standardized 25(OH)D level. However, serum 25(OH)D concentrations greater than the lowest quintile (Q1) associated with increased risk of aggressive (Gleason sum ≥7 or clinical stage III or IV) disease (ORs for Q2 vs Q1 = 1.20, 95% CI = 0.80 to 1.81, for Q3 vs Q1 =1.96, 95% CI = 1.34 to 2.87, for Q4 vs Q1 = 1.61, 95% CI = 1.09 to 2.38, and for Q5 vs Q1 = 1.37, 95% CI = 0.92 to 2.05; Ptrend = .05). The rates of aggressive prostate cancer for increasing quintiles of serum 25(OH)2D were 406, 479, 780, 633, and 544 per 100,000 person-years. In exploratory analyses, these associations with aggressive disease were consistent across subgroups defined by age, family history of prostate cancer, diabetes, body mass index, vigorous physical activity, calcium intake, study center, season of blood collection, and assay batch.
The findings of this large prospective study do not support the hypothesis that vitamin D is associated with decreased risk of prostate cancer; indeed, higher circulating 25(OH)D concentrations may be associated with increased risk of aggressive disease.
25-hydroxy-vitamin D; prostate cancer
We show how to use reports of cancer in family members to discover additional genetic associations or confirm previous findings in genome-wide association (GWA) studies conducted in case-control, cohort, or cross-sectional studies. Our novel family-history-based approach allows economical association studies for multiple cancers, without genotyping of relatives (as required in family studies), follow-up of participants (as required in cohort studies), or oversampling of specific cancer cases, (as required in case-control studies). We empirically evaluate the performance of the proposed family-history-based approach in studying associations with prostate and ovarian cancers, using data from GWA studies previously conducted within the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. The family-history-based method may be particularly useful for investigating genetic susceptibility to rare diseases, for which accruing cases may be very difficult, by using disease information from non-genotyped relatives of participants in multiple case-control and cohort studies designed primarily for other purposes.
To estimate the likely number and predictive strength of cancer-associated single nucleotide polymorphisms (SNPs) that are yet to be discovered for seven common cancers.
From the statistical power of published genome-wide association studies, we estimated the number of undetected susceptibility loci and the distribution of effect sizes for all cancers. Assuming a log-normal model for risks and multiplicative relative risks for SNPs, family history (FH), and known risk factors, we estimated the area under the receiver operating characteristic curve (AUC) and the proportion of patients with risks above risk thresholds for screening. From additional prevalence data, we estimated the positive predictive value and the ratio of non–patient cases to patient cases (false-positive ratio) for various risk thresholds.
Age-specific discriminatory accuracy (AUC) for models including FH and foreseeable SNPs ranged from 0.575 for ovarian cancer to 0.694 for prostate cancer. The proportions of patients in the highest decile of population risk ranged from 16.2% for ovarian cancer to 29.4% for prostate cancer. The corresponding false-positive ratios were 241 for colorectal cancer, 610 for ovarian cancer, and 138 or 280 for breast cancer in women age 50 to 54 or 40 to 44 years, respectively.
Foreseeable common SNP discoveries may not permit identification of small subsets of patients that contain most cancers. Usefulness of screening could be diminished by many false positives. Additional strong risk factors are needed to improve risk discrimination.
A recent genome-wide association study of bladder cancer identified the UGT1A gene cluster on chromosome 2q37.1 as a novel susceptibility locus. The UGT1A cluster encodes a family of UDP-glucuronosyltransferases (UGTs), which facilitate cellular detoxification and removal of aromatic amines. Bioactivated forms of aromatic amines found in tobacco smoke and industrial chemicals are the main risk factors for bladder cancer. The association within the UGT1A locus was detected by a single nucleotide polymorphism (SNP) rs11892031. Now, we performed detailed resequencing, imputation and genotyping in this region. We clarified the original genetic association detected by rs11892031 and identified an uncommon SNP rs17863783 that explained and strengthened the association in this region (allele frequency 0.014 in 4035 cases and 0.025 in 5284 controls, OR = 0.55, 95%CI = 0.44–0.69, P = 3.3 × 10−7). Rs17863783 is a synonymous coding variant Val209Val within the functional UGT1A6.1 splicing form, strongly expressed in the liver, kidney and bladder. We found the protective T allele of rs17863783 to be associated with increased mRNA expression of UGT1A6.1 in in-vitro exontrap assays and in human liver tissue samples. We suggest that rs17863783 may protect from bladder cancer by increasing the removal of carcinogens from bladder epithelium by the UGT1A6.1 protein. Our study shows an example of genetic and functional role of an uncommon protective genetic variant in a complex human disease, such as bladder cancer.
In follow-up of a recent genome-wide association study (GWAS) that identified a locus in chromosome 2p21 associated with risk for renal cell carcinoma (RCC), we conducted a fine mapping analysis of a 120 kb region that includes EPAS1. We genotyped 59 tagged common single-nucleotide polymorphisms (SNPs) in 2278 RCC and 3719 controls of European background and observed a novel signal for rs9679290 [P = 5.75 × 10−8, per-allele odds ratio (OR) = 1.27, 95% confidence interval (CI): 1.17–1.39]. Imputation of common SNPs surrounding rs9679290 using HapMap 3 and 1000 Genomes data yielded two additional signals, rs4953346 (P = 4.09 × 10−14) and rs12617313 (P = 7.48 × 10−12), both highly correlated with rs9679290 (r2 > 0.95), but interestingly not correlated with the two SNPs reported in the GWAS: rs11894252 and rs7579899 (r2 < 0.1 with rs9679290). Genotype analysis of rs12617313 confirmed an association with RCC risk (P = 1.72 × 10−9, per-allele OR = 1.28, 95% CI: 1.18–1.39) In conclusion, we report that chromosome 2p21 harbors a complex genetic architecture for common RCC risk variants.
The question of which statistical approach is the most effective for investigating gene-environment (G-E) interactions in the context of genome-wide association studies (GWAS) remains unresolved. By using 2 case-control GWAS (the Nurses’ Health Study, 1976–2006, and the Health Professionals Follow-up Study, 1986–2006) of type 2 diabetes, the authors compared 5 tests for interactions: standard logistic regression-based case-control; case-only; semiparametric maximum-likelihood estimation of an empirical-Bayes shrinkage estimator; and 2-stage tests. The authors also compared 2 joint tests of genetic main effects and G-E interaction. Elevated body mass index was the exposure of interest and was modeled as a binary trait to avoid an inflated type I error rate that the authors observed when the main effect of continuous body mass index was misspecified. Although both the case-only and the semiparametric maximum-likelihood estimation approaches assume that the tested markers are independent of exposure in the general population, the authors did not observe any evidence of inflated type I error for these tests in their studies with 2,199 cases and 3,044 controls. Both joint tests detected markers with known marginal effects. Loci with the most significant G-E interactions using the standard, empirical-Bayes, and 2-stage tests were strongly correlated with the exposure among controls. Study findings suggest that methods exploiting G-E independence can be efficient and valid options for investigating G-E interactions in GWAS.
case-control studies; case study; diabetes mellitus, type 2; epidemiologic methods; genome-wide association study; genotype-environment interaction
Several methods for screening gene-environment interaction have recently been proposed that address the issue of using gene-environment independence in a data-adaptive way. In this report, the authors present a comparative simulation study of power and type I error properties of 3 classes of procedures: 1) the standard 1-step case-control method; 2) the case-only method that requires an assumption of gene-environment independence for the underlying population; and 3) a variety of hybrid methods, including empirical-Bayes, 2-step, and model averaging, that aim at gaining power by exploiting the assumption of gene-environment independence and yet can protect against false positives when the independence assumption is violated. These studies suggest that, although the case-only method generally has maximum power, it has the potential to create substantial false positives in large-scale studies even when a small fraction of markers are associated with the exposure under study in the underlying population. All the hybrid methods perform well in protecting against such false positives and yet can retain substantial power advantages over standard case-control tests. The authors conclude that, for future genome-wide scans for gene-environment interactions, major power gain is possible by using alternatives to standard case-control analysis. Whether a case-only type scan or one of the hybrid methods should be used depends on the strength and direction of gene-environment interaction and association, the level of tolerance for false positives, and the nature of replication strategies.
case-control studies; efficiency; familywise error rate; genome-wide association study; profile likelihood; robustness; shrinkage
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
gene-gene interactions; gene-environment interactions; rare variants; next generation sequencing; complex phenotypes; simulations; computational resources
In an analysis of 31,717 cancer cases and 26,136 cancer-free controls drawn from 13 genome-wide association studies (GWAS), we observed large chromosomal abnormalities in a subset of clones from DNA obtained from blood or buccal samples. Mosaic chromosomal abnormalities, either aneuploidy or copy-neutral loss of heterozygosity, of size >2 Mb were observed in autosomes of 517 individuals (0.89%) with abnormal cell proportions between 7% and 95%. In cancer-free individuals, the frequency increased with age; 0.23% under 50 and 1.91% between 75 and 79 (p=4.8×10−8). Mosaic abnormalities were more frequent in individuals with solid-tumors (0.97% versus 0.74% in cancer-free individuals, OR=1.25, p=0.016), with a stronger association for cases who had DNA collected prior to diagnosis or treatment (OR=1.45, p=0.0005). Detectable clonal mosaicism was common in individuals for whom DNA was collected at least one year prior to diagnosis of leukemia compared to cancer-free individuals (OR=35.4, p=3.8×10−11). These findings underscore the importance of the role and time-dependent nature of somatic events in the etiology of cancer and other late-onset diseases.
We conducted a genome-wide association study (GWAS) of breast cancer by genotyping 528,173 single nucleotide polymorphisms (SNPs) in 1,145 cases of invasive breast cancer among postmenopausal white women, and 1,142 controls. We identified a set of four SNPs in intron 2 of FGFR2, a tyrosine kinase receptor previously shown to be amplified and/or over-expressed in some breast cancers, as highly associated with breast cancer and we confirmed this association in 1,776 cases and 2,072 controls from three additional studies. In both association testing and ancestral recombination graph analysis, FGFR2 haplotypes were associated with risk of breast cancer. Across the four studies the association with all four SNPs was highly statistically significant (Ptrend for the most strongly associated SNP, rs1219648 = 1.1 × 10−10; population attributable risk = 16%). Four SNPs at other chromosomal loci most strongly associated with breast cancer in the initial GWAS were not associated with risk in the three replication studies. Our summary results from the GWAS are freely available online in a form that should speed the identification of additional loci conferring risk.
We introduce an innovative multilocus test for disease association. It is an extension of an existing score test that gains power over alternative methods by incorporating a parsimonious one-degree-of-freedom model for interaction. We use our method in applications designed to detect interactions that generate hypotheses about the functionality of prostate cancer (PRCA) susceptibility regions.
Our proposed score test is designed to gain additional power through the use of a retrospective likelihood that exploits an assumption of independence between unlinked loci in the underlying population. Its performance is validated through simulation. The method is used in conditional scans with data from stage II of the Cancer Genetic Markers of Susceptibility PRCA genome-wide association study.
Our proposed method increases power to detect susceptibility loci in diverse settings. It identified two high-ranking, biologically interesting interactions: (1) rs748120 of NR2C2 and subregions of 8q24 that contain independent susceptibility loci specific to PRCA and (2) rs4810671 of SULF2 and both JAZF1 and HNF1B that are associated with PRCA and type 2 diabetes.
Our score test is a promising multilocus tool for genetic epidemiology. The results of our applications suggest functionality for poorly understood PRCA susceptibility regions. They motivate replication study.
Gene-gene interaction; Score test; Prostate cancer
Genome-wide and candidate-gene association studies of bladder cancer have identified 10 susceptibility loci thus far. We conducted a meta-analysis of two previously published genome-wide scans (4501 cases and 6076 controls of European background) and followed up the most significant association signals [17 single nucleotide polymorphisms (SNPs) in 10 genomic regions] in 1382 cases and 2201 controls from four studies. A combined analysis adjusted for study center, age, sex, and smoking status identified a novel susceptibility locus that mapped to a region of 18q12.3, marked by rs7238033 (P = 8.7 × 10–9; allelic odds ratio 1.20 with 95% CI: 1.13–1.28) and two highly correlated SNPs, rs10775480/rs10853535 (r2= 1.00; P = 8.9 × 10–9; allelic odds ratio 1.16 with 95% CI: 1.10–1.22). The signal localizes to the solute carrier family 14 member 1 gene, SLC14A1, a urea transporter that regulates cellular osmotic pressure. In the kidney, SLC14A1 regulates urine volume and concentration whereas in erythrocytes it determines the Kidd blood groups. Our findings suggest that genetic variation in SLC14A1 could provide new etiological insights into bladder carcinogenesis.
Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR).
We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma.
The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest.
Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
Genetic associations; Power; Random forests; SNP; Variable importance measure
A prospective study of diet and cancer has not been conducted in India; consequently, little is known regarding follow-up rates or the completeness and accuracy of cancer case ascertainment.
We assessed follow-up in the India Health Study (IHS; 4,671 participants aged 35–69 residing in New Delhi, Mumbai, or Trivandrum). We evaluated the impact of medical care access and relocation, re-contacted the IHS participants to estimate follow-up rates, and conducted separate studies of cancer cases to evaluate registry coverage (604 cases in Trivandrum) and the accuracy of self- and proxy-reporting (1600 cases in New Delhi and Trivandrum).
Over 97% of people reported seeing a doctor and 85% had lived in their current residence for over six years. The 2-year follow-up rate was 91% for Trivandrum and 53% for New Delhi. No cancer cases were missed among public institutions participating in the surveillance program in Trivandrum during 2003–04; but there are likely to be unmatched cases (ranging from 5 to13% of total cases) from private hospitals in the Trivandrum registry, as there are no mandatory reporting requirements. Vital status was obtained for 36% of cancer cases in New Delhi as compared to 78% in Trivandrum after a period of 4 years.
A prospective cohort study of cancer may be feasible in some centers in India with active follow-up to supplement registry data. Inclusion of cancers diagnosed at private institutions, unique identifiers for individuals, and computerized medical information would likely improve cancer registries.
Cancer; end-point; follow-up; registry; prospective cohort; India
Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (μ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and μ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success.
next generation sequencing; sequencing depth; study design; rare variants
We report a genome-wide association study in 10,286 cases and 9,135 controls of European ancestry, in the Cancer Genetic Markers of Susceptibility (CGEMS) initiative, identifying a new association with prostate cancer risk on chromosome 8q24 (rs620861, p=1.3×10-10, heterozygote OR = 1.17, 95% CI 1.10 – 1.24; homozygote OR = 1.33; 95% CI 1.21 – 1.45). This defines a new prostate locus on 8q24, Region 4, previously associated with breast cancer.
Previous genome-wide association studies have identified two independent variants in HNF1B as susceptibility loci for prostate cancer risk. To fine-map common genetic variation in this region, we genotyped 79 single nucleotide polymorphisms (SNPs) in the 17q12 region harboring HNF1B in 10 272 prostate cancer cases and 9123 controls of European ancestry from 10 case–control studies as part of the Cancer Genetic Markers of Susceptibility (CGEMS) initiative. Ten SNPs were significantly related to prostate cancer risk at a genome-wide significance level of P < 5 × 10−8 with the most significant association with rs4430796 (P = 1.62 × 10−24). However, risk within this first locus was not entirely explained by rs4430796. Although modestly correlated (r2= 0.64), rs7405696 was also associated with risk (P = 9.35 × 10−23) even after adjustment for rs4430769 (P = 0.007). As expected, rs11649743 was related to prostate cancer risk (P = 3.54 × 10−8); however, the association within this second locus was stronger for rs4794758 (P = 4.95 × 10−10), which explained all of the risk observed with rs11649743 when both SNPs were included in the same model (P = 0.32 for rs11649743; P = 0.002 for rs4794758). Sequential conditional analyses indicated that five SNPs (rs4430796, rs7405696, rs4794758, rs1016990 and rs3094509) together comprise the best model for risk in this region. This study demonstrates a complex relationship between variants in the HNF1B region and prostate cancer risk. Further studies are needed to investigate the biological basis of the association of variants in 17q12 with prostate cancer.
While lung cancer is largely caused by tobacco smoking, inherited genetic factors play a role in its etiology. Genome-wide association studies (GWAS) in Europeans have robustly demonstrated only three polymorphic variations influencing lung cancer risk. Tumor heterogeneity may have hampered the detection of association signal when all lung cancer subtypes were analyzed together. In a GWAS of 5,355 European smoking lung cancer cases and 4,344 smoking controls, we conducted a pathway-based analysis in lung cancer histologic subtypes with 19,082 SNPs mapping to 917 genes in the HuGE-defined “inflammation” pathway. We identified a susceptibility locus for squamous cell lung carcinoma (SQ) at 12p13.33 (RAD52, rs6489769), and replicated the association in three independent samples totaling 3,359 SQ cases and 9,100 controls (odds ratio=1.20, Pcombined=2.3×10−8).
The combination of pathway-based approaches and information on disease specific subtypes can improve the identification of cancer susceptibility loci in heterogeneous diseases.
Lung cancer; histology; squamous cell carcinoma; pathway analysis; RAD52
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100–500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10 − 6). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.
Bootstrap procedures; Genetic association studies; p-value; Resampling-based tests; Stochastic approximation Markov chain Monte Carlo
The balance between Th1 and Th2 activity is critical in lymphoid cell development and differentiation. Immune dysfunction underlies lymphomagenesis, so an alteration in the regulation of key Th1/Th2 cytokines may lead to the development of non-Hodgkin lymphoma (NHL). To study the impact of polymorphism in Th1/Th2 cytokines on NHL risk, we analyzed 145 tag single nucleotide polymorphisms (SNPs) in 17 Th1/Th2 cytokine and related genes in three population-based case-control studies (1,946 cases and 1,808 controls). Logistic regression was used to compute odds ratios (OR) for NHL and four major NHL subtypes in relation to tag SNP genotypes and haplotypes. A gene-based analysis adjusting for the number of tag SNPs genotyped in each gene showed significant associations with risk of NHL combined and one or more NHL subtypes for Th1 (IL12A and IL12RB1) and Th2 (IL4, IL10RB, and IL18) genes. The strongest association was for IL12A rs485497, which plays a central role in bridging the cellular and humoral pathways of innate resistance and antigen-specific adaptive immune responses (allele risk OR=1.17; P(trend)=0.00099). This SNP was also associated specifically with risk of follicular lymphoma (allele risk OR=1.26; P(trend)=0.0012). These findings suggest that genetic variation in Th1/Th2 cytokine genes may contribute to lymphomagenesis.
Non-Hodgkin lymphoma; single nucleotide polymorphisms; immunogenetics; case-control study
Recent genome-wide association studies have identified independent susceptibility loci for prostate cancer (CaP) that could influence risk through interaction with other, possibly undetected, susceptibility loci. We explored evidence of interaction between pairs of 13 known susceptibility loci and single nucleotide polymorphisms (SNPs) across the genome to generate hypotheses about the functionality of CaP susceptibility regions. We used data from Cancer Genetic Markers of Susceptibility: Stage I included 523,841 SNPs in 1175 cases and 1100 controls; Stage II included 27,383 SNPs in an additional 3941 cases and 3964 controls. Power calculations assessed the magnitude of interactions our study is likely to detect. Logistic regression was used with alternative methods that exploit constraints of gene-gene independence between unlinked loci to increase power. Our empirical evaluation demonstrated that an empirical Bayes (EB) technique is powerful and robust to possible violation of the independence assumption. Our EB analysis identified several noteworthy interacting SNP pairs, although none reached genome-wide significance. We highlight a Stage II interaction between the major CaP susceptibility locus in the subregion of 8q24 that contains POU5F1B and an intronic SNP in the transcription factor EPAS1, which has potentially important functional implications for 8q24. Another noteworthy result involves interaction of a known CaP susceptibility marker near the prostate protease genes KLK2 and KLK3 with an intronic SNP in PRXX2. Overall, the interactions we have identified merit follow-up study, particularly the EPAS1 interaction which has implications not only in CaP but also in other epithelial cancers that are associated with the 8q24 locus.