There is increasing interest in adding common genetic variants
identified through genome wide association studies (GWAS) to breast cancer
risk prediction models. First results from such models showed modest
benefits in terms of risk discrimination. Heterogeneity of breast cancer as
defined by hormone-receptor status has not been considered in this context.
In this study we investigated the predictive capacity of 32 GWAS-detected
common variants for breast cancer risk, alone and in combination with
classical risk factors, and for tumors with different hormone receptor
Material and Methods
Within the Breast and Prostate Cancer Cohort Consortium (BPC3), we
analyzed 6009 invasive breast cancer cases and 7827 matched controls of
European ancestry, with data on classical breast cancer risk factors and 32
common gene variants identified through GWAS. Discriminatory ability with
respect to breast cancer of specific hormone receptor-status was assessed
with the age- and cohort-adjusted concordance statistic
(AUROCa). Absolute risk scores were
calculated with external reference data. Integrated discrimination
improvement (IDI) was used to measure improvements in risk prediction.
We found a small but steady increase in discriminatory ability with
increasing numbers of genetic variants included in the model (difference in
AUROCa going from 2.7 to 4%). Discriminatory ability
for all models varied strongly by hormone receptor status
Discussion and Conclusion
Adding information on common polymorphisms provides small but
statistically significant improvements in the quality of breast cancer risk
prediction models. We consistently observed better performance for receptor
positive cases, but the gain in discriminatory quality is not sufficient for
breast cancer; risk prediction; genetic factors; hormone receptor status
Heritability, the fraction of phenotypic variation explained by genetic variation, has been estimated for many phenotypes in a range of populations, organisms, and time points. The recent development of efficient genotyping and sequencing technology has led researchers to attempt to identify the genetic variants responsible for the genetic component of phenotype directly via GWAS. The gap between the phenotypic variance explained by GWAS results and those estimated by from classical heritability methods has been termed the “missing heritability problem”. In this work, we examine modern methods for estimating heritability, which use the genotype and sequence data directly. We discuss them in the context of classical heritability methods, the missing heritability problem, and describe their implications for understanding the genetic architecture of complex phentoypes.
heritability; genome-wide association study; prediction; polygenic models; linear mixed models
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before.
In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Genome-wide association studies; gene-environment interaction; post-GWAS analysis; association tests; exploratory methods
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
As next-generation sequencing (NGS) costs continue to fall and genome-wide association study (GWAS) platform coverage improves, the human genetics community is positioned to identify potentially causal variants. However, current NGS or imputation-based studies of either the whole genome or regions previously identified by GWAS have not yet been very successful in identifying causal variants. A major hurdle is the development of methods to distinguish disease-causing variants from their highly-correlated proxies within an associated region. We show that various common factors, such as differential sequencing or imputation accuracy rates and linkage disequilibrium patterns, with or without GWAS-informed region selection, can substantially decrease the probability of identifying the correct causal SNP, often by more than half. We then describe a novel and easy-to-implement re-ranking procedure that can double the probability that the causal SNP is top-ranked in many settings. Application to the NCI Breast and Prostate Cancer (BPC3) Cohort Consortium aggressive prostate cancer data identified new top SNPs within two associated loci previously established via GWAS, as well as several additional possible causal SNPs that had been previously overlooked.
Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature.
Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants.
Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/
Supplementary information: Supplementary data are available at Bioinformatics online.
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.
Phenotype is a function of a genome and its environment. Heritability is the fraction of variation in a phenotype determined by genetic factors in a population. Current methods to estimate heritability rely on the phenotypic correlations of closely related individuals and are potentially upwardly biased, due to the impact of epistasis and shared environment. We develop new methods to estimate heritability over both closely and distantly related individuals. By examining the phenotypic correlation among different types of related individuals such as siblings, half-siblings, and first cousins, we show that shared environment is the primary determinant of inflated estimates of heritability. For a large number of phenotypes, it is not known how much of the heritability is explained by SNPs included on current genotyping platforms. Existing methods to estimate this component of heritability are biased in the presence of related individuals. We develop a method that permits the inclusion of both closely and distantly related individuals when estimating heritability explained by genotyped SNPs and use it to make estimates for 23 medically relevant phenotypes. These estimates can be used to increase our understanding of the distribution and frequency of functionally relevant variants and thereby inform the design of future studies.
ABO blood type has been associated with risk and survival for several malignancies; however, data for an association with breast cancer are inconsistent. Our study population consisted of Nurses’ Health Study participants with self-reported serologic blood type and/or ABO genotype. Using Cox proportional hazards regression, we examined the association between serologic blood type and incident breast cancer among 67,697 women, including 3,107 cases. In addition, we examined the association with ABO genotype in a nested case-control study of 1,138 invasive breast cancer cases and 1,090 matched controls. Finally, we evaluated the association between serologic blood type and survival among 2,036 participants with breast cancer. No clear association was seen between serologic blood type or ABO genotype and risk of total breast cancer, invasive breast cancer, or breast cancer subtypes. Compared to women with blood type O, the age-adjusted incidence rate ratios for serologic blood type and total breast cancer were 1.06 (95% CI, 0.98–1.15) for type A, 1.06 (95% CI, 0.93–1.22) for AB, and 1.08 (95% CI, 0.96–1.20) for B. In genetic analyses, odds ratios for invasive breast cancer were 1.05 (95% CI, 0.87–1.27) for A/O, 1.21 (95% CI, 0.86–1.69) for A/A, 0.84 (95% CI, 0.56–1.26) for A/B, 0.84 (95% CI, 0.63–1.13) for B/O, and 1.17 (95% CI, 0.35–3.86) for B/B, compared to O/O. No significant association was noted between blood type and overall or breast cancer-specific mortality. Our results suggest no association between ABO blood group and breast cancer risk or survival.
ABO blood group; ABO genotype; blood type; breast cancer; survival
The association of vitamin D status with prostate cancer is controversial; no association has been observed for overall incidence, but there is a potential link with lethal disease.
We assessed prediagnostic 25-hydroxyvitamin D [25(OH)D] levels in plasma, variation in vitamin D–related genes, and risk of lethal prostate cancer using a prospective case–control study nested within the Health Professionals Follow-up Study. We included 1260 men who were diagnosed with prostate cancer after providing a blood sample in 1993–1995 and 1331 control subjects. Men with prostate cancer were followed through March 2011 for lethal outcomes (n = 114). We selected 97 single-nucleotide polymorphisms (SNPs) in genomic regions with high linkage disequilibrium (tagSNPs) to represent common genetic variation among seven vitamin D–related genes (CYP27A1, CYP2R1, CYP27B1, GC, CYP24A1, RXRA, and VDR). We used a logistic kernel machine test to assess whether multimarker SNP sets in seven vitamin D pathway–related genes were collectively associated with prostate cancer. Tests for statistical significance were two-sided.
Higher 25(OH)D levels were associated with a 57% reduction in the risk of lethal prostate cancer (highest vs lowest quartile: odds ratio = 0.43, 95% confidence interval = 0.24 to 0.76). This finding did not vary by time from blood collection to diagnosis. We found no statistically significant association of plasma 25(OH)D levels with overall prostate cancer. Pathway analyses found that the set of SNPs that included all seven genes (P = .008) as well as sets of SNPs that included VDR (P = .01) and CYP27A1 (P = .02) were associated with risk of lethal prostate cancer.
In this prospective study, plasma 25(OH)D levels and common variation among several vitamin D–related genes were associated with lethal prostate cancer risk, suggesting that vitamin D is relevant for lethal prostate cancer.
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. There is limited information on the mechanistic basis of these associations, particularly about whether they interact with circulating concentrations of growth factors and sex hormones, which may be important in prostate cancer etiology. Using conditional logistic regression, the authors compared per-allele odds ratios for prostate cancer for 39 GWAS-identified SNPs across thirds (tertile groups) of circulating concentrations of insulin-like growth factor 1 (IGF-1), insulin-like growth factor binding protein 3 (IGFBP-3), testosterone, androstenedione, androstanediol glucuronide, estradiol, and sex hormone-binding globulin (SHBG) for 3,043 cases and 3,478 controls in the Breast and Prostate Cancer Cohort Consortium. After allowing for multiple testing, none of the SNPs examined were significantly associated with growth factor or hormone concentrations, and the SNP-prostate cancer associations did not differ by these concentrations, although 4 interactions were marginally significant (MSMB-rs10993994 with androstenedione (uncorrected P = 0.008); CTBP2-rs4962416 with IGFBP-3 (uncorrected P = 0.003); 11q13.2-rs12418451 with IGF-1 (uncorrected P = 0.006); and 11q13.2-rs10896449 with SHBG (uncorrected P = 0.005)). The authors found no strong evidence that associations between GWAS-identified SNPs and prostate cancer are modified by circulating concentrations of IGF-1, sex hormones, or their major binding proteins.
gene-environment interaction; gonadal steroid hormones; insulin-like growth factor binding protein 3; insulin-like growth factor I; molecular epidemiology; prostatic neoplasms
Body mass is inversely related to breast cancer risk among premenopausal women. Leptin, an essential cytokine regulating food intake, energy expenditure, glucose, and fat metabolism may be part of the mechanistic pathway. We investigated 50 tagging and candidate SNPs in the leptin (LEP) and leptin receptor (LEPR) genes for associations with premenopausal breast cancer incidence using 405 cases and 810 controls nested within the Nurses’ Health Study II. We also examined associations between these SNPs and circulating leptin (among 910 women) and breast cancer grade (among 267 patients). Permutation tests were performed to adjust for multiple testing. We did not detect a significant association between SNPs in the LEP or LEPR gene and either breast cancer incidence or plasma leptin levels. Among cases, 14 SNPs of the LEPR gene were significantly associated with cancer grade, and rs1137101 (Q223R) survived multiple testing adjustment (adjusted P = 0.04). The G carriers of rs1137101 were more likely to have poorly differentiated than well-differentiated cancers. Our data suggest that common genetic variation in the LEP or LEPR gene has no strong association with premenopausal breast cancer risk. The LEPR gene might be associated with breast cancer grade.
Premenopausal; Breast cancer; Leptin; Leptin receptor
Genome-wide association studies (GWAS) have identified over a dozen loci associated with colorectal cancer (CRC) risk. Here we examined potential effect-modification between single nucleotide polymorphisms (SNPs) at 10 of these loci and probable or established environmental risk factors for CRC in 7,016 CRC cases and 9,723 controls from nine cohort and case-control studies. We used meta-analysis of an efficient empirical-Bayes estimator to detect potential multiplicative interactions between each of the SNPs [rs16892766 at 8q23.3 (EIF3H/UTP23); rs6983267 at 8q24 (MYC); rs10795668 at 10p14 (FLJ3802842); rs3802842 at11q23 (LOC120376); rs4444235 at 14q22.2 (BMP4); rs4779584 at15q13 (GREM1); rs9929218 at16q22.1 (CDH1); rs4939827 at18q21 (SMAD7); rs10411210 at19q13.1 (RHPN2); and rs961253 at 20p12.3 (BMP2)] and select major CRC risk factors (sex, body mass index, height, smoking status, aspirin/non-steroidal anti-inflammatory drug use, alcohol use, and dietary intake of calcium, folate, red meat, processed meat, vegetables, fruit, and fiber). The strongest statistical evidence for a gene-environment interaction across studies was for vegetable consumption and rs16892766, located on chromosome 8q23.3, near the EIF3H and UTP23 genes (nominal p-interaction =1.3×10–4; adjusted p-value 0.02). The magnitude of the main effect of the SNP increased with increasing levels of vegetable consumption. No other interactions were statistically significant after adjusting for multiple comparisons. Overall, the association of most CRC susceptibility loci identified in initial GWAS appears to be invariant to the other risk factors considered; however, our results suggest potential modification of the rs16892766 effect by vegetable consumption.
Colorectal Cancer; Epidemiology; Gene-environment interactions; Genotype phenotype correlations; Polymorphisms in genes that modify dietary exposures
Genome-wide association studies (GWASs) have primarily focused on marginal effects for individual markers and have incorporated external functional information only after identifying robust statistical associations. We applied a new approach combining the genetics of gene expression and functional classification of genes to the GWAS of basal cell carcinoma (BCC) to identify potential biological pathways associated with BCC. We first identified 322,324 expression-associated single-nucleotide polymorphisms (eSNPs) from two existing GWASs of global gene expression in lymphoblastoid cell lines (n=995), and evaluated the association of these functionally annotated SNPs with BCC among 2,045 BCC cases and 6,013 controls in Caucasians. We then grouped them into 99 KEGG pathways for pathway analysis and identified two pathways associated with BCC with p-value < 0.05 and false discovery rate (FDR) < 0.5: the autoimmune thyroid disease pathway (mainly HLA class I and II antigens, p < 0.001, FDR = 0.24) and JAK-STAT signaling pathway (p = 0.02, FDR = 0.49). Seventy nine (25.7%) out of 307 eSNPs in the JAK-STAT pathway were associated with BCC risk (p < 0.05) in an independent replication set of 278 BCC cases and 1,262 controls. In addition, the association of JAK-STAT signaling pathway was marginally validated by using 16,691 eSNPs identified from 110 normal skin samples (p = 0.08). Based on the evidence of biological functions of the JAK-STAT pathway on oncogenesis, it is plausible that this pathway is involved in BCC pathogenesis.
Pathway analysis; Basal cell carcinoma; GWAS; JAK-STAT
Associations between single nucleotide polymorphisms (SNPs) at 5p15 and multiple cancer types have been reported. We have previously shown evidence for a strong association between prostate cancer (PrCa) risk and rs2242652 at 5p15, intronic in the telomerase reverse transcriptase (TERT) gene that encodes TERT. To comprehensively evaluate the association between genetic variation across this region and PrCa, we performed a fine-mapping analysis by genotyping 134 SNPs using a custom Illumina iSelect array or Sequenom MassArray iPlex, followed by imputation of 1094 SNPs in 22 301 PrCa cases and 22 320 controls in The PRACTICAL consortium. Multiple stepwise logistic regression analysis identified four signals in the promoter or intronic regions of TERT that independently associated with PrCa risk. Gene expression analysis of normal prostate tissue showed evidence that SNPs within one of these regions also associated with TERT expression, providing a potential mechanism for predisposition to disease.
Leukocyte telomere length (LTL) is a potential indicator of cellular aging; however, its relation to physical activity and sedentary behavior is unclear. The authors examined cross-sectionally associations among activity, sedentary behavior, and LTL among 7,813 women aged 43–70 years in the Nurses’ Health Study. Participants self-reported activity by questionnaire in 1988 and 1992 and sedentary behavior in 1992. Telomere length in peripheral blood leukocytes, collected in 1989–1990, was measured by quantitative polymerase chain reaction. The least-squares mean telomere length (z-score) was calculated after adjustment for age and other potential confounders. For total activity, moderately or highly active women had a 0.07-standard deviation (SD) increase in LTL (2-sided Ptrend = 0.02) compared with those least active. Greater moderate- or vigorous-intensity activity was also associated with increased LTL (SD = 0.11 for 2–4 vs. <1 hour/week and 0.04 for ≥7 vs. <1 hour/week; 2-sided Ptrend = 0.02). Specifically, calisthenics or aerobics was associated with increased LTL (SD = 0.10 for ≥2.5 vs. 0 hours/week; 2-sided Ptrend = 0.04). Associations remained after adjustment for body mass index. Other specific activities and sitting were unassociated with LTL. Although associations were modest, these findings suggest that even moderate amounts of activity may be associated with longer telomeres, warranting further investigation in large prospective studies.
biological markers; cohort studies; epidemiology; exercise; physical activity; sedentary lifestyle; telomere
Only two genome-wide association studies (GWAS) have been conducted to date to identify potential markers for total mortality after diagnosis of breast cancer. Here we report the identification of two SNPs associated with total mortality from a two-stage GWAS conducted among 6,110 Shanghai-resident Chinese women with TNM stage I-IV breast cancer. The discovery stage included 1,950 patients and evaluated 613,031 common SNPs. The top 49 associations were evaluated in an independent replication stage of 4,160 Shanghai breast cancer patients. A consistent and highly significant association with total mortality was documented for SNPs rs3784099 and rs9934948. SNP rs3784099, located in the RAD51L1 gene, was associated with total morality in both the discovery stage (P=1.44×10−8) and replication stage (P=0.06; P-combined=1.17×10−7). Adjusted hazard ratios (HR) for total mortality were 1.41 (95%CI=1.18–1.68) for the AG genotype and 2.64 (95%CI=1.74–4.03) for the AA genotype, when compared with the GG genotype. The variant C allele of rs9934948, located on chromosome 16, was associated with a similarly elevated risk of total mortality (P-combined: 5.75×10−6). We also observed this association among 1,145 breast cancer patients of European-ancestry from the Nurses’ Health Study (NHS; P=0.006); the association was highly significant in a combined analysis of NHS and Chinese data (P=1.39×10−7). Similar associations were observed for these two SNPs with breast cancer-specific mortality. This study provides strong evidence suggesting that the RAD51L1 gene and a chromosome 16 locus influence breast cancer prognosis.
breast cancer; survival; genome-wide association study; Asian population; RAD51L1 gene
One of the goals of personalized medicine is to generate individual risk profiles that could identify individuals in the population that exhibit high risk. The discovery of more than two-dozen independent SNP markers in prostate cancer has raised the possibility for such risk stratification. In this study, we evaluated the discriminative and predictive ability for prostate cancer risk models incorporating 25 common prostate cancer genetic markers, family history of prostate cancer and age.
We fit a series of risk models and estimated their performance in 7,509 prostate cancer cases and 7,652 controls within the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We also calculated absolute risks based on SEER incidence data.
The best risk model (C-statistic=0.642) included individual genetic markers and family history of prostate cancer. We observed a decreasing trend in discriminative ability with advancing age (P=0.009), with highest accuracy in men younger than 60 years (C-statistic=0.679). The absolute ten-year risk for 50-year old men with a family history ranged from 1.6% (10th percentile of genetic risk) to 6.7% (90th percentile of genetic risk). For men without family history, the risk ranged from 0.8% (10th percentile) to 3.4% (90th percentile).
Our results indicate that incorporating genetic information and family history in prostate cancer risk models can be particularly useful for identifying younger men that might benefit from PSA screening.
Although adding genetic risk markers improves model performance, the clinical utility of these genetic risk models is limited.
Prostate cancer; polymorphism; risk prediction model
Although cross-sectional studies have linked higher body mass index (BMI) and type 2 diabetes (T2D) to shortened telomeres, whether these metabolic conditions play a causal role in telomere biology is unknown. We therefore examined whether genetic predisposition to higher BMI or T2D was associated with shortened leukocyte telomere length (LTL).
We conducted an analysis of 3,968 women of European ancestry aged 43–70 years from the Nurses' Health Study, who were selected as cases or controls in genome-wide association studies and studies of telomeres and disease. Pre-diagnostic relative telomere length in peripheral blood leukocytes, collected in 1989–1990, was measured by quantitative PCR. We combined information from multiple risk variants by calculating genetic risk scores based on 32 polymorphisms near 32 loci for BMI, and 36 polymorphisms near 35 loci for T2D.
After adjustment for age and case-control status, there was no association between the BMI genetic risk score and LTL (β per standard deviation increase: −0.01; SE: 0.02; P = 0.52). Similarly, the T2D genetic score was not associated with LTL (β per standard deviation increase: −0.006; SE: 0.02; P = 0.69).
In this population of middle-aged and older women of European ancestry, those genetically predisposed to higher BMI or T2D did not possess shortened telomeres. Although we cannot exclude weak or modest effects, our findings do not support a causal relation of strong magnitude between these metabolic conditions and telomere dynamics.
Common genetic variants in the Toll-like receptor 4 (TLR4), which is involved in inflammation and immune response pathways, may be important for prostate cancer.
In a large nested case-control study of prostate cancer in the Physicians’ Health Study (1982–2004), 10 single nucleotide polymorphisms (SNPs) were selected and genotyped to capture common variation within the TLR4 gene as well as 5 kilobases up and downstream. Unconditional logistic regression was used to assess associations of these SNPs with total prostate cancer incidence, and with prostate cancers defined as advanced stage/lethal (T3/T4, M1/N1(T1-T4), lethal) or high Gleason grade (7 (4+3) or greater). Cox-proportional hazards regression was used to assess progression to metastases and death among prostate cancer cases.
The study included 1267 controls and 1286 incident prostate cancer cases, including 248 advanced stage/lethal and 306 high grade cases. During a median follow-up of 10.6 years, 183 men died of prostate cancer or developed distant metastases. No statistically significant associations between the TLR4 SNPs were found for total prostate cancer incidence, including SNPs for which an association was reported in other published studies. Additionally, there were no significant associations with TLR4 SNPS and the incidence of advanced stage/lethal, or high grade cancers; nor was there evidence among prostate cancer cases for associations of TLR4 SNPs with progression to prostate cancer specific mortality or bony metastases.
Results from this prospective nested case-control study suggest that genetic variation across TLR4 alone is not strongly associated with prostate cancer risk or mortality.
TLR4; prostate cancer; inflammation; molecular epidemiology
Colorectal cancer is the second leading cause of cancer death in developed countries. Genome-wide association studies (GWAS) have successfully identified novel susceptibility loci for colorectal cancer. To follow-up on these findings, and try to identify novel colorectal cancer susceptibility loci, we present results for genome-wide association studies (GWAS) of colorectal cancer (2,906 cases, 3,416 controls) that have not previously published main associations. Specifically, we calculated odds ratios (ORs) and 95% confidence intervals (CIs) using log-additive models for each study. In order to improve our power to detect novel colorectal cancer susceptibility loci, we performed a meta-analysis combining the results across studies. We selected the most statistically significant single nucleotide polymorphisms (SNPs) for replication using 10 independent studies (8,161 cases and 9,101 controls). We again used a meta-analysis to summarize results for the replication studies alone, and for a combined analysis of GWAS and replication studies. We measured 10 SNPs previously identified in colorectal cancer susceptibility loci and found eight to be associated with colorectal cancer (p-value range: 0.02 to 1.8 × 10−8). When we excluded studies that have previously published on these SNPs, five SNPs remained significant at p<0.05 in the combined analysis. No novel susceptibility loci were significant in the replication study after adjustment for multiple testing, and none reached genome-wide significance from a combined analysis of GWAS and replication. We observed marginally significant evidence for a second independent SNP in the BMP2 region at chromosomal location 20p12 (rs4813802; replication p-value 0.03; combined p-value 7.3 × 10−5). In a region on 5p33.15, which includes the coding regions of the TERT-CLPTM1L genes and has been identified in GWAS to be associated with susceptibility to at least seven other cancers, we observed a marginally significant association with rs2853668 (replication p-value 0.03; combined p-value 1.9 × 10−4). Our study suggests a complex nature of the contribution of common genetic variants to risk for colorectal cancer.
The question of which statistical approach is the most effective for investigating gene-environment (G-E) interactions in the context of genome-wide association studies (GWAS) remains unresolved. By using 2 case-control GWAS (the Nurses’ Health Study, 1976–2006, and the Health Professionals Follow-up Study, 1986–2006) of type 2 diabetes, the authors compared 5 tests for interactions: standard logistic regression-based case-control; case-only; semiparametric maximum-likelihood estimation of an empirical-Bayes shrinkage estimator; and 2-stage tests. The authors also compared 2 joint tests of genetic main effects and G-E interaction. Elevated body mass index was the exposure of interest and was modeled as a binary trait to avoid an inflated type I error rate that the authors observed when the main effect of continuous body mass index was misspecified. Although both the case-only and the semiparametric maximum-likelihood estimation approaches assume that the tested markers are independent of exposure in the general population, the authors did not observe any evidence of inflated type I error for these tests in their studies with 2,199 cases and 3,044 controls. Both joint tests detected markers with known marginal effects. Loci with the most significant G-E interactions using the standard, empirical-Bayes, and 2-stage tests were strongly correlated with the exposure among controls. Study findings suggest that methods exploiting G-E independence can be efficient and valid options for investigating G-E interactions in GWAS.
case-control studies; case study; diabetes mellitus, type 2; epidemiologic methods; genome-wide association study; genotype-environment interaction
Aims/hypothesis: Genome-wide association studies have identified over 50 new genetic loci for type 2 diabetes (T2D). Several studies conclude that higher dietary heme iron intake increases the risk of T2D. Therefore we assessed whether the relation between genetic loci and T2D is modified by dietary heme iron intake.
Methods: We used Affymetrix Genome-Wide Human 6.0 array data [681,770 single nucleotide polymorphisms (SNPs)] and dietary information collected in the Health Professionals Follow-up Study (n = 725 cases; n = 1,273 controls) and the Nurses’ Health Study (n = 1,081 cases; n = 1,692 controls). We assessed whether genome-wide SNPs or iron metabolism SNPs interacted with dietary heme iron intake in relation to T2D, testing for associations in each cohort separately and then meta-analyzing to pool the results. Finally, we created 1,000 synthetic pathways matched to an iron metabolism pathway on number of genes, and number of SNPs in each gene. We compared the iron metabolic pathway SNPs with these synthetic SNP assemblies in their relation to T2D to assess if the pathway as a whole interacts with dietary heme iron intake.
Results: Using a genomic approach, we found no significant gene–environment interactions with dietary heme iron intake in relation to T2D at a Bonferroni corrected genome-wide significance level of 7.33 ×10-8 (top SNP in pooled analysis: intergenic rs10980508; p = 1.03 × 10-6). Furthermore, no SNP in the iron metabolic pathway significantly interacted with dietary heme iron intake at a Bonferroni corrected significance level of 2.10 × 10-4 (top SNP in pooled analysis: rs1805313; p = 1.14 × 10-3). Finally, neither the main genetic effects (pooled empirical p by SNP = 0.41), nor gene – dietary heme–iron interactions (pooled empirical p-value for the interactions = 0.72) were significant for the iron metabolic pathway as a whole.
Conclusions: We found no significant interactions between dietary heme iron intake and common SNPs in relation to T2D.
type 2 diabetes; gene environment interactions; dietary heme iron; pathway analysis
Early menopause (EM) affects up to 10% of the female population, reducing reproductive lifespan considerably. Currently, it constitutes the leading cause of infertility in the western world, affecting mainly those women who postpone their first pregnancy beyond the age of 30 years. The genetic aetiology of EM is largely unknown in the majority of cases. We have undertaken a meta-analysis of genome-wide association studies (GWASs) in 3493 EM cases and 13 598 controls from 10 independent studies. No novel genetic variants were discovered, but the 17 variants previously associated with normal age at natural menopause as a quantitative trait (QT) were also associated with EM and primary ovarian insufficiency (POI). Thus, EM has a genetic aetiology which overlaps variation in normal age at menopause and is at least partly explained by the additive effects of the same polygenic variants. The combined effect of the common variants captured by the single nucleotide polymorphism arrays was estimated to account for ∼30% of the variance in EM. The association between the combined 17 variants and the risk of EM was greater than the best validated non-genetic risk factor, smoking.
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
gene-gene interactions; gene-environment interactions; rare variants; next generation sequencing; complex phenotypes; simulations; computational resources
Including previously-genotyped controls in a genome-wide association study can provide cost-savings, but can also create design biases. When cases and controls are genotyped on different platforms, the imputation needed to provide genome-wide coverage will introduce differential measurement error and may lead to false positives. We compared genotype frequencies of two healthy control groups from the Nurses’ Health Study genotyped on different platforms (Affymetrix 6.0 [n=1,672] and Illumina HumanHap550 [n=1,038]). Using standard imputation quality filters, we observed 9,841 SNPs out of 2,347,809 (0.4%) significant at the 5 × 10−8 level. We explored three methods for controlling for this Type I error inflation. One method was to remove platform effects using principal components; another was to restrict to SNPs of highest quality imputation; and a third was to genotype some controls alongside cases to exclude SNPs that are statistical artifact. The first method could not reduce the Type I error rate; the other two could dramatically reduce the error rate, although both required that a portion of SNPs be excluded from analysis. Ideally, the biases we describe would be eliminated at the design stage, by genotyping sufficient numbers of cases and controls on each platform. Researchers using imputation to combine samples genotyped on different platforms with severely unbalanced case-control ratios should be aware of the potential for inflated Type I error rates and apply appropriate quality filters. Every SNP found with genome-wide significance should be validated on another platform to verify that its significance is not an artifact of study design.
Genome-wide association study; Imputation; GWAS quality control
A recent Genome-Wide Association Study (GWAS) of prostate cancer in a Japanese population identified five novel regions not previously discovered in other ethnicities. In this study, we attempt to replicate these five loci in a series of nested prostate cancer case-control studies of European ancestry.
We genotyped five SNPs: rs13385191 (chromosome 2p24), rs12653946 (5p15), rs1983891 (6p21), rs339331 (6p22) and rs9600079 (13q22), in 7,956 prostate cancer cases and 8,148 controls from a series of nested case-control studies within the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We tested each SNP for association with prostate cancer risk and assessed if associations differed with respect to disease severity and age of onset.
Four SNPs (rs13385191, rs12653946, rs1983891 and rs339331) were significantly associated with prostate cancer risk (p-values ranging from 0.01 to 1.1×10-5). Allele frequencies and odds ratios were overall lower in our population of European descent compared to the discovery Asian population. SNP rs13385191 (C2orf43) was only associated with low-stage disease (p=0.009, case-only test). No other SNP showed association with disease severity or age of onset. We did not replicate the 13q22 SNP, rs9600079 (p=0.62).
Four SNPs associated with prostate cancer risk in an Asian population are also associated with prostate cancer risk in men of European descent.
This study illustrates the importance of evaluation of prostate cancer risk markers across ethnic groups.