The HOXB13 gene has been implicated in prostate cancer (PrCa) susceptibility. We performed a high resolution fine-mapping analysis to comprehensively evaluate the association between common genetic variation across the HOXB genetic locus at 17q21 and PrCa risk. This involved genotyping 700 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of 3195 SNPs in 20,440 PrCa cases and 21,469 controls in The PRACTICAL consortium. We identified a cluster of highly correlated common variants situated within or closely upstream of HOXB13 that were significantly associated with PrCa risk, described by rs117576373 (OR 1.30, P = 2.62×10−14). Additional genotyping, conditional regression and haplotype analyses indicated that the newly identified common variants tag a rare, partially correlated coding variant in the HOXB13 gene (G84E, rs138213197), which has been identified recently as a moderate penetrance PrCa susceptibility allele. The potential for GWAS associations detected through common SNPs to be driven by rare causal variants with higher relative risks has long been proposed; however, to our knowledge this is the first experimental evidence for this phenomenon of synthetic association contributing to cancer susceptibility.
Genome-wide association studies (GWAS) have identified numerous low penetrance disease susceptibility variants, yet few causal alleles have been unambiguously identified. The underlying causal variants are expected to be predominantly common; however synthetic associations with rare, higher penetrance variants have been hypothesised though not yet observed. Here, we report detection of a novel common, low penetrance prostate cancer association at the HOXB locus at ch17q and show that this signal can actually be attributed to a previously identified rare, moderate penetrance coding variant (G84E) in HOXB13. This study therefore provides the first experimental evidence for the existence of synthetic associations in cancer and shows that where GWAS signals arise through this phenomenon, risk predictions derived using the tag SNP would substantially underestimate the relative risk conferred and overestimate the number of carriers of the causal variant. Synthetic associations at GWAS signals could therefore account for a proportion of the missing heritability of complex diseases.
Approximately 15–30% of all breast cancer tumors are estrogen receptor negative (ER−). Compared with ER-positive (ER+) disease they have an earlier age at onset and worse prognosis. Despite the vast number of risk variants identified for numerous cancer types, only seven loci have been unambiguously identified for ER-negative breast cancer. With the aim of identifying new susceptibility SNPs for this disease we performed a pleiotropic genome-wide association study (GWAS). We selected 3079 SNPs associated with a human complex trait or disease at genome-wide significance level (P<5×10−8) to perform a secondary analysis of an ER-negative GWAS from the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium (BPC3), including 1998 cases and 2305 controls from prospective studies. We then tested the top ten associations (i.e. with the lowest P-values) using three additional populations with a total sample size of 3509 ER+ cases, 2543 ER− cases and 7031 healthy controls. None of the 3079 selected variants in the BPC3 ER-GWAS were significant at the adjusted threshold. 186 variants were associated with ER− breast cancer risk at a conventional threshold of P<0.05, with P-values ranging from 0.049 to 2.3×10−4. None of the variants reached statistical significance in the replication phase. In conclusion, this study did not identify any novel susceptibility loci for ER-breast cancer using a “pleiotropic approach”.
Osteosarcoma is the most common primary bone malignancy of adolescents and young adults. In order to better understand the genetic etiology of osteosarcoma, we performed a multi-stage genome-wide association study (GWAS) consisting of 941 cases and 3,291 cancer-free adult controls of European ancestry. Two loci achieved genome-wide significance: rs1906953 at 6p21.3, in the glutamate receptor metabotropic 4 [GRM4] gene (P = 8.1 ×10-9), and rs7591996 and rs10208273 in a gene desert on 2p25.2 (P = 1.0 ×10-8 and 2.9 ×10-7). These two susceptibility loci warrant further exploration to uncover the biological mechanisms underlying susceptibility to osteosarcoma.
The adipocyte-secreted hormone adiponectin has insulin-sensitizing and anti-inflammatory properties. Although development of pancreatic cancer is associated with states of insulin resistance and chronic inflammation, the mechanistic basis of the associations is poorly understood.
To determine whether prediagnostic plasma levels of adiponectin are associated with risk of pancreatic cancer, we conducted a nested case–control study of 468 pancreatic cancer case subjects and 1080 matched control subjects from five prospective US cohorts: Health Professionals Follow-up Study, Nurses’ Health Study, Physicians’ Health Study, Women’s Health Initiative, and Women’s Health Study. Control subjects were matched to case subjects by prospective cohort, year of birth, smoking status, fasting status, and month of blood draw. All samples for plasma adiponectin were handled identically in a single batch. Odds ratios were calculated with conditional logistic regression, and linearity of the association between adiponectin and pancreatic cancer was modeled with restricted cubic spline regression. All statistical tests were two-sided.
Median plasma adiponectin was lower in case subjects versus control subjects (6.2 vs 6.8 µg/mL, P = .009). Plasma adiponectin was inversely associated with pancreatic cancer risk, which was consistent across the five prospective cohorts (P
heterogeneity = .49) and independent of other markers of insulin resistance (eg, diabetes, body mass index, physical activity, plasma C-peptide). Compared with the lowest quintile of adiponectin, individuals in quintiles 2 to 5 had multivariable odds ratios ([ORs] 95% confidence intervals [CIs]) of OR = 0.61 (95% CI = 0.43 to 0.86), OR = 0.58 (95% CI = 0.41 to 0.84), OR = 0.59 (95% CI = 0.40 to 0.87), and OR = 0.66 (95% CI = 0.44 to 0.97), respectively (P
trend = .04). Restricted cubic spline regression confirmed a nonlinear association (P
nonlinearity < .01). The association was not modified by sex, smoking, body mass index, physical activity, or C-peptide (all P
interaction > .10).
In this pooled analysis, low prediagnostic levels of circulating adiponectin were associated with an elevated risk of pancreatic cancer.
The primary circulating form of vitamin D is 25-hydroxy-vitamin D (25(OH)D), a modifiable trait linked with a growing number of chronic diseases. In addition to environmental determinants of 25(OH)D, including dietary sources and skin ultraviolet B (UVB) exposure, twin and family-based studies suggest that genetics contribute substantially to vitamin D variability with heritability estimates ranging from 43% to 80%. Genome-wide association studies (GWAS) have identified SNPs located in four gene regions associated with 25(OH)D. These SNPs collectively explain only a fraction of the heritability in 25(OH)D estimated by twin and family based studies. Using 25(OH)D concentrations and GWAS data on 5,575 subjects drawn from 5 cohorts, we hypothesized that genome-wide data, in the form of (1) a polygenic score comprised of hundreds or thousands of SNPs that do not individually reach GWAS significance, or (2) a linear-mixed-model for genome-wide complex trait analysis, would explain variance in measured circulating 25(OH)D beyond that explained by known genome-wide significant 25(OH)D associated SNPs. GWAS identified SNPs explained 5.2% of the variation in circulating 25(OH)D in these samples and there was little evidence additional markers significantly improved predictive ability. On average a polygenic score comprised of GWAS identified SNPs explained a larger proportion of variation in circulating 25(OH)D than scores comprised of thousands of SNPs which were on average, non-significant. Employing a linear-mixed-model for genome-wide complex trait analysis explained little additional variability (range 0-22%). The absence of a significant polygenic effect in this relatively large sample suggests an oligogenetic architecture for 25(OH)D.
vitamin D; heritability; genome wide association; polygenic score
Diabetes is a suspected risk factor for pancreatic cancer, but questions remain about whether it is a risk factor or a result of the disease. This study prospectively examined the association between diabetes and the risk of pancreatic adenocarcinoma in pooled data from the NCI pancreatic cancer cohort consortium (PanScan).
The pooled data included 1,621 pancreatic adenocarcinoma cases and 1,719 matched controls from twelve cohorts using a nested case–control study design. Subjects who were diagnosed with diabetes near the time (<2 years) of pancreatic cancer diagnosis were excluded from all analyses. All analyses were adjusted for age, race, gender, study, alcohol use, smoking, BMI, and family history of pancreatic cancer.
Self-reported diabetes was associated with a forty percent increased risk of pancreatic cancer (OR = 1.40, 95 % CI: 1.07, 1.84). The association differed by duration of diabetes; risk was highest for those with a duration of 2–8 years (OR = 1.79, 95 % CI: 1.25, 2.55); there was no association for those with 9+ years of diabetes (OR = 1.02, 95 % CI: 0.68, 1.52).
These findings provide support for a relationship between diabetes and pancreatic cancer risk. The absence of association in those with the longest duration of diabetes may reflect hypoinsulinemia and warrants further investigation.
Diabetes; Risk factor; Cohort consortium; Pancreatic cancer
Bony metastases cause substantial morbidity and mortality from prostate cancer (PCa). The calcium sensing receptor (CaSR) is expressed on prostate tumors and may participate in bone metastases development. We assessed whether 1) common genetic variation in CaSR was associated with PCa risk and 2) these associations varied by calcium intake or plasma 25-hydroxyvitamin D (25(OH)D) levels.
We included 1193 PCa cases and 1244 controls nested in the prospective Health Professionals Follow-up Study (1993-2004). We genotyped 18 CaSR SNPs to capture common variation. The main outcome was risk of lethal PCa (n=113); secondary outcomes were overall (n=1193) and high-grade PCa (n=225). We used the kernel machine approach to conduct a gene-level multi-marker analysis and unconditional logistic regression to compute per-allele odds ratios (OR) and 95% confidence intervals (CI) for individual SNPs.
The joint association of SNPs in CaSR was significant for lethal PCa (p=0.04); this association was stronger in those with low 25(OH)D (p=0.009). No individual SNPs were associated after considering multiple testing; 3 SNPs were nominally associated (p<0.05) with lethal PCa with ORs (95% CI) of 0.65(0.42-0.99): rs6438705; 0.65(0.47-0.89)): rs13083990; and 1.55(1.09-2.20): rs2270916. The 3 non-synonymous SNPs (rs1801725, rs1042636, rs1801726) were not significantly associated; however, the association for rs1801725 was stronger in men with low 25(OH)D (OR(95%CI): 0.54(0.31-0.95)). There were no significant associations with overall or high-grade PCa.
Our findings indicate that CaSR may be involved in PCa progression.
Further studies investigating potential mechanisms for CaSR and PCa, including bone remodeling and metastases are warranted.
molecular and genetic epidemiology; prostate cancer; vitamin D, calcium; CaSR
In previous studies, we observed a positive association between Trichomonas vaginalis serostatus and risk of prostate cancer, particularly aggressive cancer, which we hypothesized might be due to T. vaginalis-mediated intraprostatic inflammation and cell damage. To explore this hypothesis further, we investigated effect modification by Toll-like receptor 4 (TLR4) variation on this association. We hypothesized that TLR4 variation might serve a marker of the anti-trichomonad immune response because T. vaginalis has been shown to elicit inflammation through this receptor.
We previously genotyped the non-synonymous TLR4 single nucleotide polymorphism (SNP), rs4986790, and determined T. vaginalis serostatus for 690 incident prostate cancer cases and 692 controls in a nested case-control study within the Health Professionals Follow-up Study.
A non-significant suggestion of effect modification was observed by rs4986790 carrier status on the association between T. vaginalis serostatus and prostate cancer risk (p-interaction=0.07). While no association was observed among men homozygous wildtype for this SNP (odds ratio (OR)=1.23, 95% confidence interval (CI): 0.86–1.77), a positive association was observed among variant carriers (OR=4.16, 95% CI: 1.32–13.1).
Although not statistically significant, TLR4 variation appeared to influence the association between T. vaginalis serostatus and prostate cancer risk consistent with the hypothesis that inflammation plays a role in this association. Larger studies will be necessary to explore this possible effect modification further.
Toll-like receptor 4; T. vaginalis; prostate cancer; SNP; aspirin
Survival of patients with pancreatic adenocarcinoma is limited and few prognostic factors are known. We conducted a two-stage genome-wide association study (GWAS) to identify germline variants associated with survival in patients with pancreatic adenocarcinoma.
We analyzed overall survival in relation to single nucleotide polymorphisms (SNPs) among 1,005 patients from two large GWAS datasets, PanScan I and ChinaPC. Cox proportional hazards regression was used in an additive genetic model with adjustment for age, sex, clinical stage and the top four principal components of population stratification. The first stage included 642 cases of European ancestry (PanScan), from which the top SNPs (P≤10−5) were advanced to a joint analysis with 363 additional patients from China (ChinaPC).
In the first stage of cases of European descent, the top-ranked loci were at chromosomes 11p15.4, 18p11.21, and 1p36.13, tagged by rs12362504 (P=1.63×10−7), rs981621 (P=1.65×10−7), and rs16861827 (P=3.75×10−7), respectively. One-hundred thirty-one SNPs with P ≤ 10−5 were advanced to a joint analysis with cases from the ChinaPC study. In the joint analysis, the top-ranked SNP was rs10500715 (minor allele frequency, 0.37; P=1.72×10−7) on chromosome 11p15.4, which is intronic to the SET binding factor 2 (SBF2) gene. The hazard ratio (95% CI) for death was 0.74 (0.66–0.84) in PanScan I, 0.79 (0.65–0.97) in ChinaPC, and 0.76 (0.68–0.84) in the joint analysis.
Germline genetic variation in the SBF2 locus was associated with overall survival in patients with pancreatic adenocarcinoma of European and Asian ancestry. This association should be investigated in additional large patient cohorts.
Pancreatic cancer; GWAS; single nucleotide polymorphism; SET binding factor 2
Observational studies have found an inverse association between type 2 diabetes (T2D) and prostate cancer (PCa), and genome-wide association studies have found common variants near 3 loci associated with both diseases. The authors examined whether a genetic background that favors T2D is associated with risk of advanced PCa. Data from the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium, a genome-wide association study of 2,782 advanced PCa cases and 4,458 controls, were used to evaluate whether individual single nucleotide polymorphisms or aggregations of these 36 T2D susceptibility loci are associated with PCa. Ten T2D markers near 9 loci (NOTCH2, ADCY5, JAZF1, CDKN2A/B, TCF7L2, KCNQ1, MTNR1B, FTO, and HNF1B) were nominally associated with PCa (P < 0.05); the association for single nucleotide polymorphism rs757210 at the HNF1B locus was significant when multiple comparisons were accounted for (adjusted P = 0.001). Genetic risk scores weighted by the T2D log odds ratio and multilocus kernel tests also indicated a significant relation between T2D variants and PCa risk. A mediation analysis of 9,065 PCa cases and 9,526 controls failed to produce evidence that diabetes mediates the association of the HNF1B locus with PCa risk. These data suggest a shared genetic component between T2D and PCa and add to the evidence for an interrelation between these diseases.
carcinoma; diabetes mellitus, type 2; genetic predisposition to disease; genetics; genome-wide association study; humans; polymorphism, single nucleotide; prostatic neoplasms
Rotating night shift work is associated with increased risk of breast cancer, likely via circadian disruption. We hypothesized that circadian pathway genes influence breast cancer risk, particularly in rotating night shift workers. We selected 178 common variants across 15 genes pertinent to the circadian system. Using a mixed candidate- and tag-single nucleotide polymorphism approach, we tested for associations between these variants and breast cancer risk in 1,825 women within the Nurses’ Health Study II cohort and investigated potential interactions between genotype and rotating shift-work in a subset of 1,318 women. Multiple-testing-adjusted p-values were obtained by permutation (n=10,000). None of the selected variants was significantly associated with breast cancer risk. However, when accounting for potential effect modification, rs23051560 (Ala394Thr) in the largest circadian gene, Neuronal PAS domain protein 2 (NPAS2) was most strongly associated with breast cancer risk (nominal test for interaction p-value=0.0005; 10,000-permutation-based main-effects p-value among women with <24 months of shift-work=0.003). The observed multiplicative association with breast cancer risk per minor allele (A) was 0.65 (95%CI=0.51–0.82) among women with <24 months of shift-work, and 1.19 (95%CI=0.93–1.54) with ≥24 months of shift-work. Women homozygous for the minor allele (AA) with ≥24 months of shift-work had a 2.83-times higher breast cancer risk compared to homozygous AA women with <24 months of shift-work (95%CI=1.47–5.56).
In smmary, common variation in circadian genes plays at most a small role in breast cancer risk among women of European ancestry. The impact of NPAS2 Ala394Thr in the presence of rotating shift-work requires further investigation.
circadian genes; breast cancer; rotating shift work; night work
Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the international PRACTICAL Consortium. Twenty-three new prostate cancer susceptibility loci were identified at genome-wide significance (P < 5 × 10−8). More than 70 prostate cancer susceptibility loci, explaining ~30% of the familial risk for this disease, have now been identified. On the basis of combined risks conferred by the new and previously known risk loci, the top 1% of the risk distribution has a 4.7-fold higher risk than the average of the population being profiled. These results will facilitate population risk stratification for clinical studies.
Genome-wide association studies have identified novel type 2 diabetes loci, each of which has a modest impact on risk.
To examine the joint effects of several type 2 diabetes risk variants and their combination with conventional risk factors on type 2 diabetes risk in 2 prospective cohorts.
Nested case–control study.
2809 patients with type 2 diabetes and 3501 healthy control participants of European ancestry from the Health Professionals Follow-up Study and Nurses’ Health Study.
A genetic risk score (GRS) was calculated on the basis of 10 polymorphisms in 9 loci.
After adjustment for age and body mass index (BMI), the odds ratio for type 2 diabetes with each point of GRS, corresponding to 1 risk allele, was 1.19 (95% CI, 1.14 to 1.24) and 1.16 (CI, 1.12 to 1.20) for men and women, respectively. Persons with a BMI of 30 kg/m2 or greater and a GRS in the highest quintile had an odds ratio of 14.06 (CI, 8.90 to 22.18) compared with persons with a BMI less than 25 kg/m2 and a GRS in the lowest quintile after adjustment for age and sex. Persons with a positive family history of diabetes and a GRS in the highest quintile had an odds ratio of 9.20 (CI, 5.50 to 15.40) compared with persons without a family history of diabetes and with a GRS in the lowest quintile. The addition of the GRS to a model of conventional risk factors improved discrimination by 1% (P < 0.001).
The study focused only on persons of European ancestry; whether GRS is associated with type 2 diabetes in other ethnic groups remains unknown.
Although its discriminatory value is currently limited, a GRS that combines information from multiple genetic variants might be useful for identifying subgroups with a particularly high risk for type 2 diabetes.
There is increasing interest in adding common genetic variants
identified through genome wide association studies (GWAS) to breast cancer
risk prediction models. First results from such models showed modest
benefits in terms of risk discrimination. Heterogeneity of breast cancer as
defined by hormone-receptor status has not been considered in this context.
In this study we investigated the predictive capacity of 32 GWAS-detected
common variants for breast cancer risk, alone and in combination with
classical risk factors, and for tumors with different hormone receptor
Material and Methods
Within the Breast and Prostate Cancer Cohort Consortium (BPC3), we
analyzed 6009 invasive breast cancer cases and 7827 matched controls of
European ancestry, with data on classical breast cancer risk factors and 32
common gene variants identified through GWAS. Discriminatory ability with
respect to breast cancer of specific hormone receptor-status was assessed
with the age- and cohort-adjusted concordance statistic
(AUROCa). Absolute risk scores were
calculated with external reference data. Integrated discrimination
improvement (IDI) was used to measure improvements in risk prediction.
We found a small but steady increase in discriminatory ability with
increasing numbers of genetic variants included in the model (difference in
AUROCa going from 2.7 to 4%). Discriminatory ability
for all models varied strongly by hormone receptor status
Discussion and Conclusion
Adding information on common polymorphisms provides small but
statistically significant improvements in the quality of breast cancer risk
prediction models. We consistently observed better performance for receptor
positive cases, but the gain in discriminatory quality is not sufficient for
breast cancer; risk prediction; genetic factors; hormone receptor status
Heritability, the fraction of phenotypic variation explained by genetic variation, has been estimated for many phenotypes in a range of populations, organisms, and time points. The recent development of efficient genotyping and sequencing technology has led researchers to attempt to identify the genetic variants responsible for the genetic component of phenotype directly via GWAS. The gap between the phenotypic variance explained by GWAS results and those estimated by from classical heritability methods has been termed the “missing heritability problem”. In this work, we examine modern methods for estimating heritability, which use the genotype and sequence data directly. We discuss them in the context of classical heritability methods, the missing heritability problem, and describe their implications for understanding the genetic architecture of complex phentoypes.
heritability; genome-wide association study; prediction; polygenic models; linear mixed models
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before.
In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Genome-wide association studies; gene-environment interaction; post-GWAS analysis; association tests; exploratory methods
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
As next-generation sequencing (NGS) costs continue to fall and genome-wide association study (GWAS) platform coverage improves, the human genetics community is positioned to identify potentially causal variants. However, current NGS or imputation-based studies of either the whole genome or regions previously identified by GWAS have not yet been very successful in identifying causal variants. A major hurdle is the development of methods to distinguish disease-causing variants from their highly-correlated proxies within an associated region. We show that various common factors, such as differential sequencing or imputation accuracy rates and linkage disequilibrium patterns, with or without GWAS-informed region selection, can substantially decrease the probability of identifying the correct causal SNP, often by more than half. We then describe a novel and easy-to-implement re-ranking procedure that can double the probability that the causal SNP is top-ranked in many settings. Application to the NCI Breast and Prostate Cancer (BPC3) Cohort Consortium aggressive prostate cancer data identified new top SNPs within two associated loci previously established via GWAS, as well as several additional possible causal SNPs that had been previously overlooked.
Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature.
Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants.
Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/
Supplementary information: Supplementary data are available at Bioinformatics online.
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.
Phenotype is a function of a genome and its environment. Heritability is the fraction of variation in a phenotype determined by genetic factors in a population. Current methods to estimate heritability rely on the phenotypic correlations of closely related individuals and are potentially upwardly biased, due to the impact of epistasis and shared environment. We develop new methods to estimate heritability over both closely and distantly related individuals. By examining the phenotypic correlation among different types of related individuals such as siblings, half-siblings, and first cousins, we show that shared environment is the primary determinant of inflated estimates of heritability. For a large number of phenotypes, it is not known how much of the heritability is explained by SNPs included on current genotyping platforms. Existing methods to estimate this component of heritability are biased in the presence of related individuals. We develop a method that permits the inclusion of both closely and distantly related individuals when estimating heritability explained by genotyped SNPs and use it to make estimates for 23 medically relevant phenotypes. These estimates can be used to increase our understanding of the distribution and frequency of functionally relevant variants and thereby inform the design of future studies.
ABO blood type has been associated with risk and survival for several malignancies; however, data for an association with breast cancer are inconsistent. Our study population consisted of Nurses’ Health Study participants with self-reported serologic blood type and/or ABO genotype. Using Cox proportional hazards regression, we examined the association between serologic blood type and incident breast cancer among 67,697 women, including 3,107 cases. In addition, we examined the association with ABO genotype in a nested case-control study of 1,138 invasive breast cancer cases and 1,090 matched controls. Finally, we evaluated the association between serologic blood type and survival among 2,036 participants with breast cancer. No clear association was seen between serologic blood type or ABO genotype and risk of total breast cancer, invasive breast cancer, or breast cancer subtypes. Compared to women with blood type O, the age-adjusted incidence rate ratios for serologic blood type and total breast cancer were 1.06 (95% CI, 0.98–1.15) for type A, 1.06 (95% CI, 0.93–1.22) for AB, and 1.08 (95% CI, 0.96–1.20) for B. In genetic analyses, odds ratios for invasive breast cancer were 1.05 (95% CI, 0.87–1.27) for A/O, 1.21 (95% CI, 0.86–1.69) for A/A, 0.84 (95% CI, 0.56–1.26) for A/B, 0.84 (95% CI, 0.63–1.13) for B/O, and 1.17 (95% CI, 0.35–3.86) for B/B, compared to O/O. No significant association was noted between blood type and overall or breast cancer-specific mortality. Our results suggest no association between ABO blood group and breast cancer risk or survival.
ABO blood group; ABO genotype; blood type; breast cancer; survival
The association of vitamin D status with prostate cancer is controversial; no association has been observed for overall incidence, but there is a potential link with lethal disease.
We assessed prediagnostic 25-hydroxyvitamin D [25(OH)D] levels in plasma, variation in vitamin D–related genes, and risk of lethal prostate cancer using a prospective case–control study nested within the Health Professionals Follow-up Study. We included 1260 men who were diagnosed with prostate cancer after providing a blood sample in 1993–1995 and 1331 control subjects. Men with prostate cancer were followed through March 2011 for lethal outcomes (n = 114). We selected 97 single-nucleotide polymorphisms (SNPs) in genomic regions with high linkage disequilibrium (tagSNPs) to represent common genetic variation among seven vitamin D–related genes (CYP27A1, CYP2R1, CYP27B1, GC, CYP24A1, RXRA, and VDR). We used a logistic kernel machine test to assess whether multimarker SNP sets in seven vitamin D pathway–related genes were collectively associated with prostate cancer. Tests for statistical significance were two-sided.
Higher 25(OH)D levels were associated with a 57% reduction in the risk of lethal prostate cancer (highest vs lowest quartile: odds ratio = 0.43, 95% confidence interval = 0.24 to 0.76). This finding did not vary by time from blood collection to diagnosis. We found no statistically significant association of plasma 25(OH)D levels with overall prostate cancer. Pathway analyses found that the set of SNPs that included all seven genes (P = .008) as well as sets of SNPs that included VDR (P = .01) and CYP27A1 (P = .02) were associated with risk of lethal prostate cancer.
In this prospective study, plasma 25(OH)D levels and common variation among several vitamin D–related genes were associated with lethal prostate cancer risk, suggesting that vitamin D is relevant for lethal prostate cancer.
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. There is limited information on the mechanistic basis of these associations, particularly about whether they interact with circulating concentrations of growth factors and sex hormones, which may be important in prostate cancer etiology. Using conditional logistic regression, the authors compared per-allele odds ratios for prostate cancer for 39 GWAS-identified SNPs across thirds (tertile groups) of circulating concentrations of insulin-like growth factor 1 (IGF-1), insulin-like growth factor binding protein 3 (IGFBP-3), testosterone, androstenedione, androstanediol glucuronide, estradiol, and sex hormone-binding globulin (SHBG) for 3,043 cases and 3,478 controls in the Breast and Prostate Cancer Cohort Consortium. After allowing for multiple testing, none of the SNPs examined were significantly associated with growth factor or hormone concentrations, and the SNP-prostate cancer associations did not differ by these concentrations, although 4 interactions were marginally significant (MSMB-rs10993994 with androstenedione (uncorrected P = 0.008); CTBP2-rs4962416 with IGFBP-3 (uncorrected P = 0.003); 11q13.2-rs12418451 with IGF-1 (uncorrected P = 0.006); and 11q13.2-rs10896449 with SHBG (uncorrected P = 0.005)). The authors found no strong evidence that associations between GWAS-identified SNPs and prostate cancer are modified by circulating concentrations of IGF-1, sex hormones, or their major binding proteins.
gene-environment interaction; gonadal steroid hormones; insulin-like growth factor binding protein 3; insulin-like growth factor I; molecular epidemiology; prostatic neoplasms
Body mass is inversely related to breast cancer risk among premenopausal women. Leptin, an essential cytokine regulating food intake, energy expenditure, glucose, and fat metabolism may be part of the mechanistic pathway. We investigated 50 tagging and candidate SNPs in the leptin (LEP) and leptin receptor (LEPR) genes for associations with premenopausal breast cancer incidence using 405 cases and 810 controls nested within the Nurses’ Health Study II. We also examined associations between these SNPs and circulating leptin (among 910 women) and breast cancer grade (among 267 patients). Permutation tests were performed to adjust for multiple testing. We did not detect a significant association between SNPs in the LEP or LEPR gene and either breast cancer incidence or plasma leptin levels. Among cases, 14 SNPs of the LEPR gene were significantly associated with cancer grade, and rs1137101 (Q223R) survived multiple testing adjustment (adjusted P = 0.04). The G carriers of rs1137101 were more likely to have poorly differentiated than well-differentiated cancers. Our data suggest that common genetic variation in the LEP or LEPR gene has no strong association with premenopausal breast cancer risk. The LEPR gene might be associated with breast cancer grade.
Premenopausal; Breast cancer; Leptin; Leptin receptor