|Home | About | Journals | Submit | Contact Us | Français|
There is extensive evidence that increases in blood and tissue concentrations of steroid hormones and of insulin-like growth factor I (IGF-I) are associated with breast cancer risk. However, studies of common variation in genes involved in steroid hormone and IGF-I metabolism have yet to provide convincing evidence that such variants predict breast cancer risk. The Breast and Prostate Cancer Cohort Consortium (BPC3) is a collaboration of large US and European cohorts. We genotyped 1416 tagging single nucleotide polymorphisms (SNPs) in 37 steroid hormone metabolism genes and 24 IGF-I pathway genes in 6292 cases of breast cancer and 8135 controls, mostly Caucasian, postmenopausal women from the BPC3. We also imputed 3921 additional SNPs in the regions of interest. None of the SNPs tested was significantly associated with breast cancer risk, after correction for multiple comparisons. The results remained null when cases and controls were stratified by age at diagnosis/recruitment, advanced or nonadvanced disease, body mass index, with or without in situ cases; or restricted to Caucasians. Among 770 estrogen receptor-negative cases, an SNP located 3′ of growth hormone receptor (GHR) was marginally associated with increased risk after correction for multiple testing (Ptrend = 1.5 × 10−4). We found no significant overall associations between breast cancer and common germline variation in 61 genes involved in steroid hormone and IGF-I metabolism in this large, comprehensive study. Although previous studies have shown that variations in these genes can influence endogenous hormone levels, the magnitude of the effect of single SNPs does not appear to be sufficient to alter breast cancer risk.
Extensive evidence from in vitro, animal and epidemiologic studies indicate that increases in blood and tissue concentrations of endogenous steroid hormones and insulin-like growth factor I (IGF-I) favor the development of breast cancer.
Bittner in 1947 was the first to propose, based on mouse mammary tumor models, that endogenous steroid hormones were a cause of breast cancer (1). Epidemiologic support for the hypothesis derived from the observations that most of the established risk factors for breast cancer influence endogenous hormone production (2,3). It is now well established that elevated circulating and urinary levels of estrogens and androgens predict increased risk of breast cancer in postmenopausal women (4–9). Some epidemiologic studies also support a role for endogenous estrogens and androgens in premenopausal breast cancer (10), and estrogen produced between puberty and first full-term pregnancy has long been proposed as a critical determinant of lifetime cancer risk (2,11). More recently, it has been reported that prolactin levels may also influence breast cancer risk (12–14), although these findings are much less conclusive than those for estrogens and androgens. Estrogens stimulate cell division in the breast (3) and can be converted to genotoxic and mutagenic metabolites (15). Androgens have been hypothesized to modify breast cancer risk directly, by stimulating or inhibiting the proliferation of breast cancer cells, or indirectly, by conversion to estrogens (8). Postmenopausal steroid hormone therapy significantly increases the incidence of breast cancer (16,17), whereas anti-estrogen agents that competitively bind to the estrogen receptor (ER), such as tamoxifen, can reduce the primary occurrence and spread of breast tumors (15).
IGF-I metabolism is central to the regulation of anabolic (growth) processes as a function of available energy and elementary substrates such as amino acids. IGF-I stimulates signal transduction pathways that enhance anabolic processes, inhibits apoptosis and stimulates cell proliferation in a wide variety of cell types, including breast epithelium, and enhances tumor growth (18–22). Prospective epidemiological studies have shown higher breast cancer risk in women with elevated serum IGF-I concentrations, measured either as absolute concentrations or relative to the levels of IGFBP-3, the main IGF-I-binding protein in plasma, although not all studies have observed this association (23–28). Recently, a pooled analysis of individual data from 17 prospective studies demonstrated a modest, positive association between circulating IGF-I and breast cancer risk in both pre- and postmenopausal women (29).
Heritability studies based on twins and families suggest that a sizable fraction of between-subject variance in circulating levels of some steroid hormones and IGF-I is under the influence of genetic determinants and familial environmental and lifestyle factors, with the magnitude of the effect ranging from 39 to 65% (30–37).
Over the past decade, the search for candidate susceptibility genes associated with breast cancer has logically focused on polymorphisms in genes known to be relevant for the synthesis, bioavailability and metabolism of steroid hormones and IGF-I. Although several common polymorphisms have been consistently associated with circulating hormone levels, none have been convincingly associated with breast cancer risk directly (38–49). Few reported associations have been replicated because most candidate gene studies, especially those published in the 1990s, were underpowered to detect modest associations and did not comprehensively capture variation across the entire gene region.
To address these limitations of earlier studies, the Breast and Prostate Cancer Cohort Consortium (BPC3) was formed to bring together large, well-established cohorts assembled in the USA and Europe that had DNA for genotyping, extensive baseline questionnaire data from cohort participants, and effective ascertainment of cancer outcomes during follow-up (50). Here, we present the associations with breast cancer risk of 1416 tagging polymorphisms in 37 steroid hormone metabolism and 24 IGF-I pathway genes in 6292 cases of breast cancer and 8135 controls from BPC3.
After exclusions, the analyses include 6292 breast cancer cases and 8135 controls from six cohorts. Table 1 shows summary characteristics of the study population. Cases and controls were predominantly Caucasian (78%) and postmenopausal at the time of enrollment/DNA collection (79%). The mean age at diagnosis was 63 [ranging from 57.1 years for European Prospective Investigation into Cancer and Nutrition (EPIC) to 70.1 years for Cancer Prevention Study II (CPS-II)].
The percentage of SNPs selected for genotyping in cancer cases and controls that were successfully genotyped (based on the filtering methods described in the Materials and Methods section) ranged between 89.0 and 93.5%, calculated by cohort. Most failures were due to low call rates (<75% in a given cohort). Fewer than 1% of the SNPs were excluded from further analysis because of departure from Hardy–Weinberg equilibrium (P < 10−5). Allele frequencies were similar across the cohorts; only two SNPs showed Fst values higher than 2%. A total of 403 samples (2.72% of those where genotyping was attempted) were excluded due to call rate <75%. See Supplementary Material, Table S2 for more detail. A total of 1416 SNPs in the 61 candidate genes were successfully genotyped and used for statistical analyses.
We calculated Meff values for each candidate gene separately and for the whole study (by adding the individual gene Meff values). The study-wide Meff was 1065. We therefore used a study-wide significance P-threshold of 0.05/1065 = 4.7 × 10−5.
We observed no associations (Ptrend < 4.7 × 10−5 or P2df < 4.7 × 10−5) between any of the polymorphisms genotyped and overall breast cancer risk (Ptrend 0.003–0.996; P2df 0.001–0.999; Figs 1 and and2),2), nor did the distribution of observed P-values differ from that expected (Fig. 3). Tables 2 (steroid hormone-related genes) and and33 (IGF-I-related genes) show the minimum observed P-values of analyses performed on all subjects with invasive breast cancer. The results remained null when the sample was restricted to Caucasian women, or when analyses were stratified by: (a) age at diagnosis/recruitment (>55 years, ≤55 years); (b) advanced or nonadvanced disease (advanced disease being defined as regional or distant metastasis); (c) with or without inclusion of in situ cases (Supplementary Material, Tables S3 and S4).
When analyses were stratified by body mass index (BMI; <25, 25–30, ≥30 kg/m2), the test for heterogeneity for the trend test was of borderline statistical significance for one SNP, rs6744967 in LHCGR (P for heterogeneity = 0.0006). The stratified results suggested that the SNP was inversely associated with breast cancer among women with a BMI of >30. In this stratum, we observed odds ratio (OR) = 0.73, 99% CI: 0.57–0.93 for T/C versus T/T, and OR=0.63, 99% CI: 0.47–0.84 for C/C versus T/T (Supplementary Material, Table S4).
Data on ER status were available for 4193 cases (66.6% of the total). Of these, 3423 (81.6%) were ER positive and 770 (18.4%) were ER negative. Among the ER-negative cases, the minor allele of SNP rs719756 within GHR was associated with increased risk, and the association approached a statistical significance when taking into account multiple testing (Ptrend = 1.5 × 10−4). A test for heterogeneity for SNP rs719756 showed that the estimates of association in the ER-negative and -positive strata are different (P = 0.002). Among SNPs in steroid hormone-related genes, rs1543403 in ESR1 showed a differential association between ER-positive and -negative breast cancer cases [no association in the ER-positive stratum, in the ER-negative stratum Ptrend = 5.7 × 10−4; OR = 1.20 (0.90–1.60) for one copy of the allele and OR = 1.54 (1.11–2.12) for two copies]. No statistically significant associations were observed with ER-positive cases after correction for multiple testing. Detailed results are reported in Supplementary Material, Table S5.
We successfully imputed 3921 SNPs located in the gene regions from which we selected tagging single nucleotide polymorphisms (tagSNPs) (corresponding to 39.5% of SNPs present in database of single nucleotide polymorphisms in the same regions) for which the estimated correlation with genotyped SNPs was higher than 30%. However, none of these imputed SNPs were significantly associated with breast cancer risk when corrected for multiple testing (using a threshold of P = 4.7 × 10−5, Supplementary Material, Table S6).
Of all 40 049 tests of statistical models including pairwise interactions tested, 43 showed a P-value of <10−3, with minimum observed P-values of 2.8 × 10−5 between SNPs rs7605927 and rs2164808 of POMC and 5.3 × 10−5 between SNPs rs1521479 and rs8038056 of IGF1R.
In order to further explore the possibility of association between breast cancer risk and SNPs in this gene, we reconstructed haplotypes in a region of IGF1R characterized by high linkage disequilibrium (LD) which encompasses rs1521479 and rs8038056. Statistical analysis of haplotypes did not show any significant association with breast cancer risk. Since none of the 12 genotyped SNPs in POMC approached statistical significance in the main effect models (Table 2 and Fig. 1), we attribute this finding to chance.
To date, our study provides the most comprehensive analysis of breast cancer risk in relation to common variation in genes associated with synthesis, bioavailability and metabolism of steroid hormones and IGF-I, both in terms of coverage of genes belonging to the two pathways and of SNPs belonging to each gene. In contrast to genome-wide association studies (GWAS), our approach included systematic polymorphism discovery through deep resequencing in and around candidate genes, imputation across all gene regions genotyped in the study, and examination of pairwise interactions for all genotyped SNPs. Our study provides evidence refuting the hypothesis that common genetic polymorphisms in the two pathways substantially affect breast cancer risk. The absence of an association with common variants is striking given the importance of circulating steroid hormone (4–9) and IGF-I levels (23–27,29), as well as postmenopausal steroid hormone use (16,17) in predicting breast cancer risk. Nevertheless, we believe that these null findings are convincing, given the size and comprehensive approach of our study, which included intensive SNP tagging and mathematical imputation of SNP variants identified through HapMap that were not directly genotyped.
Our study, including over 6000 breast cancer cases and 8000 controls, had >80% power to detect mono-allelic (‘main effect’) associations between variants in any of the genes studied [minor allele frequency (MAF) ≥ 0.05, OR ≥ 1.25; MAF ≥ 0.1, OR ≥ 1.2; or MAF ≥ 0.2, OR ≥ 1.15] and breast cancer risk at P < 0.0001. It is worth noting that the risks detectable in our study are of the same magnitude as those typically found in GWAS.
We did observe some significant P-values at conventional levels of statistical significance (e.g. P < 0.05, P < 0.01) that would have been classified as ‘positive’ in a single candidate gene study. We adopted a threshold of 4.7 × 10−5, to minimize false positives among the large number of SNPs tested. At this level of significance, no single SNP was associated with breast cancer risk; however, we cannot exclude that some of the SNP variants with small P-values are indeed associated with breast cancer risk at approximately the ORs we observed but that adjustment for the large number of SNPs tested obscured these associations. On balance, we prefer to be conservative and limit the chances of declaring false positives. We maintain that a major strength of our study is that with a coordinated approach and large sample size, the ORs and confidence intervals we observed substantially minimize the probability that we have missed common alleles with large effect sizes (e.g. OR ≥ 1.5) (51).
Our results do not support an association between breast cancer and a single SNP (rs3020314) in ESR1, which was recently reported from an analysis of >50 000 breast cancer cases among women of European descent in the Breast Cancer Association Consortium (BCAC) (52). In that study, carriers of the minor allele of rs3020314 had an OR of 1.05 (P = 0.004). The study by Dunning et al. included a subset of the NHS subjects used for the present work (955 cases and 1631 controls). In the BPC3, we did not genotype rs3020314. However, we genotyped two SNPs [rs3020377 (P for trend = 0.56) and rs3020394 (P for trend = 0.90)] that tag rs3020314, as well as 129 other SNPs in ESR1. Two of our SNPs in ESR1 (rs1361024 and rs1514347) have P-values for trend of 0.01. Six of our ESR1 SNPs have P-values for trend between 0.0004 and 0.001 among our advanced breast cancer cases, but the trends all suggest inverse associations for the minor allele. In our studies, ER status was known for a large fraction of the cases, which enabled us to analyze our data stratifying by ER status. Among women with ER-negative tumors, ESR1 rs1543403 was associated with a borderline significantly increased risk (Ptrend = 5.7 × 10−4). Since the BCAC includes most of the existing case–control studies of breast cancer, it may be difficult to find another independent set of data that can resolve these conflicting findings, especially given that the observed effect reported by the BCAC was small.
One SNP of GHR showed an association with ER-negative breast cancer that was close to significance when taking into account multiple testing (Ptrend = 1.5 × 10−4). Although this result is promising, the SNP is located 3.5 kb downstream from the end of the last exon of GHR and, therefore, its function is not obvious. Further studies, involving larger numbers of ER-negative cases, will show if this result can be replicated.
Besides testing for main effects, we assessed all possible models for pairwise interactions between SNPs within each gene. This approach tests all possible cis-acting interactions between SNPs at the same gene locus, at least for all pairs of SNPs within regions of high LD (haplotype blocks) where there is little uncertainty about the allelic phase (53). One interaction test between two SNPs in POMC showed a P-value of 2.8 × 10−5, and an interaction test between two SNPs at the 3′ end of IGF1R showed a P-value of 5.3 × 10−5. However, after correction using conservative Bonferroni adjustment, these no longer met the threshold for statistical significance (P < 1.2 × 10−6, 0.05/40 049 tests). Additional haplotype reconstruction and statistical testing in these gene regions did not provide further evidence for associations with breast cancer risk. Examination of the SNPs from our discovery, sequencing and genotyping sets in these two gene regions did not show any obvious potentially functional SNPs (splice junction or nonsynonymous SNPs) in high LD. These signals may have been, therefore, observed by chance.
We chose not to test for higher order interactions, as the overall sample size of our study, although very large, was insufficient to test very large number of complex interactions. For the same reasons, we elected not to conduct more complex haplotype analyses for all genes, which would require testing models incorporating all main effects plus interactions (first- and higher order) of the full sets of SNPs in regions of high LD (53).
Previous analyses of our genetic data have found variants of some of our candidate genes to be associated with circulating hormone levels (45,46,49,54). Other studies have reported similar findings, and we have assessed these genetic variants either directly or indirectly by tagging or imputation in the current analysis (55–58). In these studies, however, the genetic variation explained only a minor part of the between-subject variance in hormone levels. On the basis of the existing knowledge about the quantitative relationship between various hormones and breast cancer risk (6,29), the modest changes in circulating levels observed previously would have too small an effect on breast cancer risk to be detectable in our study, despite its power. Nevertheless, these SNPs can be considered to have high prior probabilities of association with breast cancer risk. When we calculated the false-positive report probabilities (59) for these SNPs, we found no evidence that these associations with breast cancer risk were noteworthy, even if we assigned them a high prior (data not shown). It is also possible that common variations in the genes associated with steroid hormone and IGF metabolism may modestly affect breast cancer risk, possibly at an earlier stage of life, but that this relationship is obscured in postmenopausal women by the strong influence through the lifespan of environmental factors, such as adiposity, diet, physical activity, and menstrual and reproductive events, on steroid hormone and IGF levels. However, our results remained null in analyses restricted to women aged <55 years or with a BMI of <25 kg/m2.
Our results are consistent with recent GWAS, which also have failed to identify any loci in steroid hormone or IGF-I metabolism genes associated importantly with breast cancer risk (60–65). Across a wide variety of cancers, loci identified by GWAS have usually been found in genes not previously hypothesized to be related to the disease, or in intergenic regions not known to contain specific genes (60–70). This suggests far greater complexity in the etiology of these diseases and in genetic regulation than previously anticipated, and that a narrow focus on what we already know about widely accepted etiologic pathways will have limited success identifying the heritable components of these diseases.
In conclusion, although the experimental and epidemiologic evidence relating steroid hormones and the IGF axis to breast cancer risk is strong, this comprehensive large-scale study found no statistically significant associations between breast cancer risk and polymorphisms in 61 candidate genes in these two pathways in a population of mostly Caucasian, postmenopausal women.
The BPC3 has been described in detail elsewhere (50). Briefly, the consortium includes large, well-established cohorts assembled in the USA and Europe that have both DNA samples and extensive questionnaire information: the American Cancer Society CPS-II (71), the EPIC (72), the Harvard Nurses' Health Study (NHS) (73) and Women's Health Study (WHS) (74), the Prostate, Lung, Colorectal, Ovarian Cancer Screening Trial (PLCO) (75) and the Multiethnic Cohort (MEC) (76). With the exception of the MEC, most women in these cohorts are Caucasians of US and European descent. Cases were identified in each cohort by self-report with subsequent confirmation of the diagnosis from medical records or tumor registries, and/or linkage with population-based tumor registries (method of confirmation varied by cohort). Controls were matched to cases by ethnicity and age, and in some cohorts, additional criteria, such as country of residence in EPIC.
Informed consent was obtained from all subjects, and the project has been approved by the appropriate institutional review board for each cohort.
We selected all known genes involved in synthesis, bioavailability and metabolism of steroid hormones and IGF-I. We included only genes that are specific to steroid hormones and IGF-I, and we did not include genes involved in the downstream processing of signaling for either pathway. On the basis of the current knowledge of hormone physiology, we believe that our coverage of these two pathways is comprehensive.
A detailed description of SNP selection methodology is provided in Supplementary Material. Briefly, coding and evolutionarily conserved regions of selected genes were sequenced in a panel of 95 advanced breast cancer cases (15 from each of the five ethnic groups: African American, Latina, Japanese, Native Hawaiian and Caucasian) from the MEC and EPIC. A summary of the SNPs detected is reported in Supplementary Material, Table S1.
All SNPs identified by resequencing with MAF>5% in any of the five ethnic groups or >1% overall, as well as additional SNPs from HapMap (http://www.hapmap.org) were genotyped in multiethnic reference panels of up to 710 healthy subjects, using the Sequenom, TaqMan and Illumina platforms.
These genotyping data were then used to select tagging SNPs to genotype on cancer cases and controls from the cohorts. For genes in which TaqMan genotyping was planned (ACTHR/MC2R, AR, COMT, CYP11A, CYP17, CYP19A1, CYP1A1, CYP1A2, CYP1B1, ESR2, FSHB, GNRH1, GNRHR, HSD17B1, HSD17B2, PGR, POMC, IGF1, IGFBP1 and IGFBP3), a haplotype-tagging approach was used according to the method of Stram et al. (77), with criterion r2H ≥ 0.7. Tagging SNPs for the remaining 41 genes were selected later in the project and were genotyped using Illumina GoldenGate technology; all SNPs with MAF ≥0.05 in these genes were tagged with a minimum r2-value of ≥0.8. Haplotype tagging and pairwise tagging result in essentially the same sets of tagging SNPs over blocks of no or limited recombination, whereas the coverage with pairwise tagging is more complete in inter-block regions characterized by low LD. Although different tagging methods are not likely to select the same sets of tagging SNPs, they provide comparable power to detect associations (78).
The gene for growth hormone (GH1) was considered for this study, but it was dropped due to its high homology to four other genes that are physically clustered together with GH1 and share a very high sequence homology (79), making it virtually impossible to obtain reliable genotyping data from this region.
Genotyping in 6292 breast cancer cases and 8135 controls was performed in six laboratories (located at University of Southern California, National Cancer Institute, Harvard School of Public Health, Imperial College, International Agency for Research on Cancer and DKFZ). One hundred and seventy-two SNPs in 20 genes (listed in the subsection Gene and SNP selection) were genotyped by the 5′ nuclease (TaqMan) assay, using reagents by Applied Biosystems (Foster City, CA, USA). SNPs in the remaining 41 genes examined in the present analysis were genotyped by Illumina GoldenGate technology (San Diego, CA, USA). SNPs belonging to three genes, ACTHR, PGR and POMC, were genotyped on both platforms and did not show genotyping discrepancies. Samples of 30 HapMap CEU trios were genotyped in all laboratories to evaluate inter-lab reproducibility. For the 1536 SNPs included in the Illumina GoldenGate array, the concordance was 99.5% (before excluding failed SNPs or samples). Within each study, blinded duplicate samples (~5%) were also included and concordance of these samples ranged from 97.2 to 99.9% across studies.
Samples in which more than 25% of the SNPs failed were excluded from further analysis. Also excluded were poorly performing SNPs (those that failed on 25% or more samples in a given cohort), those that deviated significantly (P < 10−5) within a cohort from HWE genotype frequencies among European-ancestry controls, and those with MAF < 1% (within a cohort). Any SNP that was missing in more than three studies or exhibited large differences in European-ancestry allele frequencies across cohorts (fixation index Fst > 0.02) was excluded from further analysis.
We used the software MACH (http://www.sph.umich.edu/csg/abecasis/MaCH/index.html) to impute SNPs that were polymorphic in any of the HapMap reference panels using observed genotypes from the BPC3 subjects (Illumina or TaqMan) and phased haplotypes from HapMap samples (release #22) (80). Genotypes for European-ancestry subjects were imputed using the CEPH European (CEU) reference panel of HapMap; those for Japanese Americans were imputed using the combined Han Chinese and Japanese panels (CHB + JPT); those for remaining subjects (African Americans, Latinos, Native Hawaiians) were imputed using a ‘cosmopolitan’ panel of all HapMap samples (CEU + CHB + JPT + YRI) (81). Imputation was performed in analyses stratified by both study and ethnicity. We excluded SNPs for which the average correlation between the imputed and true underlying genotypes (estimated by comparing the variance of the estimated genotype scores with what would be expected if genotype scores were observed without error) was <30%; this threshold was chosen based on analyses of empirical data that maximized the number of accurately imputed genotypes while minimizing the proportion of excluded SNPs (82).
We analyzed the association between breast cancer risk and genotypes for each SNP with MAF >1% using unconditional logistic regression adjusted for four factors: age at diagnosis or selection as a control (5 year age intervals), study, ethnicity and (for the EPIC cohort) recruitment center. Genotypes were coded either as counts of minor alleles (trend test) or as two indicator variables, one for heterozygotes and one for minor allele homozygotes (two degrees of freedom test). We performed these analyses in all subjects, separately for each ethnicity. For the two largest ethnic groups (Caucasians and African Americans), we further performed statistical analyses separately for each study within ethnicity. We performed also analyses stratifying on four other variables: age at diagnosis (≤55, >55 years), advanced/nonadvanced disease (advanced disease was defined as having regional or distant metastasis), ER status, by BMI (<25, 25–30, ≥30 kg/m2) and by inclusion of in situ carcinoma cases.
In order to take into account the large number of tests performed in this project, we calculated for each gene the number of effective independent variables, Meff, by the use of the SNP Spectral Decomposition approach (83). We obtained a gene-wide Meff value for each gene and also a study-wide Meff value, by adding up the gene Meff's.
We analyzed all 40 049 possible SNP–SNP interaction models involving pairs of SNPs belonging to the same gene. We coded the SNPs by genotype category and we performed a likelihood ratio test of the model with all interaction terms against the reduced model that included only the main effects of the two SNPs.
Additional haplotype analyses in IGF1R were performed with the software ‘tagSNPs’ (77,84). This program calculates, for each individual, the expected numbers of copies (‘dosages’) of each of the haplotypes compatible with the individual's SNP genotypes. Association between haplotype dosages and breast cancer risk was tested with conditional logistic regression. This and all other regression analyses were conducted with SAS, v 9.1.
Conflict of Interest statement. None declared.
This work was supported by the US National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233 to D.J.H., U01-CA98710 to M.J.T., U01-CA98216 to E.R. and R.K., and U01-CA98758 to B.E.H., and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics).