|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) of breast cancer defined by hormone receptor status have revealed loci contributing to susceptibility of estrogen receptor (ER)-negative subtypes. To identify additional genetic variants for ER-negative breast cancer, we conducted the largest meta-analysis of ER-negative disease to date, comprising 4754 ER-negative cases and 31 663 controls from three GWAS: NCI Breast and Prostate Cancer Cohort Consortium (BPC3) (2188 ER-negative cases; 25 519 controls of European ancestry), Triple Negative Breast Cancer Consortium (TNBCC) (1562 triple negative cases; 3399 controls of European ancestry) and African American Breast Cancer Consortium (AABC) (1004 ER-negative cases; 2745 controls). We performed in silico replication of 86 SNPs at P ≤ 1 × 10-5 in an additional 11 209 breast cancer cases (946 with ER-negative disease) and 16 057 controls of Japanese, Latino and European ancestry. We identified two novel loci for breast cancer at 20q11 and 6q14. SNP rs2284378 at 20q11 was associated with ER-negative breast cancer (combined two-stage OR = 1.16; P = 1.1 × 10−8) but showed a weaker association with overall breast cancer (OR = 1.08, P = 1.3 × 10–6) based on 17 869 cases and 43 745 controls and no association with ER-positive disease (OR = 1.01, P = 0.67) based on 9965 cases and 22 902 controls. Similarly, rs17530068 at 6q14 was associated with breast cancer (OR = 1.12; P = 1.1 × 10−9), and with both ER-positive (OR = 1.09; P = 1.5 × 10−5) and ER-negative (OR = 1.16, P = 2.5 × 10−7) disease. We also confirmed three known loci associated with ER-negative (19p13) and both ER-negative and ER-positive breast cancer (6q25 and 12p11). Our results highlight the value of large-scale collaborative studies to identify novel breast cancer risk loci.
Breast cancer is a heterogeneous disease and has multiple histological and molecular subtypes, likely with distinct etiologies. Tumors that lack expression of the estrogen receptor (ER) tend to have more aggressive disease, higher histological grade and lower survival rates (1). ER-negative breast cancer is more common in women of African ancestry, accounting for as much as 40% of cases in African American women compared with 15–20% in women of European ancestry. The etiologic heterogeneity between breast cancer subtypes is supported by different associations with ER-positive versus ER-negative disease for many of the known breast cancer risk factors (such as reproductive factors and BMI) (2). Tumors in women with BRCA1 mutations are predominantly ER-negative, while tumors in BRCA2 mutation carriers are predominantly ER-positive (3). Furthermore, GWAS have identified multiple common genetic variants more strongly associated with ER-positive than ER-negative breast cancer (4). Through collaborative efforts, we recently identified risk loci on 5p15 and 19p13 that are associated specifically with ER-negative and triple negative (TN) [ER-negative, progresterone (PR)-negative and HER2-negative] breast cancer (5–7).
In order to identify genetic loci associated with the risk of ER-negative breast cancer, we conducted a meta-analysis of three GWAS of ER-negative breast cancer, comprising 4754 cases and 31 663 controls with further replication in an additional 11 209 cases (946 with ER-negative disease) and 16 057 controls.
The meta-analysis included genome-wide association studies (GWAS) of ER-negative breast cancer (4754 ER-negative cases and 31 663 controls) from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3) (2188 ER-negative cases and 25 519 controls of European ancestry), the Triple Negative Breast Cancer Consortium (TNBCC) (1562 TN cases and 3399 controls of European ancestry) and the African American Breast Cancer Consortium (AABC) (1004 ER-negative cases and 2745 controls). (Fig. 1, Supplementary Material, Table S1). We observed little evidence of over-inflation in the test statistics (λ ≤ 1.04 for each study; λ = 1.04 for meta-analysis) (Supplementary Material, Fig. S1). A total of 86 SNPs were associated with ER-negative breast cancer at P ≤ 10−5 (Supplementary Material, Table S2). An in silico replication of the 86 SNPs was conducted using GWAS of European (BCAC combined), Latino (MEC-LAT, SFBCS/NC-BCFR) and Japanese (MEC-JPT) ancestry populations, totaling 11 209 breast cancer cases (946 with ER-negative disease) and 8404 controls (Stage 2) (Supplementary Material, Table S1).
Combining results for ER-negative breast cancer from Stages 1 and 2, variants in three regions showed genome-wide significance [20q11-rs2284378, T allele: odds ratio, OR = 1.16, P = 1.1 × 10−8 (Table 1); 19p13-rs8100241, G allele: OR = 1.14, P = 3.5 × 10−8; 6q25-rs9383938, T allele: OR = 1.28, P = 2.37 × 10−10]. Variants at 6q25 have previously been associated with breast cancer risk (8), and variants at the 19p13 locus have been associated with ER-negative and TN breast cancer risk (5,7). The rs2284378 variant at 20q11 is located in a region containing RALY (RNA binding protein, autoantigenic), EIF2S2 (eukaryotic translation initiation factor 2, subunit 2 beta) and ~100 kb upstream of ASIP (agouti signaling protein), and is in high linkage disequilibrium (LD) (r2 = 0.96 and D′ = 1) with rs4911414, which has been associated with melanoma and basal cell carcinoma (9) (Supplementary Material, Fig. S2). The T allele at rs2284378 was associated with an increased ER-negative breast cancer risk (OR>1) in all racial/ethnic populations, except Japanese (OR = 0.99) (Table 1). However, this group had the smallest sample size. Furthermore, no significant evidence of heterogeneity was observed by race (P = 0.28) or study (P = 0.54) (Table 1, Supplementary Material, Table S3). When the study was extended to include all available breast cancer cases (ER-positive and ER-negative) and controls from the participating GWAS, rs2284378 showed a weaker association with overall breast cancer (OR = 1.08, P = 1.3 × 10−6 based on 17 868 cases and 43 744 controls; Table 1) and no evidence for association with ER-positive disease [OR = 1.01, P = 0.67 based on 9965 cases and 22 902 controls (Supplementary Material, Table S5)]. A case-only analysis of ER-negative versus ER-positive breast cancer indicated a highly significant difference in ORs by ER status (P = 1.3 × 10−4, Supplementary Material, Table S5). Furthermore, rs2284378 appeared more strongly associated with TN breast cancer (OR = 1.16; P = 6.4 × 10−3), than ER-negative, PR-negative and HER2-positive breast cancer (OR = 1.07, P = 0.41), although these differences were not statistically significant (case-only P = 0.44) (Supplementary Material, Table S5).
Next, we examined the associations between all candidate loci from Stage 1 (n = 86 SNPs) and overall breast cancer risk using all available breast cancer cases and controls from the studies in Stages 1 and 2 (Fig. 1). We identified genome-wide statistically significant associations with variants at 6q25 (rs9383938, T allele: OR = 1.20; P = 8.7 × 10−14), and a recently reported risk locus near the PTHLH gene at 12p11 (rs1975930, T allele: OR = 1.22; P = 1.4 × 10−13) (10). In addition, we observed genome-wide significant associations with multiple variants in a gene-desert located at 6q14. Allele C of rs17530068 at 6q14 was associated with increased risk for overall breast cancer risk (OR = 1.12; P = 1.1 × 10−9) (Table 2, Supplementary Material, Fig. S3 and Table S4) and both ER-positive (OR = 1.09; P = 1.5 × 10−5) (Supplementary Material, Table S6) and ER-negative (OR = 1.16, P = 2.5 × 10−7) (Table 2) breast cancer. We observed no evidence of risk heterogeneity for rs17530068 by ER status (case-only analysis P = 0.53) (Supplementary Material, Table S6); study (Phet = 0.16); or race/ethnicity (Phet = 0.30) (Table 2). Furthermore, rs17530068 appeared more strongly associated with ER-negative, PR-negative and HER2-positive breast cancer (OR = 1.26, P = 8.0 × 10−3) than TN breast cancer (OR = 1.12, P = 0.07), although these differences were not statistically significant (case-only P = 0.17) (Supplementary Material, Table S6).
We also evaluated associations for 25 known breast cancer risk markers in European-ancestry women from our study (Supplementary Material, Table S7 and Fig. S4). In our samples 8 of the 13 markers previously associated with both ER-negative and ER-positive disease or with ER-negative disease only (TERT and 19p13.1) were nominally significantly associated (P < 0.05) with ER-negative disease. In contrast, none of the 10 markers previously associated with ER-positive disease only was associated with ER-negative disease. A risk score formed by summing the risk alleles at all 25 previously identified loci was significantly associated with ER-negative disease in our study [OR = 1.06 (1.04–1.07); P = 2.9 x10−14]. Risk scores for subsets of markers associated with ER-negative disease only (2 markers) or both ER-negative and ER-positive disease (11 markers) were also significantly associated with ER-negative disease [OR = 1.22 (1.14–1.31), P = 1.0 × 10−8 and OR = 1.08 (1.05–1.10), P = 9.5 × 10−12, respectively]. A risk score for the subset of loci previously associated with ER-positive disease only (10 markers) was not associated with the risk of ER-negative disease [OR = 1.02 (1.00–1.04), P = 0.08]. These score results provide some confirmation of earlier results and an estimate of the effects of previously identified breast cancer risk markers on the risk of ER-negative disease.
We present results from the largest meta-analysis to date to specifically focus on ER-negative disease. We identify two novel loci for breast cancer: 20q11 associated with ER-negative and TN, but not ER-positive breast cancer, and 6q14 associated with both ER-positive and ER-negative breast cancer. In addition, we confirm three known regions previously associated with ER-negative (19p13) or ER-negative and ER-positive breast cancer (6q25 and 12p11). Correction for genomic control results in similar but attenuated findings for 20q11-rs2284378 (PGC = 2.4 × 10−8) and 6q14-rs17530068 (PGC = 3.2 × 10−9).
The novel association at 20q11 with ER-negative breast cancer spans the ASIP, RALY and EIF2S2 genes. Agouti signaling protein (product of the ASIP gene) was first described to inhibit melanogenesis in human melanocytes in 1997 (11). ASIP is a melanocortin 1 receptor (MC1R) ligand that antagonises the function of the transmembrane receptor (12). The variants we identified at 20q11 for breast cancer are highly correlated with variants previously associated with pigmentation traits as well as the risk of both cutaneous melanoma and basal cell carcinoma (9), suggesting a possible biological link between these cancers. Further studies have confirmed the importance of the genetic variation spanning the ASIP locus, where a variant at 20q11 showed the strongest association with pigmentation and was implicated in a probable LD with variants within an ASIP regulatory region (13). EIF2S2 encodes eukaryotic translation initiation factor 2, subunit 2 beta, which is involved in early steps of protein synthesis by forming a ternary complex with GTP and initiator tRNA. The deletion of Eif2s2 has been associated with suppression of testicular germ cell tumor incidence and recessive lethality in mice (14). The agouti-yellow (AV) deletion is a genetic modifier known to suppress testicular germ cell tumor susceptibility in mice and humans. The AV mutation deletes both RALY and Eif2s2, and induces the ectopic expression of agouti, all of which are potential testicular germ cell tumor-modifying variations (14). Both RALY and EIF2S2 are expressed in many tissues, including mammary gland (15). The SNP rs2284378 was not consistently associated with expression of EIF2S2, RALY or ASIP in lymphocytes (11), adipocytes or skin cells(16), although there was marginal evidence for association between rs2284378 and EIF2S2 expression in one study (16) (Supplementary Material, Table S8). However, several SNPs in high LD with SNP rs2284378 (r2>0.8) within a 1 Mb region were significantly associated with expression of nearby genes EIF2S2 and RALY. Rs4911379 (r2 = 0.96) is statistically significantly associated with EIF2S2 expression in fibroblasts (P = 3.6 × 10−4) (17) and SNPs rs761238 and rs761236 (r2 = 0.85) are associated with RALY expression in lymphocytes (P = 8.3 × 10−4) (16). An additional 13 SNPs (r2 > 0.85) have been associated with expression of RALY, GGTL3, DYNLRB1 and AK054906 in liver cells, monocytes and lymphoblastoid cell lines (Supplementary Material, Table S9). In addition to expression, several enhancer as well as promoter regions defined by overlapping chromatin marks in human mammary epithelial cells (HMEC) were found at 20q11 (Supplementary Material, Fig. S5). SNPs in high LD with rs2284378 (r2 > 0.7), such as rs4911395, rs4911396 and rs1007090, are located in the promoter region of RALY. SNPs rs6142101, rs6087557 and rs4911408 (r2 > 0.7) are present in the promoter region of EIF2S2, and rs1054534, rs1555075, rs2268086, rs2268088, rs4911401, rs2284388, rs2284389 and rs932388 are located in predicted enhancer regions in introns of RALY. Thus, variants at 20q11 may influence expression of multiple genes in mammary epithelial cells, as has been seen in prostate cancer (18).
In contrast, rs17530068 at 6q14 is located in a gene desert with no evidence of an open/active regulatory region in HMEC (Supplementary Material, Fig. S6). The closest gene (~262 kb), family with sequence similarity 46, member A (FAM46A/C6orf37), encodes a protein of unknown function. Five SNPs in this region in low LD with SNP rs17530068 (r2 < 0.02) were associated with expression of IBTK in lymphoblastoid cell lines (Supplementary Material, Table S10). Additional studies of both of these novel regions will be necessary to identify the underlying biologically relevant variant/s.
The SNP rs17530068 at chromosome 6q14 was associated with overall breast cancer risk and showed no differential association depending on ER status. The association of SNP rs2284378 at 20q11, however, was stronger for ER-negative than ER-positive breast cancer. This finding underscores the importance of investigating genetic variants for specific subtypes of breast cancer, as this locus had not been previously identified in the many GWAS of breast cancer to date that did not focus on this specific breast cancer subtype. The etiology of ER-negative disease is largely unknown. Identifying new loci associated with ER-negative and TN breast cancer will continue to provide insight into the biological mechanisms underlying this more aggressive form of breast cancer, and could result in improvements in risk prediction and treatment.
Stage 1 included the studies of the NCI BPC3, TNBCC and AABC. The BPC3 study includes 2188 ER-negative cases and 25 519 controls, AABC includes 3153 cases (1004 ER-negative) and 2745 controls from 9 studies and TNBCC includes 1562 cases and 3399 controls from 15 studies (Supplementary Material, Table S1). Replication studies include 886 cases (84 ER-negative) and 830 controls from a GWAS of breast cancer in Japanese (MEC-JPT) women and 546 cases (112 ER-negative) and 558 controls from a GWAS of breast cancer in Latino (MEC-LAT) women in the Multiethnic Cohort (MEC), 992 cases (188 ER-negative) and 640 controls from the San Francisco Bay Area Breast Cancer Study (SFBCS) and the Northern California Breast Cancer Family Registry (NC-BCFR), and 8785 cases (562 ER-negative) and 14 029 controls from 8 combined GWAS of breast cancer from BCAC. All participants in these studies have provided written consent for the research and approval for the study was obtained from the ethical review board from all local institutions. A description of each participating study has been provided in Supplementary Material.
Genotyping in AABC was conducted using the Illumina Human1M-Duo BeadChip. Of the 5984 samples in the AABC Consortium (3153 cases and 2831 controls), we attempted genotyping of 5932, removing samples (n = 52) with DNA concentrations <20 ng/μl. Following genotyping, we removed samples based on the following exclusion criteria: (i) unknown replicates (≥98.9% genetically identical) that we were able to confirm, (n = 15); (ii) unknown replicates pair or triplicate removed, (n = 14); (iii) samples with call rates <95% after a second attempt (n = 100); (iv) samples with ≤5% African ancestry (n = 36) (discussed below); and (v) samples with <15% mean heterozygosity of SNPs in the X chromosome and/or similar mean allele intensities of SNPs on the X and Y chromosomes (n = 6). In the analysis, we removed SNPs with <95% call rates (n = 21 732) or minor allele frequencies (MAFs) <1% (n = 80 193). The concordance rate for blinded duplicates was 99.95%. We also eliminated SNPs with genotyping concordance rates <98% based on the replicates (n = 11 701). The final analysis data set included 1 043 036 SNPs genotyped on 3016 cases (988 ER-negative, 1520 ER-positive and the remaining 508 cases with unknown ER status) and 2745 controls, with an average SNP call rate of 99.7% and average sample call rate of 99.8%.
Genotyping for the TNBCC GWAS was conducted on 1718 cases from 10 studies (ABCTB, BBCC, DFCI, FCCC, GENICA, MARIE, MCBCS, MCCS, POSH, SBCS) using the Illumina 660-Quad SNP array. In addition, a subset of MARIE cases (n = 52) was genotyped using the Illumina CNV370 SNP array. HEBCS cases (n = 85) were genotyped using the Illumina 550 SNP array and population allele and genotype frequencies on healthy population controls (n = 222) were genotyped on Illumina 370 SNP array, and obtained from the NordicDB, a Nordic pool and portal for genome-wide control data (19) from the Finnish Genome Center. GWAS data for public controls (n = 3448) were generated using the following arrays: Illumina 660-Quad SNP array (QIMR), Illumina 550 SNP array (CGEMS), Illumina 550 SNP array (KORA) and Illumina 1.2M (WTCCC). These GWAS data were independently evaluated by an iterative QC process with the following exclusion criteria: MAF <0.01, call rate <95%, Hardy–Weinberg equilibrium (HWE) P-value <1 × 10−7 among controls and sample call rate <98%. In total, we excluded previously unknown replicates (n = 2) and samples with call rates <98% (n = 83), samples that failed sex check (n = 10), cases identified as non-TN breast cancer (n = 20) and related samples (n = 27). We removed SNPs with <95% call rates or MAF <5%. Because a number of our samples were genotyped at different locations, we removed SNPs if there was a difference >0.10 between the study allele frequency and the median frequency across all studies. The Eigensoft software which uses principle component analysis (PCA) was used to evaluate confounding due to population stratification. We removed 101 subjects that did not cluster with the CEU HapMap Phase 2 samples, and a further 179 controls were removed which overlapped with CGEMS/NHS controls in BPC3, resulting in 1562 cases and 3399 controls in the GWAS analyses.
BPC3 GWAS genotyping was conducted at three genotyping centers (NCI Core Genotyping Facility, USA; University of Southern California, USA; and Imperial College London, UK). Subjects from CPSII, EPIC, MEC, PLCO and PBCS were genotyped using the Illumina Human 660k-Quad SNP array (Illumina, Inc.), NHSI/NHSII and part of the PLCO study were genotyped previously using the Illumina Human 550 SNP array (Illumina, Inc.) (20). SNPs were filtered and removed based on deviations from Hardy–Weinberg proportions in control subjects (P < 10−5), autosomal SNPs with MAF of <5% and completion rate <95%. Samples were excluded based on genotyping call rates <95% (n = 195), samples with extreme heterozygosity were excluded from the analysis (n = 35), sex discordance (n = 3) and unexpected duplicates and relatedness (n = 6). Subjects with evidence of significant non-European ancestry and population structure were also excluded. Non-European ancestry was assessed utilizing a subset of unlinked, population informative SNPs (21). Individuals determined to have <80% European ancestry were excluded from future analyses (n = 16). The average concordance rate of blinded duplicates was 99.95%. In order to resolve a more detailed population substructure, PCA was conducted using struct.pca module of GLU (http://code.google.com/p/glu-genetics/). PCA was only performed in subjects with over 80% European ancestry. Furthermore, 958 controls from NHS (CGEMS) were removed from BPC3 analyses due to overlap between TNBCC and BPC3 studies. The overall number of cases and controls after all exclusions which contributed to the Stage 1 analysis were 1998 cases and 2305 controls.
The WHS cohort subjects in BPC3 were previously genotyped using the Human-Hap300 Duo-plus BeadChip (22). Among the final 23 294 individuals of verified European ancestry, genotypes for a total of 2 608 509 SNPs were imputed from the experimental genotypes and LD relationships implicit in the HapMap r. 22 CEU samples. WHS contributed 190 cases and 23 214 control subjects to Stage 1. WHS was meta-analyzed with the remaining BPC3 studies contributing a total of 2188 cases and 25 519 control subjects to Stage 1 analysis.
SNPs rs2284378 and rs17530068 were genotyped in all Stage 1 studies.
The SFBCS (23) and the NC-BCFR (24) study samples were genotyped with the Affymetrix 6.0 array according to the manufacturer's instructions (https://www.affymetrix.com) in the laboratory of Esteban Gonzalez Burchard at UCSF. A total of 15 cases and 30 controls were excluded from the SFBCS and NC-BCFR sample set that had a genotyping call rate <95% or showed either known or cryptic relatedness. The final sample included in the analysis was 992 cases (188 ER-negative cases) and 640 controls. Imputation was conducted with the program BEAGLE, with all unrelated HapMap Phase II samples included as references (http://hapmap.ncbi.nlm.nih.gov).
GWAS of breast cancer in Latino (MEC-LAT) and Japanese (MEC-JPT) samples from the MEC were genotyped with the Illumina 660W array at USC. For MEC-LAT, we excluded 48 samples from the MEC that had a genotyping call rate of <95% and 34 that showed either known or cryptic relatedness. The final MEC-LAT sample included 546 (112 ER-negative) and 558 controls. With similar exclusions, the final MEC-JPT sample included 886 (84 ER-negative) and 830 controls.
The BCAC combined GWAS includes primary genotype data from eight breast cancer GWAS in populations of European ancestry (ABCFS, BBCS, GC-HBOC, MARIE, HEBCS, SASBAC, UK2, DFBBCS). All studies were genotyped with various versions of Illumina arrays, except GC-HBOC which was performed with the Affymetrix 5.0 (cases) and 6.0 (controls) arrays. Standard QC was performed on all scans. Specifically, all individuals with low call rate (<95%), extreme high or low heterozygosity (P<10−5), and all individuals evaluated to be of non-European ancestry (>15% non-European component, by multidimensional scaling using the three Hapmap2 populations as a reference) were excluded. SNPs with call rate <95%; call rate <99% and MAF<5%, all SNPs with MAF<1%, and SNPs with genotype frequencies departing from HWE at P<10−6 in controls or P < 10−12 in cases were also excluded. Data were imputed for ~2.6 m SNPs for all scans using Mach v1.0 with HapMap version 2 CEU as a reference. BBCS and UK2 used the same control data (WTCCC2). These studies were imputed separately. For the combined analysis, the control set was divided randomly between the two studies, in proportion to the size of case series, to provide disjoint strata. Estimated per-allele ORs and standard errors were generated from the imputed genotypes using Probabel (25).
SNPs rs2284378 and rs17530068 were genotyped in all Stage 2 studies except SFBCS and NC-BCFR where they were imputed. Both SNPs were genotyped by TaqMan in 483 samples from these studies and genotype concordance versus imputed genotypes was 93.3% for rs2284378 and 94.9% for rs17530068.
In BPC3, genotyping of SNP rs2284378 and rs17530068 was performed for all available breast cancer cases and controls by TaqMan in four laboratories (CPS-II and MEC at the University of Southern California; NHS and WHS at Harvard University; EPIC at the German Cancer Research Center in Heidelberg; and PLCO at the NCI/Core Genotyping Facility). All studies typed SNP rs17530068; however for SNP rs2284378, PLCO and CPS-II typed a proxy SNP rs6059651 (r2 = 1, D′ = 1). The concordance for the TaqMan genotyping data with that generated from Illumina for Stage 1 ER-negative cases and controls was 0.997 for rs17530068 and 0.986 for rs2284378 for CPS2, MEC, NHS, EPIC and PLCO. The genotype concordance versus imputed for WHS was 95% for rs2284378 and 97% for rs17530068.
In AABC, we tested for gene dosage effects in models adjusted for age, study and eigenvectors 1–10. OR and 95% confidence intervals (95% CI) were estimated using unconditional logistic regression. In TNBCC, unconditional logistic regression was used to assess each SNP association analyses also assuming a log-additive model, adjusting for country and the first two principal components. In BPC3, unconditional logistic regression model was used to assess single SNP associations adjusting for age categories and the top six eigenvectors.
In both AABC and TNBCC, phased haplotype data from the founders of the CEU and YRI HapMap Phase 2 samples (build 21) were used to infer LD patterns in order to impute untyped markers. For BPC3, Hapmap Phase 2 (release 21) and Hapmap Phase 3 were used to impute untyped markers. For all studies, genome-wide imputation was carried out using the software MACH. Filtered from the analysis were SNPs with Rsq < 0.3 and MAF <1%.
We conducted a fixed effect meta-analysis of AABC, TNBCC and BPC3 using the inverse variance weighted method. The number of SNPs available for meta-analysis from AABC, TNBCC and BPC3 in Stage 1 were 3 055 415, 2 134 490 and 245 3207, respectively. The union of these three data sets was meta-analyzed using the program METAL. We conducted in silico replication of 86 SNPs with P-values ≤ 10−5 in Stage 1 in the Stage 2 studies, and a meta-analysis of these SNPs from Stages 1 and 2 for both ER- negative and overall breast cancer. P-values from our top two loci were corrected for genomic inflation (PGC) using the lambda value from the overall meta-analysis. Testing for heterogeneity by study was evaluated using the Q-statistic. Case-only analyses were performed to test for differences in the association by tumor subtypes, study and race/ethnicity.
The association between risk scores of 25 previously identified breast cancer risk alleles and risk of breast cancer in our samples was calculated using meta-regression, assuming the per-allele odds ratio was constant across the markers analyzed. This is equivalent to combining the summary log odds ratio estimates at independent loci using inverse-variance weighted meta-analysis. The overlap between subjects contributing to this study and those contributing to previous studies varied from marker to marker [e.g. the TNBCC contributed to the initial report on rs8170 (5) and the BPC3 and TNBCC contributed to the initial report on the TERT locus (6)].Thus, the results could be overestimates since some of the studies here contributed to the discovery of these 25 loci.
Expression quantitative trait loci (eQTL) were assessed for all SNPs in the chromosome 6 and 20 loci using the GTEX database (http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi), University of Chicago eQTL Browser (http://eqtl.uchicago.edu) and Genevar (http://www.sanger.ac.uk/resources/software/genevar/) (26).
In an attempt to identify functionality at the two novel breast cancer risk loci, we used the open-source R/Bioconductor package FunciSNP version 0.99 (27), which systematically integrates the 1000 Genomes Project SNP data (April 2012 data release) with chromatin features of interest. For each of the two novel breast cancer markers, we analyzed all SNPs with an r2-value > 0.5 with each index SNP in the 1000 Genomes Project EUR populations in a 1 Mb window around each index variant. We assessed whether these SNPs were co-located with 12 different chromatin features generated by next-generation sequencing technologies, which capture open chromatin regions, promoters and enhancers genome-wide in HMEC as well as known DNaseI hypersensitive locations, FAIRE-seq peaks and CTCF-binding sites from >100 different cell types, which were collected in ENCODE data (28). We utilized the UCSC Genome Browser (http://genome.ucsc.edu/) to illustrate the correlated SNPs, which overlap chromatin features as well as chromatin feature tracks (Supplementary Material, Figs S5 and S6).
AABC was supported by a Department of Defense Breast Cancer Research Program Era of Hope Scholar Award to C.A.H. (W81XWH-08-1-0383) and the Norris Foundation. Each of the participating AABC studies was supported by the following grants: MEC (National Institutes of Health grants R01-CA63464 and R37-CA54281); CARE (National Institute for Child Health and Development grant NO1-HD-3-3175), WCHS (U.S. Army Medical Research and Material Command (USAMRMC) grant DAMD-17-01-0-0334, the National Institutes of Health grant R01-CA100598 and the Breast Cancer Research Foundation, SFBCS (National Institutes of Health grant R01-CA77305 and United States Army Medical Research Program grant DAMD17-96-6071), NC-BCFR (National Institutes of Health grant U01-CA69417), CBCS (National Institutes of Health Specialized Program of Research Excellence in Breast Cancer, grant number P50-CA58223, and Center for Environmental Health and Susceptibility, National Institute of Environmental Health Sciences, National Institutes of Health, grant number P30-ES10126), PLCO (Intramural Research Program, National Cancer Institute, National Institutes of Health) and NBHS ( National Institutes of Health grant R01-CA100374),WFBC ( National Institutes of Health grant R01-CA73629). The NC-BCFR is one of six sites participating in The Breast Cancer Family Registry (BCFR) which was supported by the National Cancer Institute, National Institutes of Health under RFA CA-06-503 and through cooperative agreements with members of the Breast Cancer Family Registry and Principal Investigators. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the BCFR, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the BCFR.
The TNBCC studies were supported by the following grants: MCBCS (National Institutes of Health Grants CA122340 and a Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201) and the Breast Cancer Research Foundation (BCRF); MARIE (Deutsche Krebshilfe e.V., grant number 70-2892-BR I, the Hamburg Cancer Society, the German Cancer Research Center (DKFZ) and the Federal Ministry of Education and Research (BMBF) Germany grant 01KH0402); GENICA (Federal Ministry of Education and Research (BMBF) Germany grants 01KW9975/5, 01KW9976/8, 01KW9977/0, 01KW0114, and the Robert Bosch Foundation Stuttgart, Germany; MCCS (Australian NHMRC grants 209057, 251553 and 504711 and infrastructure provided by the Cancer Council Victoria); SBCS [Breast Cancer Campaign (grant 2004Nov49 to A.C.), and by Yorkshire Cancer Research core funding]; DFCI (DFCI Breast Cancer SPORE NIH P50 CA089393); POSH (Cancer Research UK); DEMOKRITOS [Hellenic Cooperative Oncology Group research grant (HR R_BG/04) and the Greek General Secretary for Research and Technology (GSRT) Program, Research Excellence II, funded at 75% by the European Union]; BBCC (Dr Mildred Scheel Stiftung of the Deutsche Krebshilfe e.V.); BBCS (Cancer Research UK and Breakthrough Breast Cancer and NHS funding to the NIHR biomedical Research Centre and the National Cancer Research Network (NCRN); LMBC (European Union Framework Programme 6 Project LSHC-CT-2003-503297 (the Cancerdegradome) and by the ‘Stichting tegen Kanker’ (232-2008); OBCS (Finnish Cancer Foundation, the Sigrid Juselius Foundation, the Academy of Finland, the University of Finland, and Oulu University Hospital); HEBCS (Helsinki University Central Hospital Research Fund, Academy of Finland (132473), the Finnish Cancer Society, The Nordic Cancer Union and the Sigrid Juselius Foundation); FCCC (U01CA69631, 5U01CA113916, the University of Kansas Cancer Center and the Kansas Bioscience Authority Eminent Scholar Program); RPCI (RPCI DataBank and BioRepository (DBBR), a Cancer Center Support Grant Shared Resource (P30 CA016056-32)); SKKDKFZS (Deutsches Krebsforschungszentrum); BIGGS (National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre, Guy's & St. Thomas’ NHS Foundation Trust in partnership with King's College London and King's College Hospital NHS Foundation Trust); ABCTB (National Health and Medical Research Council of Australia, The Cancer Institute NSW and the National Breast Cancer Foundation); ABCS (Dutch Cancer Society grant number 2009-4363); KARBAC (The Stockholm Cancer Society).
BPC3 is supported by the US National Institutes of Health, National Cancer Institute under cooperative agreements U01-CA98233 (NHS, NHSII, WHS), U01-CA98710 (CPS2), U01-CA98216 (EPIC), U01-CA98758 (MEC) and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics (PLCO). The authors thank Drs Christine Berg and Philip Prorok, Division of Cancer Prevention, NCI, the screening center investigators and staff of the PLCO Cancer Screening Trial, Mr Thomas Riley and staff at Information Management Services, Inc., and Ms Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial.
The WHS is supported by HL043851 and HL080467 from the National Heart, Lung, and Blood Institute and CA 047988 from the National Cancer Institute, the Donald W. Reynolds Foundation and the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen.
The UK2 GWAS was funded by Wellcome Trust and Cancer Research UK. The WTCCC was funded by the Wellcome Trust. BCAC is funded by CR-UK (C1287/A10118, C1287/A12014) and by the European Community's Seventh Framework Programme under grant agreement no. 223175 (HEALTH-F2-2009-223175) (COGS). Meetings of the BCAC have been funded by the European Union COST programme (BM0606). The ABCFS study was supported by the National Health and Medical Research Council of Australia, the New South Wales Cancer Council, the Victorian Health Promotion Foundation (Australia) and the National Cancer Institute, National Institutes of Health under RFA-CA-06-503 and through cooperative agreements with members of the Breast Cancer Family Registry (BCFR) and the Principle Investigators. The University of Melbourne (U01 CA69638) contributed data to this study. The content of this manuscript does not necessarily reflect the views or the policies of the National Cancer Institute or any of the collaborating centers in the BCFR, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the BCFR. We extend our thanks to the many women and their families that generously participated in the Australian Breast Cancer Family Study and consented to us accessing their pathology material. J.L.H. is a National Health and Medical Research Council Australia Fellow. M.C.S. is a National Health and Medical Research Council Senior Research Fellow. J.L.H. and M.C.S. are both group leaders of the Victoria Breast Cancer Research Consortium. The BBCS is funded by Cancer Research UK and Breakthrough Breast Cancer and acknowledges NHS funding to the NIHR Biomedical Research Centre, and the National Cancer Research Network (NCRN). The BBCS GWAS received funding from The Institut National de Cancer. The DFBBCS GWAS was funded by The Netherlands Organisation for Scientific Research (NWO) as part of a ZonMw/VIDI grant number 91756341. We thank Muriel Adank for selecting the samples and Margreet Ausems, Christi van Asperen, Senno Verhoef and Rogier van Oldenburg for providing samples from their Clinical Genetic centers. The GC-HBOC was supported by Deutsche Krebshilfe (107054), the Dietmar-Hopp Foundation, the Helmholtz society and the German Cancer Research Centre (DKFZ). The GC-HBOC GWAS was supported by the German Cancer Aid (grant no. 107352). The MARIE study was supported by the Deutsche Krebshilfe e.V. (70-2892-BR I), the Hamburg Cancer Society, the German Cancer Research Center and the genotype work in part by the Federal Ministry of Education and Research (BMBF) Germany (01KH0402). MARIE would like to thank Tracy Slanger and Elke Mutschelknauss for their valuable contributions, and S. Behrens, R. Birr, M.Celik, U. Eilber, B. Kaspereit, N. Knese and K. Smit for their excellent technical assistance. The SASBAC study was supported by funding from the Agency for Science, Technology and Research of Singapore (A*STAR), the US National Institute of Health(NIH) and the Susan G. Komen Breast Cancer Foundation. CGEMS. The Nurses' Health Studies are supported by NIH grants CA 65725, CA87969, CA49449, CA67262, CA50385 and 5UO1CA098233. The HEBCS study has been financially supported by the Helsinki University Central Hospital Research Fund, Academy of Finland (132473), the Finnish Cancer Society, The Nordic Cancer Union and the Sigrid Juselius Foundation. The population allele and genotype frequencies were obtained from the data source funded by the Nordic Center of Excellence in Disease Genetics based on samples regionally selected from Finland, Sweden and Denmark. We thank Drs Kirsimari Aaltonen, Päivi Heikkilä and Tuomas Heikkinen and RN Hanna Jäntti and Irja Erkkilä for their help with the HEBCS data and samples.
The biofeature analysis was supported by NIH grant CA109147.
The breast cancer GWAS in Japanese and Latinos in the MEC (MEC-LAT and MEC-JPT) were supported by NIH grants CA132839, CA54281 and CA63464. Genotyping of the Latino breast cancer cases and controls from SFBCS and NC-BCFR was supported by NIH grant CA120120.
We thank the women who volunteered to participate in each study. We also thank Madhavi Eranti, Andrea Holbrook, Paul Poznaik and David Wong from the University of Southern California for their technical support. We would also like to acknowledge co-investigators from the WCHS study: Dana H. Bovbjerg (University of Pittsburgh), Lina Jandorf (Mount Sinai School of Medicine) and Gregory Ciupak, Warren Davis, Gary Zirpoli, Song Yao and Michelle Roberts from Roswell Park Cancer Institute.
Conflict of Interest statement. None declared.