|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) have revealed 19 common genetic variants that are associated with breast cancer risk. Testing of the index signals found through GWAS and fine-mapping of each locus in diverse populations will be necessary for characterizing the role of these risk regions in contributing to inherited susceptibility. In this large study of breast cancer in African-American women (3016 cases and 2745 controls), we tested the 19 known risk variants identified by GWAS and replicated associations (P < 0.05) with only 4 variants. Through fine-mapping, we identified markers in four regions that better capture the association with breast cancer risk in African Americans as defined by the index signal (2q35, 5q11, 10q26 and 19p13). We also identified statistically significant associations with markers in four separate regions (8q24, 10q22, 11q13 and 16q12) that are independent of the index signals and may represent putative novel risk variants. In aggregate, the more informative markers found in the study enhance the association of these risk regions with breast cancer in African Americans [per allele odds ratio (OR) = 1.18, P = 2.8 × 10−24 versus OR = 1.04, P = 6.1 × 10−5]. In this detailed analysis of the known breast cancer risk loci, we have validated and improved upon markers of risk that better characterize their association with breast cancer in women of African ancestry.
Genome-wide association studies (GWAS) of breast cancer have identified at least 19 chromosomal regions that harbor common alleles that contribute to genetic susceptibility (1–10). These discoveries have allowed for improved understanding of genetic risk for this common cancer, although it is argued that many more markers will be needed to elucidate disease heritability, and in the clinical setting for disease prediction (11–13). Except for the breast cancer risk locus at 6q25 identified in a GWAS of Chinese women, the risk loci for breast cancer have been revealed in studies in women of European ancestry. We have recently shown in a multiethnic study that a summary score comprised of the index variants at many of these risk loci is statistically significantly associated with breast cancer risk in multiple populations [odds ratio (OR) per allele of >1.10], but not in African Americans (14). Similar studies in African-American women have also reported lack of replication with many of the reported index signals (15–17). Limited statistical power of these initial reports as well as variation in both allele frequency and patterns of linkage disequilibrium (LD) across populations may be contributing factors as to why the associations found in the GWAS populations may not be generalizable to African Americans. Association testing of the risk variants as well as fine-mapping in a sufficiently large sample of African Americans will be needed to identify and localize the subset of markers that best define risk of the functional allele(s) within known risk regions.
In the present study, we tested common genetic variation at the breast cancer risk loci identified in women of European and Asian descent in a large sample comprised of 3016 African-American breast cancer cases and 2745 controls to identify markers of risk that are relevant to this population. More specifically, we examined the index variants and conducted fine-mapping of the locus to both improve the current set of risk markers in African Americans as well as to identify new risk variants for breast cancer. We then applied this information to model breast cancer risk in African-American women in an attempt to characterize the spectrum of genetic risk in this population defined by common variants at the known risk loci.
The ages of cases and controls ranged from 22 to 87 years and 23 to 86 years, respectively, with cases and controls having similar mean ages (55 and 58 years, respectively; Supplementary Material, Table S1).
We tested 19 validated breast cancer risk variants (referred to as ‘index variants’ throughout the paper) at 1p11, 2q35, 3p24, 5p12, 5q11, 6q25, 8q24, 9p21, 9q31, 10p15, 10q21, 10q22, 10q26, 11p15, 11q13, 14q24, 16q12, 17q23 and 19p13 in models adjusted for age, study, global ancestry (the first 10 eigenvectors) and local ancestry (Table 1; Supplementary Material, Table S2) (1–10); 17 SNPs were directly genotyped, whereas 2 were imputed (r2> 0.98; see Materials and Methods). All 19 variants were common (≥0.05) in African Americans, with 11 variants being more common in Europeans than in African Americans (Table 1, Fig. 1). In previous GWAS, the index signals had modest ORs (1.05–1.29 per copy of the risk allele) and our sample size provided ≥70% statistical power to detect the reported effects for 12 of the 19 variants (at P < 0.05; Supplementary Material, Table S2).
We observed positive associations with 11 of the 19 variants (OR > 1); however, only 4 were statistically significant (P < 0.05 at 2q35, 9q31, 10q26 and 19p13; Table 1). Of the 15 variants that were not replicated at P < 0.05, statistical power was <70% for only 7 of the variants. Although power was more limited, we also evaluated associations by estrogen receptor (ER) status as some risk variants have been found to be more strongly associated with ER-positive (ER+) or ER-negative (ER−) breast cancer (2,18). We observed positive associations with 12 variants (2 at P < 0.05) for ER+ disease (n = 1520) and with 9 variants for ER− (3 at P < 0.05; n = 988) (Supplementary Material, Table S3). For only one variant did we observe statistically significant risk heterogeneity by ER status (rs13387042 at 2q35, P = 0.013) (Supplementary Material, Table S3).
Local ancestry was included in all models, as it was found to be associated with breast cancer risk in many regions (Supplementary Material, Table S4). We observed nominally significant associations between local ancestry and overall breast cancer, ER+ or ER− disease risk at 5 loci (5p12, 6q25, 8q24, 10p15, 10q26). The most statistically significant association was between European ancestry and ER+ breast cancer risk at 6q25 (OR per European allele chromosome = 1.19, P = 6.2 × 10−3). The inverse association observed between European ancestry and ER+ disease risk at 10q26 (OR per European chromosome = 0.85, P = 0.011) is consistent with previous reports of over-representation of African ancestry at this locus in many of these same cases (19,20).
Aside from statistical power, the lack of a statistically significant association with an index variant (OR > 1 and P < 0.05) suggests that the particular variant revealed in the GWAS populations may not be adequately correlated with the biologically relevant allele in African Americans. In an attempt to identify a better genetic marker of risk in African Americans, we conducted fine-mapping across all risk regions, using genotyped SNPs on the Illumina 1M array and imputed SNPs to Phase 2 HapMap populations (see Materials and Methods). If a marker associated with risk in African Americans represents the same signal as that reported in the initial GWAS, then it should be correlated to some degree with the index signal in the GWAS population. Using HapMap data for the populations in which the risk variant was identified [Utah residents with ancestry from northern and western Europe (CEU), or Han Chinese in Beijing, China (CHB)], we catalogued and tested all SNPs that were correlated (r2≥ 0.2) with the index signal (within 250 kb), applying an αa of 3.2 × 10−3 which was estimated to be 0.05 divided by the average number of tags needed to capture (r2≥ 0.8) the common risk alleles correlated with the index allele in each region in the Yoruba HapMap population [in Ibadan, Nigeria (YRI); Supplementary Material, Table S5]. We also tested for novel independent associations, focusing on SNPs that were uncorrelated with the index signal in the initial GWAS populations. Here, we applied a Bonferroni correction for defining novel associations as statistically significant in each region, with αb estimated to be 0.05 divided by the total number of tags needed to capture (r2≥ 0.8) all common risk alleles in the 19 regions in the YRI population (αb= 1.0 × 10−5; similar to the genome-wide-type correction of 5 × 10−8, which accounts for the number of tags needed to capture all common alleles in the genome; Supplementary Material, Table S5). For each region, stepwise logistic regression was used with SNPs kept in the final model based on αa or αb (results for each model are provided in Supplementary Material, Tables S6 and S7). These procedures were applied to all cases and controls as well as in hypothesis-generating analyses stratified by ER status.
At nine loci, we detected variants that were statistically significantly associated with breast cancer risk in African Americans. These regions include 9q31, where the sole marker of risk was the index signal (rs865686: OR = 1.08, P = 0.034; Table 1). In five of these nine regions, the index marker itself was not statistically significantly associated with disease risk. Through fine-mapping, we revealed markers in four regions that were more significantly associated with risk than the index signal (>1 order of magnitude change in the P-value) and are likely to capture the same signal (2q35, 5q11, 10q26 and 19p13). We also identified markers in four regions that are not correlated with the index signal in the GWAS populations (8q24, 10q22, 11q13 and 16q12) and may represent putative novel risk variants, with one being specific for ER+ disease (8q24) (Table 1, Fig. 2 and Supplementary Material, Table S8). These regions are discussed in what follows.
The index signal at 2q35 was statistically significantly associated with risk of overall breast cancer (rs13387042: OR = 1.12, P = 7.5 × 10−3; Table 1) and ER+ disease (OR = 1.22, P = 2.6 × 10−4; Supplementary Material, Table S3). However, we found stronger associations with two markers that are each modestly correlated with the index signal in CEU and YRI: rs13000023 with overall breast cancer (OR = 1.20, P = 5.8 × 10−4) and rs12998806 with ER+ disease (OR = 1.39, P = 3.3 × 10−6) (Table 1 and Supplementary Material, Table S8). As shown in Supplementary Material, Figure S1, the signal in this region appeared limited to ER+ breast cancer, which is consistent with the initial report of this risk locus (2) but not with subsequent large-scale replication efforts in European populations (21).
We found a positive non-significant association with the index signal at 5q11, which is located 79 kb centromeric of the MAP3K1 gene (rs889312: OR = 1.07, P = 0.084; Table 1). Fine-mapping revealed statistically significant associations with markers, rs16886165 for overall breast cancer (OR = 1.15, P = 6.5 × 10−4) and rs832529 for ER− disease (OR = 1.22, P = 1.3 × 10−3; Table 1 and Supplementary Material, Table S8). These SNPs show greater correlation with the index signal in Europeans (CEU, r2= 0.40 and 0.46) than in Africans (YRI, r2< 0.01 and r2= 0.09), which suggests that they may be better markers of the biologically functional variant in African Americans (Table 1, Fig. 2).
Both the index signal, rs2981582 (OR = 1.11, P = 8.6 × 10−3; Table 1) and rs2981578, which was identified previously through fine-mapping in African Americans (which some of these studies contributed to) (22), were statistically significantly associated with risk (OR = 1.24, P = 1.7 × 10−4, Table 1). Variant rs2981578 was the most strongly associated marker in the region for overall breast cancer and for ER+ disease, which is consistent with previous reports of variation in this region being more strongly associated with ER+ breast cancer (Supplementary Material, Table S8) (18). In fine-mapping the locus, we observed a suggestive association with a correlated marker and ER− disease (rs2912774: OR = 1.19, P = 2.1 × 10−3; Supplementary Material, Table S8); however, the association was also noted with ER+ disease (OR = 1.10, P = 0.041; Supplementary Material, Table S9) and is likely to capture the same signal as rs2981578.
19p13 was the first risk locus reported to harbor a variant that may be specific for ER− disease (9). In African Americans, the index variant was statistically significantly associated with risk of overall breast cancer (rs2363956: OR = 1.14, P = 8.0 × 10−4), as well as ER+ (OR = 1.12, P = 0.016) and ER− disease (OR = 1.14, P = 0.018; Table 1 and Supplementary Material, Table S3). The most significant association in the region for overall breast cancer and ER+ disease was with rs3745185 (P = 3.7 × 10−5 and P = 8.2 × 10−4, respectively), which is likely to capture the same functional variant (r2= 0.57 in CEU and 0.19 in YRI; Table 1 and Supplementary Material, Table S8). The most significant marker for ER− breast cancer was correlated with both rs2363956 and rs3745185 (rs11668840: OR = 1.25, P = 5.1 × 10−5; Supplementary Material, Tables S8 and S10).
Given the importance of the 8q24 locus in cancer, we conducted association testing across the entire cancer risk region (126.0–130.0 Mb) (23–25). The index signal (rs13281615) was not statistically significantly associated with risk in African Americans (Table 1 and Supplementary Material, Table S3), nor did we identify significant associations with correlated SNPs. However, we did detect a significant association with rs16902056 and ER+ breast cancer [risk allele frequency (RAF) 0.95; P = 6.7 × 10−6; ER−: P = 0.66; Supplementary Material, Table S8]. This SNP is located 78 kb centromeric of the index variant and is not correlated with the index variant (r2< 0.01 in CEU and r2= 0.027 in YRI). No statistically significant associations were observed with variants found previously in association with cancers of the bladder and ovary, or leukemia (rs9642880: OR = 1.03, P = 0.58; rs10088218: OR = 1.02, P = 0.62; rs2456449: OR = 1.07, P = 0.14) (26–28). Of the known risk variants for prostate cancer (29–35), we found a single nominally significant (P < 0.05) association with the same risk allele of rs1016343 (P = 0.015) which is located >260 kb centromeric of the breast cancer risk region and is not correlated with rs13281615 or rs16902056.
We observed no association with the index signal at 10q22 (rs704010) which is located in intron 1 of the gene ZMIZ1, or with any correlated markers. However, we did detect strong evidence of a second signal located 215 kb telomeric in intron 12 of the gene ZMIZ1 (rs12355688: OR = 1.24, P = 6.8 × 10−6). As is shown in Table 1 and Figure 2, this putative novel risk variant is not correlated with the index variant in the CEU or YRI populations (r2< 0.01).
No positive association was noted with the index variant at 11q13. However, we did detect evidence of a second independent signal (rs609275: OR = 1.20, P = 1.0 × 10−5), located 74 kb telomeric, and 53 kb centromeric of CCND1. The variant is monomorphic and uncorrelated with the index signal in the CEU population; and r2 with the index signal in the YRI population is <0.01 (Table 1).
As in previous studies of African Americans, we were not able to replicate the association signal defined by the index variant rs3803662 (Table 1) (15,16). A recent study of African Americans reported a suggestive association with SNP rs3104746, which is located 15 kb telomeric of rs3803662 (16). This SNP has a minor allele frequency (MAF) of 0.04 in the HapMap CEU population, 0.19 in our African-American controls, and is modestly correlated with rs3803662 in Africans (r2= 0.31 in YRI), but not in Europeans (r2= 0.038; Supplementary Material, Table S10). Fine-mapping around this putative signal revealed a perfect proxy (r2= 1) for rs3104746, rs3112572, which is significantly associated with breast cancer risk in African Americans (OR = 1.18, P = 3.9 × 10−4), with the association noted to be stronger for ER+ breast cancer (OR = 1.27, P = 3.1 × 10−5; Table 1 and Supplementary Material, Table S8).
For index SNPs found to be nominally associated with breast cancer risk, as well as risk-associated markers identified through fine-mapping, we also tested for associations by genotype. Results from the genotype-specific model were consistent with log-additive associations (Supplementary Material, Tables S9 and S11). Risk variants at 2q35 and 8q24 were also found to have significantly stronger associations with ER+ breast cancer than ER− disease (Supplementary Material, Table S7), which is consistent with previous studies (2,18).
We observed no statistically significant associations with common variation at 10 risk loci on 1p11, 3p24, 5p12, 6q25, 9p21, 10p15, 10q21, 11p15, 14q24 and 17q23 (Supplementary Material, Fig. S2). We also could not replicate the association with the recently identified SNP rs9397435 at 6q25 that was found through fine-mapping in European, African and Asian population samples (17) (P = 0.26 for overall breast cancer, P = 0.71 for ER+ and P = 0.36 for ER− tumor subtypes). Neither could we replicate the association with SNP rs4784227 at 16q12, which was identified by a recent multi-stage GWAS in women of Asian ancestry (36) in our African-American sample (P = 0.51 overall, P = 0.35 and P = 0.65 for ER+ and ER− subtypes, respectively).
We next estimated the cumulative effect of all breast cancer risk variants, and compared a summary risk score comprised of unweighted counts of all GWAS-reported risk variants with a risk score that included variants we identified as being associated with risk in African Americans (Table 2). Using the 19 index signals from GWAS (see Materials and Methods), the risk per allele was 1.04 [95% confidence interval (CI) 1.02–1.06; P = 6.1 × 10−5], and individuals in the top quintile of the risk allele distribution were at 1.4-fold greater risk (P = 7.4 × 10−5) of breast cancer compared with those in the lowest quintile (Table 2). As expected, the risk score was improved when utilizing the markers that we identified at the known risk loci as being more relevant to African Americans (eight markers for overall breast cancer: 2q35, 5q11, 9q31, 10q22, 10q26, 11q13, 16q12 and 19p13; OR = 1.18; 95% CI 1.14–1.22; P = 2.8 × 10−24), with risk for those in the top quartile being 2.2 times that observed in the lowest quintile (P = 3.6 × 10−17). This score was significantly associated with risk of both ER+ (OR = 1.20, P = 1.7 × 10−19) and ER− (OR = 1.15, P = 2.8 × 10−9) disease (Phet= 0.12) (Supplementary Material, Table S12).
Stratifying by first-degree family history of breast cancer differentiated risk further with those with a family history and in the top quintile of the risk score distribution (4% of the population) having a 3.4-fold greater risk (P = 9.9 × 10−14) compared with those without a family history and in the lowest quintile of the risk score (Table 2).
In hypothesis-generating analyses, we also developed risk scores for ER+ and ER− breast tumor subtypes, utilizing the most informative markers revealed through fine-mapping of each phenotype. These phenotype-specific scores were highly significant (ER+: OR = 1.30, P = 6.0 × 10−18; ER−: OR = 1.20, P = 2.3 × 10−10) with statistically significant heterogeneity noted when the scores were applied to the other subtype (Phet= 1.7 × 10−5 and 5.0 × 10−3 for ER+ and ER− scores, respectively) (Supplementary Material, Table S12).
In this large study of breast cancer in African-American women, we were able to replicate associations with 4 of the 19 index variants (at P < 0.05). Through fine-mapping, we observed that overall breast cancer risk was statistically significantly associated with markers in four regions which are likely to capture the GWAS-reported signal and to serve as better markers of the functional allele and risk in African Americans. We also detected putative novel associations that are independent of the index signals in three regions for overall breast cancer (10q22, 11q13 and 16q12) and in one region for ER+ disease (8q24). In 10 of the risk regions, however, we were not able to replicate the GWAS index signals, nor did we detect statistically significant associations of common SNPs with breast cancer risk at the levels of statistical significance we set for fine-mapping. The inability to replicate associations with the index signals despite adequate statistical power (>70% power for 12 of 19 variants) suggests that they are unlikely to be functional variants or capture the functional variants as efficiently in this population. Our ability to find associated markers in five regions where index signals were not significantly associated with risk also demonstrates the value of testing common variation at GWAS-identified risk loci in additional populations (14,16,17,22,37,38).
In four regions, we observed risk markers that are correlated with, and in the same LD block as the index markers in CEU (rs13000023 at 2q35, rs16886165 at 5q11, rs2981578 at 10q26 and rs3745185 at 19p13). It is likely that these risk markers capture the same signal as defined by the index markers based on the r2 values between these markers and the index markers (≥0.35). We cannot rule out the possibility, though, that some of them may represent a second, independent signal in the same region.
In the four regions where we observed independent signals, the risk alleles (rs16902056 at 8q24, rs12355688 at 10q22, rs609275 at 11q13 and rs3112572 at 16q12) were uncorrelated with, and not in, the same LD block as the index variant in Europeans (CEU, r2< 0.04)) (distances from the index signal ranged from 14 kb at 16q12 to 215 kb at 10q22) (Supplementary Material, Fig. S3). Therefore, these variants are likely to pick up a novel signal independent of the index signal. However, because of different LD patterns in European and African ancestry populations, they may each mark the same functional variant, and if the functional variant is less common it may not be well captured by either common marker alone. At 10q22, both the index SNP and the novel variant are located within introns of the ZMIZ1 gene. ZMIZ1 encodes zinc finger MIZ-type containing 1, which regulates the activity of various transcription factors (39–41). At 11q13, rs609275 lies 74kb telomeric of the index signal and in closer proximity to a number of candidate genes, including CCND1 (encoding cyclin D1, a protein crucial for cell-cycle control), ORAOV1 (encoding oral cancer overexpressed 1) and FGF19 (encoding fibroblast growth factor 19). The association at 16q12 confirms the findings of a previous, smaller study of African Americans (16), and is consistent with a previous fine-mapping study suggesting that African Americans may harbor a separate causal variant in this region (42). Whether this variant is influencing the same genes/pathways as the index variant rs3803662 is not known; however, the stronger associations noted for both variants with ER+ disease (2,18) suggest that they may affect the same biological process.
Notably, at region 19p13, which was originally reported in association with ER− breast cancer (9), the index signal was statistically significantly associated with both ER+ and ER− subtypes in African Americans. In addition, we found a stronger marker in this region (rs3745185) for ER+ as well as overall breast cancer risk (Table 1 and Supplementary Material, Table S8). We also found stronger associations with ER+ than ER− disease for variants in many regions, including 2q35, 8q24, 10q26 and 16q12, which is consistent with previous reports (2,18). In the study, we also found strong signals for ER− disease in regions 5q11, 10q26 and 19p13. It is possible that these signals may explain some of the excess risk for ER− disease in African Americans, since these risk alleles have higher frequencies in this population than they do in European-ancestry populations. However, our understanding of their contribution to racial and ethnic differences in disease incidence will only be determined once the functional variants have been identified and tested across populations. Unfortunately, we were not able to assess associations with triple-negative (ER/PR/HER2-negative; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2) breast cancer, since HER2 status was available for only a limited number of cases. However, in a large study of women of European ancestry which tested many of these same index variants, further stratification on tumor subtype using HER2 status was not additionally informative for ER/PR-negative breast cancer (43).
The observation of secondary signals at many loci, and associations of variants with different tumor subtypes that have not yet been reported in European-ancestry populations could indicate a different genetic architecture of breast cancer across populations. For example, the index signal at TNRC9 does not replicate in African Americans, but there appears to be a second risk variant that is unique to this population. At FGFR2, which was originally reported to be associated with ER+ disease in women of European ancestry, we found a signal for ER− disease with a marker correlated with the index variant. Similarly, for chromosome 19p13, which was reported as an ER− locus, we observed an association with ER+ breast cancer. However, these findings and their implications require further validation.
We investigated local ancestry as a potential confounding factor in the analysis of each risk locus. At five loci, we observed nominally significant evidence of association between local ancestry and breast cancer risk, with the most statistically significant association observed at 6q25 between European ancestry and ER+ breast cancer risk. Although the association of local ancestry and breast cancer risk needs to be validated in additional large studies, the inability to identify a risk variant that is differentiated in frequency between populations of European and African ancestry implies that either the association with local ancestry at many regions is a false-positive signal and/or we have not tested an adequate surrogate of the functional alleles.
The majority of the variants identified by GWAS for common cancers are of low risk (relative risks <1.30) and in aggregate are not yet informative for risk prediction (11–13). Until the functional alleles at each susceptibility locus are identified and their effects are accurately estimated, modeling of the genetic risk will rely on markers that best capture risk for a given population. Many of the markers we identified at these risk loci appear to have stronger associations with breast cancer risk compared with the GWAS-identified variants in African-American women. The risk score for overall breast cancer was also equally efficient for ER+ and ER− tumors. However, our hypothesis-generating model suggests that identification of tumor subtype-specific variants will improve the fit of these models.
While this is the largest study of African Americans to date to investigate genetic risk at known breast cancer susceptibility loci, statistical power was still limited. We had only 35% power to detect an OR of 1.10 for a risk allele of 0.10 frequency which may account for our inability to replicate GWAS signals or risk-associated markers in 10 of the regions. While attempting to apply a strict threshold for declaring significance through fine-mapping, we did not take into account testing for multiple phenotypes (overall breast as well as ER+ and ER− disease). As a result, the α-levels used as selection criteria may be too liberal. However, our risk modeling focused on the variants revealed for overall breast cancer, whereas we consider the associations observed for markers identified for ER+ or ER− disease and used in the subtype-specific risk modeling as hypothesis-generating. Since all of the cases and controls used for fine-mapping/discovery were also included in the risk modeling, the risk model is likely to over-estimate the level of association due to winner's curse. Instead of partitioning the sample into test and validation sets, we felt it was necessary to use all of the subjects in the association testing of known variants and in fine-mapping to increase the statistical power to detect associations in each region. Therefore, other studies with reasonable power in African Americans must be performed in the future to test the model presented.
In summary, through fine-mapping of the breast cancer susceptibility regions in a large sample of African-American women, we identified markers with enhanced association with breast cancer in this population. Validation and augmentation of this model are needed before risk modeling based on genetic variants of low risk can be implemented in the clinical setting.
The Institutional Review Board at the University of Southern California approved the study protocol.
This study included 9 epidemiological studies of breast cancer among African-American women, which comprise a total of 3153 cases and 2831 controls. Sample size and selected characteristics for these studies are summarized in Supplementary Material, Table S1. What follows is a brief description of these studies.
The MEC is a prospective cohort study of 215 000 men and women in Hawaii and Los Angeles (44) between the ages of 45 and 75 years at baseline (1993–1996). Through 31 December 2007, a nested breast cancer case–control study in the MEC included 556 African-American cases (544 invasive and 12 in situ) and 1003 African-American controls. An additional 178 African-American breast cancer cases (ages: 50–84) diagnosed between 1 June 2006 and 31 December 2007 in Los Angeles County (but outside of the MEC) were included in the study.
The CARE Study is a large multi-center, population-based case–control study that was designed to examine the effects of oral contraceptive use on invasive breast cancer risk among African-American women and white women aged 35–64 years in five US locations (45). Cases in Los Angeles County were diagnosed from 1 July 1994 through 30 April 1998, and controls were sampled by random-digit dialing (RDD) from the same population and time period; 380 African-American cases and 224 African-American controls were included in the study.
The WCHS is an ongoing case–control study of breast cancer among European women and African-American women in the New York City boroughs and in seven counties in New Jersey (46). Eligible cases included women with invasive breast cancer between 20 and 74 years of age; controls were identified through RDD. The WCHS contributed 272 invasive African-American cases and 240 African-American controls.
The SFBCS is a population-based case–control study of invasive breast cancer in Hispanic, African-American and non-Hispanic white women conducted between 1995 and 2003 in the San Francisco Bay Area (47). African-American cases, aged 35–79 years, were diagnosed between 1 April 1995 and 30 April 1999, with controls identified through RDD. Included from this study were 172 invasive African-American cases and 231 African-American controls.
The NC-BCFR is a population-based family study conducted in the Greater San Francisco Bay Area, and one of six sites of the Breast Cancer Family Registry (BCFR) (48). African-American breast cancer cases in NC-BCFR were diagnosed after 1 January 1995 and between the ages of 18 and 64 years; population controls were identified through RDD. Genotyping was conducted for 440 invasive African-American cases and 53 African-American controls.
The CBCS is a population-based case–control study conducted between 1993 and 2001 in 24 counties of central and eastern North Carolina (49). Cases were identified by rapid case ascertainment system in cooperation with the North Carolina Central Cancer Registry, and controls were selected from the North Carolina Division of Motor Vehicle and United States Health Care Financing Administration beneficiary lists. Participants' ages ranged from 20 to 74 years. DNA samples were provided from 656 African-American cases with invasive breast cancer and 608 African-American controls.
PLCO, coordinated by the US National Cancer Institute (NCI) in 10 US centers, enrolled approximately 155 000 men and women aged 55–74 years during 1993–2001 in a randomized, two-arm trial to evaluate the efficacy of screening for these four cancers (50). A total of 64 African-American invasive breast cancer cases and 133 African-American controls contributed to this study.
The NBHS is a population-based case–control study of incident breast cancer conducted in Tennessee (15). The study was initiated in 2001 to recruit patients with invasive breast cancer or ductal carcinoma in situ, and controls, recruited through RDD between the ages of 25 and 75 years. NBHS contributed 310 African-American cases (57 in situ) and 186 African-American controls.
African-American breast cancer cases and controls in WFBC were recruited at Wake Forest University Health Sciences from November 1998 through December 2008 (51). Controls were recruited from the patient population receiving routine mammography at the Breast Screening and Diagnostic Center. Age range of participants was 30–86 years. WFBC contributed 125 cases (116 invasive and 9 in situ) and 153 controls to the analysis.
Genotyping in stage 1 was conducted using the Illumina Human1M-Duo BeadChip. Of the 5984 samples from these studies (3153 cases and 2831 controls), we attempted genotyping of 5932, removing samples (n = 52) with DNA concentrations <20 ng/μl. Following genotyping, we removed samples based on the following exclusion criteria: (i) unknown replicates (≥98.9% genetically identical) that we were able to confirm (only one of each duplicate was removed, n = 15); (ii) unknown replicates that we were not able to confirm through discussions with study investigators (pair or triplicate removed, n = 14); (iii) samples with call rates <95% after a second attempt (n = 100); (iv) samples with ≤5% African ancestry (n = 36) (discussed in what follows); and (v) samples with <15% mean heterozygosity of SNPs on the X chromosome and/or similar mean allele intensities of SNPs on the X and Y chromosomes (n = 6) (these are likely to be males).
In the analysis, we removed SNPs with <95% call rates (n = 21 732) or MAFs <1% (n = 80 193). To assess genotyping reproducibility, we included 138 replicate samples; the average concordance rate was 99.95% (>99.93% for all pairs). We also eliminated SNPs with genotyping concordance rates <98% based on the replicates (n = 11 701). The final analysis data set included 1 043 036 SNPs genotyped on 3016 cases (1520 ER+, 988 ER− and the remaining 508 cases with unknown ER status) and 2745 controls, with an average SNP call rate of 99.7% and average sample call rate of 99.8%.
We used principal components analysis (52) to estimate global ancestry among the 5761 individuals, using 2546 ancestry informative markers. Eigenvector 1 was highly correlated (ρ = 0.997, P < 1 × 10−16) with percentage of European ancestry, estimated in HAPMIX (53), and accounted for 10.1% of the variation between subjects; subsequent eigenvectors accounted for no more than 0.5%. At each locus and for each participant, we also estimated local ancestry [i.e. the number of European chromosomes (continuous between 0 and 2) carried by the participant], using the HAPMIX program (53). To summarize local ancestry at each region, for each individual we averaged across all local ancestry estimates that were within the start and end points of the region (Supplementary Material, Table S5). To address the potential for confounding by genetic ancestry, we adjusted for both global and local ancestry in all analyses.
In order to generate a data set suitable for fine-mapping, we carried out genome-wide imputation using the software MACH (54). Phased haplotype data from the founders of the CEU and YRI HapMap Phase 2 samples were used to infer LD patterns in order to impute ungenotyped markers. The r2 metric, defined as the observed variance divided by the expected variance, provides a measure of the quality of the imputation at any SNP, and was used as a threshold in determining which SNPs to filter from analysis (r2< 0.3). Of the 1 539 328 common SNPs (MAF ≥ 0.05) in the YRI population in HapMap Phase 2, we could impute 1 392 294 (90%) with r2≥ 0.8. For all the imputed SNPs presented in Results and the tables reported herein, the average r2 was 0.92 (estimated in MACH).
For each typed and imputed SNP, ORs and 95% CIs were estimated using unconditional logistic regression adjusting for age at diagnosis (or age at the reference date for controls), study, the first 10 eigenvalues and local ancestry. For each SNP, we tested for allele dosage effects through a 1 d.f. Wald χ2 trend test.
We fine-mapped each risk locus using the combined genotyped and imputed SNPs in search of (i) an SNP that is more associated with risk in African Americans than the index signal; and (ii) a novel signal that is independent of the index signal. As some risk loci have been found to be more strongly associated with breast cancer subtypes, we investigated three outcomes: (i) overall breast cancer, (ii) ER+ breast cancer, and (iii) ER− breast cancer, with the latter two being hypothesis-generating. These analyses included SNPs (genotyped and imputed) spanning 250 kb upstream and 250 kb downstream of each index signal. If the index signal was contained within an LD block (based on the D′ statistic) of >250 kb, then the region was extended to include the entire region of LD.
Stepwise regression was performed by region to select the most informative risk variants as discussed in what follows, in models adjusted for age, study, global ancestry (the first 10 eigenvectors) and local ancestry. In the stepwise regression, we preserved the original sample size by using the mean genotype of typed subjects in place of ‘no-calls’ for SNPs with <100% genotyping completion rate.
Within each known risk locus, it is expected that markers that are associated with risk in African Americans will be correlated with the index signal reported in Europeans. Thus, we identified and tested SNPs that are correlated (r2> 0.2) with the index signals in the GWAS populations (HapMap CEU or CHB for 6q25). For each region, we determined the number of tags needed to capture all the SNPs correlated with the index signal in the YRI population (Phase 2 HapMap). The average number of tags in each region was then used as the correction factor for Bonferroni correction. An α-level of 0.05 divided by average number of tags needed in each region was applied in the stepwise regression process. For all of the remaining markers that were not correlated with the index signal (in Europeans), we applied a more stringent α-level for defining statistical significance. In each risk region, we determined the number of tag SNPs needed to capture all common alleles (MAF > 0.05, with r2> 0.8) in the YRI HapMap population. The total number of tags across the 19 regions was then used as a correction factor, as they define the number of independent tests in each region. An α of 0.05 divided by the number of tags was applied to assess statistical significance for any putative novel, independent signal in each region. For correlated SNPs that were selected to be better markers, we also assessed phase to ensure that the new risk allele is on the same haplotype as the GWAS-reported risk allele in the HapMap CEU population.
We modeled the cumulative genetic risk of breast cancer using the risk variants reported in previous GWAS (total = 19). We compared the results with a model of the SNPs found to be significantly associated with risk in African Americans, which included SNPs identified from the stepwise procedures at all loci for overall breast cancer risk (presented in Table 1). More specifically, in each case we summed the number of risk alleles for each individual and estimated the OR per allele for this aggregate-unweighted allele count variable as an approximate risk score appropriate for unlinked variants with independent effects of approximately the same magnitude for each allele. We then applied this risk score to overall breast cancer as well as ER+/ER− breast cancer subtypes. We also constructed risk scores based on risk alleles for ER+ and ER− tumor subtypes separately, and, as hypothesis-generating, applied both risk scores to overall and ER+/ER− breast cancer subtypes.
Conflict of Interest statement. None declared.
This work was supported by a Department of Defense Breast Cancer Research Program Era of Hope Scholar Award to C.A.H. (W81XWH-08-1-0383), a National Institute of Health grant to C.A.H. (R01-CA132839), the Norris Foundation, and a grant from the California Breast Cancer Research Program to D.O.S. (15UB-8402). Each of the participating studies was supported by the following grants: MEC: by National Institutes of Health (R01-CA63464 and R37-CA54281); CARE: by National Institute for Child Health and Development (NO1-HD-3-3175), WCHS: by US Army Medical Research and Material Command (USAMRMC) (DAMD-17-01-0-0334); the National Institutes of Health (R01-CA100598); and the Breast Cancer Research Foundation; SFBCS: by National Institutes of Health (R01-CA77305) and United States Army Medical Research Program (DAMD17-96-6071); NC-BCFR: by National Institutes of Health (U01-CA69417); CBCS: by National Institutes of Health Specialized Program of Research Excellence in Breast Cancer (P50-CA58223) and Center for Environmental Health and Susceptibility, National Institute of Environmental Health Sciences, National Institutes of Health (P30-ES10126); PLCO: by Intramural Research Program, National Cancer Institute, National Institutes of Health; NBHS: by National Institutes of Health (R01-CA100374); WFBC: by National Institutes of Health (R01-CA73629). The Breast Cancer Family Registry (BCFR) was supported by the National Cancer Institute, National Institutes of Health under (RFA CA-95-011) and through cooperative agreements with members of the Breast Cancer Family Registry and Principal Investigators.
We thank the women who volunteered to participate in each study. We also thank Madhavi Eranti, Andrea Holbrook, Paul Poznaik, Loreall Pooler, Xin Sheng and David Wong from the University of Southern California for their technical support. We would also like to acknowledge co-investigators from the WCHS study: Dana H. Bovbjerg (University of Pittsburgh), Lina Jandorf (Mount Sinai School of Medicine) and Gregory Ciupak, Warren Davis, Gary Zirpoli, Song Yao and Michelle Roberts from Roswell Park Cancer Institute.