|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) in diverse populations are needed to reveal variants that are more common and/or limited to defined populations. We conducted a GWAS of breast cancer in women of African ancestry, with genotyping of > 1,000,000 SNPs in 3,153 African American cases and 2,831 controls, and replication testing of the top 66 associations in an additional 3,607 breast cancer cases and 11,330 controls of African ancestry. Two of the 66 SNPs replicated (p < 0.05) in stage 2, which reached statistical significance levels of 10−6 and 10−5 in the stage 1 and 2 combined analysis (rs4322600 at chromosome 14q31: OR = 1.18, p = 4.3×10−6; rs10510333 at chromosome 3p26: OR = 1.15, p = 1.5×10−5). These suggestive risk loci have not been identified in previous GWAS in other populations and will need to be examined in additional samples. Identification of novel risk variants for breast cancer in women of African ancestry will demand testing of a substantially larger set of markers from stage 1 in a larger replication sample.
Genome-wide association studies (GWAS) of breast cancer have been conducted almost exclusively in populations of European ancestry, and have firmly established associations with a number of common susceptibility loci that contribute modest effects (relative risks ≤ 1.3) (Ahmed et al. 2009; Antoniou et al. 2010; Easton et al. 2007; Fletcher et al. 2011; Ghoussaini et al. 2012; Haiman et al. 2011b; Hunter et al. 2007b; Kim et al. 2012; Long et al. 2012; Stacey et al. 2007; Stacey et al. 2008; Thomas et al. 2009; Turnbull et al. 2010; Zheng et al. 2009b). These discoveries provide support for the polygenic model of breast cancer susceptibility (Pharoah et al. 2002), as well as clues as to important biological pathways involved in the pathogenesis of breast cancer. For example, the most strongly associated risk locus for breast cancer revealed through GWAS has been the region containing the fibroblast growth factor receptor 2 (FGFR2) at chromosome 10q26 (Easton et al. 2007; Hunter et al. 2007a; Meyer et al. 2008). FGFR2 is a member of the FGFR family of receptor tyrosine kinases (RTKs) which regulate cell proliferation, differentiation and apoptosis (Tenhagen et al. 2012). The risk variant on chromosome 14q24 is located in intron 12 of RAD51B which is a member of the RAD51 protein family. RAD51 proteins are essential for DNA repair by homologous recombination (Tarsounas et al. 2004), a DNA repair pathway with an established and important role in breast cancer development. A more recent study, which included African American subject from the current study, revealed a risk marker at the telomerase reverse transcriptase (TERT) locus (Haiman et al. 2011b), a protein that controls telomere length and is also implicated in oncogenesis (Kim et al. 1994). Many of the risk variants identified by GWAS, however, are located in gene deserts, or near genes with roles in breast cancer etiology that are currently unknown.
The search for additional low penetrance alleles for breast cancer in specific racial/ethnic populations has revealed additional variants that are important globally or more common and/or limited to defined populations. For example, a GWAS conducted among Chinese women identified a novel risk locus for breast cancer near the gene for the estrogen receptor (ER) on chromosome 6 which had not been revealed in previous, well-powered GWAS in populations of European ancestry (Zheng et al. 2009b). A GWAS of prostate cancer in men of African ancestry also identified a novel risk variant at 17q12 that is not observed in other populations (Haiman et al. 2011a). In search for risk variants for breast cancer that may be important to women of African ancestry, we analyzed > 1 million common SNPs in 3,153 African American breast cancer cases and 2,831 African American controls, and examined the most statistically significant associations in a second stage of 3,607 cases and 11,330 controls of African ancestry.
Stage 1 of the GWAS included African American participants from 9 epidemiological studies of breast cancer, comprising a total of 3,153 cases and 2,831 controls (cases/controls: The Multiethnic Cohort study (MEC), 734/1,003; The Los Angeles component of The Women’s Contraceptive and Reproductive Experiences (CARE) Study, 380/224; The Women’s Circle of Health Study (WCHS), 272/240; The San Francisco Bay Area Breast Cancer Study (SFBCS), 172/231; The Northern California Breast Cancer Family Registry (NC-BCFR), 440/53; The Carolina Breast Cancer Study (CBCS), 656/608; The Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Cohort, 64/133; The Nashville Breast Health Study (NBHS), 310/186; and, The Wake Forest University Breast Cancer Study (WFBC), 125/153). Replication testing was conducted in an independent sample of 3,607 breast cancer cases and 11,330 controls from 9 additional studies of breast cancer in women of African ancestry (The Black Women’s Health Study (BWHS), 826/1,167; The Women’s Insights and Shared Experiences study (WISE), 174/458; NBHS/Southern Community Cohort (SCCS), 981/851; The Nigerian Breast Cancer Study (NBCS), 681/282; The Barbados National Cancer Study (BNCS), 93/244; The Racial Variability in Genotypic Determinants of Breast Cancer Risk Study (RVGBC), 151/272; The Baltimore Breast Cancer Study (BBCS), 117/111; The Chicago Cancer Prone Study (CCPS), 268/261; and, The Women’s Health Initiative (WHI), 316/7,484).
Sample size and selected characteristics for these studies are summarized in Supplemental Tables 1 and 2 and detailed information about the design and organization of each study is provided in Supporting Information.
Genotyping in stage 1 was conducted using the Illumina Human1M-Duo BeadChip. Of the 5,984 samples from these studies (3,153 cases and 2,831 controls), we attempted genotyping of 5,932, removing samples (n = 52) with DNA concentrations < 20 ng/ul. Following genotyping, we removed samples based on the following exclusion criteria: 1) unexpected replicates (≥ 98.9% genetically identical) that we were able to confirm through discussions with study investigators (only one of each replicate was removed, n = 15); 2) unknown replicates that we were not able to confirm (pair or triplicate removed, n = 14);3) samples with call rates < 95% after a second genotyping attempt (n = 100); 4) samples with ≤ 5% African ancestry (n = 36) (discussed below); and 5) samples with < 15% mean heterozygosity of SNPs on the X chromosome and/or similar mean allele intensities of SNPs on the X and Y chromosomes (n = 6) as these are likely to be males.
We removed SNPs with < 95% call rate (n = 21,732) or minor allele frequencies (MAFs) < 1% (n = 80,193). To assess genotyping reproducibility we included 138 known replicate samples; the average concordance rate was 99.95% (> 99.93% for all pairs). We also eliminated SNPs with genotyping concordance rates < 98% based on the replicates (n = 11,701). The final analysis dataset included 1,043,036 SNPs genotyped on 3,016 cases and 2,745 controls, with an average SNP call rate of 99.7% and average sample call rate of 99.8%. Hardy-Weinberg equilibrium (HWE) was not used as a criterion for removing SNPs; none of the SNPs selected for replication deviated from HWE in controls in each study (based on a cut-off of p < 0.001).
We selected 66 SNPs with p-values < 2×10−4 in stage 1 for evaluation in the second stage. These SNPs were selected from 53 regions following linkage disequilibrium (LD) pruning of correlated SNPs. Two of these SNPs were located near a previously validated breast cancer risk locus [rs12355688 at 10q22, 241 kb downstream of rs704010, r2 = 0 in both CEU and YRI populations from 1000 Genomes Project (March 2010 release) (Turnbull et al. 2010); and rs3745185 at 19p13, 10kb downstream of rs2363956, r2 = 0.57 and 0.19 in the CEU and YRI populations from 1000 Genomes Project (March 2010 release), respectively (Antoniou et al. 2010)]. Genotyping in the replication studies was performed using the Sequenom platform (BWHS), OpenArray (WISE and NBHS/SCCS), the Affymetrix 6.0 SNP array (WHI) (Hutter et al. 2011) and Illumina GoldenGate (all other studies) (see Supporting Information). Blinded duplicate samples (5–10%) were included in the replication studies and concordance of these samples was ≥ 98% in all studies. The number of SNPs that were genotyped successfully in each stage 2 study ranged from 51 to 63. The average call rate for all SNPs in stage 2 was 98.8% (range for call rates of a SNP within study: 71.4–100%). Call rates by SNP and study are shown in Supplemental Table 3.
In stage 1, we utilized STRUCTURE (Pritchard et al. 2000) to infer percent African ancestry on an individual level. A total of 2,546 ancestry-informative SNPs from the Illumina array were selected based on low inter-marker correlation and ability to differentiate between samples of African and European descent. In evaluating the distribution of the fraction of African ancestry across the stage 1 populations, statistically significant differences (ANOVA p < 10−16) were noted (Supplemental Figure 1). We also applied principal components analysis (PCA) (Price et al. 2006) to estimate axes of variation among the 5,761 individuals using the same 2,546 ancestry informative markers. The first eigenvector accounted for 10.1% of the variation between subjects, and subsequent eigenvectors accounted for no more than 0.5%. Using input genotypes from the HapMap populations, CEU (CEPH Utah), YRI (Yoruba), and JPT (Japanese), we determined that the first eigenvector captures clearly differentiates between Europeans (CEU) and West Africans (YRI) in the HapMap samples (Supplemental Figure 2).
We examined the observed versus the expected distribution of the Chi-squared test statistics using a 1-degree of freedom (df) trend test, comparing genotype counts for each SNP in cases versus controls. All tests of statistical significance were two-sided. To improve coverage, we augmented the set of SNPs tested for association through imputation using MACH (Li and Abecasis 2006). Phased haplotypes from the120 CEU and 120 YRI founders in HapMap Phase 2 were used to infer genotypes of all Phase 2 SNPs that were not available on the Illumina 1M Duo or did not pass our quality control (QC) criteria. Odds ratios (OR) and 95% confidence intervals (CI) for each SNP were estimated using unconditional logistic regression, adjusting for age, the first eigenvector and study. The SFBCS and NC-BCFR studies were conducted in the same San Francisco Bay Area population and were combined in all analyses.
In the replication studies, ORs and 95% CIs for each SNP were estimated using unconditional logistic regression, adjusting for age, region within the WHI and estimated genetic ancestry. Ancestry information was available for all stage 2 studies except WISE (Supporting Information). Overall testing of single SNP associations was conducted via meta-analyses of results from the stage 1 and stage 2 studies.
We also conducted combined GWAS and admixture-based statistical tests to assess the contribution of local ancestry on the SNP associations. For each subject in our analysis, we inferred local ancestry, which defines the proportion of European and African ancestry at each genotyped and imputed SNP. To infer local ancestry in our GWAS panel of 5,761 African American women, we applied the program HAPMIX (Price et al. 2009). HAPMIX builds a Hidden Markov Model (HMM) using phased haplotype data that are representative of the two source populations assumed to be ancestral to the admixed (study) data. In this case, we provided the same HapMap dataset that was used for imputation (i.e. 240 CEU + YRI founder haplotypes per chromosome) as input. HAPMIX reports posterior probabilities for each subject at each SNP of carrying 0, 1 and 2 copies of a European allele.
Combined GWAS and admixture-based statistical tests were conducted to make inferences about regions of the genome that explain not only case-control differences in disease risk based on SNP associations, but also risk differences based on local genetic ancestry. We utilized the MIXSCORE program (Pasaniuc et al. 2011) which takes as input results from a GWAS scan and an admixture scan (specifically HAPMIX output), computes several statistics that incorporate allele frequency information from both sources of evidence. The SUM score is a 2-df Chi squared test that simultaneously tests for association (i.e. a case-control difference in allele frequency) and admixture evidence (i.e. a deviation from the genome-wide proportion of European ancestry). The MIX score also tests for both evidence of admixture and association, but assumes the odds ratios for admixture and association are equal, which is potentially more powerful when this assumption is true since it is a 1-df test.
The stage 1 analysis included 3,016 cases and 2,745 controls among African American women from 9 epidemiological studies of breast cancer. The age of the cases and controls in stage 1 ranged from 22 to 87 years with the median ages being 55 and 58 years, respectively (Supplemental Table 1). The analysis of the most statistically significant associations from stage 1 was conducted in 3,533 cases and 11,046 controls from an additional 9 studies. The age of the cases and controls in stage 2 ranged from 18 to 92 years with the median ages being 50 and 53 years, respectively (Supplemental Table 2).
We observed no evidence of inflation of the test statistic (λ = 1.01) for the 1,043,036 genotyped and 2,067,098 imputed SNPs analyzed in stage 1, and no excess of very small p-values beyond what was expected (Figure 1). We observed no SNP to be associated with disease status at a genome-wide level of significance (p < 5×10−8) in stage 1 (Figure 2). The most statistically significant association was noted with SNP rs7610073 located in intron 2 of the gene GRM7 (metabotropic glutamate receptor 7) on chromosome 3p26 (risk allele frequency 0.64; OR per allele = 1.22; p = 7.4×10−7). A second signal was also noted ~486 kb upstream of GRM7 (rs10510333: risk allele frequency = 0.18; OR per allele = 1.24; p = 8.2×10−6). The associations with these 2 markers were independent and remained statistically significant when both were included in the same model (p-values of 8.3×10−7 and 9.3×10−6, respectively). Shown in Table 1 are the genotyped SNPs with p-values < 10−5 in stage 1, as well as SNPs that replicated in stage 2 (discussed below).
We selected 66 genotyped SNPs with association p-values less than 2×10−4 for replication testing in the stage 2 studies. None of these SNPs replicated with stage 2-wide significance of < 0.0008 (0.05/66), but 2 replicated with a p-value < 0.05 and an OR in the same direction as that observed in stage 1 (Table 1). Combining results from stages 1 and 2, no SNP achieved genome-wide significance. The smallest combined p-values were noted for the two SNPs that replicated in stage 2: rs4322600 located ~100 kb upstream of the gene GALC (galactosylceramidase) on chromosome 14q31 (risk allele frequency = 0.78, OR per allele = 1.18, p = 4.3×10−6) and rs10510333 located ~486 kb upstream of GRM7 on chromosome 3p26 (risk allele frequency = 0.18, OR per allele = 1.15, p = 1.5×10−5) (Table 1). We found no strong statistical evidence that the associations with these two loci differ by ER status (p-values for heterogeneity in case-only testing: rs10510333: p = 0.67; rs4322600: p = 0.85)
Using the MIXSCORE program, we simultaneously tested the null hypothesis of no association and admixture at each loci defined by the 66 most significant variants identified in Stage 1. SNP rs7610073, which had the largest MIX score of 24.5 (p = 7.5×10−7) also had the smallest p-value in the first stage (Supplemental Table 4). The risk allele (the “A” allele for rs7610073) was not strongly differentiated (60% in HapMap YRI versus 81% in HapMap CEU) and the MIX score p-value was almost identical to the p-value from our association scan. Association p-values were generally stronger than the SUM or MIX score, so admixture did not make a substantive contribution in joint evidence of admixture and association for these 66 SNPs, as indicated in Supplemental Table 4. All together, these findings seem to indicate that the associations at the most significant loci in Stage 1 are not influenced by differences in local ancestry between cases and controls, meaning that any causal variants in these regions are not appreciably differentiated in frequency between cases and controls.
Genome-wide studies of common and rare genetic variation conducted in multiple populations will be required to reveal the complete spectrum of susceptibility alleles that contribute to risk of breast cancer globally. In a genome-wide scan of common genetic variation in > 3,000 African American cases and > 2,700 controls, followed by replication testing of the most significant associations (p < 2×10−4) in an independent set of > 3,500 cases and > 11,000 controls, we identified two suggestive associations with breast cancer risk that replicated in stage 2 at p < 0.05 [chromosome 14q31 (p = 4.3×10−6) and 3p26 (p = 1.5×10−5)]; however, these associations did not reach the standard level of genome-wide significance. These regions have not been highlighted in previous GWAS conducted in other racial/ethnic populations and each association requires further validation in additional studies.
Populations of African ancestry have greater genetic diversity and lower levels of LD among chromosomal loci (Campbell and Tishkoff 2008; Reed and Tishkoff 2006). Because of LD patterns and allele frequencies that differ from non-African populations, GWAS results from European or Asian populations are not always replicable in populations of African ancestry (Chen et al. 2010; Huo et al. 2012; Hutter et al. 2011; Ruiz-Narvaez et al. 2010; Zheng et al. 2009a). Fine-mapping of known breast cancer risk loci in populations of African ancestry have revealed risk-associated markers that are more relevant to African populations and contribute to modeling of genetic risk in this population (Chen et al. 2011; Ruiz-Narvaez et al. 2010; Udler et al. 2009). Large GWAS in populations of African ancestry, with proper control of population structure, will be required to discover additional disease susceptibility variants that better define the genetic profile of breast cancer in this population.
A strength of the present study is that it includes most existing case-control studies of breast cancer conducted in women of African ancestry. In this 2-stage design, we had 80% statistical power to identify a common risk variant (frequency of ≥ 10%) that conveys a risk per allele of 1.3 at genome-wide significance (p = 5×10−8). Thus, we were able to rule out variants with large effects if they were among the top 0.007% in stage 1 (and thus taken to stage 2) and were adequately tagged by the common SNPs on the 1M array. However, we are likely to have missed some milder associations. In previous GWAS of breast cancer in European ancestry populations, most risk variants eventually identified were not among the most statistically significant in stage 1 and were only revealed through testing of large numbers of SNPs in additional replication stages. To identify novel risk loci for breast cancer in African ancestry populations will require continued collaborative efforts and investigators willing to test larger numbers of SNPs in their respective studies.
Our attempt to apply joint admixture and association mapping, using MIXSCORE, did not provide additional suggestive risk variants beyond those found using association methods alone. This suggests that the associations observed at the most significant regions in Stage 1 are not weakened by ancestry differences between cases and controls, and thus, the biologically functional alleles are unlikely to be highly differentiated in frequency between cases and controls. Because of the limited number of ER-negative cases in stage 1 (n = 988) and stage 2 (n = 423) the statistical power to look at subtypes with rate differences (e.g. ER-negative disease, more common in African American than European American women) was limited and not attempted for GWAS or admixture testing. However, in collaboration with GWAS of ER-negative breast cancer in European ancestry populations, which have substantially larger numbers of ER-negative cases, we have identified a novel locus for ER-negative breast cancer at 5p15 (TERT) (Haiman et al. 2011b). Genetic variation at this locus may contribute in part to the higher incidence of ER-negative disease subtypes in women of African ancestry (frequency of 0.56 in African Americans and frequency of 0.26 in Whites) (Haiman et al. 2011b). As for the analysis of overall breast cancer, larger studies of breast cancer in women of African ancestry will be needed to search for novel risk loci for ER-negative disease subtypes that are important for and may be limited to this population.
This study is the first genome-wide investigation of common genetic variation in relationship with breast cancer risk in women of African ancestry. The suggestive associations noted with risk variants at 14q31 and 3p26 require further validation in additional samples of African ancestry as well as in other populations. Identification of common risk variants for breast cancer in African ancestry populations will require testing a larger number of the most statistically significant SNPs from stage 1 in additional samples.
This work was supported by a Department of Defense Breast Cancer Research Program Era of Hope Scholar Award to CAH [W81XWH-08-1-0383] and the Norris Foundation. Each of the participating studies was supported by the following grants: MEC (National Institutes of Health grants R01-CA63464 and R37-CA54281); CARE (National Institute for Child Health and Development grant NO1-HD-3-3175, K05 CA136967); WCHS (U.S. Army Medical Research and Material Command (USAMRMC) grant DAMD-17-01-0-0334, the National Institutes of Health grant R01-CA100598, and the Breast Cancer Research Foundation); SFBCS (National Institutes of Health grant R01-CA77305 and United States Army Medical Research Program grant DAMD17-96-6071); NC-BCFR (National Institutes of Health grant U01-CA69417); CBCS (National Institutes of Health Specialized Program of Research Excellence in Breast Cancer, grant number P50-CA58223, and Center for Environmental Health and Susceptibility National Institute of Environmental Health Sciences, National Institutes of Health, grant number P30-ES10126); PLCO (Intramural Research Program, National Cancer Institute, National Institutes of Health); NBHS (National Institutes of Health grantR01-CA100374); SCCS (National Institutes of Health grant R01-CA092447), WFBC (National Institutes of Health grant R01-CA73629); BWHS (National Institutes of Health grants R01-CA58420 and R01-CA98663) and WISE (National Institutes of Health grant P01-CA77596). OI Olopade and D Huo were supported by National Institutes of Health Specialized Program of Research Excellence in Breast Cancer, grant number P50-CA125183 and National Cancer Institute R01-CA141712. BBCS is supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. The Breast Cancer Family Registry (BCFR) was supported by the National Cancer Institute, National Institutes of Health under RFA-CA-06-503 and through cooperative agreements with members of the Breast Cancer Family Registry and Principal Investigators. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the BCFR, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government or the BCFR. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institute of Health, U.S. Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, and 44221. Funding for WHI SHARe genotyping was provided by NHLBI Contract N02-HL-64278.
We thank the women who volunteered to participate in each study. We also thank Madhavi Eranti, Andrea Holbrook, Paul Poznaik, David Wong and Lucy Xia from the University of Southern California for their technical support. We would also like to acknowledge co-investigators from the WCHS study: Dana H. Bovbjerg (University of Pittsburgh), Lina Jandorf (Mount Sinai School of Medicine) and Gregory Ciupak, Warren Davis, Gary Zirpoli, Song Yao and Michelle Roberts from Roswell Park Cancer Institute.
Ethnical statement: The experiments done in this manuscript comply with the current laws of the country of USA.
Conflict of interest: The authors declare that they have no conflict of interest.