Breast cancer, a complex multifactorial disease, is one of the most common malignancies among women in the world. Genetic factors play an important role in the pathogenesis of both sporadic and familial breast cancer1–3
. However, only a small fraction of breast cancer cases can be explained by the breast cancer susceptibility genes identified thus far, such as the BRCA1
. Family-based linkage studies have been successful in mapping genes associated with Mendelian disorders1–3, 7
. However, this approach has had limited success in identifying common genetic variants that confer small to moderate risk of disease susceptibility. Over the past 15 years, a large number of association studies have evaluated genetic variants in many candidate genes in relation to breast cancer risk 1–3, 7–9
. Although numerous genetic variants have been implicated, only a few of them have been replicated in subsequent studies10, 11
. Four recent GWA studies have identified several novel risk alleles for breast cancer12–15
. All these studies, however, are conducted in women of European decent who differ from women of other ethnic groups in certain genetic architecture. Therefore, additional GWA studies, particularly those conducted in non-European decent population, are needed to fully uncover genetic basis for breast cancer susceptibility.
Since 1996 we have initiated multiple population-based epidemiologic studies of cancer in Shanghai, China, including the Shanghai Breast Cancer Study (SBCS) (see Supplementary Methods
). Included in Stages I to III of the current GWA study were genomic DNA samples from 6,531 incident breast cancer cases and 3,998 community controls who participated in these studies (). The pilot phase of the GWA study was initiated in 2005 with 150 cases and 150 controls genotyped using the Affymetrix GeneChip Human Mapping 500K Array Set that contains approximately 500,000 SNPs. An additional 1,374 cases and 1,402 controls were genotyped in 2008 using the Affymetrix Genome-Wide Human SNP Array 6.0 that contains 906,602 SNPs. Cases and controls were matched on age. Included in the current analysis were 607,728 SNPs in the Affymetrix SNP array 6.0 and 330,885 SNPs in the Affymetrix 500K array set that met the following criteria: 1) ≥ 5% minor allele frequency (MAF), 2) ≥ 95% call rate, 3) ≥ 95% genotyping concordance rate in quality control (QC) samples, and additionally for SNPs in the Affymetrix 500K arrays only those that are also present in the Affymetrix 6.0 array (Supplementary Table S1
). Of the initial 3,076 samples included in the GWA scan, 49 samples were excluded, due to < 95% call rate (n = 4), or sample duplication or contamination (n = 45). A total of 1,505 cases and 1,522 controls remained for the GWA analyses. Multidimensional scaling analyses based on pairwise identity-by-state showed no evidence of apparent genetic admixture in this study population (Supplementary Figure S1
Distribution of demographic characteristics and known breast cancer risk factors for cases and controls included in the study
Multiple genomic locations were revealed as potentially related to breast cancer risk (), and the observed number of SNPs with a small P-value is larger than that expected by chance (Supplementary Figure S2
). Similar results were obtained after excluding the 292 subjects genotyped by Affymetrix 500K Array Set from the analyses (data not shown). P-values presented in are derived from trend tests using logistic regression (df = 1) after adjusting for age. Six of the 11 SNPs identified from published GWA studies12–15
are included in the Affymetrix 6.0 array, and four of them showed an association with breast cancer risk consistent with that reported previously (Supplementary Table S2
). Specifically, elevated risk of breast cancer was found to be associated with the minor allele of rs1219648 (FGFR2
=0.0025), rs2981582 (FGFR2
=0.001), rs3803662 (TNRC9
=0.012), and rs8051542 (TNRC9
=0.098). No apparent association, however, was found for rs3817198 (LSP1
=0.75), and the association with rs2180341 (6q22.33, Ptrend
=0.068) was in the opposite direction of the one reported initially in a study based on the Ashkenazi Jewish population15
Genome-wide association results in the Shanghai Breast Cancer Study, Scatter plot of P-values in log-scale from the trend test for 607,728 genotyped SNPs comparing 1,505 cases and 1,522 controls
For our fast track replication, 29 most promising SNPs were genotyped in an independent set of 1,554 cases and 1,576 controls recruited in the SBCS. These SNPs were selected from those that had 1) MAF ≥ 10%; 2) very clear genotyping clusters; 3) not yet confirmed previously as a genetic risk variant for breast cancer; and 4) P ≤ 1 × 10−4 for all samples along with a consistent association at P ≤ 0.05 in samples analyzed in the first batch (754 cases/741 controls) and the second batch (751 cases/781 controls) or P ≤ 5 × 10−4 for all subjects and consistent association at P ≤ 0.01 in both batches.
Of the 29 SNPs included in fast-track replication (Supplementary Table S3
), four SNPs in stage II showed a significant association with breast cancer risk at P ≤ 0.05 and the fifth one had a P-value of 0.077 (). A highly significant association with breast cancer risk was identified for rs2046210 (P = 3.9 × 10−5
) and rs10872676 (P = 1.6 × 10−3
). Both SNPs are located at 6q25.1, approximately 4.4 kb apart, showing a high degree of LD (r2
= 0.69). Therefore, rs2046210 was selected for further validation, as rs10872676 showed a weaker association with breast cancer risk than rs2046210, and its association was not statistically significant after adjusting for rs2046210.
Summary results for five SNPs showing a promising association with breast cancer risk in Stage II
Four SNPs were further evaluated in Stage III (), which included 3,472 cases who were recruited during 2002 and 2006 as part of the Shanghai Breast Cancer Survivor Study (SBCSS), along with 900 healthy women recruited from the same source population as the control group for a population-based endometrial cancer study that was conducted in parallel with the SBCS. Again, rs2046210 was associated with breast cancer risk (P = 3.3 × 10−7) (), and the P-value reached 2.0 × 10−15 in the pooled analysis of samples from all three stages (). This p-value is substantially lower than the genome-wide significance level based on conservative Bonferroni adjustment of multiple comparisons at a level α = 0.05, providing unequivocal evidence for an association of this SNP with breast cancer risk. This SNP was associated with a population attributable risk (PAR) of 18.9% and an estimated 2.1% excess familial risk of breast cancer. The positive association of this SNP with breast cancer risk was found for both pre- and post-menopausal women, and the association was stronger for ER negative cancer than ER positive cancer (P=0.02) (). None of the other three SNPs, however, were replicated in Stage III ().
Association of rs2046210 with breast cancer risk among Chinese women in the pooled analysis of cases and controls included in all three stages of the Shanghai Studies.
shows the 6q25.1 locus where rs2046210 is located. A cluster of SNPs that are in strong LD with rs2046210 all showed a significant association with breast cancer risk with P ≤ 0.001 in Stage I. Using data from Stage I, haplotype analyses of a haplotype block including rs2046210 and other 7 SNPs as defined by the method of Gabriel16
or a larger block including 7 additional SNPs failed to identify any particular SNP that may explain the observed association in this locus (Supplementary Table S4
Figure 2 Regional plot of chromosome 6q25.1 locus. Results (-logP) are shown for directly genotyped (diamonds) and imputed (circles) SNPs for the region of 151.5–152.5 Mb, flanking 500kb of SNP rs2046210. SNP rs2046210 is shown in blue. Gene locations (more ...)
We also evaluated SNP rs2046210 in association with breast cancer risk among 1,590 cases and 1,466 controls of European ancestry, recruited as part of the Nashville Breast Health Study (NBHS), a population based case-control study conducted in Tennessee, USA (). Consistent with the findings from the Shanghai studies, the variant allele of this SNP was associated with an elevated risk of breast cancer, and the association was stronger in post- than pre-menopausal women ().
Association of rs2046210 with breast cancer risk among women of European ancestry in the Nashville Breast Health Study.
Several genes are located in the 1 Mb region centered on SNP rs2046210 including PLEKHG1, MTHFD1L, AKAP12, ZBTB2, RMND1, C6orf211, C6orf97, ESR1, C6orf98, SYNE1
, and NANOGP11
. Of them, the ESR1
gene is perhaps of particular interest to breast carcinogenesis. The ESR1
gene encodes estrogen receptor α (ERα) that regulates signal transduction of estrogen, a sex hormone that plays a central role in the etiology of breast cancer. Elevated estrogen levels have been shown to be associated with an increased risk of breast cancer in multiple prospective studies17
. Since biological effects of estrogen are mediated primarily through high-affinity binding to ERs, genetic variants in ER genes, including ESR1
have been the focus of multiple previous epidemiologic studies18–21
. The identified SNP (rs2046210) associated with breast cancer risk is located 29 kb upstream of the first untranslated exon and 180 kb upstream of the transcription start site of exon 1 of the ESR1
. None of the SNPs at this locus has been previously reported to be associated with breast cancer, nor in LD with two of the most widely studied polymorphisms in ESR1
: rs2234693 and rs9340799 (r2
<0.05 in both HapMap Asian and women of European decent samples). SNP rs2234693 was genotyped in Stage I of the study and carried an OR (95% CI) of 0.95 (0.80–1.12) for C/T and 0.79 (0.63–1.00) for T/T genotype in relation to breast cancer risk. Because of the relatively close location to the ESR1
gene and the biological function of ERα, it is possible that rs2046210 or SNPs in LD with it may alter ESR1
gene expression and affect susceptibility to breast cancer. It is noteworthy that a recent GWA study has found that the 6q25.1 locus is associated with bone mineral density23
, a phenotype that is affected by estrogen.
SNP rs2046210 is located 6 kb downstream of C6orf97
, the chromosome 6 open reading frame 97. The function of C6orf97
is unknown. The LD block that includes rs2046210 spans a region of about 41 kb (151,971,942 to 152,013,380) which contains part of C6orf97
. By running BLAST with C6orf97 coding peptide as the query sequence, a structural maintenance of chromosomes (SMC) domain was found in the C-terminal of the C6orf97
protein. SMC proteins appear to play an important role in chromosome dynamics24
. Further research to the functionality of C6orf97
and its potential association with breast cancer may be warranted.