SNPs in intron two of FGFR2
have been reported to be significantly associated with risk of BC in European (1
), Asian (1
), Ashkenazi Jewish (20
) and Israeli populations (21
). This present study extends this association to women of African American ethnicity. Of the SNPs genotyped in all African American subjects, SNPs rs10736303 and rs2981578 showed the most significant associations, and their estimated per-allele ORs were similar to those found in European and Asian subjects (1
). A recent meta-analysis found that SNP rs2981582 was more strongly associated with ER-positive than ER-negative tumors and PR-positive than PR-negative tumors (22
). A similar, although not significant, pattern was observed in the African American studies presented here. These results suggest strongly that the same causal variant(s) are present in all three populations and confer similar risks of BC in all populations.
Women from Africa and Eastern Asia have lower age-adjusted world incidence rates of BC (<40 per 100 000 in most countries) than non-Hispanic White women in USA (98.5 per 100 000 woman). The age-adjusted world incidence rate for black women in USA (81.7 per 100 000 women) is, however, closer to that of US non-Hispanic Whites than Africans (23
). The frequencies of the risk-association alleles in FGFR2
are similar in European and Asian populations, but substantially higher in African Americans. On the basis of these frequencies, the BC risk in African Americans would be predicted to be 16% higher than in Europeans. The fact that risk is lower is consistent with the extensive evidence that the international variation in the risk of BC is driven mainly by different lifestyle and environmental exposures rather than differing genetic factors (24
African Americans represent a population of mixed ethnicity, and it has not been possible to assess the ancestral composition of study subjects. However, the allele frequencies and risk estimates were similar in the four component studies, suggesting that the risk estimates should be broadly applicable to all African American women.
The confirmed disease association combined with the pattern of LD within the African American population indicates that African American case–control studies may help eliminate variants which cannot be eliminated in Europeans due to strong SNP correlation. We were indeed able to eliminate one candidate FGFR2 SNP, rs35054928, by combining African American case–control data with data from European and Asian studies. SNP rs2981575 could not be excluded in the present analysis; however, assuming it is not the causal variant, it should be possible to exclude this SNP with larger studies in African or African American women, as it is weakly correlated with the other candidate SNPs. The remaining five candidate-causal SNPs (rs7895676, rs10736303, rs2912781, rs2912778, rs2981578), however, are strongly correlated (all r2 > 0.81) in all populations examined so far. It will therefore be difficult (and perhaps impossible) to separate their effects using epidemiological studies, even with much larger sample sizes. Additionally, the two variants (rs35393331 and rs33971856) that could not be directly evaluated in any study remain candidate-causal variants, leaving in total eight candidate variants.
Given the location of the associated SNPs, in intron two of the gene, the most plausible hypothesis is that the association with risk is mediated through differential expression of FGFR2
. Recent analyses by Meyer et al
. indicate that the rare allele of the marker-SNP rs2981582 in Europeans is associated with increased FGFR2 expression in breast tissue. They also report that candidate SNP rs2981578, the most strongly associated SNP in the present analysis, alters binding of transcription factors Oct-1/Runx2. Furthermore, Meyer et al
) report that luciferase assays in transfected cell lines showed that the minor allele of rs2981578 increased transcription 2–5-fold over the common allele. A second SNP, rs7895676, was found to alter binding of the C/EBPβ transcription factor in vitro
, on naked DNA. However, when assaying occupancy at this site in vivo
using chromatin immunoprecipitation, only a very slight difference was seen between the two alleles.
On the assumption that susceptibility is mediated through binding of transcription factors, a causative variant would be expected to reside in a region of open chromatin. Of the eight remaining candidate variants, we have shown that only two (rs2912778 and rs2981578) lie in a region of highly accessible chromatin. Five SNPs lie in chromatin that is in a more closed conformation, whereas one SNP is in a region that is not covered by our probe set due to the repetitive nature of the surrounding sequence, and therefore cannot be assessed (rs35393331). The region of open chromatin includes SNP rs2981578, the most strongly associated variant in this present study, consistent with its association with increased transcription noted above. Conversely rs7895676, the other SNP suggested as a potentially important transcription factor binding site by Meyer et al
., lies in a region of less open chromatin. Our current analyses, therefore, point to the region surrounding rs2981578 as the most critical. However, it is important to note that DNase hypersensitive sites are dynamic and can change in response to cell signaling or during development (27
Evolutionary conservation of the sequence surrounding each SNP may also give an indication of the relative evolutionary importance of the eight remaining variants, all of which are non-coding. In a previous evaluation of the conservation across six placental mammalian species and the opossum, SNPs rs10736303 and rs2981578 were the only of the eight remaining variants conserved across all six placental mammals (1
). Using the University of California Santa Cruz (UCSC) human genome browser (http://genome.ucsc.edu/
), we also considered conservation scores from the alignment of 28 vertebrate species and the subset of 17 placental mammals (28
). SNP rs2981578 (the most strongly associated variant in this present study) had a conservation score of >0.75 within 50 bp on either side using both species alignment measurements, whereas none of the other seven SNPs reached this score using either measurement. Thus for SNP rs2981578 conclusions from the genetic, biochemical and conservation analyses all converge, increasing the probability that this is a causal variant for BC.
It should be noted that the present analysis has focused on common SNPs with minor allele frequency >5%. The possibility remains that these SNPs also tag other types of variants (such as multi-allelic repeats) or rarer variants, which were not specifically targeted during the resequencing effort.
For the four SNPs that were not genotyped in all studies, the imputation of genotypes also introduces a source of uncertainty. To evaluate the accuracy of genotype imputation, we calculated the estimated r2 between the imputed and actual SNP (see Materials and Methods). In all three ethnic groups, all imputed SNPs had estimated r2 values >0.90 with the exception of two SNPs in Asians studies (rs2912781 with r2 = 0.84 and rs2912778 with r2 = 0.87). The imputation approach assumes that the reference panel of genotyped SNPs provides haplotype frequencies representative of those in the populations in which SNPs are being imputed. In our study, SNPs genotyped in the Seoul Breast Cancer Project were used as a reference panel for the imputation in the Asian studies, which could bias the results should the Korean haplotype frequencies differ greatly from the other Asian populations. In fact, we found that SNP allele frequencies were similar within this broad ethnic group, suggesting that the use of Korean haplotypes for the imputation was reasonable.
This fine-scale mapping work highlights the complexity involved in pinpointing causal variants within loci identified by GWAS, and the utility of combining genetic mapping with data on sequence context, biochemical and functional studies. Although the identification of causative variants, particularly in non-coding regions, poses a great challenge, such efforts are critical for furthering our understanding of the genetic architecture of complex genetic diseases.