|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies have identified FGFR2 as a breast cancer (BC) susceptibility gene in populations of European and Asian descent, but a causative variant has not yet been conclusively identified. We hypothesized that the weaker linkage disequilibrium across this associated region in populations of African ancestry might help refine the set of candidate-causal single nucleotide polymorphisms (SNPs) previously identified by our group. Eight candidate-causal SNPs were evaluated in 1253 African American invasive BC cases and 1245 controls. A significant association with BC risk was found with SNP rs2981578 (unadjusted per-allele odds ratio = 1.20, 95% confidence interval 1.03–1.41, Ptrend = 0.02), with the odds ratio estimate similar to that reported in European and Asian subjects. To extend the fine-mapping, genotype data from the African American studies were analyzed jointly with data from European (n = 7196 cases, 7275 controls) and Asian (n = 3901 cases, 3205 controls) studies. In the combined analysis, SNP rs2981578 was the most strongly associated. Five other SNPs were too strongly correlated to be excluded at a likelihood ratio of < 1/100 relative to rs2981578. Analysis of DNase I hypersensitive sites indicated that only two of these map to highly accessible chromatin, one of which, SNP rs2981578, has previously been implicated in up-regulating FGFR2 expression. Our results demonstrate that the association of SNPs in FGFR2 with BC risk extends to women of African American ethnicity, and illustrate the utility of combining association analysis in datasets of diverse ethnic groups with functional experiments to identify disease susceptibility variants.
Two recent genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) within the FGFR2 gene to be associated with breast cancer (BC) risk (1,2). This was the strongest susceptibility locus in both studies (and hence the strongest known common susceptibility locus for BC): the minor alleles of the associated SNPs confer a per-allele risk of ~1.3-fold. FGFR2 is located on chromosome 10q and encodes fibroblast growth factor receptor 2, a receptor tyrosine kinase. It is amplified and overexpressed in breast tumors (3–5), and several lines of experimental evidence in humans and mice support that it is a BC oncogene (6–11).
While the GWAS provide overwhelming evidence that FGFR2 is a BC susceptibility locus, the associated SNPs may not be causally related to risk, as they are simply marker-SNPs correlated [in strong linkage disequilibrium (LD)] with other variants within the region. An important problem, therefore, is the identification of the causal variant or variants. Identification of the causal variant(s) will improve the accuracy of risk prediction (because the causal variants will be more strongly correlated with risk than other associated SNPs) and will constitute an important step towards understanding the functional basis of the disease susceptibility.
In this manuscript, we describe two parallel approaches to identifying causal variants in this intronic gene region: (i) a genetic epidemiological approach using fine-scale mapping with BC case–control studies from populations of European, Asian and African descent and (ii) biochemical probing to determine whether candidate SNPs reside in open chromatin indicating an increased probability of functionality. The first approach has previously been applied to this locus using case–control studies from populations of European and Asian ancestry (1). Here we extend that work to investigate candidate-causal variants in FGFR2 in studies of African American women, a population for which there was to-date no information on the association between FGFR2 variants and BC risk. Assuming that this association existed in African Americans, we aimed to utilize this population's different pattern of LD, which is typically weaker than in European or Asian populations (12–15), to refine further the set of possible causal variants.
The initial marker-SNPs identified in both genome-wide studies fell in a 25 kb LD block almost entirely within intron 2 of FGFR2 (Fig. 1). The most strongly associated GWAS marker-SNP found by Easton et al. (1) was rs2981582. Haplotype analyses indicated that multiple haplotypes carrying the minor allele of rs2981582 were associated with risk, suggesting strongly that the causal variant was a common SNP strongly correlated with rs2981582. A catalog of all common variants in this region was generated by resequencing 45 subjects of European origin, allowing the detection of all variants with minor allele frequency ≥ 5% with > 97% probability. Of the 117 variants identified, 29 were highly correlated (r2 > 0.6) with rs2981582. In the previous analyses by Easton et al., 27 of the 29 SNPs were evaluated in multiple populations of European and Asian origin (two variants, rs35393331 and rs33971856, could not be assayed). In the combined analysis of European and Asian datasets, the most strongly associated SNP was rs7895676. Assuming a single causal variant and using the likelihood ratio test, 21 SNPs could be excluded as the causal variant at likelihood ratios <1/100 relative to rs7895676 (with 19 of these SNPs having ratios <1/1000) (1). Functional work on the eight remaining variants (including the two variants that could not be genotyped in the epidemiological studies) found that SNPs rs2981578 and rs7895676 affected binding of transcription factors Oct-1/Runx2 and C/EBPβ, respectively (16). Nevertheless, there remains uncertainty as to which variant(s) directly increases BC risk.
In this study, we have extended the association studies of SNPs in FGFR2 to women of African American ethnicity. We have utilized case–control data from African American, Asian and European populations to identify the possible causal variants. Finally, we have used DNase I hypersensitivity assays to identify SNPs in accessible chromatin, as a method for identifying likely functional SNPs.
The six SNPs identified as potentially causal for BC by Easton et al. (1) (rs7895676, rs10736303, rs2912781, rs2912778, rs2981578 and rs35054928) together with SNPs rs2981575 and rs2981582 were included in this study. SNP rs2981575 was not definitively excluded in the fine-scale mapping work in European and Asian populations performed by Easton et al., and it was one of the most strongly associated variants in Europeans (1). SNP rs2981582, the most significant GWAS marker-SNP, was excluded in the analyses by Easton et al. and is included here for comparison with other studies.
Of the eight SNPs considered in this analysis, five could be genotyped directly in the case–control studies using TaqMan® assays. From the four African American studies, a total of 1253 cases of invasive BC and 1245 controls were genotyped for the five SNPs (Table 1). The allele frequencies in African Americans differed markedly from those in Europeans and Asians; for four of the five SNPs, the minor (risk) allele in Europeans was the major allele in African Americans (Table 2). SNP allele frequencies did not differ by >15% across African American study centers (Supplementary Material, Table S1).
Significant associations with BC risk were observed for two of the five SNPs: rs10736303 and rs2981578 [unadjusted per-allele odds ratios (ORs) 95% confidence intervals (CIs) and P-values 1.17 (1.00–1.37) P = 0.046 and 1.20 (1.03–1.41) P = 0.021, respectively]. For all SNPs, the risk allele in African Americans was the same as in European and Asian populations. The two significant associations remained after adjusting for age at diagnosis and family history of BC (Table 2). The per-allele ORs for SNPs rs10736303 or rs2981578 did not vary significantly by age at diagnosis (data not shown).
SNPs rs10736303 and rs2981578 were analyzed further to see if the associations varied by tumor characteristics. Estrogen receptor (ER) status was available for 82% of cases, with 639 ER-positive and 389 ER-negative tumors. Progesterone receptor (PR) status was available for 72% of cases, with 499 PR-positive and 397 PR-negative tumors. Both SNPs rs10736303 and rs2981578 demonstrated stronger associations with ER-positive tumors [unadjusted per-allele ORs = 1.24 (95% CI 1.02–1.51) and 1.26 (95% CI 1.03–1.52), respectively] than ER-negative tumors [ORs = 1.04 (95% CI 0.83–1.31) and 1.07 (95% CI 0.85–1.35) (Table 3)]. However, the association with ER-positive tumors was not significantly different from that with ER-negative tumors. Similarly, a stronger association was observed for both rs10736303 and rs2981578 with PR-positive tumors [ORs = 1.28 (95% CI 1.03–1.58) and 1.30 (95% CI 1.05–1.61), respectively] versus PR-negative tumors [ORs = 1.02 (95% CI 0.82–1.28) and 1.04 (95% CI 0.87–1.24)], but again these differences were not significant.
Four of the five SNPs were genotyped in all four African American studies (rs10736303, rs2981578, rs35054928 and rs2981575). Haplotype analysis based on these SNPs identified three haplotypes significantly associated with increased risk of BC, including one unique to African Americans (Table 4). All three haplotypes contained the risk alleles of the two associated SNPs rs10736303 and rs2981578. A fourth rare haplotype also carrying these two risk alleles was not significantly associated with risk, but the 95% CI included the ORs of the other risk haplotypes. The three risk haplotypes were associated with risk in all three populations, although only one achieved statistical significance in Europeans.
In the previous fine-scale mapping analysis (1), 27 of the 29 variants were evaluated in European and Asian case–control studies (Supplementary Material, Table S2a). The other two variants (rs35393331 and rs33971856) could not be genotyped by high-throughput methods and so were not considered further in that study nor this one; thus they remain on the list of candidate-causal variants. Following the previous analysis, 21 SNPs could be excluded as causal variants based on likelihood ratios <1/100 relative to the most significantly associated SNP, rs7895676; likelihood ratios were <1/1000 for 19 of the 21 SNPs (1). One SNP, rs2981575, was only marginally excluded by this analysis (likelihood ratio = 1/224) and was one of the most significant variants in Europeans. This SNP together with the six remaining candidate SNPs (rs7895676, rs10736303, rs2912781, rs2912778, rs2981578 and rs35054928) were evaluated further in this study as seven candidate-causal SNPs. Additionally, we included the GWAS marker-SNP rs2981582 for reference.
The SNPs amenable to TaqMan® analyses (rs10736303, rs2981578, rs35054928, rs2981575 and rs2981582) were genotyped in the African American studies (as described previously) together with additional European and Asian cases and controls (Table 5). The genotypes of an additional three SNPs which were not amenable to high-throughput genotyping methods (rs7895676, rs2912781 and rs2912778) were determined in 95 samples from the CARE-AA study by bi-directional sequencing. These data were used to impute genotypes for all African American women (see Materials and Methods).
Haplotype frequencies were calculated for the seven candidate-causal SNPs (Table 6). One haplotype not present in Europeans and Asians at frequency ≥1% was identified in African Americans. As expected, the correlations between the SNPs in African Americans were weaker than those observed in Asians or Europeans (Supplementary Material, Table S4).
Using these new data in combination with the existing European and Asian data, likelihoods were calculated for each variant, based on the assumption that there is one single disease-causing allele. The most strongly associated SNP was rs2981578, and likelihood ratios were calculated for the other SNPs relative to it (Table 7). As in the previous analysis in European and Asian women (1), rs2981582, the GWAS marker-SNP included in this study for reference purposes, could be definitively excluded as the causal variant at a likelihood ratio of <1/100 relative to rs2981578. Of the seven candidate-causal SNPs, rs35054928 had a likelihood ratio of <1/100 relative to rs2981578, indicating that it was unlikely to be the causal variant. SNP rs2981575, however, which had been tentatively excluded (likelihood ratio of 1/224) by the previous analysis (1), had a likelihood ratio of 1/30 in this extended analysis and could no longer be excluded at this level. The other five SNPs, including rs7895676 which was the most strongly associated SNP in the previous analysis, were too highly correlated with each other in the three populations and could not be excluded from being the causative variant (based on a likelihood ratio of >1/100 relative to rs2981578).
Functional assays may offer additional insight about SNPs that are too strongly correlated to be distinguished by epidemiological approaches. We surmised that any SNP functioning in gene regulation would lie in a region of the chromatin that is accessible to regulatory factors. By probing the chromatin with DNase I, we were able to identify regions of the chromatin that are in an open configuration without having to make assumptions about the nature of the proteins involved (17,18). To this end, we used two BC cell lines PMC42 and MCF-7 (both expressing the ER) and the prostate cell line RWPE-1, and generated DNase I hypersensitivity profiles for the region of interest using two different concentrations of DNase I (Fig. 2). As expected, we found the FGFR2 promoter to be in an open configuration, serving as a positive control in our experiment. Of the remaining possible eight causative SNPs, only two SNPs mapped to a region of the FGFR2 intron that is highly accessible in breast cell lines, including the most highly associated SNP rs2981578. SNP rs35393331 lies in a 1 kb region of repetitive sequence that is not covered by probes on our microarray, and its status is therefore undetermined. The current analysis suggests that the remaining SNPs lie in predominantly closed chromatin. Interestingly, the two breast cell lines as well as the prostate cell lines display very similar regions of open chromatin, with the most open stretch overlapping regions of sequence conservation, further strengthening the supposition that this region contains regulatory elements.
SNPs in intron two of FGFR2 have been reported to be significantly associated with risk of BC in European (1,2), Asian (1,19), Ashkenazi Jewish (20) and Israeli populations (21). This present study extends this association to women of African American ethnicity. Of the SNPs genotyped in all African American subjects, SNPs rs10736303 and rs2981578 showed the most significant associations, and their estimated per-allele ORs were similar to those found in European and Asian subjects (1). A recent meta-analysis found that SNP rs2981582 was more strongly associated with ER-positive than ER-negative tumors and PR-positive than PR-negative tumors (22). A similar, although not significant, pattern was observed in the African American studies presented here. These results suggest strongly that the same causal variant(s) are present in all three populations and confer similar risks of BC in all populations.
Women from Africa and Eastern Asia have lower age-adjusted world incidence rates of BC (<40 per 100 000 in most countries) than non-Hispanic White women in USA (98.5 per 100 000 woman). The age-adjusted world incidence rate for black women in USA (81.7 per 100 000 women) is, however, closer to that of US non-Hispanic Whites than Africans (23). The frequencies of the risk-association alleles in FGFR2 are similar in European and Asian populations, but substantially higher in African Americans. On the basis of these frequencies, the BC risk in African Americans would be predicted to be 16% higher than in Europeans. The fact that risk is lower is consistent with the extensive evidence that the international variation in the risk of BC is driven mainly by different lifestyle and environmental exposures rather than differing genetic factors (24–26).
African Americans represent a population of mixed ethnicity, and it has not been possible to assess the ancestral composition of study subjects. However, the allele frequencies and risk estimates were similar in the four component studies, suggesting that the risk estimates should be broadly applicable to all African American women.
The confirmed disease association combined with the pattern of LD within the African American population indicates that African American case–control studies may help eliminate variants which cannot be eliminated in Europeans due to strong SNP correlation. We were indeed able to eliminate one candidate FGFR2 SNP, rs35054928, by combining African American case–control data with data from European and Asian studies. SNP rs2981575 could not be excluded in the present analysis; however, assuming it is not the causal variant, it should be possible to exclude this SNP with larger studies in African or African American women, as it is weakly correlated with the other candidate SNPs. The remaining five candidate-causal SNPs (rs7895676, rs10736303, rs2912781, rs2912778, rs2981578), however, are strongly correlated (all r2 > 0.81) in all populations examined so far. It will therefore be difficult (and perhaps impossible) to separate their effects using epidemiological studies, even with much larger sample sizes. Additionally, the two variants (rs35393331 and rs33971856) that could not be directly evaluated in any study remain candidate-causal variants, leaving in total eight candidate variants.
Given the location of the associated SNPs, in intron two of the gene, the most plausible hypothesis is that the association with risk is mediated through differential expression of FGFR2. Recent analyses by Meyer et al. indicate that the rare allele of the marker-SNP rs2981582 in Europeans is associated with increased FGFR2 expression in breast tissue. They also report that candidate SNP rs2981578, the most strongly associated SNP in the present analysis, alters binding of transcription factors Oct-1/Runx2. Furthermore, Meyer et al. (16) report that luciferase assays in transfected cell lines showed that the minor allele of rs2981578 increased transcription 2–5-fold over the common allele. A second SNP, rs7895676, was found to alter binding of the C/EBPβ transcription factor in vitro, on naked DNA. However, when assaying occupancy at this site in vivo using chromatin immunoprecipitation, only a very slight difference was seen between the two alleles.
On the assumption that susceptibility is mediated through binding of transcription factors, a causative variant would be expected to reside in a region of open chromatin. Of the eight remaining candidate variants, we have shown that only two (rs2912778 and rs2981578) lie in a region of highly accessible chromatin. Five SNPs lie in chromatin that is in a more closed conformation, whereas one SNP is in a region that is not covered by our probe set due to the repetitive nature of the surrounding sequence, and therefore cannot be assessed (rs35393331). The region of open chromatin includes SNP rs2981578, the most strongly associated variant in this present study, consistent with its association with increased transcription noted above. Conversely rs7895676, the other SNP suggested as a potentially important transcription factor binding site by Meyer et al., lies in a region of less open chromatin. Our current analyses, therefore, point to the region surrounding rs2981578 as the most critical. However, it is important to note that DNase hypersensitive sites are dynamic and can change in response to cell signaling or during development (27).
Evolutionary conservation of the sequence surrounding each SNP may also give an indication of the relative evolutionary importance of the eight remaining variants, all of which are non-coding. In a previous evaluation of the conservation across six placental mammalian species and the opossum, SNPs rs10736303 and rs2981578 were the only of the eight remaining variants conserved across all six placental mammals (1). Using the University of California Santa Cruz (UCSC) human genome browser (http://genome.ucsc.edu/), we also considered conservation scores from the alignment of 28 vertebrate species and the subset of 17 placental mammals (28). SNP rs2981578 (the most strongly associated variant in this present study) had a conservation score of >0.75 within 50 bp on either side using both species alignment measurements, whereas none of the other seven SNPs reached this score using either measurement. Thus for SNP rs2981578 conclusions from the genetic, biochemical and conservation analyses all converge, increasing the probability that this is a causal variant for BC.
It should be noted that the present analysis has focused on common SNPs with minor allele frequency >5%. The possibility remains that these SNPs also tag other types of variants (such as multi-allelic repeats) or rarer variants, which were not specifically targeted during the resequencing effort.
For the four SNPs that were not genotyped in all studies, the imputation of genotypes also introduces a source of uncertainty. To evaluate the accuracy of genotype imputation, we calculated the estimated r2 between the imputed and actual SNP (see Materials and Methods). In all three ethnic groups, all imputed SNPs had estimated r2 values >0.90 with the exception of two SNPs in Asians studies (rs2912781 with r2 = 0.84 and rs2912778 with r2 = 0.87). The imputation approach assumes that the reference panel of genotyped SNPs provides haplotype frequencies representative of those in the populations in which SNPs are being imputed. In our study, SNPs genotyped in the Seoul Breast Cancer Project were used as a reference panel for the imputation in the Asian studies, which could bias the results should the Korean haplotype frequencies differ greatly from the other Asian populations. In fact, we found that SNP allele frequencies were similar within this broad ethnic group, suggesting that the use of Korean haplotypes for the imputation was reasonable.
This fine-scale mapping work highlights the complexity involved in pinpointing causal variants within loci identified by GWAS, and the utility of combining genetic mapping with data on sequence context, biochemical and functional studies. Although the identification of causative variants, particularly in non-coding regions, poses a great challenge, such efforts are critical for furthering our understanding of the genetic architecture of complex genetic diseases.
African American study subjects were participants in four population-based studies. The first study, abbreviated CARE-AA, includes African American participants from a case–control study conducted in five metropolitan areas of USA (Atlanta, Detroit, Los Angeles, Philadelphia and Seattle) as part of the National Institute of Child Health and Human Development's Women's Contraceptive and Reproductive Experiences (CARE) Study (29,30). Briefly, cases aged 35–64 years and newly diagnosed with invasive BC between July 1994 and April 1998 were ascertained by the Surveillance, Epidemiology and End Results (SEER) population-based cancer registries at four sites and by field staff monitoring hospital records at the fifth site. Both younger cases and African American cases were intentionally oversampled. Controls were identified through centralized random-digit dialing and matched on study center location, race and age group. An in-person interview on BC risk factors was completed by 4575 cases (76.5% of those eligible) and 4682 controls (78.6% of those eligible). Funding allowances permitted the collection of blood from 33% of interviewed women. Hence, blood collection was requested from all cases and controls with a first-degree family history of BC, plus a random sample of those without a first-degree family history. Of those asked, 1644 (80.2%) cases and 1451 (74.3%) controls provided blood. African American individuals represented 29.7% of cases and 31.6% of controls. A total of 435 African American cases and 452 African American controls age 35–64 years were available for the present analysis.
The second study included the African American participants in Los Angeles (CARE-LA) component of the CARE (29). As is described below, only individuals whose blood samples were not collected in the CARE-AA study were included in these analyses to ensure no duplicate observations. Participation rates for the CARE-LA cases and controls were >70%. Blood specimens were collected from 82% of cases (n = 397) and 80% of controls (n = 236).
Only cases were available from the third study, the LIFE Los Angeles-based case–control study (31). This study was comprised incident invasive BC cases age 20–49 years diagnosed between 2000 and 2003 identified from Los Angeles SEER registry. The participation rate for African American cases in LIFE was 60%. Further details about the CARE-LA and LIFE studies have been previously described (31–33). On the basis of shared geography and similarity in age of participants as well as allele frequencies of genotyped FGFR2 SNPs, individuals from these studies were combined into one stratum [as has previously been done i.e. (33)].
The fourth study, abbreviated MEC-AA, included African American cases and controls primarily from the Multiethnic Cohort Study (MEC) (34). Blood sample collection began in 1994, with participation rates ≥65% for cases and controls. Incident BC cases were identified by cohort linkage to SEER registries and controls were a randomly selected sample of MEC participants frequency-matched on 5-year age group. The present analysis includes 439 invasive BC cases aged 45–83 years diagnosed up to December 2004 and 654 controls free of cancer up to December 2004.
All cases and controls from Los Angeles were assessed to ensure no overlap. A common identification number was used to identify 237 pairs of duplicate samples in CARE-AA and CARE-LA; these were removed from CARE-LA in the analysis. CARE-AA maintained records identifying any participants in the MEC-AA study by comparing the information on name, birth date and address while the studies were in the field; four pairs in CARE-AA that overlapped with MEC-AA were removed from the MEC-AA study.
The two populations of European ethnicity and four of Asian ethnicity that were genotyped for the fine-scale mapping work have been described previously (1). Briefly, SEARCH is a population-based study of cancer in East Anglia involving a three-stage design with a total of 6771 cases of invasive BC and 6840 controls (35). Within the Multiethnic Cohort, described above, the White and Japanese populations were included in this study with 425/435 White cases/controls and 448/394 Japanese cases/controls (34). Finally, three Asian studies, the Seoul Breast Cancer Project (36), the Taiwan Breast Cancer Study (37,38) and the IARC Thai Breast cancer study (39) are each hospital-based case–control studies with 2442/1780, 921/940 and 474/390 cases/controls, respectively. All studies had the relevant institutional review board approval.
SNP genotyping was conducted at individual study centers with the exceptions of Seoul Breast Cancer Project study samples, which were genotyped at Strangeways Research Laboratory in Cambridge, UK and the CARE-AA samples, which were genotyped at Ostrander Laboratory, NHGRI, Bethesda, MD, USA. Genotyping of five SNPs (rs10736303, rs2981578, rs35054928, rs2981575, rs2981582) was carried out using a fluorescent 5' exonuclease assay (TaqMan®) and the ABI PRISM 7900 Sequence Detection Sequence (Applied Biosystems, Foster City, CA, USA) implemented in 384-well format for all the African American studies. The single exception was rs2981582, which was not genotyped in CARE-AA. Each 384-well plate included duplicate samples which demonstrated average concordance >98%. Call rates for each assay were over 95%. Assays were not repeated for failed genotype calls. As in the African American studies, for all other studies the average replication rate for assays was >98%, and frequencies of the SNP genotypes did not deviate from those expected under Hardy–Weinberg equilibrium (P > 0.01).
Three SNPs those were not amenable to the TaqMan® assay (rs7895676, rs2912781 and rs2912778) were genotyped by direct sequencing of the SNP and surrounding region. One 96-well plate of CARE-AA samples was sequenced bidirectionally. PCRs were performed in a 10 µl volume containing 5 ng of genomic DNA, 1 mm dNTPs, 1.5 or 2.5 mm MgCl2, 1 µm of both the forward and the reverse primers, and 0.25 U Amplitaq DNA polymerase (Applied Biosystems). PCR products were treated using the ExoSAP-IT method (USB Corporation, Cleveland, OH, USA). DNA-sequencing reactions were carried out using BigDye Terminator v3.1 Cycle Sequencing Kits (Applied Biosystems) and sequence data were obtained on both forward and reverse PCR primers using the ABI 3730xl Genetic Analyzer (Applied Biosystems). Contigs were assembled using PhredPhrap/Consed (40). Variants were detected using Polyphred (41).
Each of the five SNPs genotyped by TaqMan® was assessed for association with disease status using the Cochran-Armitage 1 d.f. score test. Per-allele ORs and CIs were estimated by logistic regression stratified by study and CARE-AA site using Intercooled Stata v9. Standard tests of homogeneity of ORs for SNP genotypes across the study strata were computed. Significant SNPs (Ptrend < 0.05) were further analyzed adjusting for age at diagnosis (categorized as: <40, 40–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, >75 years) and family history of BC (categorized as at least one or no first-degree relatives with BC). Family history was adjusted for because the estimated ORs would be expected to be higher in families with a history of BC (1). Age was adjusted for because ORs might vary by age, and different study sets had different selection criteria. To assess potential differences in genetic effects associated with receptor status (ER and PR), separate logistic regression models were used to compare receptor positive cases to controls, receptor-negative cases to controls and receptor-positive cases to receptor-negative cases.
Sampling weights were developed for the CARE-AA data to account for the non-random selection of women from the study population, as described in greater detail elsewhere (30). When these sampling weights were applied to the FGFR2 SNP data, the OR estimates were very similar to those generated using an unweighted analysis (Supplementary Material, Table S1). For simplicity and consistency with the fine-scale mapping analyses, unweighted analyses are presented in the main text.
For the fine-scale mapping analyses, we first estimated haplotype frequencies using the 95 African American individuals for whom all candidate SNPs were typed, using the haplo.stats package in S-plus (42). The haplotype frequencies were used to impute genotype probabilities for each SNP for each individual. Missed calls for TaqMan® genotyped SNPs (rs10736303, rs2981578, rs35054928, rs2981575) were imputed, as were missing data for SNPs genotyped in a subset of the African American subjects (rs7895676, rs2912781, rs2912778, rs2981582). An expectation-maximization algorithm was then used to fit a logistic regression model, allowing uncertainty in the genotypes of the untyped SNPs and assuming that each SNP in turn was the causal variant. By this method, we were able to calculate the likelihood that each SNP was the causal variant (1). Likewise, in Europeans, missed calls were imputed for SNPs genotyped by all studies (rs10736303, rs2981578, rs2981575, rs2981582) as were missing data for SNPs genotyped in a subset of European subjects (rs7895676, rs2912781, rs2912778, rs35054928). In the Asian studies, missed calls were imputed for SNPs (rs10736303, rs11200014, rs2912780, rs2981578, rs4752571, rs2981575, rs1219642, rs2912774, rs2936870, rs2420946, rs2981582, rs3135718) and for SNPs genotyped in a subset of individuals (rs7895676, rs2912781, rs2912778, rs35054928) (Supplementary Material, Table S2b).
To give an estimate of the accuracy of the imputation, we calculated the estimated r2 between the imputed and actual SNP (43), which we estimated by:
where var(X) is the sample variance of the imputed genotypes and var(Y) is the variance of the true genotypes, estimated from the overall genotype frequencies.
The BC cell lines MCF-7 and PMC42 [Cambridge Research Institute (CRI) culture collection] were maintained in RPMI with 10% fetal calf serum. RWPE-1 (ATCC) was cultured in Keratinocyte-Serum Free Medium (Gibco) supplemented with 5 ng/ml human recombinant EGF and 0.05 mg/ml bovine pituitary extract from Sigma Aldrich, UK. Cells were harvested while in the exponential growth phase. DNase I hypersensitivity experiments were carried out as described in Follows et al. (44), except that 1 × 106 nuclei were harvested per experiment using a microcentrifuge at full speed for 20 s and RNase treatment was at 55°C for 1 h. Three picomole of annealed primer were ligated to 3 µg of genomic DNA, and the subsequent amplification reaction with biotin-labeled primers was carried out as described except that dNTPs were at 200 µm. A library of enriched fragments was generated from a single PCR and purified on streptavidin coated magnetic beads. Enrichment of open chromatin was ascertained by RT-PCR using probes in the MYC and HMBS locus, before the library was hybridized to an Agilent 4 x44K custom array, where the FGFR2 gene was tiled using the Agilent High Density probe set (May 2007). The libraries were labeled with the Bioprime Array CGH (Invitrogen) labeling kits, using Cy3-dUTP (Enzo Life Sciences, Inc.) for the library and Cy5-dUTP (Enzo Life Sciences, Inc.) for the input genomic DNA in each experiment. A minimum of 1.5 µg of probe were hybridized to each array by the CRI Genomics Core facility according to manufacturer's instructions. The array data were analyzed by ACME (45), using a 95% cut-off and a sliding window size of 500 bp by the CRI Computational Biology group. The data were visualized using the Affymetrix Integrated Genome Browser.
Rebbeck et al. (46) recently tested SNP rs2981582 in a case-control study including 584 African American and 1225 European American Women. They found no evidence of association with breast cancer risk in African American women, however, none of the candidate causal variants were tested in this study. In European Americans, they reported significant associations in the subgroups of ER+, PR+ and HER− breast tumors.
The genotyping and analysis of this study, and the conduct of the SEARCH study, was funded by Cancer Research UK. DFE is a Principal Research Fellow of CR-UK and PDPP is a CR-UK Senior Clinical Research Fellow. This research was supported in part by the Intramural Research Program of the NIH, National Cancer Institute and National Human Genome Research Institute, US Department of Health and Human Services. M.U. is supported by the NIH-Oxford/Cambridge PhD program. The Multiethnic Cohort Study was supported by the US National Cancer Institute [CA 54281, CA 63464]. The CARE study was supported by the National Institute of Child Health and Human Development, with additional support from the National Cancer Institute, through contracts with Emory University [N01 HD 3-3168], Fred Hutchinson Cancer Research Center [N01 HD 2-3166], Karmanos Cancer Institute at Wayne State University [N01 HD 3-3174], University of Pennsylvania [N01 HD 3-3176], University of Southern California [N01 HD 3-3175], and through an intra-agency agreement with the Centers for Disease Control and Prevention [Y01 HD 7022]. General support through SEER contracts [N01-PC-67006 (Atlanta), N01-CN-65064 (Detroit), N01-CN-67010 (Los Angeles), and N01-CN-0532 (Seattle)] are also acknowledged. BAJP is Li Ka Shing Professor of Oncology and we acknowledge Hutchison Whampoa Limited.
The authors thank the women who participated in this research.
Conflict of Interest statement. None declared.