|Home | About | Journals | Submit | Contact Us | Français|
Mistakes in DNA repair can result in sustained damage and genetic instability. We comprehensively evaluated common variants in DNA repair pathway genes for their association with postmenopausal breast cancer risk with and without respect to estrogen receptor (ER) and progesterone receptor (PR) subtypes.
In this nested case-control study of 1,145 prospectively ascertained breast cancer cases and 1,142 matched controls within the Nurses’ Health Study Cancer Genetic Markers of Susceptibility project, we evaluated 1,314 common genetic variants in 68 candidate genes. These variants were chosen to represent five DNA repair pathways including base excision repair, nucleotide excision repair, double strand break repair (homologous recombination and non-homologous end-joining), direct reversal repair, and mismatch repair, along with candidate DNA polymerases, Fanconi Anemia complementation groups, and other genes relevant to DNA damage recognition and response. Main effects, pathway effects and pair-wise interactions were evaluated using Logistic Regression, and the Admixture Maximum Likelihood (AML) and Kernel Machine tests.
Eight loci in linkage disequilibrium within the XRCC4 gene were associated with susceptibility to PR− breast cancer in main effect analyses (p-values corrected for multiple testing at the within-gene level <0.04). These loci drove the association between the non-homologous end-joining pathway, containing XRCC4, and PR− breast cancer (Admixture Maximum Likelihood p-value for the full pathway=0.002; p-value when the eight loci were removed=0.86). We performed the Kernel machine analysis to test the hypothesis of no linear or quadratic effects for any of the tested SNPs, or any SNP-SNP interactions among them, including those SNPs in XRCC4, and yielded a p-value of 0.85.
These findings suggest that common variation alone in DNA repair genes plays at most a small role in determining postmenopausal breast cancer risk among women of European ancestry, and support the theory that redundancies in DNA repair mechanisms may be compensatory.
Breast cancer is the most common type of cancer and the second leading cause of cancer death among women in the United States. While family history of breast cancer is an established predisposing factor, epidemiological studies suggest that only 5–10% of breast cancer cases are familial and the remaining proportion is sporadic . The majority of genetic variants that influence susceptibility to sporadic breast cancer are unknown . Common variants may explain a greater proportion of breast cancer morbidity and mortality than rare highly penetrant mutations, such as those in BRCA1 and BRCA2 which account for only 15–20% of familial breast cancer cases.
DNA repair plays an essential role in the maintenance of DNA integrity. Failure of DNA repair mechanisms can lead to sustained damage, potentially resulting in the malfunction of cellular systems and checkpoints, and the ability of a cell to over-proliferate or evade apoptosis. Deficient DNA repair capacity has been suggested as a predisposing factor in familial and sporadic breast cancer [3–5]. Substantial correlations have been found between DNA repair gene variants and DNA repair capacity. Several studies have observed a low nucleotide excision repair capacity and direct reversion repair capacity of breast tissue [7–9] and suggest that the breast epithelium may uniquely lack redundant systems of double-strand break repair that are present in other tissues [10, 11]. If true, this suggests common genetic variation in DNA repair genes would have greater impact in breast tissues than other tissues with more extensive DNA repair redundancy.
Despite the relevance of DNA repair to carcinogenesis, the impact of common genetic variation on postmenopausal breast cancer susceptibility is not fully understood. In this case-control study of 2,287 postmenopausal women of European ancestry (1,145 cases and 1,142 controls matched on age and postmenopausal hormone use) nested within the Nurses’ Health Study (NHS), we comprehensively and systematically evaluated genetic variation in the coding and non-coding regions of 68 DNA repair genes in relation to invasive postmenopausal breast cancer risk. The association between breast cancer risk and each of these markers individually was assessed as part of the Cancer Genetic Markers of Susceptibility (CGEMS) Project, although none reached conventional genome-wide significance in the initial scan  and only one reached genome-wide significance (rs999737 within RAD51L1) after extensive follow-up . Here we present the results for these markers adjusted for multiple testing at the gene, rather than the genome-wide level. We also explore the possibility that markers in DNA repair pathways may be collectively associated with risk of breast cancer even though the association between breast cancer and any particular marker is too weak to detect. To this end, we conduct a test that aggregates evidence for association across multiple markers (the Admixture Maximum Likelihood test (AML)) and a test that explicitly allows for non-additive interactions among markers in the same pathway (the Kernel Machine test).
These pathways/genes included direct reversion repair (MGMT), base excision repair (BER) (ADPRT, APEX1, FEN1, LIG1, LIG3, NEIL1, NEIL2, OGG1, PCNA, UNG2, XRCC1), nucleotide excision repair (NER) (CKN1, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, ERCC6, RAD23a, RAD23b, RPA1, RPA2, RPA3, XPA, XPC), double-strand break (DSB) repair via (a) homologous recombination (HR) (BRCA1, BRCA2, MRE11A, NBS1, RAD50, RAD51, RAD51c, RAD51L1, RAD51L3, RAD52, RAD54L, XRCC2, XRCC3), or (b) non-homologous end-joining (NH) (DCLRE1C, G22P1, LIG4, PRKDC, XRCC4, XRCC5), mismatch repair (MMR) (MLH1, MLH3, MSH2, MSH3, MSH6, PMS1, PMS2), DNA polymerases (POL) (POLB, POLD1, POLE, POLI, POLK), Fanconi Anemia complementation groups (FAN) (FANCA, FANCC, FANCD2, FANCE, FANCF, FANCG), and DNA damage recognition and response (REG)(ATM, ATR, CHEK1, CHEK2, TP53).
Breast cancer tumors are heterogeneous across ER and PR status with respect to tumor characteristics, response to treatment, and risk profiles [14–17]. In this study we investigate overall breast cancer susceptibility as well as ER and PR subtype specific susceptibility. Preliminary reports suggest that categorization into ER and PR subtypes may be particularly useful when studying the etiology of breast cancer with respect to DNA repair [18–22].
This study population is nested within the Nurses’ Health Study (NHS) Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer case-control study (described elsewhere ). Participants provided a blood sample (between 1989 and 1990), were free of diagnosed breast cancer at blood collection, and followed for incident disease until May 2004. Controls were matched to cases based on age, blood collection variables (time, date, and year of blood collection, as well as recent (<3 months) use of postmenopausal hormones (PMH)). The analysis was restricted to women of self-reported European ancestry (with no substantial genetic evidence of non-European ancestry) who were menopausal at blood draw. Informed consent was obtained from all participants. The study was approved by the Institutional Review Board of the Brigham and Women’s Hospital, Boston, MA, USA.
Here we investigate the 1,314 CGEMS tag SNPs which had a minor allele frequency (MAF)>0.01 among controls and were located within 30Kb upstream or downstream of 68 genes chosen to represent the above-mentioned five DNA repair pathways, candidate DNA polymerases, Fanconi Anemia complementation groups, and genes relevant to DNA damage recognition and response. These SNPs were selected from the 528,173 genome-wide-tagging SNPs genotyped successfully among 1,145 CGEMS cases and 1,142 controls. Detailed methods for the CGEMS genome wide association study have been previously reported .
The main effect of each SNP on invasive breast cancer susceptibility was estimated using unconditional logistic regression, adjusted for matching factors (four indicators of age quartile, PMH use (Yes/No)) and three eigenvectors previously shown to be effective in reducing population stratification in this cohort . All SNPs were modeled additively using their minor allele count. Two-sided p-values were calculated for each SNP. Measures of linkage disequilibrium (r2) presented for genotyped SNPs within a gene were calculated among the controls.
The Bonferroni correction, which divides the type I error rate (α) by the total number of tests performed, is overly conservative in the presence of linkage disequilibrium (LD); tests of SNPs in LD cannot be considered independent tests as their outcomes will be correlated. We apply the method proposed by Gao et al.  to estimate the effective number of independent tests (Meff,i) for each gene i. To obtain a gene specific corrected type I error rate we substitute this value into the standard Bonferroni correction in place of the total number of SNPs (α/Meff,i). P-values for main effects analyses are presented as both uncorrected and Meff,i corrected p-values (pcorrected=puncorrected*Meff,i). The sum of the Meff,i over the 68 genes studied was 862.
For each of the five DNA repair pathways and the three additional categories of genes important to DNA repair, we applied Admixture Maximum Likelihood (AML) . AML estimates the proportion of associated SNPs and their typical effect size to test the global null hypothesis of no association between any SNP and breast cancer susceptibility within the pathway. 1,000 permutations were used to estimate the AML p-values for trend. Due to computational limitations AML analyses were not adjusted for covariates. However, the minimum estimated p-values obtained for each SNP when using this method were similar to those obtained in the above covariate-adjusted main effects analyses. The NH and HR double strand break repair pathways were analyzed separately.
Kernel machine analyses  were conducted using a quadratic kernel to test the hypothesis of no linear or quadratic effects for any of the tested SNPs, or any SNP-SNP interactions among them. This analysis is equivalent to fitting a mixed model and testing whether the variance of the random effects τ2=0. The mixed model is:
where pi is the probability individual i has breast cancer; X is a vector of observed covariates (including an intercept) and β a vector of their fixed effects; Gij is the genotype at SNP j for individual i, coded additively (0, 1, or 2 counts of the minor allele); γj and γij are independent random effects, each distributed as N(0,τ2). The kernel defines a similarity matrix among subjects based on their genotypes; the quadratic kernel used here was K(Gi,Gj)=(1+ Gi’Gj)2.
Due to computational intensity only those SNPs with a raw p-value less than 0.05 in the covariate-adjusted single SNP analyses were fed into the kernel machine. Because restricting the analysis to univariately significant SNPs can downwardly bias p-values, we used permutation to assess statistical significance (1,000 permutations). The complete case approach was used to handle missing data; to verify the appropriateness of the complete case approach, the kernel analysis was repeated after filling in missing genotypes with imputed CGEMS data, and a similar result was obtained. The imputation methods applied are described elsewhere .
Main effect and pathway analyses were repeated classifying invasive breast cancer cases according to their ER or PR status, and comparing them to non-cases. ER and PR status were available from pathology reports for a subset of the 1,145 breast cancer cases (ER+ cases: 807, ER− cases: 181, PR+ cases: 666, PR− cases: 297). AML analyses were performed using AMLcalc ; all other analyses including the calculation of Meff,i were performed using R 2.8.0 .
Table 1 lists, by outcome of interest (any breast cancer, ER+, ER−, PR+, PR−), all common variants with a Meff,i corrected p-value less than 0.05 in the main effects analysis using unconditional logistic regression adjusted for matching factors (age, PMH use) and population structure. Given the total number of effective tests performed per outcome in the main effects analysis (ΣMeff,i = 862) one would expect 43 Meff,i corrected p-values to exceed this 0.05 threshold per outcome when the null hypothesis of no association is true (862 × 0.05 = 43.1). Meff,i adjusted p-values presented represent a gene-level correction and do not take into account the number of genes or outcomes tested.
Two SNPs exceeded the corrected threshold with respect to Breast Cancer susceptibility. Rs6151838, an intronic variant within MSH3 (member of the MMR pathway), had a Meff,MSH3 adjusted p-value of 0.048. Rs17136898, an intronic variant within RPA3 (member of the NER pathway), had a Meff,RPA3 adjusted p-value of 0.046. This RPA3 variant was also associated with ER− breast cancer (Meff,RPA3 adjusted p-value = 0.007) and PR+ breast cancer (Meff,RPA3 adjusted p-value = 0.038).
Two intronic SNPs (rs274860 and rs2304136) within LIG1 (a member of the BER pathway) had Meff,LIG1 adjusted p-values <0.05 for association with both ER+ and PR+ breast cancer susceptibility. These two SNPs are in high LD (r2 =0.84). Rs274860 had the smaller set of p-values (ER+: Meff,LIG1 adjusted p-value = 0.006; PR+: Meff,LIG1 adjusted p-value =0.013).
In addition to the previously mentioned LIG1 and RPA3 associations, three SNPs exceeded their corrected gene specific threshold for association with PR+ breast cancer. Rs12805507 and rs408199, both had a Meff,FANCF adjusted p-value of 0.009. They are intergenic SNPs in high LD (r2=0.89) located near FAN gene FANCF. Rs5743030, an intronic SNP within PMS1 (a member of the MMR pathway had Meff,PMS1 adjusted p- value of 0.047).
Five SNPs exceeded their corrected gene-specific threshold for association with ER− breast cancer. RPA3 contained three associated SNPs including rs17136898 (previously mentioned) and two additional associated SNPs (rs13237260 and rs10952069; intergenic SNPs in high LD located downstream of RPA3 (r2=0.96)). Some LD exists between rs17136898 and rs13237260 (r2=0.69) and between rs17136898 and rs10952069 (r2=0.68). Rs13237260 had the lowest Meff,RPA3 adjusted p-value (0.007). POL gene POLE contained a missense SNP (rs5745066) with Meff,POLE adjusted p-value of 0.027. APEX1, of the BER pathway contained intronic SNP rs2275008 which had a Meff,APEX1 adjusted p-value of 0.033.
Eight of the nine SNPs with Meff adjusted p-values < 0.05 for their association with risk of PR− breast cancer reside on XRCC4, a key player in the NH pathway. Fig. 1 depicts r-square values or SNPs on XRCC4 and their corresponding p-values for the PR–analysis. It suggests that the large number of associated SNPs on XRCC4 may be due to high LD (the smallest Meff,XRCC4 adjusted p-value observed was 0.012). The 9th SNP to exceed the corrected threshold is rs12572872, and intronic SNP within DCLRE1C, also residing within the NH pathway (Meff,DCLRE1C adjusted p-value =0.020).
None of the pathways or groupings of DNA repair related genes, tested using AML, appeared associated with susceptibility to breast cancer independent of ER or PR status. Raw p-values for trend for all AML analyses are presented in Table 2, uncorrected for the number of pathways or outcomes tested. BER appeared mildly associated with ER+ and PR+ breast cancer (p-values of 0.032 and 0.049 respectively), as did NER with ER− breast cancer (p-value=0.048). However, none of these associations are strong enough to survive a multiple testing correction for the number of pathways investigated. The NH association with PR− breast cancer is strong enough to survive a multiple testing correction (p-value=0.002). To examine whether this NH pathway effect is driven by the multitude of highly linked SNPs on XRCC4, we repeated the analysis excluding the eight associated XRCC4 SNPs, and obtained a p-value of 0.86.
916 of the 1,000 kernel permutations had a p-value more extreme than that observed in our data, therefore we cannot reject the null hypothesis that there are no main effects or pair-wise SNP-SNP interactions (p=0.92). To assess whether it was appropriate to apply a complete case approach, the same analysis was repeated using the imputed CGEMS data, and a similar result was obtained (permutation p-value=0.85).
Specific types of DNA damage trigger a repair response from specific DNA repair pathways. For example, the BER pathway responds to a wide variety of non-bulky exogenous and endogenous oxidative DNA damage and single strand breaks  while the NER pathway is a versatile repair system to remove a wide variety of bulky, helix-distorting lesions and adducts [29, 30]. Therefore, considering common genetic variants individually may mask biologically relevant pathway effects. We applied AML and Kernel machine analyses to examine the effect of common genetic variation at the pathway level.
In addition to five traditional DNA repair pathways, we considered Fanconi Anemia complementation group genes and DNA polymerases for their role in DNA repair. FAN genes interact with DNA-damage response proteins and other proteins related to cellular responses to carcinogenic stress and to caretaker and gatekeeper functions. The products of POL genes are specialized for operation in distinct DNA repair pathways, or for bypass of specific classes of adducts in DNA . To our knowledge this study is the most exhaustive investigation to date into pathway effects of DNA repair related genes with respect to breast cancer.
In this study we have presented a comprehensive screen of the effect of common variation within a broader range of DNA repair genes than has, to our knowledge, been previously studied on breast cancer and specifically invasive postmenopausal breast cancer. Additionally this study builds on previous work [32, 33] by considering the potential impact of full pathways and pairwise interaction. We have also examined the relevance of each SNP and pathway with and without respect to ER and PR subtypes -acknowledging the heterogeneity of breast cancer and allowing detection for subtype specific effects that may not be noticeable when pooling all breast cancer cases together.
We observed no more independent associations than would be expected by chance. Thus it is probable that most if not all of our observed nominally significant associations are likely to be false positive results. However, the identity of false positive and the false negative results cannot be discerned from our data. Replication is essential in studies of large numbers of common variants to identify the true nature of each SNP’s association.
SNP rs999737 within RAD51L1, a gene on the NH pathway, achieved genome-wide significance in the third stage of the CGEMS initiative  (p-value = 1.74×10(−7); ORheterozygote = 0.94(95% CI: 0.88–0.99)). In this study we observed rs999737 to have an uncorrected p-value of 0.012. We have not found evidence of common variants with large effects; however, our study is underpowered to detect modest effects and we may therefore lack the ability to detect individual SNPs with small effects.
Due to the number of tests performed in our analysis, one ought to anticipate a high degree of false positive test results in the absence of appropriate statistical controls. We applied a correction for multiple testing to reduce false positive associations that takes into account LD structure in determining the number of effective tests. Many of the SNPs which exceeded our corrected threshold at the gene level were in high LD with one another and therefore each cluster of LD was likely explaining a single underlying haplotype.
We have previously investigated the association between 1,050 common variants in 60 DNA repair genes and pre-menopausal breast cancer risk among 239 cases and 477 matched controls within the Nurses’ Health Study II . Within these premenopausal women, we found suggestive evidence that common variants in XPF and XRCC3 genes may influence risk, as well as common variation in the NH pathway.
Haiman et al.  investigated common genetic variation in 60 genes from within the same set of pathways using the Multiethnic Cohort Study (MEC). This analysis utilized the genotypes for 1367 tag SNPs from 2,093 breast cancer cases and 2,303 controls stemming from five ethnic sub-populations to identify candidate loci. The study sought replication of 15 tag SNPs in three populations of varying ethnic composition including the NHS to identify “pan-ethnic” alleles. The authors observed a lack of replication when comparing United States and United Kingdom study populations.
Both of the above studies investigated pathway effects by studying whether an increased count of variant alleles within a pathway was correlated with increased breast cancer susceptibility. Our study utilizes a more inclusive set of tag SNPs, and more comprehensive pathway approach to investigate this question in greater detail.
Our observations support the general conclusions and lack of replication between the majority of the previous studies into common variation within DNA repair genes. They suggest that when not taking into account environmental interaction, common variation in DNA repair genes plays at most a small role in determining postmenopausal invasive breast cancer risk among American women of European ancestry.
While mutations in key DNA repair genes (for example BRCA1, BRCA2, ATM) have been found to have a major impact on breast cancer risk, it is understandable that the tangible impact of common variation in these genes could be low due to redundant repair systems compensating for one another. Breast tissue is thought to lack redundancy relative to other tissues; this study is consistent with the hypothesis that residual redundancies may be sufficient or that variants that markedly degrade DNA repair capacity are under selective pressure.
It is possible that environmental factors, such as antioxidant or folate deficiency, that insult the DNA or impair DNA repair capacity, may interact with common variants in DNA repair genes in a way such that the meaningful impact of the variant is only detectable when stratifying by the presence or absence of the environmental factor. Such interaction would make replication across diverse populations, as was attempted in the multiethnic study by Haiman et al. , unlikely if the populations geographically or temporally differ greatly with respect to environment. Further research is needed to investigate the effect of environmental interaction on the relationship between common variation in DNA repair genes and breast cancer susceptibility.
This project was funded by NCI grant CA118447. Genevieve Monsees was supported by PHS T32-ES016645-01. The authors would like to thank Constance Chen for preparing the data and for producing Fig. 1. We also thank the participants of the Nurses’ Health Study for their dedication and commitment.