|Home | About | Journals | Submit | Contact Us | Français|
We comprehensively evaluated genetic variants in DNA repair genes with premenopausal breast cancer risk.
In this nested case-control study of 239 prospectively ascertained premenopausal breast cancer cases and 477 matched controls within the Nurses’ Health Study II, we evaluated 1,463 genetic variants in 60 candidate genes across 5 DNA repair pathways, along with DNA polymerases, Fanconi Anemia complementation groups, and other related genes.
Four variants were associated with breast cancer risk with a significance level of <0.01; two in the XPF gene and two in the XRCC3 gene. An increased risk was found in those harboring a greater number of missense putative risk alleles (a priori defined in an independent study) in the non-homologous end-joining repair pathway of double-strand breaks (odds ratio per risk allele, 1.37 (95%confidence interval, 1.03–1.82), P trend, 0.03).
This study implicates variants of genes in the double-strand break repair pathway in the etiology of premenopausal breast cancer.
Breast cancer is the most common cancer and the second leading cause of cancer death among women in the United States. Epidemiological studies have shown that familial breast cancer constitutes only about 5–10% of total breast cancer, and only 15–20% of the observed familial clustering of breast cancer is attributable to strongly predisposing BRCA1 and BRCA2 mutations . Most of the genetic variants that contribute to the risk of developing sporadic breast cancer remain unknown .
Deficient DNA repair capacity has been suggested as a predisposing factor in familial and sporadic breast cancer [2–5]. Reduced DNA repair capacity among breast cancer cases has been observed in mutagen (X-rays, bleomycin, and BPDE [benzopyrene dihydrodiol epoxide]) sensitivity assays conducted in human peripheral blood lymphocytes [5–9] and in host cell reactivation assays with BPDE- or UV-induced damage [10, 11]. The wide range of carcinogens used in these assays suggests that defects in global DNA repair capacity, rather than a single substrate-specific DNA repair pathway, underlie cancer risk. The spectrum of p53 gene mutations in breast cancer suggests the involvement of multiple genotoxic compounds and DNA repair abnormalities in breast cell mutagenesis [12, 13]. The importance of DNA repair in breast cancer development is further supported by the involvement of BRCA1 and BRCA2 in many critical cellular processes including multiple DNA repair pathways and apoptosis through protein-protein interactions and transcriptional regulation. One mechanism that may lead to inter-individual variation in DNA repair capacity is germline variation in DNA repair genes [14–16]. Even though a variety of factors modulate the path from genotype to phenotype, there are substantial correlations between DNA repair gene variants and DNA repair capacity . A deficient DNA repair capacity may be attributable to multiple polymorphisms in multiple DNA repair pathways.
Breast cancer in premenopausal women is more aggressive, with a poorer prognosis than postmenopausal breast cancer. The etiology for premenopausal breast cancer may differ from that for postmenopausal women, and involve a relatively stronger component of inherited predisposition. In this study of 239 cases and 477 matched controls among premenopausal predominantly Caucasian women in a nested case-control study within the Nurses’ Health Study II, we comprehensively and systematically evaluated genetic variation in 60 DNA repair genes in relation to breast cancer risk. These pathways/genes included direct reversion repair (MGMT), base excision repair (BER) (APE1, LIG3, NEIL1, NEIL2, OGG1, PARP1, XRCC1, FEN1), nucleotide excision repair (NER) (XPA, ERCC3, XPC, ERCC2, ERCC4, ERCC5, ERCC1, LIG1, ERCC6, ERCC8, RPA1, RPA2, RPA3), double-strand break (DSB) repair via a) homologous recombination (HR) (RAD50, RAD51, RAD52, XRCC2, XRCC3, NBN, MRE11A, ATM, ATR) or b) non-homologous end-joining (NHEJ) (XRCC4, XRCC5, XRCC6, ARTEMIS, PRKDC, LIG4), mismatch repair (MMR) (MSH2, MSH3, MSH6, MLH1, MLH3, PMS1, PMS2), DNA polymerases (POLB, POLD1, POLE, POLI, POLK), Fanconi Anemia complementation groups (FANCA, FANCC, FANCD2, FANCE, FANCF, FANCG), and other related genes (CHEK1, CHEK2, TP53, PCNA, BLM).
The Nurses’ Health Study II was established in 1989 when 116,609 female registered nurses, ages 25 to 42 years, completed andreturned a mailed questionnaire. The cohort has been followed biennially to update exposures and ascertain newly diagnosed diseases. Between 1996 and 1999, 29,611 cohort members who werecancer-free and between the ages of 32 and 54 years providedblood samples . Briefly, participants were sent a short questionnaire and a blood collection kit containing necessary supplies to have blood samples drawn by a local laboratoryor a colleague. Premenopausal women who had not taken oral contraceptives, been pregnant, or breast-fed within 6 months (n = 18,521) providedblood samples drawn on the 3rd to 5th day of their menstrualcycle (follicular draw) and 7 to 9 days before the anticipated start of their next cycle (luteal draw). All other women (n = 11,090) provided a single 30-mL, untimed blood sample. These samples were collected in a similar manner, shipped viaovernight courier with an ice pack to our laboratory, and separatedinto plasma, RBC, and WBC components. Samples have been storedin liquid nitrogen freezers since collection. Menopausal status determination for women providing untimed samples has been described previously . Follow-up of the blood cohort was 98% in 2003. The study was approved by the Committee on the Use of Human Subjects in Research at Harvard School of Public Health andBrigham and Women’s Hospital.
Breast cancer cases were identified on biennial questionnaires;the National Death Index was searched for nonresponders. Caseshad no previously reported cancer diagnosis and were diagnosedwith breast cancer after blood collection but before June 1,2003. Each of the 239 premenopausal cases of breast cancerwas matched to two premenopausal controls (one pair with only 1:1 matching) (total n = 477) on age (±2years), month/year of blooddraw (±2 months), and race/ethnicity (Caucasian, African American, Asian, Hispanic, Other) (>93% of cases and controls are Caucasian), and for each blood collection, time of day (±2 hours), and fasting status (<2 h, 2–4, 5–7, 8–11,≥12). For each matching variable, >90% of matches were exact.
The characterization of common genetic variation in candidate DNA repair and related genes was conducted by genotyping a high density of common SNPs across the promoter, untranslated regions (UTRs), and coding and non-coding regions of 60 DNA repair genes . Briefly, genotype data were collected from seven population samples, including 20 CEPH trios (60 individuals in total), which are a subset of the 30 trios used in the HapMap and 70 White subjects from the Multiethnic Cohort (MEC) study . In total, about 3,000 SNPs have been genotyped across these 60 genes, including a high density of common SNPs (n > 2,700, minor allele frequency ≥ 5%) selected from the public dbSNP database and all known missense SNPs (>300, minor allele frequency ≥ 1%) identified through gene resequencing from the Environmental Genome Project (http://egp.gs.washington.edu/); the average spacing of common SNPs across each locus is 1.7 kb. Tag-SNPs were selected by the Tagger approach , which combines pairwise r2 methods  with the potential efficiency of multi-marker approaches . In the selection of tag-SNPs for Caucasians (r2 >0.8), these SNPs genotyped in-house in the 20 CEPH trios and the HapMap phase I data of the same 60 Caucasians were combined to achieve a much higher density of SNP markers. The patterns of linkage disequilibrium (LD) in these individuals should provide an accurate estimate of the patterns in our study population . The detailed description of the tag SNP selection for predicting untyped SNPs was presented elsewhere . In brief, 91% of HapMap phase II SNPs are predicted by this panel with 80% or greater multi-allelic r2.
High-throughput genotyping was performed using the Illumina high-multiplex BeadArray genotyping system at the MIT Broad Institute, Center for Genotyping and Analysis. The assay employs allele-specific extension methods and universal PCR amplification reactions conducted at 1,536 loci. DNA samples were processed through the highly multiplexed GoldenGate protocol using bar-coded microwell plates and robust automation systems. Among the 1,536 SNPs, there are 1,463 SNPs in 60 DNA repair genes, as described above.
The initial set of SNPs was chosen to include tag-SNPs for other ethnicities. Excluding 98 non-Caucasian SNPs, 1263 (88%) SNPs had a genotyping success rate >95%, and 1322 (92%) SNPs had a genotyping success rate ≥80%. SNPs with a genotyping success rate <80% were excluded from further analysis. Eight pairs of blinded duplicate samples were included. Analysis of 10072 pair tests revealed a 99.95% overall concordance rate. Five SNPs that failed the concordance test were excluded. Among these 1317 SNPs, there remained 1256 SNPs in the DNA repair genes for further analysis. There were 1088 out of the 1256 SNPs with minor allele frequency >0.01 in controls of our study. Among the controls, 38 loci had Hardy-Weinberg equilibrium χ2 p-values < 0.01 and were excluded. Hence, the final analysis included 1050 SNPs in the DNA repair genes.
Conditional logistic regression was employed to calculate odds ratios (ORs) and 95% confidence intervals (CIs). The test for main effects of SNPs was based on the additive model, treating genotype as an ordinal variable (wildtype coded as 0, heterozygote as 1, and homozygotes variant as 2). All P values were two-sided.
The Bonferroni correction, which is the most commonly used method to adjust type I error, α, treats every single-SNP test as an independent test and is overly conservative for SNPs that are in LD, because the Bonferroni correction ignores the correlation among SNPs. To address this limitation, we calculated the effective number of independent SNPs, Meff,i, for each candidate gene i, on the basis of the spectral decomposition (SpD) of matrices of pair-wise LD between SNPs [25, 26]. Meff provides a simple correction for multiple testing of non-independent SNPs in LD with each other. For each SNP for candidate gene i, the multiplicity-adjusted point-wise α (αp) was then calculated as α/Meff,i.
Analysis of interactions between genetic variants and family history of breast cancer and subgroup analysis according to estrogen receptor (ER) and progesterone receptor (PR) status were restricted to those variants with P values <0.05 in the analysis of main effect. Unconditional logistic regression was used in these analyses. We modeled family history of breast cancer as a dichotomous variable (yes/no) and genotypes as carriers of variants vs non-carriers in the interaction analysis. We used a likelihood ratio test (LRT) to compare nested models that included terms for all combinations of the genotype and family history in the models with indicator variables for the main effects only. In subgroup analysis, each subtype of cases was compared with the common controls.
In the final panel of 1,050 SNPs after exclusion criteria (refer to Results section), 65 SNPs were missense SNPs. Among them, 4 SNPs (NEIL2 rs8191664, CSB rs2228529, CSB rs2228526, and XPD rs1799793) were in high LD (defined as r2>0.90) with another missense SNP in the same gene and were excluded. Eight women had missing genotype data at > 10 loci and were removed. Hence, the analysis of missense SNPs was restricted to 61 SNPs in 31 genes among 708 women. We used the Partition-Ligation Expectation-Maximization (PLEM) algorithm  to impute the missing genotypes based on the estimated haplotype frequencies within each gene. In the event of only one single SNP in a candidate gene, missing genotypes were imputed by using the most common genotype for that SNP (User Manual of open source Java software Multifactor Dimensionality Reduction (MDR) 1.0.0 (http://sourceforge.net/projects/mdr/)) [28, 29].
To test the hypothesis that multiple missense SNPs in the same pathway have an additive effect on breast cancer risk, we estimated the combined effect of the risk alleles for these SNPs in each pathway. First, we evaluated the main effect associated with each minor allele in an independent dataset, a set of 45 cases and 90 controls in premenopausal Caucasian women in the Multiethnic Cohort study . If the minor allele was associated with an increased risk of breast cancer, we designated the minor allele as the risk allele. If the minor allele was found to be inversely associated with risk, we designated the common allele as the risk allele. We applied this a priori definition of risk allele for each locus from this independent dataset to risk allele designation in our study population. We summed the number of risk alleles of each pathway for each individual and evaluated the risk associated with the increasing number of risk alleles.
Participants were 32 to 52 years old (mean, 44 years) at blood collection (Table 1). Differences between cases and controls for age at menarche, parity, and BMI at blood draw generally were small. A higher percentage of cases versus controls had a family history of breast cancer (19.3% versus 12.3%, respectively) and a history of benign breast disease (22.2% versus 16.1%, respectively).
Forty-four SNPs were associated with altered pre-menopausal breast cancer risk in our study (Table 2), with P value <0.05 in the additive model. These 44 SNPs were located in 18 DNA repair genes with 1–3 SNPs per gene except for the XPF and XRCC3 genes. There were 9 SNPs in XPF and 6 in XRCC3. Among the 44 SNPs, four SNPs showed a significance level of <0.01; two SNPs in the XPF gene (R2=0.88) and two SNPs in the XRCC3 gene (R2=0.99). The LD plots for these two genes are displayed in Figure 1.
The data on the main effect of 1050 SNPs are provided in Supplementary Table 1. We performed analysis on interactions between genetic variants and family history of breast cancer and subgroup analysis according to estrogen receptor/progesterone receptor (ER/PR) status. These analyses were restricted to those variants with P value <0.05 in the analysis of main effect. The data are provided in Supplementary Tables 2–3.
We calculated the Meff value by SNPSpD for each of the 60 candidate genes (Table 3). On average, each candidate gene has 17.5±14.18 (Mean±SD) SNPs (range: 5 [NEIL1] - 69 [MGMT] SNPs). Because of the linkage disequilibrium (LD) among SNPs within each gene, on average, the value of Meff of each candidate gene is 14.18±10.01 (range: 3.44 [NEIL1] - [MGMT] 63.12). The percentage of reduction (i.e. how much the use of SNPSpD has “compressed” the total number of SNPs for a candidate gene i, defined as ) is 21.23±7.63% (range: 8.52% [MGMT, 69 SNPs, Meff = 63.12] - 45.97% [MLH3, 9 SNPs, Meff = 4.86]). We used the Meff value for correcting for multiple comparisons for each gene. As shown in Table 3, for all genes, the smallest P value for individual SNP was larger than the significance threshold adjusted by Meff value.
We evaluated the effect of multiple missense SNPs on premenopausal breast cancer risk. We first evaluated the main effect associated with each minor allele in a set of 45 cases and 90 controls in premenopausal Caucasian women in the Multiethnic Cohort study. We used the direction of the associations observed in this independent dataset as a priori definition of risk allele for each locus to assign risk allele in our study population. We summed the number of risk alleles of each pathway for each individual and evaluated the risk associated with the increasing number of risk alleles. The associations between the number of putative risk alleles carried in each pathway and breast cancer risk are presented in Table 4. A trend toward increased risk of breast cancer was found among women carrying a greater number of putative risk alleles in the DSB-NHEJ pathway. The OR associated with an additional risk allele in this pathway was 1.37 (95%CI, 1.03–1.82; P for trend, 0.03). Compared with women with 2–3 risk alleles, those with 4 risk alleles had OR of 1.69 (95%CI, 1.08–2.64) and those with 5–6 risk alleles had OR of 1.92 (95%CI, 1.02–3.60). No significant trend was observed for other pathways.
Despite evidence of the role of high-penetrance mutations in BRCA1/2 in breast cancer, the importance of common inherited variants in DNA repair pathways and their interactions with environmental factors in causing breast cancer are relatively unknown. There are some published data on select genetic polymorphisms in DNA repair genes and breast cancer risk. However, previous studies have not given extensive consideration to multiple genes and polymorphisms in the pathways. We evaluated in considerably more detail the common variants in DNA repair and related genes using both missense-SNP and tag-SNP approaches among premenopausal women.
Specific DNA repair pathways are responsible for the repair of different types of DNA damage. (1) The BER is responsible for a wide variety of non-bulky exogenous and endogenous oxidative DNA damage and single strand breaks . (2) The NER is a versatile repair system to remove a wide variety of bulky, helix-distorting lesions and adducts induced by environmental chemicals or endogenous metabolites [31, 32]. (3) The HR and NHEJ are two distinct mechanisms in the repair of DSB in mammalian cells. DSBs can be induced by other exogenous agents and endogenous reactive oxygen species. DSBs can also be generated as products of blocked replication forks and programmed rearrangements [33, 34]. (4) The MMR is responsible for the repair of base mispair and insertion/deletion mispair. Mutations in genes involved in mismatch repair (MSH2, MLH1, PMS1, and PMS2) result in microsatellite instability and replication errors. (5) The O6-methylguanine DNA methyltransferase (MGMT) is the gene involved in the direct reversal DNA repair that removes alkyl or methyl adducts from the O6 position of guanine. (6) Other candidates include Fanconi Anemia complementation groups and DNA polymerases . Fanconi anaemia genes interact with DNA-damage-response proteins and other proteins related to cellular responses to carcinogenic stress and to caretaker and gatekeeper functions. Many different DNA polymerases found in human cells are specialized for operation in distinct DNA repair pathways, or for bypass of specific classes of adducts in DNA .
A complex disease such as breast cancer occurs through an intricate multifactorial interaction of genetic risk factors. In the analysis of main effect of 1,050 SNPs, two SNPs in the XRCC3 gene and two in the XPF gene were associated with altered breast cancer risk with P <0.01. There were 6 SNPs in the XRCC3 gene and 9 SNPs in the XPF gene with P <0.05. The XRCC3 gene is involved in DSB repair and the XPF gene is involved in NER pathway. Further work is needed to replicate these findings and identify variants across both loci to determine the optimal candidates for epidemiological and functional studies.
A dose-response relation between the increasing number of risk alleles in DNA repair genes and the decreased DNA repair capacity at the individual level has been shown . We thus analyzed combined missense SNPs in each pathway. We defined risk alleles for missense SNPs on the basis of an independent external dataset of premenopausal Caucasian breast cancer cases and controls and evaluated the combined effect of these risk alleles in each pathway in our study. We found a significant trend of increased risk with increasing numbers of risk alleles in the DSB-NHEJ pathway. No such trend was observed for other pathways, which suggests differential contribution of each DNA repair pathway to breast cancer risk. The importance of DSB repair in breast cancer development is further supported by the involvement of BRCA1 and BRCA2 in the repair process of DSB. It has been shown that breast epithelium uniquely lacks redundant systems of DSB repair that are present in other tissues [38, 39], which suggests defects in the repair of DSB may be particularly important for breast cancer development. The NHEJ is the predominant mechanism in the repair of DSB in mammalian cells and is an error-prone repair process. Our data suggest the additive or synergistic effect of multiple DNA repair variants in the NHEJ pathway on premenopausal breast cancer risk and highlight the importance of a pathway-based approach to analyze multiple genes and polymorphisms for risk assessment. Further research is warranted to confirm these findings in premenopausal Caucasian women.
We thank Dr. Paul de Bakker at the Broad Institute of Harvard and MIT for selecting tagging SNPs. We thank Pati Soule for her laboratory assistance, Dr. Daniel B. Mirel at the Broad Institute Center for Genotyping and Analysis for his coordination, and Dr. Fredrick Schumacher for generating the LD plots. We also thank the participants in the Nurses’ Health Study for their dedication and commitment. The work is supported by NIH grants CA098233, CA118447, CA067262, and CA050385. The Broad Institute Center for Genotyping and Analysis is supported by grant U54 RR020278-01 from the National Center for Research Resources.