|Home | About | Journals | Submit | Contact Us | Français|
Mutagen sensitivity, a measurement of chromatid breaks induced by various mutagens in short-term cultures of peripheral blood lymphocytes, is an established risk factor for a number of cancers and is highly heritable. The purpose of this study is to identify genetic predictors of mutagen sensitivity. Therefore, we conducted a multi-stage genome-wide association study. The primary scan analyzed 539 437 autosomal SNPs in 673 healthy individuals, followed by validations in two independent sets of 575 and 259 healthy individuals, respectively. One SNP, rs8093763, on chromosome 18q21 showed significant association with bleomycin (BLM) sensitivity (combined P = 2.64 × 10−8). We observed significantly lower BLM-induced chromotid breaks for genotypes containing wild-type allele compared with the homozygous variant genotype in the discovery set (0.71 versus 0.90, P= 3.77 × 10−5) and in replication phase 1 (0.61 versus 0.84, P= 7.00 × 10−5). The result of replication phase 2 was not statistically significant (0.65 versus 0.68, P= 0.44). This SNP is approximately 64 kb from PMAIP1/Noxa, which is a radiation-inducible gene and exhibits higher expression in BLM-sensitive lymphoblastoid cell lines than insensitive cell lines upon BLM treatment. In conclusion, we identified a biologically plausible genetic variant on 18q21 near the PMAIP1/Noxa gene that is associated with BLM sensitivity.
Mutagen sensitivity, a measurement of chromatid breaks induced by various mutagens in short-term cultures of peripheral blood lymphocytes, is used as an indicator of DNA repair capacity (1–3). Numerous retrospective and prospective epidemiologic studies have shown that higher mutagen sensitivity predisposes individuals to a variety of epithelial cancers (4–9). Furthermore, compelling evidence suggests that mutagen sensitivity is an inherited phenotype. Family studies have shown that the first-degree relatives of mutagen-sensitive individuals have a significantly higher rate of the mutagen-sensitive phenotype than relatives of individuals with normal sensitivity (10). Bleomycin (BLM), a radiomimetic chemotherapeutic drug, was the first applied challenge mutagen (1–3). BLM induces oxidative base damage and DNA-strand breaks that require two major DNA repair pathways, base excision repair and double-strand break repair. Three independent twin studies that compared similarities between monozygotic twins and dizygotic twins demonstrated that BLM sensitivity is a genetic trait with heritability estimates of 41, 58 and 75%, respectively (11,12), indicating high genetic heritability for this intermediate cancer phenotype. However, the genetic factors that determine BLM sensitivity remain unknown. Cloos et al. (13) performed a microarray analysis in lymphoblastoid cells from BLM sensitive and insensitive individuals to search for genes and pathways responsible for BLM sensitivity and observed differential expression of genes in multiple biological pathways, but no specific pathway could be singled out to explain the difference in BLM sensitivity. These results suggest that there may not be a single major gene or a major pathway determining BLM sensitivity. More likely, and similar to genetic susceptibility to common cancers, there are a number of low-penetrance genes that determine BLM sensitivity.
Genome-wide association study (GWAS) have successfully identified genetic determinants for intermediate biomarkers, such as plasma levels of lipoprotein and vitamin B12 (14,15). However, to our knowledge, there have not been any reports of GWAS for intermediate cancer susceptibility biomarkers. In this study, we conducted a multi-stage GWAS to identify genetic variations associated with BLM sensitivity in healthy individuals of European ancestry.
Table 1 shows the selected characteristics of the study participants in each phase.
In the discovery phase, we tested the association of 539 437 SNPs that passed strict quality control measures in 673 healthy participants. These individuals were from the control population in a recently published GWAS of bladder cancer risk for whom the BLM sensitivity measurement was available (16). The quantile–quantile plot for the multiplicity adjusted P-values showed no evidence for a systematic inflation of P-values (genomic inflation factor λ = 0.969) (Supplementary Material, Fig. S1). We further determined the potential effect of population substructure by using two approaches: one including further adjustment for strata identified by PLINK and the other including further adjustment for residual population structure using the top four principal components by EIGENSTRAT. These two approaches showed similar strength of associations with no evidence for the inflation of the multiplicity adjusted P-values (λ = 0.969 for PLINK and λ = 0.968 for EIGENSTRAT, Supplementary Material, Fig. S1), indicating that the effect of population substructure was negligible.
In the first replication phase, we selected SNPs with multiplicity adjusted P < 10−3 level in the discovery phase to be validated in an independent set of 575 individuals who had BLM sensitivity data from the control population of a recent GWAS (using Illumina's Humanhap300 chip) of lung cancer risk (17). A total of 554 SNPs from discovery set were significantly associated with BLM sensitivity satisfying these criteria (Fig. 1). Among these 554 SNPs, 287 were covered by the HumanHap300 chip, 245 were imputed from HapMap release 22, build 36, and 22 SNPs were not HapMap SNPs and hence not included in the analysis. All the imputed SNPs showed high quality with an average predicted r2 of 0.93 between imputed genotypes and the true underlying genotypes.
We then performed a second validation using TaqMan assay in 259 healthy individuals from additional controls of the ongoing bladder and lung case control studies for the 13 SNPs that were significantly associated with BLM sensitivity at P < 0.05 in the first replication phase (Supplement Material, Table S1). There was no significant heterogeneity between the discovery and first replication phase (P for heterogeneity > 0.1) for these 13 SNPs. Since SNP rs10093667 failed the TaqMan design, only 12 SNPs were genotyped in this second replication phase. Out of the 12 SNPs, 4 SNPs were imputed in the first replication phase. Therefore, we also performed direct genotyping using TaqMan assays for these four SNPs in the first replication phase (Supplement Material, Table S2).
Overall, SNP rs8093763 showed the strongest evidence of association with BLM sensitivity in a recessive model (combined P = 2.64 × 10−8, Table 2). We observed significantly lower BLM-induced chromatid breaks (BICB) for genotypes containing wild-type allele compared with the homozygous variant genotype in the discovery set (0.71 versus 0.90, P = 3.77 × 10−5) and in the first replication phase (0.61 versus 0.84, P = 7.00 × 10−5). The result of replication phase 2 was not statistically significant (0.65 versus 0.68, P = 0.44). There was no significant evidence for heterogeneity among the three phases (P for heterogeneity = 0.23). The second strongest association was observed for rs708547 on chromosome 4q12 using the recessive model (combined P = 8.79 × 10−7, P for heterogeneity = 0.93, Table 2). The genotypes with at least one wild-type allele exhibited consistently higher BICB than the homozygous variant genotype in the discovery set (0.74 versus 0.56, P = 2.94 × 10−4), in replication phase 1 (0.63 versus 0.47, P = 1.12 × 10−2) and in replication phase 2 (0.66 versus 0.51, P = 3.29 × 10−2). We also observed a consistent association for SNP rs4662834 on chromosome 2q21 using the dominant model (combined P = 4.89 × 10−6, P for heterogeneity = 0.72, Table 2). The common homozygous genotype had higher BICB than variant allele carrying genotypes in the discovery set (0.79 versus 0.70, P = 1.54 × 10−4), in replication phase 1 (0.67 versus 0.59, P = 2.37 × 10−2) and in replication phase 2 (0.69 versus 0.63, P = 1.68 × 10−1).
Rs8093763 is located on chromosome 18q21. Linkage disequilibrium (LD) analysis of the HapMap SNPs in its vicinity showed that it maps to a region of low LD structure, and no SNPs in the HapMap database are in strong LD (r2> 0.6) with rs8093763. We performed imputations for SNPs in the HapMap database within 1Mb of rs8093763 and computed the combined P-value for the discovery set and replication phase 1 (Fig. 2). Rs8093763 remained the most significant SNP in the region.
There are a few genes in this region, the nearest being PMAIP1/Noxa, about 64 kb from rs8093763. To support the biological plausibility of our observation, we queried the Gene Expression Database (GEO; http://www.ncbi.nlm.nih.gov/geo/) for data sets obtained from BLM-treated samples. We identified one data set (GDS1714) that compared gene expression differences of 14 lymphoblastoid cell lines (7 BLM-sensitive and 7 BLM-insensitive based on mutagen sensitivity assays) at baseline and after a 4 h exposure to BLM (13). The mean expression of PMAIP1/Noxa was higher in BLM sensitive (mean ± SD: 0.99 ± 0.58) than in insensitive cell lines (0.64 ± 0.60) at baseline, although the difference was not statistically significant (P= 0.28) possibly due to small sample size. In BLM-insensitive cells, the expression of PMAIP1/Noxa was little changed after BLM treatment (before versus after: 0.64 ± 0.60 versus 0.70 ± 0.41) (P= 0.80); however, in BLM-sensitive cells, the expression of PMAIP1/Noxa was apparently increased (before versus after: 0.99 ± 0.58 versus 1.37 ± 0.59) (P= 0.24), although again the data did not reach statistical significance possibly due to small sample size. When we compared the expression of PMAIP1/Noxa after BLM treatment, BLM-sensitive cells exhibited significantly higher expression (1.37 ± 0.59) than in BLM-insensitive cells (0.70 ± 0.41) (P= 0.03). We also performed same analyses of other genes (Fig. 2) in this region, but none of them showed similar trend as PMAIP1/Noxa (data not shown).
In this GWAS study, we identified a novel genetic determinant of BLM sensitivity that reached genome-wide significance in combined analysis. More importantly, rs8093763 is located on chromosome 18q21 and its closest gene is PMAIP1/Noxa, about 64 kb apart. PMAIP1/Noxa was originally isolated as a phorbol 12-myristate 13-acetate (PMA) induced protein 1 (PMAIP1) potentially involved in leukemogenesis (18). Its physiologic function was not known until a decade later when Oda et al. (19) discovered a mouse gene that was differentially expressed between γ-irradiated p53-wild-type and mutant mouse embryonic fibroblasts and named it Noxa (Latin for damage). PMAIP1 was then revealed as Noxa's human homologue. PMAIP1/Noxa is a Bcl2 homology domain 3 (BH3)-only protein that plays an important role in apoptosis. Extensive in vitro and in vivo studies have found that PMAIP1/Noxa is induced by multiple stimuli, including DNA damage (e.g. γ-irradiation and chemotherapeutic drugs), hypoxia and mitogenic stimulation (20). Since BLM is a radiomimetic drug that induces DNA-strand breaks, it is tempting to speculate that PMAIP1/Noxa is also inducible by BLM treatment. Preliminary pilot data from a GEO data set suggest that PMAIP1/Noxa is a BLM-inducible gene in BLM-sensitive cells, but the results did not reach statistical significance and need to be confirmed in larger studies. Another data set (GDS1492) compared the expression of PMAIP1/Noxa in vivo in lung tissues from mouse strains with different susceptibility to BLM-induced pulmonary fibrosis 3 and 6 weeks after BLM treatment (21). It appeared that BLM increased the expression of PMAIP1/Noxa in BLM-sensitive strain (C57BL/6J). But the results were difficult to interpret since it was an in vivo mice study and there were only two mice in each strain (one male and one female). Future in vitro and in vivo functional studies of larger sample sizes are needed to confirm the inducibility of PMAIP1/Noxa by BLM. Nevertheless, these results provide biological plausibility that PMAIP1/Noxa is a susceptibility locus for BLM sensitivity. It also remains to be studied whether rs8093763 affects PMAIP1/Noxa expression and whether there are other causal functional SNP(s) in PMAIP1/Noxa that are in LD with rs8093763. Interestingly, the MC4R gene that plays an important role in energy balance and obesity is located ~2.4Mb from rs8093763. Two recent GWAS identified several genetic variants between PMAIP1/Noxa and MC4R genes that were associated with waist circumference and BMI (22,23). Those SNPs were not in strong LD with rs8093763 and were not associated with BLM sensitivity (data not shown).
We also identified two other potential susceptibility loci for BLM sensitivity. Rs708547 is located in a region of high LD on chromosome 4q12. The two closest genes are C4orf14 (~8.2 kb apart) and RE1-silencing transcription (REST) factor (~22.9 kb apart). C4orf14 encode a putative mammalian mitochondrial nitric oxide synthase that may regulate mitochondrial functions and apoptosis (24). REST is a zinc finger transcriptional repressor regulating many genes and playing critical roles in embryonic development and a number of cellular responses in neurons and other cell types (25). Of particular interest, REST suppresses expression of Mad2, an essential component of the mitotic checkpoint /spindle-assembly checkpoint. REST aberrations resulted in shortened mitosis, premature sister-chromatid separation, chromosome bridges and other mitosis defects (26), providing biological plausibility for this gene in influencing BICB. There were no apparent expression differences of REST between BLM-sensitive and insensitive lymphoblastoid cell lines and before and after BLM treatment in GDS1714 data set (data not shown). Rs4662834 maps to a gene-poor region on chromosome 2q21. The nearest gene is heparan sulfate 6-O-sulfotransferase 1 (HS6ST1), approximately 402 kb from the SNP. Further validation with more samples is warranted to confirm the associations of these two SNPs with BLM sensitivity, identify causal SNPs and genes, and investigate the biological mechanisms underlying the validated associations.
Previous candidate gene studies have suggested correlations between a few potential functional polymorphisms in DNA repair and xenobiotic-metabolizing enzyme genes (e.g. GSTT1, GSTM1, XPD, XRCC1 and XRCC3) with BICB or micronuclei formation, but the sample sizes were small and the results were inconsistent (27–30). These potential functional polymorphisms were not covered in current SNP arrays; however, there are tagging SNPs from these genes. When we extracted tagging SNPs from 10 kb upstream and 10 kb downstream of these genes, none of the SNPs was among the top hits (P< 10−3) in our discovery phase. Many more samples are needed to validate or refute the associations of SNPs in these genes with BLM sensitivity.
In conclusion, in this multi-stage GWAS, we identified a biologically plausible genetic variant on 18q21 near the PMAIP1/Noxa gene that is associated with BLM sensitivity. We also found two additional potential genetic variants on chromosomes 4q12 and 2q21 that may be associated with BLM sensitivity and warrant further validation.
All the participants in this study were healthy controls of European ancestry from Kelsey Seybold Clinic, the largest multi-specialty physician group in the Houston metropolitan area. For the discovery phase, participants were enrolled as controls in a case control study of bladder cancer and involved in the primary scanning of a recently published GWAS of bladder cancer (16). Details of control ascertainment have been described previously (16). For the first validation phase, participants were enrolled as controls in a case control study of lung cancer and involved in the primary scanning of a recently published GWAS of lung cancer (17). The control recruitment of these two case control studies used the same infrastructure and procedure. For the second validation phase, participants were additional recent controls from these two ongoing case control studies who were not part of the published GWAS. After quality control, mutagen sensitivity data and complete demographic information were available for 673 individuals in the discovery set, 575 individuals in replication phase 1 and 259 individuals in replication phase 2. Informed consent was obtained from all study participants before the collection of epidemiological data and blood samples by trained MD Anderson staff interviewers.
Genotyping for the discovery set was performed using the Illumina HumanHap610 chip. We retained genotype data for 542 879 autosomal SNPs from 957 subjects that passed strict quality control measures for the SNPs and subjects (16). Genotyping for the replication phase 1 series was obtained from GWAS of lung cancer risk using Illumina's HumanHap300 BeadChip (17). We retained 303 669 autosomal SNPs after performing quality control measures for 1137 subjects using PLINK to exclude SNPs with call rate <95%, with minor allele frequency < 0.01, and with deviations from Hardy–Weinberg equilibrium (P < 0.0001). We removed 64 subjects that were either duplicated samples of the discovery set or genetically related samples with those in the discovery set identified by PLINK software (31). The resulting replication phase 1 contained 1073 subjects. We used TaqMan assays to perform genotyping for replication phase 2 using the 7900HT sequence detection system (Applied BioSystems, Foster City, CA, USA). We also performed direct genotyping of the four imputed SNPs in the first replication phase that showed significant association with BLM sensitivity.
Genotyping for the discovery phase and the first replication phase used two different genotyping platforms (HumanHap610 and HumanHap300, respectively). Nearly half of the top SNPs (P < 10−3) to be validated from the discovery phase were not covered by the HumanHap300 in the first validation phase. We imputed genotypes for SNPs that were genotyped in the HapMap project but were not on the HumanHap300 chip. The imputation was done using a Hidden Markov Model programmed in MACH (32). Phased haplotypes from HapMap CEU samples (HapMap release 22, build 36) were provided as the reference haplotypes to impute the unobserved genotypes based on the local haplotype information.
The standard procedure for the mutagen sensitivity assay using BLM as a challenge mutagen has been optimized and kept constant across different laboratories for over two decades (1–4,33). Briefly, 1 ml of each fresh whole blood sample was added to 9 ml of RPMI 1640 supplemented with 20% fetal bovine serum and 1% (volume for volume) phytohemagglutinin. After incubation at 37°C for 72 h, BLM was added to each culture for 5 h with a final concentration of 0.03 U/ml. During the last hour of BLM treatment, 0.04 µg/ml colcemid was added to the cells to arrest them in mitosis. Then the cells were harvested and stained with Giesma. The number of BICB was counted under microscope in 50 metaphases per sample and was recorded as the mean number of breaks per cell. The assays are set up on a daily basis when the fresh blood sample is delivered to the lab and the readings have been highly consistent. A previous multicenter study (4) compared the mean number of BICB per cell between two major US institutions and one European institution under the above assay conditions and found that the readings were highly consistent across different institutions.
BICB was analyzed as a continuous variable. In the primary scan, we used linear regression when adjusting for age, gender, pack-years of smoking, slide reader and blood collection year to compare the BICB by genotype of each autosomal SNP. The statistical significance of each SNP was determined by the lower P-value of the dominant model (comparing the BICB between common homozygous genotypes and heterozygous/rare homozygous genotypes) and the recessive model (comparing the BICB between common homozygous/heterozygous genotypes and rare homozygous genotypes). We chose the combination of dominant and recessive models since this yields optimal power to detect disease susceptibility loci under all of the three inheritance models, i.e. additive, dominant, or recessive, for quantitative traits (34). The genome-wide association was computed for the 539 437 autosomal SNPs with at least 20 subjects carrying one variant allele. If the number of homozygous variant genotype was ≤ 20, we only computed results under the dominant model. For SNPs tested for both the dominant and recessive models, the multiplicity adjusted P-value was computed by the Šidák procedure as 1−(1−Pmin)2. For SNPs tested only for the dominant model, the multiplicity adjusted P-value was the P-value for the dominant model. For replication phases 1 and 2, we performed linear regression while adjusting for age, gender, pack-year, slide reader and blood collection year using the most significant tests identified in the discovery set. To summarize results for the discovery set and the two replication sets, we performed meta-analysis for the most significant tests identified in the discovery set. All analyses were done with STATA software (version 10.1, STATA Corporation). Data manipulations were done using the R software package.
To assess the effect of population substructure, we used linear regression to perform two separate analyses for the discovery data, replication phase 1 and pooled data: one including strata identified by PLINK as covariates; the other including four eigenvectors from EIGENSTRAT (35). PLINK clusters individuals into homogeneous groups using genetic similarity based on pairwise identity-by-state distance. EIGENSTRAT analysis uses a principle component method to detect subpopulation. We used the eigenvectors associated with the top four eigenvalues determined by the scree plot of all the available eigenvalues in descending order obtained from the principle component analysis. The LD structure of the neighboring region containing loci associated with BICB was inferred by Haploview software (v4.1) (36). The screen shot of all the known genes in the region were obtained from UCSC genome browser (http://genome.ucsc.edu) (37).
This work was supported by the National Institutes of Health (CA127615 and CA74880 to X.W., CA133996 to C.I.A. and CA055769 to M.R.S.).
We thank the genotyping personnel, study coordinators and interviewers for performing experiments and recruiting participants. We are especially thankful for all the study participants, who make population based research possible.
Conflict of Interest statement. The authors declare no conflict of interest.