|Home | About | Journals | Submit | Contact Us | Français|
Despite limited discovery stages (<1,125 cases), genome-wide association studies (GWAS) have successfully identified 13 loci associated with risk of chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL). To identify additional CLL susceptibility loci, we conducted the largest meta-analysis, to date, including four GWAS totaling 3,100 CLL cases and 7,667 controls with genotype data. In the meta-analysis, we discovered ten independent SNPs in nine novel loci at 10q23.31 (ACTA2/FAS; P=1.22×10−14), 18q21.33 (BCL2; P=7.76×10−11), 11p15.5 (C11orf21; P=2.15×10−10), 4q25 (LEF1; P=4.24×10−10), 2q33.1 (CASP10/CASP8; P=2.50×10−9), 9p21.3 (CDKN2B-AS1; P=1.27×10−8), 18q21.32 (PMAIP1; P=2.51×10−8), 15q15.1 (BMF; P=2.71×10−10), and 2p22.2 (QPCT; P=1.68×10−8) as well as an independent signal at an established locus (2q13, ACOXL, P=2.08×10−18). We also found evidence for two additional promising loci that reached marginal genome-wide significance (P<2.0×10−7) at 8q22.3 (ODF1; P=5.40×10−8) and 5p15.33 (TERT; P=1.92×10−7). Although further studies are required, proximity of several of these loci to genes involved in apoptosis suggests a plausible underlying biological mechanism.
CLL is a B-cell malignancy with a strong familial component1 and an ~8.5-fold increased relative risk in first-degree relatives.2 Previous CLL GWAS have identified 13 loci that explain a portion of the familial risk,3–6 suggesting that additional loci of modest effects can be found using a larger discovery sample size.7
As part of a larger initiative in non-Hodgkin lymphoma (NHL) (called the NHL-GWAS), we genotyped 2,343 CLL cases and 2,854 controls of European descent from 22 studies using the Illumina OmniExpress Beadchip (see Online Methods and Supplementary Table 1). Of those 5,197 subjects, 94% passed rigorous quality control criteria (see Online Methods and Supplementary Table 2) and 549,934 SNPs successfully passed quality control criteria with a median call rate >98%. We also utilized genotype data previously generated on the Illumina Omni2.5 from an additional 3,536 controls and one case from three studies8 giving a total of 2,179 cases and 6,221 controls for the analysis of the NHL-GWAS (Supplementary Table 3).
In the NHL-GWAS (Stage 1) analysis, we observed an enrichment of SNPs with small P-values compared to the null distribution with a lambda of 1.026 in the Q-Q plot (Supplementary Figure 1). After exclusion of previously established loci, an excess of small P-values still remained suggesting additional novel loci were yet to be discovered. In our Stage 1 analyses, we observed SNPs from 10 unique loci (defined as separated by at least 500kb and linkage disequilibrium (LD) r2<0.05), which reached genome-wide significance (P<5×10−8), including eight established loci and two novel loci (Supplementary Figure 2).
We then performed a meta-analysis of the NHL-GWAS with three other independent CLL GWAS5,9 that had a combined total of 921 CLL cases and 1,446 controls (Stage 2, Supplementary Tables 1 and 3). Because these other CLL GWAS studies were conducted on different commercial SNP microarrays, we imputed common SNPs from the 1000 Genomes Project10 using IMPUTE211 (Online Methods, Supplementary Table 4). In the meta-analysis of stages 1 and 2 data, associations for all 13 established loci showed a consistent direction of effect with previously reported studies, and 10 loci achieved P<5×10−8 (Supplementary Table 5). However, two previously established loci, 15q25.2 and 19q13.3, were only nominally significant in the meta-analysis (P=0.03, and P=0.008, respectively), and no significant association was observed in stage 1 for the 15q25.2 locus (P=0.10). A suggestive locus on 18q21.1 that had not met genome-wide significance in prior studies12 was also nominally significant (P=5.06×10−4) herein. From the meta-analysis of stages 1–2, we identified 10 promising SNPs in the eight novel loci and one promising SNP in an established locus that we carried forward for a de novo replication in stage 3: this included an additional 392 cases and 4561 controls and in silico replication in an independent CLL GWAS with 396 cases and 311 controls (see Online Methods and Supplementary Tables 1, 3, and 4).
Seven of the 10 SNPs in novel loci reached genome-wide significance in the meta-analysis of all three stages: 10q23.31 (ACTA2/FAS; P=1.22×10−14), 18q21.33 (BCL2; P=2.66×10−12), 11p15.5 (C11orf21; P=2.15×10−10), 4q25 (LEF1; P=4.24×10−10), 2q33.1 (CASP10/CASP8; P=2.50×10−9), 9p21.3 (CDKN2B-AS1; P=1.27×10−8), and 18q21.32 (PMAIP1; P=2.51×10−8) (Table 1, Figure 1). Further, within the 18q21.33 locus, a second SNP (rs4987852) in low LD (r2=0.01) with rs4987855 and located only 372 bp away, also reached genome-wide significance (Table 1, P =7.76×10−11); this SNP was determined to be independent in conditional analyses (Pconditional =3.87×10−7, Table 2).
To explore these regions in greater detail and identify additional loci that we may have missed using just the genotyped SNPs in Stage 1, we imputed Stage 1 of our NHL-GWAS using the 1000 Genomes Project10 data (February 2012 release) and performed a meta-analysis of the results from stage 1 and stage 2. The most significant SNPs at three of our novel loci, 10q23.31 (rs2147420) 18q21.33 (rs4987856), and 4q25 (rs2003869), were highly correlated (r2 ≥0.95) with our strongest genotyped SNPs, rs4406737, rs4987885, and rs898518, respectively (Supplementary Table 6). Only modest correlation (r2 range: 0.18–0.58) was observed for the most significant imputed SNPs at 11p15.5 (rs2521269), 2q33.1 (rs11688943), and 9p21.3 (rs1359742) and our strongest genotyped SNPs in each of the respective regions. The most significant of the imputed SNPs at 18q21.32 (rs35748167) appeared to be independent of our strongest genotyped SNP (rs4368253, r2=0.003, Pconditional < 7.89×10−7 for both SNPs), suggesting a possible second, independent signal (Table 2).
Meta-analysis of our imputed scan data revealed two novel loci, 15q15.1 (BMF; P=2.71×10−10) and 2p22.2 (QPCT; P=1.68×10−8) (Table 1, Figure 1). In addition, although our genotyped SNP at 5p15.33 (TERT, rs10069690, P=1.92×10−7) (Supplementary Table 7) did not reach genome-wide significance, we did observe an imputed SNP in this region that reached genome-wide significance (rs7705526; P=3.75×10−8). Another promising locus was observed at 8q22.3 (ODF1; P=5.40×10−8) (Supplementary Table 7). Additional studies are needed to confirm these findings, particularly the signal on 5p15.33, which is already known to harbor risk variants for multiple cancers.13–20,
An examination of established loci revealed a new SNP in 2q13 (BCL2L11, rs13401811, P=6.09×10−17; Table 1, Figure 2) that was independent of the previously reported SNP. After conditioning on the established 2q13 SNP (rs17483466, r2=0.02), the new SNP rs13401811 remained strongly associated with CLL risk (Pconditional=1.60×10−12, Table 2). A putative second signal was observed at the established 2q37.3 locus (Supplementary Table 5, rs7578199, P =5.39×10−7) that was in low LD (r2=0.01) and independent of the previously reported rs757978 SNP (Pconditional=6.10×10−6, Table 2), although rs7578199 was not genome-wide significant. Another possible second signal was observed on 6p21.32 (Supplementary Table 5, HLA, rs9273363, P=2.24×10−10). Rs9273363 showed some evidence of conditional independence with the originally reported SNPs (r2≤0.25, Pconditional ≤3.50×10−9, Table 2); however, it may be part of a shared HLA haplotype; thus accurate HLA typing is needed to further clarify its level of independence. Finally, we observed a SNP at 15q21.3 (Supplementary Table 5, rs11636802, P=1.68×10−13) that had stronger statistical significance than that of the previously reported SNP, rs7169431 (P=1.72×10−05). Although only modestly correlated (r2=0.16), rs11636802 explained all of the risk associated with rs7169431 in a conditional analysis (Table 2) suggesting that this SNP may be a better marker for the locus.
Heritability analysis indicated that the ten independent SNPs in our novel loci together with the new independent SNP at 2q13 (Table 1) explain approximately 5% more of the familial risk in addition to ~12% for the established loci. When we explored the contribution of all common variants to the genetic heritability of CLL (using a method that estimates the variance explained by fitting all genotyped autosomal SNPs simultaneously21,22, Online Methods) 21,22 21,22 we estimate that common SNPs have the potential to explain up to ~46% of the familial risk, suggesting more common loci, likely of small effects, are still yet to be discovered. However, the analysis also implies that common SNPs probably do not explain all of the familial risk and other factors, such as uncommon SNPs with modest effects or rare highly penetrant variants, are likely to also play a role.
Five of the novel loci (10q23.31, 18q21.33, 2q33.1, 18q21.32, and 15q15.1) identified in this study as well as the new SNP at the established 2q13 locus are located in or near genes involved in apoptosis. Rs4406737 is located on 10q23.31 between the first and second exons of FAS, a member of the tumor necrosis factor receptor superfamily that has a crucial role in the initiation of the signaling cascade of the caspase family in apoptosis. Mutations in FAS leading to defective Fas-mediated apoptosis have been documented in inherited lymphoproliferative disorders associated with autoimmunity,23,24 and families with germline FAS mutations have a substantially increased risk of other lymphoma subtypes.25
The two newly identified SNPs at 18q21.33 (rs4987855 and rs4987852) map to the 3′-UTR of B-cell CLL/lymphoma 2 (BCL2), which encodes an essential outer mitochondrial membrane protein that blocks lymphocyte apoptosis. Constitutive expression of BCL2 through t(14:18) and other translocations is common in follicular lymphomas, but the translocation is also seen in CLL albeit rarely.26 Both SNPs are located within a narrow region of BCL2 where the majority of t(14;18) translocation breakpoints occur.27 rs4987855 is in linkage disequilibrium with a SNP (rs4987856, r2=1.0) that is located within 200bp of a putative microRNA binding site for mir-19528 and was found to be nominally correlated with BCL2 expression (Supplementary Table 8, P=0.02)29. Forced overexpression of BCL2 in mice leads to an increased incidence of B-cell lymphomas.30
The novel SNPs at 18q21.32 and 15q15.1 as well as the new SNP at the established 2q13 locus are located near Bcl-2 family member genes. Rs4368253 is located approximately 51kb downstream from phorbol-12-myristat-13-acetate-induced protein 1 (PMAIP1), which encodes the proapoptotic BCL2 protein, NOXA. Regulation of apoptosis through NOXA is critical for B-cell expansion after antigen triggering.31 Down-regulation of NOXA contributes to the persistence of CLL B-cells in the lymph node environment.32 Rs8024033 is located approximately 5.4kb upstream of Bcl-2 modifying factor (BMF), which encodes an apoptotic activator that binds to BCL2 proteins. BMF has been implicated in the survival of chronic lymphocytic leukemia cells33, and loss of BMF in mice leads to B-cell hyperplasia and an accelerated development of radiation-induced thymic lymphomas34. The new SNP (rs13401811) at 2q13, a locus previously implicated in risk of CLL3,35,36 and more generally B-cell non-Hodgkin lymphomas,37 is located approximately 262kb upstream of BCL2-like 11 (BCL2L11). BCL2L11 encodes a pro-apoptotic member of the BCL2 family, BIM, which plays a key role in the regulation of apoptosis in T- and B-cell homeostasis. Loss of BIM accelerates Myc-induced leukemia in mice,38 and this SNP has been previously reported to be nominally associated with CLL in a small candidate gene study.39
The novel 2q33.1 SNP (rs3769825) resides in intron 2 of caspase-8 (CASP8) and is in LD with a missense SNP (rs13006529, r2=0.71) in the nearby caspase-10 (CASP10) (Supplementary Table 9), both of which play a central role in cell apoptosis. SNPs within this region have been associated with breast cancer,40 esophageal cancer,41 and melanoma42 susceptibility. SNPs in CASP8/CASP10, including one in moderate LD with ours (rs11674246, r2=0.66), were previously nominally associated with CLL risk in smaller case-control studies.43,44
The remaining four novel loci (11p15.5, 4q25, 9p21.3 and 2p22.2) map to other biologically interesting genes. The 4q25 SNP, rs898518, is located between the fourth and fifth exons of lymphoid enhancer-binding factor 1 (LEF1), which encodes a transcription factor involved in the Wnt signaling pathway, an essential component for the normal homeostasis of hematopoietic stem cells.45 Aberrant protein expression of LEF1 has been observed in CLL cells as well as monoclonal B-cell lymphocytosis, suggesting that LEF1 plays an early role in CLL leukemogenesis.46 Rs1679013 maps to an inter-genic region on 9p21.3, roughly 200kb upstream fromCDKN2B-AS1, an antisense non-coding RNA implicated in the risk of acute lymphocytic leukemia.47 The 2p22.2 SNP (rs3770745) is located approximately 52kb upstream of protein kinase D3 (PRKD3), which interacts with transcriptional repressor, B-cell lymphoma 6 (BCL-6). Lastly, the 11p15.5 region contains many imprinted genes and has been implicated in Beckwith-Wiedemann syndrome,48 a disorder characterized by excessive growth and a high incidence of childhood tumors.49
In conclusion, our large GWAS of CLL identified ten SNPs in nine novel loci and one new independent SNP in a previously discovered locus. Together with the previously established loci, the cumulative set of SNPs correspond to an area-under-the-curve (AUC) of 0.73. Although further studies are required to fine-map the regions, the proximity of several of these loci to genes involved in apoptosis suggests a possible underlying mechanism of biological relevance. Our results further support a substantial contribution of common gene variants in the pathogenesis of CLL.
As part of a larger initiative, we conducted a genome-wide association study (GWAS) of CLL using cases and controls of European descent from 22 studies of non-Hodgkin lymphoma (NHL) (Supplementary Table 1), including nine prospective cohort studies, eight population-based case-control studies, and five clinic or hospital-based case-control studies. All studies obtained informed consent from their participants and approval from their respective Institutional Review Boards for this study. As described in Supplementary Table 1, cases were ascertained from cancer registries, clinics or hospitals, or through self-report verified by medical and pathology reports. The phenotype information for all NHL cases was reviewed centrally at the International Lymphoma Epidemiology Consortium (InterLymph) Data Coordinating Center and harmonized according to the hierarchical classification proposed by the InterLymph Pathology Working Group based on the World Health Organization (WHO) classification (2008).50,51
All CLL cases with sufficient DNA (n=2,343) and a subset of available controls frequency-matched by age and sex to cases (n=2,854) including 4% quality control duplicates were genotyped on the Illumina OmniExpress at the NCI Cancer Genomic Research Laboratory (CGR). Genotypes were called using Illumina GenomeStudio software, and quality control duplicates showed >99% concordance. Extensive quality control metrics were applied to the data. Monomorphic SNPs and SNPs with a call rate <93% were excluded. Samples with a call rate ≤93%, mean heterozygosity <0.25 or >0.33 based on the autosomal SNPs, or gender discordance (>5% heterozygosity on X chromosome for males and <20% heterozygosity on the X chromosome for females) were excluded. Unexpected duplicates (>99.9% concordance) and first-degree relatives based on identity by descent (IBD) sharing with Pi-hat>0.40 were removed. Ancestry was assessed using the GLU struct.admix module based on the method proposed by Pritchard et al,52 and participants with <80% European ancestry were excluded (Supplementary Figure 3). After exclusions, 2,178 (93%) cases and 2,685 (94%) controls remained (Supplementary Table 2). Genotype data previously generated on the Illumina Omni2.5 from additional 3,536 controls and 1 case from three of the studies (ATBC, CPSII, and PLCO) were also included,8 resulting in a total of 2,179 cases and 6,221 controls for the stage 1 analysis. Of these additional controls, 703 (~235 from each study) were selected to be representative of their cohort and cancer-free8. The remaining 2,823 controls were cancer-free controls from an unpublished study of prostate cancer in PLCO. SNPs with call rate <99%, with Hardy-Weinberg equilibrium P-value<1×10−6 or minor allele frequency <1% were excluded from analysis, leaving 549,934 SNPs for analysis. To evaluate population substructure, a principal components analysis (PCA) was performed using the Genotyping Library and Utilities (GLU), version 1.0, struct.pca module, which is similar to EIGENSTRAT.53 Plots of the first ten principal components are shown in Supplementary Figure 4. Association testing was conducted assuming a log-additive genetic model, adjusting for age, sex, and significant principal components. All data analysis and management was conducted using GLU.
Three independent CLL GWAS provided genotype data for a meta-analysis (Supplementary Table 1). In all three studies, subjects with a genotyping call rate <95%, duplicates, related individuals, and SNPs with a call rate <95% were removed prior to imputation (Supplementary Table 4). Imputation was conducted separately for each study using IMPUTE211 and a hybrid of the 1000 Genomes Project version 2 (February 2012 release) and Division of Cancer Epidemiology and Genetics (DCEG) European reference panels.8,10 SNPs were imputed for a total of 921 cases and 1446 controls. Association testing was conducted for each study using SNPTEST version 2, adjusting for age, sex, and significant principal components for GEC and UCSF2. No principal components were significant for the Utah study.
In stage 3, 10 SNPs in the most promising loci and one SNP from an established locus were taken forward for de novo replication in an additional 392 cases and 4561 controls from the NCI replication study (NCI Rep) and from the Utah/Sheffield Chronic Lymphocytic Leukemia study (Utah-Sheffield) (Supplementary Table 1). Additionally, these 10 SNPs were also taken forward in an in silico replication in 396 CLL cases and 311 controls from the International Cancer Genome Consortium (ICGC) (Supplementary Table 1). Genotyping for the NCI Rep study was conducted using custom TaqMan genotyping assays (Applied Biosystems) at the NCI Core Genotyping Resource and genotyping for the Utah-Sheffield study was conducted at the Core Research Facilities at the University of Utah. Blind duplicates (~5%) yielded 100% concordance. The ICGC study provided results for eight SNPs (or proxies) that were genotyped on the Affymetrix 6.0 SNP microarray (Supplementary Table 4). Association results for the NCI Rep and Utah-Sheffield studies were adjusted for age and sex, and results from the ICGC were adjusted for age, sex, and significant principal components. A comparison of the genotyping calls from the OmniExpress microarray and confirmatory TaqMan assays (n=384) yielded 99.9% concordance.
Meta-analyses were performed using the fixed effects inverse variance method based on the beta estimates and standard errors from each study. For all SNPs in Tables 1 and and2,2, no substantial heterogeneity was observed among studies in stage 1 or among studies in stages 1–3 combined after Bonferroni correction (Pheterogeneity ≥ 0.02 for all SNPs).
Using 1000 Genomes data, we identified SNPs with r2>0.7 with our lead SNP that were reported to be non-synonymous or nonsense variants. We utilized HaploReg54 which is a tool for exploring non-coding functional annotation using ENCODE data, to evaluate the genome surrounding our SNPs (Supplementary Table 9). In addition, we evaluated cis associations between all novel and promising SNPs discovered in this study and the expression of nearby genes in lymphoblastoid cell lines from subjects of European descent from three publically available datasets29,55,56 (Supplementary Table 8).
To evaluate the familial risk explained by the novel loci identified in this study, we estimated the contribution of each SNP to the heritability using the equation7, h2SNP=β22f(1−f), where β is the log-odds ratio per copy of the risk allele and f is the allele frequency, and then summed the contributions of all novel SNPs. Using the equation derived by Pharoah et al57 to estimate the total heritability from the sibling relative risk (RR=8.5 from Goldin et al2), we then calculated the proportion of familial risk explained by dividing the summed contributions of the novel SNPs by the total heritability.
To estimate the contribution of all common SNPs to familial risk, we used the method proposed by Yang et al21, (which was extended to dichotomous traits22 and implemented in the Genome-wide Complex Trait Analysis (GCTA) software.58 The genetic similarity matrix was estimated from our discovery scan using all genotyped autosomal SNPs with a minor allele frequency >0.01. We used restricted maximum likelihood (REML), the default option for GCTA, to fit the appropriate variance components model that included the top 10 eigenvectors as covariates. The final estimate of heritability on the underlying liability scale assumed that the lifetime risk of CLL was 0.005. From this estimate, we calculated the proportion of familial risk explained based on a familial relative risk of 8.5. Details of fitting the variance components model and transforming from the observed to liability scale have been previously documented.22
To identify recombination hotspots in the region we used SequenceLDhot59, a program that uses the approximate marginal likelihood method60 and calculates likelihood ratio statistics at a set of possible hotspots. We tested five unique sets of 100 control samples. PHASE v2.1 program was used to calculate background recombination rates61,62 and LD heatmap was visualized in r2 using snp.plotter program.63
We thank C. Allmer, E. Angelucci, A. Bigelow, I. Brock, K. Butterbach, A. Chabrier, D. Chan-Lam, J.M. Conners, D. Connley, M. Cornelis, K. Corsano, C. Dalley, D. Cox, H. Cramp, R. Cutting, H. Dykes, L. Ershler, A. Gabbas, R.P. Gallagher, R.D. Gascoyne, P. Hui, L. Irish, L. Jacobus, S. Kaul, J. Lunde, M. McAdams, R. Montalvan, M. Rais, T. Rattle, L. Rigacci, K. Snyder, G. Specchia, M. Stagner, P. Taylor, G. Thomas, C. Tornow, G. Wood, M. Yang, M. Zucca. The overall GWAS project was supported by the intramural program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, U.S. National Institutes of Health. A full list of acknowledgements is provided in the Supplementary Note.
AUTHORS’ CONTRIBUTIONSS.I.B., C.F.S., N.J.C., A.N., W.C., S.S.W., L.R.T., A.R.B.W., P.H., M.P.P., B.M.B., B.K.A., P.C., Y.Z., G.S., A.Z.J., C.L., K.E.S., J.M., P.V., J.J.S., A.K., S. S., H.H., J.R.C., S.J.C., N.R. and S.L.S. organized and designed the study. C.F.S., N.J.C., B.J.,L.B., J.Y., A.H., L.C., P.M.B., E.A.H., J.M.C., J.R.C., S.J.C. and S.L.S. conducted and supervised the genotyping of samples. S.I.B., C.F.S., V.J., N.J.C., Z.W., N.C., C.C.C., M.Y., K.B.J., L.L., J.S., J.P., J.R.C., L.C., S.J.C., N.R. and S.L.S. contributed to the design and execution of statistical analysis. S.I.B., C.F.S., V.J., N.J.C., A.N., Z.W., W.C., A.M., R.S.K., N.C., C.C.C., M.Y., C.L., H.H., J.R.C., S.J.C., N.R. and S.L.S. wrote the first draft of the manuscript. S.I.B., C.F.S., V.J., N.J.C., A.N., W.C., A.M., S.S.W., R.S.K., Q.L., L.R.T., A.R.B.W., P.H., M.P.P., B.M.B., B.K.A., P.C., Y.Z., G.S., A.Z.J., T.G.C., T.D.S., A.J.N., N.E.K., M.L., A.H.W., K.E.S., H.O.A., M.M., B.G., E.T.C., M.G., K.C., L.A.C.A., B.J., W.R.D., B.K.L., G.J.W., L.C., P.M.B., J.R., E.A.H., M.T.S., R.D.J., L.F.T., S.D.S., Y.B., N.B., P.B., P.B., L.F., M.M., J.M., A.S., K.G.R., S.J.A., C.M.V., L.R.G., S.S.S., M.C.L., L.G.S., J.F.L., J.M.C., J.B.W., V.A.M., N.E.C., A.N., M.S.L., A.J.D.R., L.M.M., R.K.S., E.R., P.V., R.K., D.T., G.M., E.W., M.D.C., R.C.H.V., R.C.T., G.G.G., D.A., J.V., S.W., J.C., T.Z., T.R.H., K.O., A.Z., R.J.K., J.J.S., K.A.B., F.L., E.G., P.K., A.K., J.T., C.M.V., M.G.E., G.M.F., L.M., L.L., J.S, S.C., J.F.F., K.E.N., A.C., J.S., J.W., A.C., C.L.O., S.B., I.S., D.M., E.C., H.H., J.R.C., N.R. and S.L.S. conducted the epidemiological studies and contributed samples to the GWAS and/or follow-up genotyping. All authors contributed to the writing of the manuscript.
The authors declare no competing financial interests