|Home | About | Journals | Submit | Contact Us | Français|
Recent genomic profiling of childhood acute lymphoblastic leukemia (ALL) identified a novel high-risk subtype with a gene expression signature resembling Philadelphia chromosome-positive ALL and a poor prognosis (Ph-like ALL). However, the role of inherited genetic variation in Ph-like ALL pathogenesis remains unknown. In a genome-wide association study (GWAS) of 511 ALL cases and 6,661 non-ALL controls, we identified a single susceptibility locus for Ph-like ALL (GATA3, rs3824662, P=2.17×10−14, odds ratio [OR]=3.85, for Ph-like ALL vs. non-ALL; P=1.05×10−8, OR=3.25, for Ph-like ALL vs. non-Ph-like ALL) that was independently validated. The rs3824662 risk allele was associated with somatic lesions underlying Ph-like ALL (i.e., CRLF2 rearrangement, JAK mutation, and IKZF1 deletion) and directly influenced GATA3 transcription. Finally, GATA3 SNP genotype was also associated with early treatment response and the risk of ALL relapse. Our results provide insights into interactions between host and tumor genomes and their importance in ALL pathogenesis and prognosis.
Progressive intensification and risk-adapted chemotherapy have improved the 5-year survival rate of childhood ALL to over 85% in most developed countries1. However, prognosis remains poor for approximately 20% of patients with high-risk features (e.g., older age and higher leukocyte count at diagnosis, Philadelphia chromosome-positive [Ph+] ALL)2-5 .
Recent genomic profiling studies have revealed the remarkable heterogeneity of childhood ALL with more granular classification of molecular subtypes. Up to 15% of childhood B-lineage ALL cases exhibit a gene expression signature similar to that of Ph+ ALL6-9. Defined by this common expression profile, the “Ph-like” ALL subtype has a range of structural genetic alterations in the tumor genome that activate lymphoid development, cytokine receptor, and kinase signaling pathways. Ph-like ALL commonly harbors somatic IKZF1 deletion or mutation6,9. Up to 50% of Ph-like ALL cases carry CRLF2 rearrangements, with concurrent JAK mutations in approximately half of CRLF2-related cases8,10. Ph-like ALL cases without CRLF2 alterations harbor a range of genomic lesions targeting cytokine receptors and tyrosine kinases7. Importantly, Ph-like ALL is associated with a high risk of relapse 6,8,11.
GWAS have identified germline single nucleotide polymorphisms (SNPs) in ARID5B, IKZF1, CEBPE, PIP4K2A, and CDKN2A/CDKN2B that strongly influence susceptibility to childhood ALL12-15. In fact, children carrying the ARID5B variants not only are more likely to develop ALL in general, but are at a particularly high risk of having hyperdiploid ALL14,16,17, implying interactions between inherited and acquired genetic variations during leukemogenesis. Similarly in myeloproliferative neoplasms, germline variation at the JAK2 locus was linked to somatic JAK2V617F mutation18-20. Together, these observations indicate that both germline and somatic genetic variations play critical roles in tumor pathogenesis.
To this end, we conducted a GWAS of Ph-like ALL to identify germline genetic variants related to susceptibility to this ALL subtype, and to evaluate their association with somatic lesions underlying Ph-like ALL and with the risk of relapse.
In the discovery GWAS, we compared genotype frequency at 718,890 SNPs between 75 children with Ph-like ALL from the Children’s Oncology Group (COG) AALL0232 cohort and 6,661 non-ALL controls (Supplementary Fig.1). After adjusting for genetic ancestry, two SNPs at 10p14 within the GATA3 gene reached genome-wide significance: rs3824662 (P=2.17×10−14, OR=3.85 [95%CI, 2.71 to 5.47]) and rs3781093 (P=4.94×10−12, OR=3.45 [2.42 to 4.93], Table 1 and Fig. 1). These two SNPs were in strong linkage disequilibrium (LD, r2=0.94, D’=1 in HapMap CEU, Supplementary Fig. 2), representing a single susceptibility locus. The A allele at rs3824662 and the C allele at rs3781093 were over-represented in Ph-like ALL, conferring increased disease risk across ethnicity (Table 1 and Supplementary Fig. 3). We next performed a second GWAS comparing children in the COG AALL0232 cohort who had the Ph-like expression profile (N=75) with those who did not have the Ph-like profile (“non-Ph-like”, N=436). After adjusting for genetic ancestry, the same GATA3 SNPs, rs3824662 and rs3781093, exhibited the strongest association across the genome (P=1.05×10−8, OR=3.25 [2.16 to 4.89], and P=2.62×10−7, OR=2.89 [1.92 to 4.34], respectively, Table 1 and Supplementary Figs. 3 and 4). Imputation of genotypes at 37,493 additional SNPs at this locus (chr10: 60,523 to 10,060,447) did not reveal any variants with a stronger association with Ph-like ALL than the original GWAS hits (Supplementary Fig. 5).
To validate the association of GATA3 SNPs with Ph-like ALL, we then genotyped rs3824662 and rs3781093 in 171 children with B-ALL enrolled in the COG P9906 study and in an independent cohort of 5,755 non-ALL controls. In this replication analysis, risk alleles at both GATA3 SNPs were consistently over-represented in Ph-like ALL (N=32) compared to non-ALL controls: rs3824662 (P=3.69×10−5, OR=3.14, [1.18 to 5.44]), and rs3781093 (P=0.0001, OR=2.95 [1.68 to 5.16]), or compared to non-Ph-like ALL (N=139): rs3824662 (P=0.01, OR=2.16 [1.18 to 3.97]) and rs3781093 (P=0.004, OR=2.55 [1.33 to 4.88], Table 1).
To explore the functions of these germline GATA3 variants, we first examined the relationships between rs3824662 SNP genotype and GATA3 mRNA expression. In lymphoblastoid cell lines, rs3824662 A allele was associated with significantly increased GATA3 mRNA level (HapMap YRI, N=56, P=0.034, Fig. 2A; CEU and MEX, Supplementary Fig. 6). Consistently, the A allele was also linked to higher levels of DNase hypersensitivity at this locus (HapMap YRI, N=67, P=9.5×10−8, Fig. 2B), indicating its influence on local chromatin accessibility and transcriptional activity. Association of germline GATA3 SNP genotype and GATA3 expression was confirmed in ALL blasts in both COG AALL0232 and COG P9906 cohorts (N=511, P=9.2×10−8 and N=173, P=3.6×10−6, respectively, Supplementary Fig. 7). Interestingly, ectopic overexpression of GATA3 in ALL cell lines consistently led to global changes in gene expression pattern, with a highly significant enrichment of genes within the Ph-like ALL expression signature (UOCB1 cell line, P=0.0004; Nalm6 cell line, P=0.001, Supplementary Fig. 8).
Recurrent genomic lesions targeting lymphoid development, cytokine receptor, and tyrosine kinase signaling are a hallmark of Ph-like ALL. In both COG AALL0232 and COG P9906, the GATA3 SNP rs3824662 was associated with CRLF2 lesion, JAK mutation, and IKZF1 deletion, which was also validated in a third cohort of 781 children enrolled on the COG P9905 protocol (Table 2). The A risk allele at rs3824662 was further enriched among patients with multiple “Ph-like ALL related” somatic lesions. In COG AALL0232, the frequency of the rs3824662 A allele was highest (73%) in ALL cases with CRLF2 lesion, JAK mutation, and IKZF1 deletion simultaneously, followed by patients with one or two of lesions (40%), and lowest (29%) among patients without any of the three lesions (P=6.09×10−5, Fig. 3). This correlation was also validated in the COG P9906 cohort (P=0.0005) and in the COG P9905 cohort (P=7.6×10−5, Fig. 3). Within Ph-like ALL, there was a trend that rs3824662 A allele was over-represented in cases with CRLF2 lesions (P=0.05, Supplementary Fig. 9). However, the association of rs3824662 with Ph-likeness remained significant within ALL cases that were negative for CRLF2 alterations (P=8.8×10−5, Supplementary Fig. 9), JAK mutation (P=2.1×10−5), or IKZF1 deletion (P=0.001), and in a multivariate model after adjusting for all three lesions (P=0.001).
Given the poor prognosis of Ph-like ALL, we next examined the relationships between GATA3 SNP genotypes and ALL relapse. In the COG P9906 cohort, the GATA3 allele linked to Ph-like ALL was also associated with a higher risk of relapse after adjusting for genetic ancestry (rs3824662, N=215, P=0.002, Fig. 4A). While rs3824662 was strongly related to early treatment response (i.e., minimal residual disease [MRD] at the end of induction therapy, N=193, P=9.8×10−5, Fig. 4B), it remained prognostic even within patients who were MRD negative (N=132, P=0.028). To further define the prognostic value of the GATA3 SNP, we tested the association of rs3824662 with relapse in the COG P9905 protocol. In this cohort, genotype at rs3824662 was significantly associated with relapse, with each copy of A allele linked to 1.43-fold increase (95% CI, 1.10 to 1.86) in the risk of disease recurrence (N=781, P=0.007, Fig. 4C). Also, the A allele at rs3824662 was associated with a higher MRD level at the end of induction therapy (N=710, P=0.039, Fig. 4D), and there was a trend for it to be linked to higher relapse risk within patients negative for MRD in the COG P9905 cohort (N=566, P=0.094).
While association of rs3824662 with Ph-like ALL was consistent across ethnicity (Supplementary Fig. 3), the risk allele frequency varied significantly among different ethnic groups. Among worldwide populations, the rs3824662 allele related to Ph-like ALL and relapse was markedly more common in Guatemalans with high Native American (NA) genetic ancestry and US Hispanics than individuals of European descent (52%, 40%, and 14%, respectively, Supplementary Fig. 10), consistent with the racial disparities in ALL treatment outcomes21.
The majority of children with ALL can be cured with individualized combination chemotherapy, and treatment outcome continues to improve as new molecular prognostic markers are incorporated to achieve more precise risk classification2. Until recently, little is known about why a child develops a specific subtype of ALL in the first place and whether inherited genetic variations that predispose to a subtype also influence prognosis12,14,16. Therefore, the goal of this GWAS was to discover the genetic basis of the susceptibility to Ph-like ALL and to better understand the biology of this important high-risk subtype. The discovery of GATA3 variants associated with Ph-like ALL and related genomic lesions points to potentially novel mechanisms of ALL etiology and also previously unrecognized function of GATA3 in leukemogenesis. GATA3 belongs to a group of transcription factors characterized by 2 highly-conserved zinc fingers that mediate binding to the (A/G)GATA(A/G) sequence and protein-protein interactions22. Stage-specific transcription of GATA3 has been extensively characterized during T cell development and differentiation23. GATA3 is critical for the generation of early T-lineage progenitor cells24 and somatic loss-of-function mutations in GATA3 are enriched in early T-cell precursor ALL25. Inherited genetic variation in GATA3 has also been linked to the susceptibility to Hodgkin lymphoma26, although they are not related to rs3824662 or rs3781093 (r2<0.1 in HapMap CEU). Other members of GATA family are critical for different stages of hematopoietic development, and germline or somatic mutations in these genes can lead to a variety of hematologic disorders27,28.
In strong LD in European, Hispanic, and Asian populations (r2=0.94, 0.90, and 0.97 in HapMap CEU, MEX, and CHB/JPT, respectively), rs3824662 and rs3781093 both achieved genome-wide significance in the discovery GWAS with similar association with Ph-like ALL (Supplementary Figs. 2, 3, and 11). However, rs3781093 became non-significant in multivariate analysis conditioning on rs3824662 (Supplementary Table 1). In African subjects in which these 2 SNPs are poorly linked (r2=0.006 in the HapMap YRI), the A allele at rs3824662 remained over-represented in Ph-like ALL whereas rs3781093 no longer showed any evidence of association with Ph-like ALL (Supplementary Fig. 11). Also, rs3781093 was not associated with GATA3 expression nor with local DNase hypersensitivity in HapMap YRI samples, whereas consistent evidence points to rs3824662 as a potential expression quantitative trait locus across ancestry in HapMap populations (Supplementary Figs. 6 and 12). In fact, rs3842662 was the top SNP influencing DNase hypersensitivity at this locus in the YRI population (Supplementary Fig. 12). Further examination of the ENCODE data suggested possible enhancer activities within the region encompassing rs3824662 in lymphoblastoid cell lines, based on histone methylation marks and PU.1 and P300 binding (Supplementary Fig. 13). Although functional studies are warranted to determine the exact causal variant(s) at this locus and molecular mechanisms by which GATA3 variants influence Ph-like ALL leukemogenesis, these lines of evidence consistently point to rs3824662 as a potentially functional variation with possibly direct contribution to the GWAS signal.
The GATA3 allele linked to Ph-like ALL was also associated with an increased risk of relapse in the COG P9906 cohort, which was validated in the COG P9905 cohort (Fig. 4). However, in the COG P9906 cohort, the GATA3 SNP was not prognostic after adjusting for Ph-likeness, arguing that the association with relapse might be largely driven by its relationship with Ph-like ALL. GATA3 SNP genotype was also related to CRLF2 rearrangement, JAK mutation, and IKZF1 deletion, but remained associated with Ph-like ALL after adjusting for these genomic lesions. To explore this further, we attempted to build a classification model for Ph-like ALL on the basis of GATA3 germline SNPs, somatic lesions in CRLF2, JAK, and IKZF1, and genetic ancestry in 682 patients in COG AALL0232 and COG P9906, using classification and regression tree methods (CART29). In this analysis (Supplementary Fig. 14), CRLF2, IKZF1, rs3824662, and NA genetic ancestry were independent predictors of Ph-like ALL, and rs3824662 was associated with Ph-likeness regardless of CRLF2 status. Interestingly, NA genetic ancestry remained significant after stratifying on the GATA3 SNP, indicative of additional ancestry-related germline variants that are associated with Ph-like ALL. There was also significant over-representation of the rs3824662 risk alleles in non-Ph-like ALL compared with non-ALL control (P=0.0008 and 0.00035 in the discovery GWAS and replication cohorts, respectively), suggesting effects of this variant on ALL susceptibility in general.
In conclusion, our genome-wide germline SNP analysis identified genetic variations in the GATA3 gene that influence susceptibility to Ph-like ALL and the risk of relapse. These findings highlight the intricate interactions between host and tumor genomes and their importance in the pathogenesis and prognosis of cancer in general.
The ALL cases investigated comprised children with newly-diagnosed B-precursor ALL who were treated on the Children’s Oncology Group (COG) trials AALL0232, P990532 and P990610 (Supplementary Table 2), and non-ALL controls included 12,416 subjects14,33-35. The number of subjects included in each analysis was described in Supplementary Figs 15, 16, and 17, and in the text as appropriate. This study was approved by the Institutional Review Boards with proper informed consent.
Germline genomic DNA was extracted from peripheral blood or bone marrow samples obtained during clinical remission for children with ALL. Genotyping was done for COG AALL0232 and COG P9905 cohorts and for non-ALL controls using the Affymetrix Human SNP Array 6.0. Quality control was performed for samples and SNPs according to call rate and minor allele frequency (Supplementary Fig. 1). Theta (allele signal intensity) plots were constructed using Affymetrix Genotyping Console for rs3824662 and rs3781093 (Supplementary Fig. 18). GATA3 SNPs (rs3824662 and rs3781093) were genotyped in the COG P9906 cohort and in the Guatemalan samples by Sanger sequencing (Supplementary Table 3).
Ph-like ALL was identified in the COG ALL0232 cohort and in the COG P9906 cohort on the basis of unsupervised hierarchical clustering analysis of global gene expression profile, as described previously7,9,37 .
The discovery GWAS of Ph-like ALL comprised 511 ALL cases enrolled on the COG AALL0232 protocol and 6,661 non-ALL controls from the dbGaP MESA dataset. We performed two association tests to identify germline SNPs related to Ph-like ALL: we compared the genotype frequency at each SNP 1) in Ph-like ALL (N=75) vs. non-ALL controls (N=6,661) and 2) in Ph-like ALL (N=75) vs. ALL cases without Ph-like profile (“non-Ph-like ALL”, N=436). Association was evaluated with logistic regression under an additive model with genetic ancestry as covariates. Population stratification was assessed by the construction of a quantile-quantile (Q-Q) plot (Supplementary Fig. 19). SNPs that reached P≤5×10−8 in the discovery GWAS were tested in an independent replication cohort: 171 ALL cases from the COG P9906 protocol and 5,755 non-ALL controls. Association with Ph-like ALL was evaluated by logistic regression with genetic ancestries as covariates by comparing 1) Ph-like ALL (N=32) vs. non-ALL controls (N=5,755) and 2) Ph-like ALL (N=32) vs. non-Ph-like ALL (N=139). Independently, the Ph-like phenotype was also identified by the recognition of outliers by sampling ends (ROSE) algorithm (Supplementary Fig. 20). GATA3 SNPs (rs3824662 and rs3781093) and expression were also evaluated in a separate cohort of patients with Ph+ ALL (Supplementary Note and Supplementary Fig. 21).
Functional characterization of GATA3 SNPs was performed by examining the association of SNP genotype with GATA3 expression, local DNase hypersensitivity, and global gene expression in ALL (Supplementary Note, Supplementary Table 4, Fig. 2, Supplementary Figs. 6,7, 12, 22, and 23), partly using previously published data sets30,31,38. Associations of GATA3 SNPs with CRLF2, JAK, and IKZF1 somatic lesions were evaluated in the COG AALL0232, COG P9906, and COG P9905 cohorts, and with relapse in the COG P9906 and COG P9905 cohorts (Supplementary Note). Germline SNPs within the JAK2 gene were tested for association with somatic JAK2 mutation in ALL (Supplementary Table 5). R 2.15.1 statistical software was used for all analyses unless indicated otherwise (Supplementary Note). Statistical tests were chosen as appropriate and according to the phenotype distribution (e.g., normally or binomially distributed for continuous or categorical variables, respectively).
We thank the patients and parents who participated in the COG protocols included in this study, the clinicians and research staff at COG institutions and J. Pullen (University of Mississippi at Jackson) for assistance in classification of patients with ALL. Genome-wide genotyping of COG P9905 samples was performed by the Center for Molecular Medicine with the generous financial support from the Jeffrey Pride Foundation and the National Childhood Cancer Foundation. V.P.A is supported by the Spanish Ministry of Education Fellowship Grant and by the St. Jude Children’s Research Hospital Academic Programs Special Fellowship. J.J.Y. is supported by the American Society of Hematology Scholar Award, Alex Lemonade Stand Foundation for Childhood Cancer Young Investigator Grant, and by the Order of St. Francis Foundation. K.G.R. is supported by a National Health and Medical Research Council (Australia) Overseas Training Fellowship and a Haematology Society of Australia and New Zealand Novartis New Investigator Scholarship. C.G.M. is a Pew Scholar in the Biomedical Sciences and a St. Baldrick’s Scholar. We thank M. Shriver (Pennsylvania State University) for sharing SNP genotype data of the Native American references, Jonathan Pritchard and Jacob Degner (University of Chicago) for sharing DNase hypersensitivity data of HapMap Yoruba cell lines, Raul C. Ribeiro (St. Jude Children’s Research Hospital) and Pedro De Alarcon (University of Illinois College of Medicine at Peoria) for coordinating collaborations in Guatemala. This work was supported by the National Institutes of Health (grant numbers CA156449, CA21765, CA36401, CA98543, CA114766, CA98413, CA140729 and GM92666), in part by the intramural Program of the National Cancer Institute, and by the American Lebanese Syrian Associated Charities (ALSAC). The study sponsors were not directly involved in the design of the study, the collection, analysis, and interpretation of the data, the writing of the manuscript, or the decision to submit the manuscript.
dbGaP database: http://www.ncbi.nlm.nih.gov/gap
Zoom Locus: http://csg.sph.umich.edu/locuszoom
Epigenome Browser: http://epigenomegateway.wustl.edu/browser
1000 Genomes: http://www.1000genomes.org
Database accession numbers
NCBI dbGAP: phs000638, phs000209, phs000021, phs000017
NCBI GEO: GSE11877, GSE7851, GSE5859
NCI caArray: EXP-578
Jointly supervised research: J.J.Y; Conceived and designed the experiments: V.P.A, S.P.H., C.L.W., C.G.M and J.J.Y.; Performed the experiments: V.P.A, K.G.R, R.C.H., J.G.F., S.E., I-M.C., G.N., E.G.B., D.G.T. and C.N.V.; Performed statistical analysis: V.P.A, J.J.Y., R.C.H., W.Y., C.C., D.P., Y.F., M.D., C.S. and G.N.; Analyzed the data: V.P.A, K.G.R., R.C.H., W.Y., H.X., S.E., J.Y.S.L., I-M.C., Y.F., M.J.B., C.S., G.N., E.G.B., D.T., F.A.K., C.N.V., M-L.L., M.D., D.B., C-H.P., W.E.E., M.V.R., S.P.H., C.L.W. and C.G.M.; Contributed to reagents/materials/analysis tools: R.C.H., J.G.F., J.Y.S.L.,Y.F., E.G.B., F.A.K., C.N.V., N.J.W., B.M.C., E.R., B.W., F.Y., W.L.C., E.L., W.P.B., M-L.L., M.D., S.P.H., C.L.W. and C.G.M.; Wrote the paper: V.P.A and J.J.Y.
Competing Interest and Financial Disclosures
The authors declare do not have any relevant competing interest and full disclosures are provided in the