|Home | About | Journals | Submit | Contact Us | Français|
Polycythemia vera, essential thrombocythemia and primary myelofibrosis are myeloproliferative neoplasms (MPN) characterized by multilineage clonal hematopoiesis1–5. Given that the identical somatic activating mutation in the JAK2 tyrosine kinase gene (JAK2V617F) is observed in most individuals with polycythemia vera, essential thrombocythemia and primary myelofibrosis6–10, there likely are additional genetic events that contribute to the pathogenesis of these phenotypically distinct disorders. Moreover, family members of individuals with MPN are at higher risk for the development of MPN, consistent with the existence of MPN predisposition loci11. We hypothesized that germline variation contributes to MPN predisposition and phenotypic pleiotropy. Genome-wide analysis identified an allele in the JAK2 locus (rs10974944) that predisposes to the development of JAK2V617F-positive MPN, as well as three previously unknown MPN modifier loci. We found that JAK2V617F is preferentially acquired in cis with the predisposition allele. These data suggest that germline variation is an important contributor to MPN phenotype and predisposition.
The presence of the identical JAK2V617F allele in polycythemia vera, essential thrombocythemia and primary myelofibrosis suggests that there are additional genetic and epigenetic events that contribute to MPN pathogenesis. To date, most studies have focused on the identification of additional somatic events that are stochastically acquired by the MPN clone, whereas few studies have addressed the role of germline genetic variation in MPN pathogenesis. A recent analysis of 32 candidate SNPs in MPN tumor samples identified three SNPs in JAK2 that were enriched in polycythemia vera12; although these results suggest there are host genetic variants that influence MPN phenotype, the findings are likely influenced by the high rate of somatic isodisomy at the JAK2 locus in polycythemia vera. It has also been observed that there is familial clustering in MPN cases13–15, suggesting that there are inherited MPN predisposition loci. We therefore used genome-wide SNP array data to identify MPN modifier and predisposition loci.
We first analyzed Affymetrix StyI SNP array data derived from granulocyte DNA from 181 subjects with polycythemia vera or essential thrombocythemia to identify loci significantly enriched in these conditions. We identified four loci with P values < 10–5, including SNPs at three loci not previously implicated in either disease (Table 1). We confirmed that the minor allele at rs12500918 is more common in polycythemia vera than in essential thrombo- cythemia in a larger set of MPN samples (P = 0.01). Unbiased genome-wide analysis also suggested that germline variation at the JAK2 locus (rs10974944) varied according to MPN phenotype. However, given acquired isodisomy leading to homozygosity for JAK2V617F is more common in polycythemia vera than in essential thrombocythemia, we were concerned that individuals with poly-cythemia vera might have a spuriously high prevalence of the minor allele (G) owing to somatic alterations at the JAK2 locus6–8,10,16. To exclude the effects of somatic variation at the JAK2 locus, we analyzed germline DNA samples from 284 subjects with polycythe-mia vera or essential thrombocythemia for JAK2 SNP rs10974944 and confirmed that the minor allele (G) was significantly more common in polycythemia vera than in essential thrombocythemia (P = 0.01).
We also noted that the frequency of the GG/CG genotypes at rs10974944 was much higher in MPN cases than in control populations, suggesting that rs10974944 is a MPN predisposition allele. Analysis of germline DNA from 324 subjects with polycythemia vera, essential thrombocythemia and primary myelofibrosis found that the GG/CG genotypes at rs10974944 were more common in MPN cases compared to WTCCC controls (OR = 3.1, P = 4.1 × 10–20; Table 2), consistent with the G allele functioning as a dominant MPN predisposition allele. These data indicate that germline JAK2 variation more strongly influences MPN predisposition than MPN phenotype. In contrast, allelic variation at rs12500918 is associated with MPN phenotype (P = 0.01) but not with MPN predisposition (P = 0.24) (Supplementary Table 1 online).
We then carried out principal component analysis of SNP array data derived from MPN samples and WTCCC controls to assess whether population substructure or ancestry might explain differences in rs10974944 allele frequency. We selected case and control individuals who cluster on the first two principal components consistent with ancestry from Northern and Western Europe (Fig. 1). We observed a similar relationship between genotypic variation at JAK2 and MPN predisposition in these matched cases and controls, suggesting that differences in the frequency of rs10974944 genotypes are not due to differences in population substructure. Moreover, the distribution of rs10974944 genotypes did not vary in eight different ancestry populations (Supplementary Fig. 1a online), or in myelodysplasia (P = 0.23) or acute myeloid leukemia (P = 0.10) samples compared to control samples (Supplementary Fig. 1b).
Given the high frequency of somatic mutations at JAK2 in MPN6–10,17, we hypothesized that rs10974944 germline variation might specifically predispose to JAK2-mutated MPN. We assessed rs10974944 genotype in 321 MPN cases that had been genotyped for JAK2V617F and for JAK2 exon 12 mutations17 and found that germline allelic variation at rs10974944 was strongly associated with JAK2V617F-positive MPN (OR = 4.0, P = 7.7 × 10–22) and much less strongly associated with JAK2V617F-negative MPN (OR = 1.6, P = 0.06) (Table 3). Allelic variation at rs10974944 was strongly associated with predisposition to polycythemia vera (OR = 4.3, P = 1.0 × 10–16) and essential thrombocythemia (OR = 2.1, P = 6.7 × 10–5) (Supplementary Table 2 online). The higher odds ratio associated with allelic variation at rs10974944 in polycythemia vera is in part due to the higher incidence of JAK2 mutations in this condition (95%) compared to essential thrombocythemia (65%), as shown by the higher association between rs10974944 genotype and JAK2V617F-positive essential thrombocythemia (OR = 2.8, P = 3.0 × 10–5).
Analysis of the haplotype structure of the JAK2 locus in CEPH founders (Fig. 2a) shows that rs10974944 and JAK2V617 are contained in a common haplotype block distinct from the promoter and 5′ exons of JAK2. This led us to hypothesize that the G allele at rs10974944 might predispose to somatic JAK2V617F mutations on the same strand. We investigated 42 subjects heterozygous for rs10974944 in their germline and a somatic homozygous JAK2V617F mutant clone (JAK2V617F allele burden > 50%) and found in 38 of 42 cases somatic conversion to a homozygous GG genotype at rs10974944 (Supplementary Fig. 2 online). Using allele-specific PCR on 45 subjects heterozygous for rs10974944 in their germline and hetero-zygous for JAK2V617F in their granulocyte DNA, we found that in 38 cases JAK2V617F was acquired in cis with the G allele at rs10974944 (P = 2.8 × 10–14) (Fig. 2b). These data suggest that the G allele at rs10974944 favors the in cis acquisition of JAK2V617F.
Given these findings, we then wished to see whether JAK2 would be clearly detected as an MPN susceptibility locus in the context of testing SNPs genome-wide for association with MPN risk. To do so, we combined all unambiguous SNPs genotyped in MPN samples and WTCCC controls matched by principal component analysis with our data on rs10974944 and asked whether allele frequencies differed significantly between the two groups at any of the SNPs (Fig. 3). Four SNPs with P values < 10–7 were significantly associated with MPN risk after correcting for residual population stratification and multiple testing. One of these four SNPs is rs10974944, further supporting the hypothesis that this represents a true MPN risk SNP and that our findings are not due to population stratification.
The observation that a JAK2 germline haplotype is markedly enriched in MPN cases compared to controls suggests that germline variation at the JAK2 locus is an important contributor to MPN predisposition. Although genome-wide association studies have identified predisposition loci for a spectrum of human diseases, the loci identified in these studies have a modest effect on disease risk for the individual despite a larger population attributable risk. For example, a recent genome-wide association study identified six chronic lympho-cytic leukemia (CLL) predisposition loci, each of which had an odds ratio less than 1.6, but which accounted for between 12% and 39% of CLL attributable to a hereditary predisposition18. The GG/CC genotype at JAK2 rs10974944 contributes significantly to the excess familial risk of MPN (OR = 2.80) with a population attributable risk of 46.0%. The high population attributable risks for these predisposi-tion alleles reflect gene–gene and gene–environment interactions, and the familial relative risks due to these variants are much smaller, for example, less than 3% for CLL19. Nonetheless, the range of genotypic relative risks reported here for JAK2-positive MPN are among the highest described to date using a genome-wide association approach20.
We found that somatic JAK2 mutations were most commonly acquired in cis with the JAK2 predisposition haplotype, suggesting a direct interaction between haplotype-specific genetic variation in the JAK2 locus and secondary acquisition of somatic mutations on the same strand. We did not observe genotype-specific differences inJAK2 expression (Supplementary Fig. 3 online), nor did we observe genotype-specific nonsynonymous sequence alterations or alterations in the 3′ UTR. These data suggest that rs10974944 favors the acquisition of JAK2 mutations in cis by an unidentified mechanism. There may be haplotype-specific variation in a regulatory motif or genotype-specific splicing that increases the selective advantage of the JAK2V617F allele when it is acquired on the predisposition haplotype. Alternatively, it is possible that genotype-specific genomic variation in the JAK2 haplotype block increases the somatic mutation rate at this locus, as has been observed for germline variants in the APC gene present in the Ashkenazi Jewish population that increase the rate of somatic APC mutations21. Although alternate activating JAK2 muta-tions have been identified in acute leukemia samples and cell lines22,23, the JAK2V617F allele overwhelmingly predominates in polycythemia vera, essential thrombocythema and primary myelofibrosis. It is thus possible that germline variation in the JAK2 locus may be specifically associated with an increase in the rate of the guanine-to-thymidine substitution at JAK2 codon 617.
It is likely there are additional germline loci important in MPN predisposition and pathogenesis. Our data suggests that germline variation at the JAK2 locus has a minimal contribution to JAK2V617F-negative MPN. Moreover, analysis of affected and unaffected members from 25 MPN kindreds did not reveal an association between JAK2 rs10974944 genotype and MPN in these pedigrees (O.K., A.M., M.W., D.G.G. and R.L.L., unpublished data). Using genome-wide analysis of ancestry-matched cases and controls, we demonstrate that there are additional candidate loci that contribute to MPN predisposition. These data should be interpreted with the caveats that the MPN SNP data are generated from diseased tissue and that there is not complete coverage of the genome owing to strand ambiguous SNPs. In addition, this study was done on a relatively small number of samples, and it is likely that larger genome-wide association studies will identify additional germline alleles relevant to MPN pathogenesis. Taken together, these data indicate that germline variation is an important contributor to MPN pathogenesis. Moreover, this approach can be used to identify cancer predisposition loci in cancer SNP array data that is being generated by The Cancer Genome Atlas Project24 and other large-scale cancer genomics studies.
Samples from subjects with MPN were obtained from the Harvard MPD Study case cohort10, all cases provided informed consent. Acute myeloid leukemia, myelodysplasia, and familial MPN samples were collected using protocols approved by the Dana Farber Cancer Institute institutional review board; all subjects provided informed consent. DNA was extracted from granulocytes and buccal swabs as previously described10, and RNA was extracted from subject cells stored in Trizol. We chose 217 granulocyte DNA samples, including 113 samples from subjects with polycythemia vera and 68 samples from subjects with essential thrombocythemia, for SNP array analysis on the basis of clonality studies and JAK2V617F mutational burden25 in order to limit analysis to samples with > 80% MPN cells. DNA samples were genotyped using Affymetrix 250K StyI arrays. We scanned arrays with the GeneChip Scanner 3000, and used the Affymetrix Genotyping Tools Version 2.0 to ascertain genotypes.
Analysis of genome-wide SNP array data was done using PLINK26 unless otherwise described. To identify modifier loci, we tested all SNPs for frequency differences between 113 individuals with polycythemia vera and 68 individuals with essential thrombocythemia. Five different genetic models (genotypic, allelic, trend, dominant and recessive) were tested for each SNP, and significance at each SNP was assessed using adaptive permutation testing with a maximum of 108 permutations. Odds ratios and confidence intervals for Table 1 were computed based on logistic regression (–logistic option in PLINK).
Granulocyte and buccal DNA samples were genotyped using TaqMan SNP genotyping assays for rs10974944 and rs12500918 (Applied Biosystems) assays. DNA samples from CEU HapMap founders were used as controls. We measured expression of JAK2 and HPRT1 using TaqMan Gene Expression Assays (Applied Biosystems).
A 3-kb PCR product containing rs10974944 and exon 14 of JAK2, where 1849G > T (V617F) resides, was amplified from JAK2V617F -positive subjects heterozygous for rs10974944 in the germline using allele-specific forward primers and a common reverse primer followed by sequencing the 1849G > T site with the reverse primer (Supplementary Table 3 online). To validate the allele-specific PCR assay we cloned the 3-kb fragment from 11 cases using the TOPO TA Cloning Kit (Invitrogen), and sequenced sufficient individual colonies from each subject to ascertain which genotype at rs10974944 was in cis with the 1849G > T (V617F) allele in granulocyte DNA from each informative subject.
For principal component analysis we used genome-wide data from the 217 MPN cases and from 3,000 controls from the Wellcome Trust Case Control consortium27, which were genotyped with the Affymetrix GeneChip 500k Mapping Array Set, of which the 250 K Sty chip is a subset. Before analysis, we carried out quality control filtering of both samples and SNP separately for cases and controls and then merged the dataset using the common set of SNPs present in the two cohorts. To do so, we first filtered out the ambiguous SNPs (A/Tor G/C alleles) to ensure we unambiguously know strand when we merge the two datasets. We removed 35,218 ambiguous markers (out of 231,786) from the MPN genotype dataset and 77,934 ambiguous markers (out of 486,661) from the WTCCC control cohort. The quality control filters and quality assessment removed subjects with low genotype completion rates (< 90%). Further data cleaning of the autosomal SNPs typed in both datasets retained SNPs that have a minor allele frequency (MAF) > 5%, a rate of missing genotype < 1%, and are in Hardy-Weinberg equilibrium in the WTCCC controls (exact test P > 10–7 ). In total, 62,775 markers were identified for analysis and used in the merged case and control dataset.
To investigate potential population stratification biases that could be introduced by the shared controls, we carried out principal component analysis using EIGENSTRAT28 . To reduce the linkage disequilibrium between markers, we first used PLINK to filter markers such that all remaining markers are in low LD (r2 < 0.1, calculated in sliding windows 50 SNPs wide, shifted and recalculated every 5 SNPs). We applied the EIGENSTRAT program with default parameters and no outlier removal to infer axes of variation in the combined dataset. The case and controls that clustered together on the eigenvector plot (with the first two axes of variation) were used for the association analysis.
The main SNP of interest in JAK2, rs10974944, has G and C alleles and was therefore eliminated by our filtering for ambiguous SNPs. To see at what rank it would appear in a GWAS for MPN risk alleles, we included it in our genome-wide association analysis. Specifically, we included the germline genotypes generated using TaqMan for the cases with the genotypes provided by the WTCCC data for the controls. A test of allelic association was done using –assoc in PLINK.
The frequencies of the genotypes between cases and controls were compared using Pearson’s χ2 test and, when required, Fisher’s exact test. The ANOVA test was used for comparison of JAK2V617F allele burden between different genotypes. SPSS version 16.0 for Windows (SPSS) was used for all statistical tests.
We would like to acknowledge the subjects who have contributed to our understanding of these disorders. We thank S. Thomas, I. Dolgalev and T. Landers for assistance with high-throughput resequencing, A. Viale for assistance with JAK2 expression analysis, and T. Kirchhoff for advice and suggestions. This study makes use of data generated by the Wellcome Trust Case-Control Consortium; a full list of the investigators who contributed to the generation of the data are available from http://www.wtccc.org.uk and funding was provided by the Wellcome Trust under award 076113. This work was supported by grants from the National Institutes of Health, the Starr Cancer Consortium, the Myeloproliferative Disorders Foundation, the Howard Hughes Medical Institute, the Doris Duke Charitable Foundation and the Kristen Amico Sesselman Leukemia Research Fund. O.K. is supported by a grant from the Academy of Finland. D.G.G. is an Investigator of the Howard Hughes Medical Institute and is a Doris Duke Charitable Foundation Distinguished Clinical Scientist. Work in the laboratory of R.J.K. is supported by Memorial Sloan Kettering Cancer Center through US National Institutes of Health grant P30 CA008748. R.L.L. is an Early Career Award recipient of the Howard Hughes Medical Institute and a Clinical Scientist Development Award recipient of the Doris Duke Charitable Foundation and is the Geoffrey Beene Junior Chair at Memorial Sloan Kettering Cancer Center.
AUTHOR CONTRIBUTIONSThe study was designed by O.K., S. Mukherjee, R.J.K. and R.L.L. with advice from K.O. SNP arrays were performed and analyzed by A.B., B.L.E. and R.L.L, and analysis of SNP array data for modifier and predisposition loci was performed by S. Mukherjee and R.J.K. Genotyping, sequence analysis and realtime PCR assays were performed by O.K., A.M.S., S. Marubayashi, A.H. and R.L.L. Principal component analysis was done by S. Mukherjee and R.J.K. Identification of subjects, sample collection and phenotypic assessment were done by M.W., A.M., G.G.-M., H.K., R.M.S, D.G.G. and R.L.L. The paper was written by O.K., S. Mukherjee, K.O., D.G.G., R.J.K. and R.L.L. All authors discussed the results and commented on the manuscript.