Preliminary screen of somatic copy number alterations.
DNA from 15 mouse APL tumors on the B6/T (F10) and pooled DNA derived from spleen of 6-week-old WT, parental strain mice (B6/T, n
= 6, and 129X1/SvJ, n
= 6) was hybridized to the NimbleGen Mouse CGH 3x720K WG-T arrays, as previously described (34
). To systematically identify gross chromosomal gains and losses, we applied the wuHMM algorithm to the average log2
ratios of 50 consecutive probes (35
). Because segmentation algorithms assume that the test and reference DNA samples have largely the same copy number across a chromosome, gross abnormalities can confound the identification of micro-alterations. To compensate for this, we mean centered each chromosome, except for chromosomes containing gross abnormalities. For each of these chromosomes, we mean centered the called and non-called regions separately. To identify micro-alterations, we executed wuHMM on the normalized individual probe-level data, identifying 9,897 putative CNVs. Next, we determined the extent of residual 129/SvJ CNVs in each tumor. First, we identified 140 germline CNVs in 129/SvJ versus B6/T (60 of the 140 CNVs were identified as amplifications with a gain score of >1.5, and the remaining 80 CNVs were identified as deletions with a loss score of >2.5; gain and loss scores were calculated using the wuHMM algorithm, in which CNVs are scored based on log2
ratio magnitude, number of probes, and local noise). For each CNV region, we assigned genotypes to tumor samples based on the log2
ratio of individual probes within the region, as previously described (36
). CGH array data for all samples have been deposited on the Gene Expression Omnibus (GEO;
; SuperSeries accession no. GSE26131).
Tiling-array validation of somatic copy number alterations in 15 tumors.
To prioritize candidate copy number alterations (CNAs) for validation on a custom tiling array, first we filtered calls based on visual inspection. This process resulted in 455 calls. 84 out of the 455 CNAs substantially overlapped 129X1/SvJ CNVs that we identified in this study. These were considered germline polymorphisms and were excluded from further analyses, leaving 371 candidates. It is possible that we missed real CNAs in the visualization process. Therefore, of the calls that did not pass our initial inspection, we also included 268 calls that passed stringent score thresholds: gains scoring of more than 1.25 and losses scoring of more than 1.75. Overlapping CNAs were merged into nonredundant regions, leading to a final list of 449 candidate regions for validation by custom-tiling array. Validation arrays were designed such that each region was covered by approximately 300 probes.
Mouse gene expression arrays.
Total cellular RNA was purified using TRIzol
reagent (Invitrogen), quantified using UV spectroscopy (Nanodrop Technologies), and qualitatively assessed using an Experion Bioanalyzer. Amplified cDNA was prepared from 20 ng total RNA using the whole transcript WT-Ovation RNA Amplification System and biotin-labeled using the Encore Biotin Module, both from NuGen Technologies, according to the manufacturer’s instructions. Labeled targets were then hybridized to Mouse Exon 1.0 ST arrays (Affymetrix), washed, stained, and scanned using standard protocols from the Siteman Cancer Center, Molecular and Genomic Analysis Core Facility (
). Affymetrix Expression Console software was used to process array images, export signal data, and evaluate image and data quality relative to standard Affymetrix quality control metrics. The expression data from 3 out of the 34 probe sets for Kdm6a
was plotted; these 3 “best” probe sets had the highest expression with the lowest variability in the tumors without the Kdm6a
deletion. The average expression of Kdm6a
in the 11 tumors without Kdm6a
deletions was significantly higher than the expression of the tumors with deletions involving Kdm6a
, if the raw expression data from all 34 probe sets was averaged for each sample (P
< 0.0001). Both the unsupervised and supervised heat map analysis of the 4 Kdm6a-deleted tumors versus the remaining 11 tumors was done using Spotfire software (TIBCO). To determine significantly dysregulated genes, we removed probes with raw signal intensity of less than 200 in all samples (bringing the total number of probes in the analysis down to 142,000) and used a fold-change cutoff of less than or equal to 0.5 or more than or equal to 2 and P
value cutoff of less than or equal to
0.05, using the ANOVA algorithm. Finally, we removed probe sets with average expression of less than 200 in both groups. The remaining probe sets were also analyzed using SAM (37
); 6 probe sets were significant at an FDR of 0, and all 6 probe sets were significant for Kdm6a
. We used the same approach for the analysis of the 2 Jak1
-mutated tumors in comparison with the 13 tumors without a Jak1
mutation. Exon array data for all samples have been deposited on GEO (
; SuperSeries accession no. GSE26131).
Human gene expression and SNP arrays.
The gene expression array and analysis of the human sample 750152 was done on the Affymetrix Human v133 Plus 2 array, as previously described (38
). The copy number analysis of the human sample 750152 was done by Affymetrix v6.0 SNP array, as previously described (39
), and analyzed by the CMDS algorithm (40
). One hundred fifty human AML samples were screened by CMDS of Affymetrix v6.0 SNP array data, with a focal deletion involving the KDM6A
gene in 1 sample, 750152. This study was approved by the Human Research Protection Office at Washington University School of Medicine, after patients and donors provided informed consent, in accordance with the Declaration of Helsinki.
Illumina library construction.
We followed the standard library construction procedure, as previously described (16
). Briefly, we started with 500 ng DNA from the mouse APL tumor 9500 and 1 μg DNA from a pool of equal amounts of DNA isolated from the spleens of 6 young 129X1/SvJ mice. After the DNA was fragmented, end repaired, and ligated with adaptors, we amplified the products and ran them on a polyacrylamide gel. We then excised gel slices between 350 and 400 bp and 550 and 600 bp for the tumor 9500 sample and 300 and 350 bp for the 129X1/SvJ pool and eluted the DNA from each gel slice as the source for independent libraries.
After diluting the libraries to a 10-nm concentration, we used the paired-end flow cell and cluster generation kits to produce flow cells with an average cluster density ranging between 115,000 to 420,000 clusters per tile. We used the standard sequencing kits (Illumina Sequencing Kit v3) and performed 36–100 cycles of nucleotide incorporation. After this first round of end sequencing, the flow cell was treated to remove the synthesized fragments, clusters were re-amplified, and the resulting fragments were linearized. We then annealed the second sequencing primer and initiated another round of sequencing by synthesis to complete the read pairs. The read length for the second sequencing round matched that of the first. After round 2, we used the Illumina sequencing pipeline, version 1.3 or version 1.6 (depending on when the sequencing was performed), to analyze the data and produced files containing high-quality (“passed filter”) reads with associated quality values.
Alignment, coverage analysis, and SNV/indel calling.
Illumina reads from both mouse strains were aligned to the mouse NCBI build 37 reference sequence using MAQ (41
). Haploid coverage for each genome was estimated using MAQ’s mapcheck command at 28.9X for 129/SvJ and 15.5X for tumor 9500. SNVs in tumor 9500 were detected by MAQ and further filtered using MAQ SNPfilter. Despite 10 generations of backcrossing to B6/T, we expected to find a small number of 129/SvJ variants clustered in inherited blocks. Thus, we predicted variants from our genomic reads of 129/SvJ with SAMtools and removed tumor 9500 variants sharing an allele with predictions from 129/SvJ from consideration (42
). We then further removed predicted sites occurring in blocks. We defined a block as a cluster of at least 4 variants of which each variant was less than 40,000 nucleotides away from its nearest neighbor. After filtering, 31 potential tier 1 heterozygous SNVs in noncontiguous blocks were then validated by resequencing on the 3730 DNA Analyzer platform, and 23 out of 31 sites did not validate. Of these, 1 had no coverage, and 3 had low-quality data due to simple repeat content at the site. Six were variants in our B6/T WT pool and thus represented variation from the sequenced reference in our backcross. One site was present in both the B6/T WT pool and the 129/SvJ WT control, but the 129/SvJ variant did not pass MAQ SNPfilter and hence was not identified as a 129/SvJ variant. One variant was present in the 129/SvJ sequence but did not pass MAQ SNPfilter and thus was not filtered out by our analysis. Eleven sites validated as matching the reference sequence. Of those, manual review found that 2 had low coverage, 3 looked like potential paralog mappings, 2 were near homopolymers, 2 were in regions with reads supporting the variant that were ambiguously mapped (in addition to some reads that were unambiguous), and 2 gave no indication of why they might have validated WT.
Determination of residual 129/SvJ genomic regions.
We found 4,951,238 SNPs present in the 129/SvJ genome compared with the B6/Jackson reference genome. There were 78,339 129/SvJ SNPs present in the tumor 9500 genome (10,479 were homozygous; 55,691 were heterozygous but overlapped homozygous 129/SvJ mutations; and 12,169 were heterozygous and exactly matched heterozygous mutations in 129/SvJ). The percentage of B6/Jackson genome in the tumor 9500 genome equals ([0.5 × number of heterozygous SNPs in tumor 9500 that were homozygous in the 129/SvJ genome] + [0.75 × number of heterozygous SNPs in tumor 9500 that exactly matched heterozygous SNPs in 129/SvJ genome] + [number of homozygous SNPs present in both tumor 9500 and 129/SvJ genome])/(total number of polymorphic alleles). Thus, the percentage of residual 129/SvJ genome in the tumor 9500 genome was calculated to be ([0.5 × 55,691 SNPs] + [0.75 × 12,169 SNPs] + 10,479 SNPs)/4,951,238 alleles or 0.96%. Conversely, the percentage of the B6/Jackson genome present in the tumor 9500 genome was 99.04%. We used the Circos program (
) to generate the heat map of residual 129/SvJ SNPs by chromosomal location (Figure and ref. 43
Determination of Jak1 V657F allele frequency.
We thawed a vial of passaged tumor 9500 and then plated the tumor cells using serial dilution in methylcellulose plates containing SCF, IL-3, and IL-6 (MethoCult M3534, Stem Cell Technologies) for 7 days before 48 distinct, individual colonies (>50 cells) were harvested by direct visualization under light microscopy. Individual colonies were transferred to a single well of a 96-well plate, washed with PBS, and spun down. Cell pellets were then directly used as the input for whole genome DNA amplification, as described in ref. 2
. Whole genome amplification DNA was then used as the input DNA for standard PCR amplification of a Jak1
amplicon, followed by Sanger sequencing. We included 2 mouse APL tumors with known Jak1
V657F mutations, 2 mouse APL tumors without Jak1
mutations, and 2 blank wells as controls for this experiment.
Sequence based copy number analysis.
We aligned the APL data with BWA and counted the number of mapped reads in consecutive nonoverlapping 1-kb intervals across the genome (44
). We estimated copy numbers by dividing the read counts by the chromosomal median. Copy number altered regions (CNARs) between these 2 strains were identified by first dividing the genome into 10-kbp nonoverlapping windows. We estimated the copy number value of each window as twice (diploid) the ratio of the number of reads confidently mapped (mapping quality >35) within the window over the median of all the windows in the genome. For each genome, we used a Hidden Markov Model (HMM) to segment the copy number values and produce copy number segments. Each state in the HMM corresponds to a discrete copy number, and the distribution of the copy number values given a state was modeled as a Gaussian probability density function. We used the expectation maximization algorithm to estimate the HMM parameters and determined the segmentation using the Viterbi algorithm. We derived CNARs by comparing the segmentation of the tumor genome versus that of the normal genome. A log likelihood ratio (LLR) score was computed for each CNAR to quantify the possibility of being a true CNAR from being the null (copy number neutral) hypothesis. All CNARs with LLRs of more than or equal to 100 were reported. We have named the above method for detecting CNARs, CNVHMM. The boundaries of the CNARs were further refined using supporting discordant read pairs identified by BreakDancer (18
). PCR validation was done with primers designed upstream and downstream of the predicted deletion breakpoints. For tumor 9500, the forward primer was GACAGACCTACCTCTCATG at position 17,752,753 of chromosome X, and the reverse primer was GACAGACCTACCTCTCATG at position 17,904,755 (using build 37.1 of the mouse genome). The PCR product was cloned using the Topo TA Cloning Kit (Invitrogen) and sequenced to confirm the breakpoints of the deletion.
Functional validation using the transduction-transplantation mouse model with human JAK1 cDNA MSCV retroviral constructs.
The human JAK1
cDNA was obtained from a commercial vendor (Origene) and subcloned into the retroviral expression vector MSCV-IRES-eGFP
). The V658F mutation was introduced into the JAK1
coding sequence by site-directed mutagenesis using the QuikChange Site-Directed Mutagenesis Kit (Stratagene) and confirmed by full-length DNA sequencing. Retroviral supernatants were generated and viral titers determined by flow cytometry analysis of the GFP+
cells in retrovirally transfected 3T3 cells, as described previously (45
). GFP alone versus JAK1 WT/IRES/GFP versus JAK1 V658F mutant/IRES/GFP MSCV retroviral constructs were transduced into WT B6/T or mCG-PR bone marrow from 6-week-old healthy mice. The transduced marrow was then transplanted into lethally irradiated WT B6/T male mice obtained from The Jackson Laboratory, according to the method previously reported (45
). Briefly, mononuclear bone marrow cells were cultured in media that contained RPMI, 20% FCS, SCF, FLT3L, IL-3, and TPO for 48 hours. 5-Fluorouracil treatment was not used for donor mice. The cells were then spinfected twice, on days –1 and 0. After a 4-hour rest period, transduced bone marrow cells (1 × 106
cells) were then injected intravenously via the tail vein into lethally irradiated (1,100 cGy) syngeneic B6/T mice. Secondary transplants were performed by retro-orbitally injecting 5 × 105
to 10 × 105
unsorted, banked mononuclear spleen cells isolated from moribund primary transplant recipients (n
= 3) into secondary syngeneic recipient healthy mice. Two mice in the mCG-PR cohort transduced with JAK1
V658F retrovirus in the first 2 experiments did not have sustained engraftment of GFP+
cells and were not included in the analysis.
Analysis of mice.
All mouse work was in done in accordance with institutional guidelines and approved by the Animal Studies Committee at Washington University in accordance with current NIH policy. Mice were monitored 3 times per week for disease by palpation and observation. Peripheral blood was obtained by retro-orbital phlebotomy after adequate methoxyflurane (Vedco) anesthesia using heparinized capillary tubes (Fisher Scientific). Mononuclear cell suspensions of spleen or bone marrow were made by passing tissue through 70-μm nylon mesh cell strainers (Falcon) wetted with ACK lysis buffer, spun down in appropriate volume of ACK buffer, and then resuspended in working media.
Cytospin tissue slides were stained with Wright-Giemsa (Sigma-Aldrich) and were imaged using a Nikon MICROPHOT-SA microscope equipped with an oil-immersion 50×/0.90 or 100×/1.30 objective lens (Nikon Corp.). Spleen differentials were done by a blinded hematopathologist who counted 200 cells per slide.
One million mononuclear cells were suspended in 100 μl flow cytometry buffer and incubated for 20 minutes on ice with CD16/32 FcR block and then appropriate antibodies, including PE-conjugated Gr-1, FITC-conjugated CD34, allophycocyanin-conjugated CD117, or isotype controls (all from eBioscience), before being analyzed on a FACScan Flow Cytometer (BD Biosciences). Staining for phosphorylated STAT1 (pY701), STAT3 (pY705), and STAT5 (pY694) was done as described previously (all antibodies from BD Biosciences) (46
Methylcellulose colony-forming assay.
We tested the efficacy of a commercially available pan-JAK inhibitor, P6 or JAK Inhibitor I (final concentration 0.5 μM, in 1,000× DMSO stock; Calbiochem), versus doxorubicin (final concentration of 100 ng/ml, in 1,000× water stock; Novation), ATRA (final concentration of 1 μM, in 1000× ethanol stock, Sigma-Aldrich), and a selective JAK3 inhibitor (WHI-P131 or JAK3 Inhibitor I, final concentration 10 μM, in 1,000× DMSO stock; Calbiochem) in a set of 10 banked mCG-PR tumors by a standard methylcellulose colony-forming assay. Banked mCG-PR tumors were thawed in liquid culture with K10 media with or without drug (1:1,000 dilution of DMSO used for mock controls) for 48 hours, and then plated in methylcellulose containing SCF, IL-3, and IL-6 (MethoCult M3534, Stem Cell Technologies) for 5–7 days before colonies (>50 cells) were counted. The cells were incubated throughout the experiment in a 3% oxygen incubator at 37° C.
Western blot analysis.
Protein lysate preparation and Western blots were performed following standard protocols as described previously (6
). The Jak1 (6G4) rabbit mAb used was from Cell Signaling Technology. To assess protein loading, blots were probed with anti–β-actin antibody (Sigma-Aldrich).
DNA isolation and Southern analysis.
Genomic DNA was isolated from banked splenocytes after being lysed with a 1% Triton X-100 lysis buffer with 2 mg/ml Proteinase K added (Sigma-Aldrich), followed by standard phenol-chloroform extraction. For assay of clonality, 10 μg genomic DNA was digested with EcoRI or HindIII (as a positive control) (both from Fisher Scientific). The hybridization probe was a 277-bp PCR fragment generated from primers targeting the cDNA of the EGFP gene.
Statistical comparisons were made using Student’s 2-sided t test unless otherwise noted. We performed the above calculations and the Kaplan-Meier survival analysis using the log-rank test with the GraphPad Prism 5 software. We considered P ≤ 0.05 to be significant, except for the analysis of the methylcellulose colony-forming assay (Supplemental Figure 8), for which we considered a cutoff of P ≤ 0.01 as significant.