The UCLA Institutional Review Board approved this study, which was carried out in compliance with the Helsinki Declaration, and all participants, or parents of participants, provided written informed consent before samples were collected.
We collected peripheral blood at diagnosis and remission bone marrow from four patients with congenital ALL (Table ). The institutional review board reviewed and approved this study.
DNA extraction and sequencing
Tumor genomic DNA was extracted from peripheral blood at diagnosis and normal genomic DNA was extracted from remission bone marrow using QIAmp DNA Minikit (Qiagen, Valencia, California). Genomic DNA was enriched for coding exons using Sure Select Human All Exon for sample 1, and Human All Exon 50Mb kits for samples 2–4 (Agilent, Santa Clara, California). Sample 1 was sequenced on one full lane of the Illumina Genome Analyzer IIx as 76x76 base paired-end reads as well as one full lane of the HiSeq2000 as 50x50 base paired-end reads and reads were merged for downstream analysis (Illumina, San Diego, California). Leukemia sample numbers 2 through 4 and parents of sample 1 were sequenced on one full lane of the HiSeq2000 as 100x100 base pair, paired-end reads, while the germlines of samples 2–4 were sequenced on one full lane of the HiSeq2000 as 50x50 base pair, paired-end reads.
Variant calling and filtration
Sequence reads were aligned to the human reference genome build 37, using Novoalign (novocraft.com). Post-processing of reads was performed using Samtools (samtools.sf.net) and Picard (picard.sf.net) for removal of PCR duplicates, merging, and indexing [13
The Genome Analysis Toolkit (GATK) was used for recalibration of base quality, variant calling, filtration and evaluation [14
]. Quality scores generated by the sequencer were recalibrated by analyzing the covariation among reported.
Quality score, position within the read, dinucleotide, and probability of a reference mismatch. Local realignment around small insertions and deletions (indels) was performed, using GATK's indel realigner to minimize the number of mismatching bases across all reads. Statistically significant non-reference variants, single nucleotide substitutions (SNS) and small indels were identified using the GATK UnifiedGenotyper. The GATK VariantAnnotator annotated each variant with various statistics, including allele balance, depth of coverage, strand balance, and multiple quality metrics. These statistics were then used in an adaptive error model to identify likely false positive SNSs, using the GATK VariantQualityScoreRealibrator (VQSR). Single nucleotide substitutions with a low VQSR score were filtered out, leaving a set of likely true variants. Hard filtering was applied to indels and only passing indels were used for subsequent analyses.
Variants were filtered out if they were in non-coding regions, resulted in synonymous amino acid changes, or were predicted to have a benign change in protein function by Polyphen (http://genetics.bwh.harvard.edu/pph
) or Sift (http://sift.jcvi.org
). Variants were classified as rare if alternate allele frequencies were less than 1%.
Nonsynonymous, protein-damaging, and rare germline variants were intersected with known germline mutations that predispose to cancer syndromes, found in Cosmic [16
]. Germline variants were also intersected with known DNA repair genes [17
]. Germline variants in sample 1 were cross-checked with the parents’ sequence data to identify inherited versus de novo mutations. All germline and somatic variants at the last step of filtering were manually visualized using Integrated Genomics Viewer [18
Mutations were classified as somatic if they were rare and found in the tumor sample only with no evidence in the germline data. Fisher’s Exact test was performed on the reference and non-reference reads and p-value <1x10-6 was used as the cut-off for significance. Somatic mutations found in sample 1 were cross-checked with the parents’ sequence data to ensure they were indeed somatic and not alleles missed in the germline. Three somatic variants were excluded because they were present as non-reference reads in one or both parents.
Polymerase chain reaction and capillary sequencing
mutation in Sample 1, FLT3
mutation in Sample 3, and DMBT
mutation in Sample 4 were validated using PCR and capillary sequencing. All primers for mutations were designed using Primer3Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi
) and ordered from Integrated DNA Technologies (Coralville, IA). Capillary sequencing was performed on Biosystems 3730 Capillary DNA Analyzer (Life Technologies, Carlsbad, CA). Raw and analyzed sequence results were visualized on Sequence Scanner v1.0 (Life Technologies, Carlsbad, CA). There was not sufficient DNA for Sample 2 to validate variants with PCR and capillary sequencing.