Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability, driving clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of AML samples with a known initiating event (PML-RARA) vs. normal karyotype AML samples, and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is “captured” as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.)
Most patients with acute myeloid leukemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level1,2. To determine the mutational spectrum associated with relapse, we sequenced the primary tumor and relapse genomes from 8 AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to precisely define clonality and clonal evolution patterns at relapse. Besides discovering novel, recurrently mutated genes (e.g. WAC, SMC3, DIS3, DDX41, and DAXX) in AML, we found two major clonal evolution patterns during AML relapse: 1) the founding clone in the primary tumor gained mutations and evolved into the relapse clone, or 2) a subclone of the founding clone survived initial therapy, gained additional mutations, and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific vs. primary tumor mutations in all 8 cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped in part by the chemotherapy that the patients receive to establish and maintain remissions.
Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole genome sequencing to perform an unbiased comprehensive screen to discover all the somatic mutations in a sAML sample and genotyped these loci in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (S34) in U2AF1 was recurrently mutated in 13/150 (8.7%) de novo MDS patients, with suggestive evidence of an associated increased risk of progression to sAML. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3′ end of introns and mutations are located in highly conserved zinc fingers in U2AF11,2. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This novel, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
The full complement of DNA mutations that are responsible for the pathogenesis of acute myeloid leukemia (AML) is not yet known.
We used massively parallel DNA sequencing to obtain a very high level of coverage (approximately 98%) of a primary, cytogenetically normal, de novo genome for AML with minimal maturation (AML-M1) and a matched normal skin genome.
We identified 12 acquired (somatic) mutations within the coding sequences of genes and 52 somatic point mutations in conserved or regulatory portions of the genome. All mutations appeared to be heterozygous and present in nearly all cells in the tumor sample. Four of the 64 mutations occurred in at least 1 additional AML sample in 188 samples that were tested. Mutations in NRAS and NPM1 had been identified previously in patients with AML, but two other mutations had not been identified. One of these mutations, in the IDH1 gene, was present in 15 of 187 additional AML genomes tested and was strongly associated with normal cytogenetic status; it was present in 13 of 80 cytogenetically normal samples (16%). The other was a nongenic mutation in a genomic region with regulatory potential and conservation in higher mammals; we detected it in one additional AML tumor. The AML genome that we sequenced contains approximately 750 point mutations, of which only a small fraction are likely to be relevant to pathogenesis.
By comparing the sequences of tumor and skin genomes of a patient with AML-M1, we have identified recurring mutations that may be relevant for pathogenesis.
The genetic alterations responsible for an adverse outcome in most patients with acute myeloid leukemia (AML) are unknown.
Using massively parallel DNA sequencing, we identified a somatic mutation in DNMT3A, encoding a DNA methyltransferase, in the genome of cells from a patient with AML with a normal karyotype. We sequenced the exons of DNMT3A in 280 additional patients with de novo AML to define recurring mutations.
A total of 62 of 281 patients (22.1%) had mutations in DNMT3A that were predicted to affect translation. We identified 18 different missense mutations, the most common of which was predicted to affect amino acid R882 (in 37 patients). We also identified six frameshift, six nonsense, and three splice-site mutations and a 1.5-Mbp deletion encompassing DNMT3A. These mutations were highly enriched in the group of patients with an intermediate-risk cytogenetic profile (56 of 166 patients, or 33.7%) but were absent in all 79 patients with a favorable-risk cytogenetic profile (P<0.001 for both comparisons). The median overall survival among patients with DNMT3A mutations was significantly shorter than that among patients without such mutations (12.3 months vs. 41.1 months, P<0.001). DNMT3A mutations were associated with adverse outcomes among patients with an intermediate-risk cytogenetic profile or FLT3 mutations, regardless of age, and were independently associated with a poor outcome in Cox proportional-hazards analysis.
DNMT3A mutations are highly recurrent in patients with de novo AML with an intermediate-risk cytogenetic profile and are independently associated with a poor outcome. (Funded by the National Institutes of Health and others.)
The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing.
To apply whole-genome sequencing to a patient with suspected cancer susceptibility (and lacking a clear family history of cancer and no BRCA1 and BRCA2 mutations) to identify rare or novel germline variants in cancer susceptibility genes.
Design, Setting, and Participant
Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer and therapy-related acute myeloid leukemia (t-AML), and analyzed with: 1) whole genome sequencing using paired end reads; 2) SNP genotyping; 3) RNA expression profiling; and 4) spectral karyotyping.
Main Outcome Measures
Structural variants, copy number alterations, single nucleotide variants and small insertions and deletions (indels) were detected and validated using the above platforms.
Whole genome sequencing revealed a novel, heterozygous 3 Kb deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient’s leukemia genome.
Whole genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.
Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35–250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.
massively parallel sequencing; next generation sequencing; human genome; variant detection; short read alignment; whole genome sequencing
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing- and alignment-related artifacts.
variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next-generation sequencing
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
The goal of our research is to identify genes and mutations causing auto-somal dominant retinitis pigmentosa (adRP). For this purpose we established a cohort of more than 250 independently ascertained families with adRP in the Houston Laboratory for Molecular Diagnosis of Inherited Eye Diseases. Affected members of each family were screened for disease-causing mutations in genes and gene regions that are commonly associated with adRP. By this approach, we detected mutations in 65 % of the families, leaving 85 families that are likely to harbor mutations outside of the “common” regions or in novel genes. Of these, 32 families were tested by several types of next-generation sequencing (NGS), including (a) targeted polymerase chain reaction (PCR) NGS, (b) whole exome NGS, and (c) targeted retinal-capture NGS. We detected mutations in 11 of these families (31 %) bringing the total detected in the adRP cohort to 70 %. Several large families have also been tested for linkage using Afymetrix single nucleotide polymorphism (SNP) arrays.
Retinitis pigmentosa; Next-generation sequencing; Linkage mapping; Mutation prevalence; Retinal gene capture; Whole-exome sequencing
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways.
Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry matched controls revealed two large-effect rare variants; previously described R1210C in the CFH gene (fcase = 0.51%, fcontrol = 0.02%, OR = 23.11), and newly identified K155Q in the C3 gene (fcase = 1.06%, fcontrol = 0.39%, OR = 2.68). The variants suggest decreased inhibition of C3 by Factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
Data from eight breast cancer genome sequencing projects identified 25 patients with HER2 somatic mutations in cancers lacking HER2 gene amplification. To determine the phenotype of these mutations, we functionally characterized thirteen HER2 mutations using in vitro kinase assays, protein structure analysis, cell culture and xenograft experiments. Seven of these mutations are activating mutations, including G309A, D769H, D769Y, V777L, P780ins, V842I, and R896C. HER2 in-frame deletion 755-759, which is homologous to EGFR exon 19 in-frame deletions, had a neomorphic phenotype with increased phosphorylation of EGFR or HER3. L755S produced lapatinib resistance, but was not an activating mutation in our experimental systems. All of these mutations were sensitive to the irreversible kinase inhibitor, neratinib. These findings demonstrate that HER2 somatic mutation is an alternative mechanism to activate HER2 in breast cancer and they validate HER2 somatic mutations as drug targets for breast cancer treatment.
Genomics; Breast Cancer; Receptor Tyrosine Kinase; Oncogene
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
Cancer immunoediting, the process whereby the immune system controls tumour outgrowth and shapes tumour immunogenicity, is comprised of three phases: elimination, equilibrium and escape1–5. Although many immune components that participate in this process are known, its underlying mechanisms remain poorly defined. A central tenet of cancer immunoediting is that T cell recognition of tumour antigens drives the immunologic destruction or sculpting of a developing cancer. However, our current understanding of tumour antigens comes largely from analyses of cancers that develop in immunocompetent hosts and thus may have already been edited. Little is known about the antigens expressed in nascent tumour cells, whether they are sufficient to induce protective anti-tumour immune responses or whether their expression is modulated by the immune system. Here, using massively parallel sequencing, we characterize expressed mutations in highly immunogenic methylcholanthrene-induced sarcomas derived from immunodeficient Rag2−/− mice which phenotypically resemble nascent primary tumour cells1,3,5. Employing class I prediction algorithms, we identify mutant spectrin-β2 as a potential rejection antigen of the d42m1 sarcoma and validate this prediction by conventional antigen expression cloning and detection. We also demonstrate that cancer immunoediting of d42m1 occurs via a T cell-dependent immunoselection process that promotes outgrowth of pre-existing tumour cell clones lacking highly antigenic mutant spectrin-β2 and other potential strong antigens. These results demonstrate that the strong immunogenicity of an unedited tumour can be ascribed to expression of highly antigenic mutant proteins and show that outgrowth of tumour cells that lack these strong antigens via a T cell-dependent immunoselection process represents one mechanism of cancer immunoediting.
The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.
Next-generation sequencing; Paired-end sequencing; 454; Illumina; Solexa; Abi solid; Insertions; Deletions; Duplications; Inversions; Translocations; Indels; Copy number variants
Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample.
Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
To correlate the variable clinical features of estrogen receptor positive (ER+) breast cancer with somatic alterations, we studied pre-treatment tumour biopsies accrued from patients in a study of neoadjuvant aromatase inhibitor (AI) therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to hematopoietic disorders. Mutant MAP3K1 was associated with Luminal A status, low grade histology and low proliferation rates whereas mutant TP53 associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon AI treatment. Pathway analysis demonstrated mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in ER+ breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumor biology but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.
Linkage testing using Affymetrix 6.0 SNP Arrays mapped the disease locus in TCD-G, an Irish family with autosomal dominant retinitis pigmentosa (adRP), to an 8.8 Mb region on 1p31. Of 50 known genes in the region, 11 candidates, including RPE65 and PDE4B, were sequenced using di-deoxy capillary electrophoresis. Simultaneously, a subset of family members was analyzed using Agilent SureSelect All Exome capture, followed by sequencing on an Illumina GAIIx platform. Candidate gene and exome sequencing resulted in the identification of an Asp477Gly mutation in exon 13 of the RPE65 gene tracking with the disease in TCD-G. All coding exons of genes not sequenced to sufficient depth by next generation sequencing were sequenced by di-deoxy sequencing. No other potential disease-causing variants were found to segregate with disease in TCD-G. The Asp477Gly mutation was not present in Irish controls, but was found in a second Irish family provisionally diagnosed with choroideremia, bringing the combined maximum two-point LOD score to 5.3. Mutations in RPE65 are a known cause of recessive Leber congenital amaurosis (LCA) and recessive RP, but no dominant mutations have been reported. Protein modeling suggests that the Asp477Gly mutation may destabilize protein folding, and mutant RPE65 protein migrates marginally faster on SDS-PAGE, compared with wild type. Gene therapy for LCA patients with RPE65 mutations has shown great promise, raising the possibility of related therapies for dominant-acting mutations in this gene.
retinitis pigmentosa; choroideremia; RPE65; exome capture; next-generation sequencing
The application of next-generation sequencing technology has produced a transformation in cancer genomics, generating large data sets that can be analyzed in different ways to answer a multitude of questions about the genomic alterations associated with the disease. Analytical approaches can discover focused mutations such as substitutions and small insertion/deletions, large structural alterations and copy number events. As our capacity to produce such data for multiple cancers of the same type is improving, so are the demands to analyze multiple tumor genomes simultaneously growing. For example, pathway-based analyses that provide the full mutational impact on cellular protein networks and correlation analyses aimed at revealing causal relationships between genomic alterations and clinical presentations are both enabled. As the repertoire of data grows to include mRNA-seq, non-coding RNA-seq and methylation for multiple genomes, our challenge will be to intelligently integrate data types and genomes to produce a coherent picture of the genetic basis of cancer.
The nematode Caenorhabditis briggsae is an emerging model organism that allows evolutionary comparisons with C. elegans and exploration of its own unique biological attributes. To produce a high-resolution C. briggsae recombination map, recombinant inbred lines were generated from reciprocal crosses between two strains and genotyped at over 1,000 loci. A second set of recombinant inbred lines involving a third strain was also genotyped at lower resolution. The resulting recombination maps exhibit discrete domains of high and low recombination, as in C. elegans, indicating these are a general feature of Caenorhabditis species. The proportion of a chromosome's physical size occupied by the central, low-recombination domain is highly correlated between species. However, the C. briggsae intra-species comparison reveals striking variation in the distribution of recombination between domains. Hybrid lines made with the more divergent pair of strains also exhibit pervasive marker transmission ratio distortion, evidence of selection acting on hybrid genotypes. The strongest effect, on chromosome III, is explained by a developmental delay phenotype exhibited by some hybrid F2 animals. In addition, on chromosomes IV and V, cross direction-specific biases towards one parental genotype suggest the existence of cytonuclear epistatic interactions. These interactions are discussed in relation to surprising mitochondrial genome polymorphism in C. briggsae, evidence that the two strains diverged in allopatry, the potential for local adaptation, and the evolution of Dobzhansky-Muller incompatibilities. The genetic and genomic resources resulting from this work will support future efforts to understand inter-strain divergence as well as facilitate studies of gene function, natural variation, and the evolution of recombination in Caenorhabditis nematodes.
The nematode Caenorhabditis briggsae is increasingly used for comparisons with its more famous relative, C. elegans. To improve genomic resources for C. briggsae, we created two sets of inbred lines derived from crosses between diverged C. briggsae strains. High-throughput genotyping of these has improved the resolution of the recombination map and genome assembly. It also allows detailed comparisons of recombination both within and between species. Unexpectedly, we found that alleles from one parental strain were much more likely to be fixed on three of the six chromosomes in one of the sets of lines. One of these biases is caused by a pronounced developmental delay in F2 progeny that is seen in both reciprocal crosses, whereas the other two manifest in only one of the two cross directions. This indicates that the parental strains have diverged in both nuclear and nuclear-cytoplasmic interactions, either because of local adaptation or restricted gene flow across much of the genome.
In this study, high-throughput, next-generation sequencing was used to identify disease-causing mutations in a large set of unrelated families with autosomal dominant retinitis pigmentosa (adRP), a highly heterogeneous inherited disease. This is one of the first reports of the application of the technology to a large set of adRP families. Next-generation sequencing of a large set of candidate genes identified mutations in 24% of the families tested, bringing the mutation identification rate in this adRP cohort to 65%.
To determine whether massively parallel next-generation DNA sequencing offers rapid and efficient detection of disease-causing mutations in patients with monogenic inherited diseases. Retinitis pigmentosa (RP) is a challenging application for this technology because it is a monogenic disease in individuals and families but is highly heterogeneous in patient populations. RP has multiple patterns of inheritance, with mutations in many genes for each inheritance pattern and numerous, distinct, disease-causing mutations at each locus; further, many RP genes have not been identified yet.
Next-generation sequencing was used to identify mutations in pairs of affected individuals from 21 families with autosomal dominant RP, selected from a cohort of families without mutations in “common” RP genes. One thousand amplicons targeting 249,267 unique bases of 46 candidate genes were sequenced with the 454GS FLX Titanium (Roche Diagnostics, Indianapolis, IN) and GAIIx (Illumina/Solexa, San Diego, CA) platforms.
An average sequence depth of 70× and 125× was obtained for the 454GS FLX and GAIIx platforms, respectively. More than 9000 sequence variants were identified and analyzed, to assess the likelihood of pathogenicity. One hundred twelve of these were selected as likely candidates and tested for segregation with traditional di-deoxy capillary electrophoresis sequencing of additional family members and control subjects. Five disease-causing mutations (24%) were identified in the 21 families.
This project demonstrates that next-generation sequencing is an effective approach for detecting novel, rare mutations causing heterogeneous monogenic disorders such as RP. With the addition of this technology, disease-causing mutations can now be identified in 65% of autosomal dominant RP cases.
Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML). It is characterized by the t(15;17)(q22;q11.2) chromosomal translocation that creates the promyelocytic leukemia–retinoic acid receptor α (PML-RARA) fusion oncogene. Although this fusion oncogene is known to initiate APL in mice, other cooperating mutations, as yet ill defined, are important for disease pathogenesis. To identify these, we used a mouse model of APL, whereby PML-RARA expressed in myeloid cells leads to a myeloproliferative disease that ultimately evolves into APL. Sequencing of a mouse APL genome revealed 3 somatic, nonsynonymous mutations relevant to APL pathogenesis, of which 1 (Jak1 V657F) was found to be recurrent in other affected mice. This mutation was identical to the JAK1 V658F mutation previously found in human APL and acute lymphoblastic leukemia samples. Further analysis showed that JAK1 V658F cooperated in vivo with PML-RARA, causing a rapidly fatal leukemia in mice. We also discovered a somatic 150-kb deletion involving the lysine (K)-specific demethylase 6A (Kdm6a, also known as Utx) gene, in the mouse APL genome. Similar deletions were observed in 3 out of 14 additional mouse APL samples and 1 out of 150 human AML samples. In conclusion, whole genome sequencing of mouse cancer genomes can provide an unbiased and comprehensive approach for discovering functionally relevant mutations that are also present in human leukemias.