The acute myeloid leukemia (AML) genome has been the subject of intensive research over the past four decades. New technologies, enabling characterization of the AML genome at increased resolution, have revealed deeper layers of complexity that have provided insights into the biological basis of this disease, nominated targets for therapy, and identified biomarkers predictive of response to therapy or long-term prognosis. Still, our understanding of AML genomics is incomplete. Recent publications have demonstrated that whole genome sequencing (WGS) of primary AML samples is feasible and can detect novel, clinically relevant mutations. New insights are emerging from this work, including the clonal heterogeneity of this disease and clonal evolution that occurs over time. Some of the novel mutations are highly recurrent (>20% of patients), but there appears to be a continuum of mutation frequency down to rare (<5%) or even singleton mutations that may be relevant for the biology of this disease. Large cohorts of well-annotated samples are needed to establish mutation frequencies, implicate biological pathways, and demonstrate genotype:phenotype correlations. Although many technical and logistical challenges must be overcome, the capacity of WGS to detect all classes of inherited and acquired genetic abnormalities makes it an attractive candidate for development as a clinical diagnostic test.
acute myeloid leukemia; genomics; next generation sequencing
Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35–250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.
massively parallel sequencing; next generation sequencing; human genome; variant detection; short read alignment; whole genome sequencing
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
BACKGROUND: The nuclear factor-kB (NF-kB) family of transcriptional regulators are central mediators of the cellular inflammatory response. Although constitutive NF-kB signaling is present in most human tumours, mutations in pathway members are rare, complicating efforts to understand and block aberrant NF-kB activity in cancer. METHODS: To identify additional genetic alterations that drive ependymoma, we sequenced the whole genomes (WGS) of 41 tumours and matched normal blood, and the transcriptomes (RNAseq) of 77 tumours. The transforming significance of alterations were tested in mouse NSCs that we showed previously to be cells of origin of ependymoma. RESULTS: Here, we show that more than two thirds of supratentorial ependymomas contain oncogenic fusions between RELA, the principal effector of canonical NF-kB signalling, and an uncharacterized gene, C11orf95. In each case, C11orf95-RELA fusions resulted from chromothripsis involving chromosome 11q13.1. C11orf95-RELA fusion proteins translocated spontaneously to the nucleus to activate NF-kB target genes, and rapidly transformed neural stem cells—the cell of origin of ependymoma—to form these tumours in mice. CONCLUSIONS: Our data identify the first highly recurrent genetic alteration of RELA in human cancer, and the C11orf95-RELA fusion protein as a potential therapeutic target in supratentorial ependymoma. SECONDARY CATEGORY: Neuropathology & Tumor Biomarkers.
Improved understanding of the molecular basis underlying oral squamous cell carcinoma (OSCC) aggressive growth has significant clinical implications. Herein, cross-species genomic comparison of carcinogen-induced murine and human OSCCs with indolent or metastatic growth yielded results with surprising translational relevance.
Murine OSCC cell lines were subjected to next-generation sequencing (NGS) to define their mutational landscape, to define novel candidate cancer genes and to assess for parallels with known drivers in human OSCC. Expression arrays identified a mouse metastasis signature and we assessed its representation in 4 independent human datasets comprising 324 patients using weighted voting and Gene Set Enrichment Analysis (GSEA). Kaplan-Meier analysis and multivariate Cox proportional hazards modeling were used to stratify outcomes. A qRT-PCR assay based on the mouse signature coupled to a machine-learning algorithm was developed and used to stratify an independent set of 31 patients with respect to metastatic lymphadenopathy.
NGS revealed conservation of human driver pathway mutations in mouse OSCC including in Trp53, MAPK, PI3K, NOTCH, JAK/STAT and FAT1–4. Moreover, comparative analysis between The Cancer Genome Atlas (TCGA) and mouse samples defined AKAP9, MED12L and MYH6 as novel putative cancer genes. Expression analysis identified a transcriptional signature predicting aggressiveness and clinical outcomes, which were validated in 4 independent human OSCC datasets. Finally, we harnessed the translational potential of this signature by creating a clinically feasible assay that stratified OSCC patients with a 93.5% accuracy.
These data demonstrate surprising cross-species genomic conservation that has translational relevance for human oral squamous cell cancer.
Several genetic alterations characteristic of leukemia and lymphoma have been detected in the blood of individuals without apparent hematological malignancies. We analyzed blood-derived sequence data from 2,728 individuals within The Cancer Genome Atlas, and discovered 77 blood-specific mutations in cancer-associated genes, the majority being associated with advanced age. Remarkably, 83% of these mutations were from 19 leukemia/lymphoma-associated genes, and nine were recurrently mutated (DNMT3A, TET2, JAK2, ASXL1, TP53, GNAS, PPM1D, BCORL1 and SF3B1). We identified 14 additional mutations in a very small fraction of blood cells, possibly representing the earliest stages of clonal expansion in hematopoietic stem cells. Comparison of these findings to mutations in hematological malignancies identified several recurrently mutated genes that may be disease initiators. Our analyses show that the blood cells of more than 2% of individuals (5–6% of people older than 70 years) contain mutations that may represent premalignant, initiating events that cause clonal hematopoietic expansion.
The immune system plays key roles in determining the fate of developing cancers by not only functioning as a tumour promoter facilitating cellular transformation, promoting tumour growth and sculpting tumour cell immunogenicity1–6, but also as an extrinsic tumour suppressor that either destroys developing tumours or restrains their expansion1,2,7. Yet clinically apparent cancers still arise in immunocompetent individuals in part as a consequence of cancer induced immunosuppression. In many individuals, immunosuppression is mediated by Cytotoxic T-Lymphocyte Associated Antigen-4 (CTLA-4) and Programmed Death-1 (PD-1), two immunomodulatory receptors expressed on T cells8,9. Monoclonal antibody (mAb) based therapies targeting CTLA-4 and/or PD-1 (checkpoint blockade) have yielded significant clinical benefits—including durable responses—to patients with different malignancies10–13. However, little is known about the identity of the tumour antigens that function as the targets of T cells activated by checkpoint blockade immunotherapy and whether these antigens can be used to generate vaccines that are highly tumour-specific. Herein, we use genomics and bioinformatics approaches to identify tumour-specific mutant proteins as a major class of T cell rejection antigens following αPD-1 and/or αCTLA-4 therapy of mice bearing progressively growing sarcomas and show that therapeutic synthetic long peptide (SLP) vaccines incorporating these mutant epitopes induce tumour rejection comparably to checkpoint blockade immunotherapy. Whereas, mutant tumour antigen-specific T cells are present in progressively growing tumours, they are reactivated following treatment with αPD-1- and/or αCTLA-4 and display some overlapping but mostly treatment-specific transcriptional profiles rendering them capable of mediating tumour rejection. These results reveal that tumour-specific mutant antigens (TSMA) are not only important targets of checkpoint blockade therapy but also can be used to develop personalized cancer-specific vaccines and to probe the mechanistic underpinnings of different checkpoint blockade treatments.
Ewing sarcoma is a primary bone tumor initiated by EWSR1–ETS gene fusions. To identify secondary genetic lesions that contribute to tumor progression, we performed whole-genome sequencing of 112 Ewing sarcoma samples and matched germline DNA. Overall, Ewing sarcoma tumors had relatively few single-nucleotide variants, indels, structural variants and copy-number alterations. Apart from whole chromosome arm copy-number changes, the most common somatic mutations were detected in STAG2 (17%), CDKN2A (12%), TP53 (7%), EZH2, BCOR, and ZMYM3 (2.7% each). Strikingly, STAG2 mutations and CDKN2A deletions were mutually exclusive, as confirmed in Ewing sarcoma cell lines. In an expanded cohort of 299 patients with clinical data, we discovered that STAG2 and TP53 mutations are often concurrent and are associated with poor outcome. Finally, we detected subclonal STAG2 mutations in diagnostic tumors and expansion of STAG2 immuno-negative cells in relapsed tumors as compared with matched diagnostic samples.
Ewing sarcoma; genomics; mutations; whole genome sequencing; prognostic
Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3x10-21), A928V (rs35018800, OR = 0.53, P = 1.2x10-9), and I684S (rs12720356, OR = 0.86, P = 4.6x10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6x10-18), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
The introduction of next-generation sequencing technologies has dramatically impacted the life sciences, perhaps most profoundly in the area of cancer genomics. Clinical applications of next-generation sequencing and associated methods are emerging from ongoing large-scale discovery projects that have catalogued hundreds of genes as having a role in cancer susceptibility, onset and progression. For example, discovery cancer genomics has confirmed that many of the same genes are altered by mutation, copy number gain or loss, or structural variation across multiple tumor types, resulting in a gain or loss of function that likely contributes to cancer development in these tissues. Beyond these frequently mutated genes, we now know there is a ‘long tail’ of less frequently mutated, but probably important, genes that play roles in cancer onset or progression. Here, I discuss some of the remaining barriers to clinical translation, and look forward to new applications of these technologies in cancer care.
The relationships between clonal architecture and functional heterogeneity in acute myeloid leukemia (AML) samples are not yet clear. We used targeted sequencing to track AML subclones identified by whole genome sequencing using a variety of experimental approaches. We found that virtually all AML subclones trafficked from the marrow to the peripheral blood, but some were enriched in specific cell populations. Subclones showed variable engraftment potential in immunodeficient mice. Xenografts were predominantly comprised of a single genetically-defined subclone, but there was no predictable relationship between the engrafting subclone and the evolutionary hierarchy of the leukemia. These data demonstrate the importance of integrating genetic and functional data in studies of primary cancer samples, both in xenograft models and in patients.
The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease.
Despite remarkable advances in the genomic characterization of adult melanoma, the molecular pathogenesis of pediatric melanoma remains largely unknown. We analyzed 15 conventional melanomas (CMs), 3 melanomas arising in congenital nevi (CNMs), and 5 spitzoid melanomas (SMs), using various platforms, including whole genome or exome sequencing, the molecular inversion probe assay, and/or targeted sequencing. CMs demonstrated a high burden of somatic single-nucleotide variations (SNVs), with each case containing a TERT promoter (TERT-p) mutation, 13/15 containing an activating BRAF V600 mutation, and >80% of the identified SNVs consistent with UV damage. In contrast, the three CNMs contained an activating NRAS Q61 mutation and no TERT-p mutations. SMs were characterized by chromosomal rearrangements resulting in activated kinase signaling in 40%, and an absence of TERT-p mutations, except for the one SM that succumbed to hematogenous metastasis. We conclude that pediatric CM has a very similar UV-induced mutational spectrum to that found in the adult counterpart, emphasizing the need to promote sun protection practices in early life and to improve access to therapeutic agents being explored in adults in young patients. In contrast, the pathogenesis of CNM appears to be distinct. TERT-p mutations may identify the rare subset of spitzoid melanocytic lesions prone to disseminate.
Tubular carcinoma (TC) is a rare, luminal A subtype of breast carcinoma with excellent prognosis, for which adjuvant chemotherapy is usually contraindicated.
To examine the levels of estrogen receptor (ER) and progesterone receptor expression in cases of TC and well-differentiated invasive ductal carcinoma as compared to normal breast glands and to determine if any significant differences could be detected via molecular testing.
We examined ER and progesterone receptor via immunohistochemistry in tubular (N = 27), mixed ductal/tubular (N = 16), and well-differentiated ductal (N = 27) carcinomas with comparison to surrounding normal breast tissue. We additionally performed molecular subtyping of 10 TCs and 10 ductal carcinomas via the PAM50 assay.
Although ER expression was high for all groups, TC had statistically significantly lower ER staining percentage (ER%) (P = .003) and difference in ER expression between tumor and accompanying normal tissue (P = .02) than well-differentiated ductal carcinomas, with mixed ductal/tubular carcinomas falling between these 2 groups. Mean ER% was 79%, 87%, and 94%, and mean tumor-normal ER% differences were 13.6%, 25.9%, and 32.6% in tubular, mixed, and ductal carcinomas, respectively. Most tumors that had molecular subtyping were luminal A (9 of 10 tubular and 8 of 10 ductal), and no significant differences in specific gene expression between the 2 groups were identified.
Tubular carcinoma exhibited decreased intensity in ER expression, closer to that of normal breast parenchyma, likely as a consequence of a high degree of differentiation. Lower ER% expression by TC may represent a potential pitfall when performing commercially available breast carcinoma prognostic assays that rely heavily on ER-related gene expression.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
Rhabdomyosarcoma is a soft-tissue sarcoma with molecular and cellular features of developing skeletal muscle. Rhabdomyosarcoma has two major histological subtypes, embryonal and alveolar, each with distinct clinical, molecular, and genetic features. Genomic analysis show that embryonal tumors have more structural and copy number variations than alveolar tumors. Mutations in the RAS/NF1 pathway are significantly associated with intermediate- and high-risk embryonal rhabdomyosarcomas (ERMS). In contrast, alveolar rhabdomyosarcoma (ARMS) have fewer genetic lesions overall and no known recurrently mutated cancer consensus genes. To identify therapeutics for ERMS, we developed and characterized orthotopic xenografts of tumors that were sequenced in our study. High throughput screening of primary cultures derived from those xenografts identified oxidative stress as a pathway of therapeutic relevance for ERMS.
Pediatric high-grade glioma (HGG) is a devastating disease with a two-year survival of less than 20%1. We analyzed 127 pediatric HGGs, including diffuse intrinsic pontine gliomas (DIPGs) and non-brainstem HGGs (NBS-HGGs) by whole genome, whole exome, and/or transcriptome sequencing. We identified recurrent somatic mutations in ACVR1 exclusively in DIPG (32%), in addition to the previously reported frequent somatic mutations in histone H3, TP53 and ATRX in both DIPG and NBS-HGGs2-5. Structural variants generating fusion genes were found in 47% of DIPGs and NBS-HGGs, with recurrent fusions involving the neurotrophin receptor genes NTRK1, 2, or 3 in 40% of NBS-HGGs in infants. Mutations targeting receptor tyrosine kinase/RAS/PI3K signaling, histone modification or chromatin remodeling, and cell cycle regulation were found in 68%, 73% and 59%, respectively, of pediatric HGGs, including DIPGs and NBS-HGGs. This comprehensive analysis provides insights into the unique and shared pathways driving pediatric HGG within and outside the brainstem.
The nuclear factor-κB (NF-κB) family of transcriptional regulators are central mediators of the cellular inflammatory response. Although constitutive NF-κB signaling is present in most human tumours, mutations in pathway members are rare, complicating efforts to understand and block aberrant NF-κB activity in cancer. Here, we show that more than two thirds of supratentorial ependymomas contain oncogenic fusions between RELA, the principal effector of canonical NF-κB signalling, and an uncharacterized gene, C11orf95. In each case, C11orf95-RELA fusions resulted from chromothripsis involving chromosome 11q13.1. C11orf95-RELA fusion proteins translocated spontaneously to the nucleus to activate NF-κB target genes, and rapidly transformed neural stem cells—the cell of origin of ependymoma—to form these tumours in mice. Our data identify the first highly recurrent genetic alteration of RELA in human cancer, and the C11orf95-RELA fusion protein as a potential therapeutic target in supratentorial ependymoma.
The sensitivity of massively-parallel sequencing has confirmed that most cancers are oligoclonal, with subpopulations of neoplastic cells harboring distinct mutations. A fine resolution view of this clonal architecture provides insight into tumor heterogeneity, evolution, and treatment response, all of which may have clinical implications. Single tumor analysis already contributes to understanding these phenomena. However, cryptic subclones are frequently revealed by additional patient samples (e.g., collected at relapse or following treatment), indicating that accurately characterizing a tumor requires analyzing multiple samples from the same patient. To address this need, we present SciClone, a computational method that identifies the number and genetic composition of subclones by analyzing the variant allele frequencies of somatic mutations. We use it to detect subclones in acute myeloid leukemia and breast cancer samples that, though present at disease onset, are not evident from a single primary tumor sample. By doing so, we can track tumor evolution and identify the spatial origins of cells resisting therapy.
Sequencing the genomic DNA of cancers has revealed that tumors are not homogeneous. As a tumor grows, new mutations accumulate in individual cells, and as these cells replicate, the mutations are passed on to their offspring, which comprise only a portion of the tumor when it is sampled. We present a method for identifying the fraction of cells containing specific mutations, clustering them into subclonal populations, and tracking the changes in these subclones. This allows us to follow the clonal evolution of cancers as they respond to chemotherapy or develop therapy resistance, processes which may radically alter the subclonal composition of a tumor. It also gives us insight into the spatial organization of tumors, and we show that multiple biopsies from a single breast cancer may harbor different subclones that respond differently to treatment. Finally, we show that sequencing multiple samples from a patient's tumor is often critical, as it reveals cryptic subclones that cannot be discerned from only one sample. This is the first tool that can efficiently leverage multiple samples to identify these as distinct subpopulations of cells, thus contributing to understanding the biology of the tumor and influencing clinical decisions about therapy.
Here we sequence 633 genes, encoding the majority of known epigenetic regulatory proteins, in over 1000 pediatric tumors to define the landscape of somatic mutations in epigenetic regulators in pediatric cancer. Our results demonstrate a marked variation in the frequency of gene mutations across 21 different pediatric cancer subtypes, with the highest frequency of mutations detected in high-grade gliomas, T-lineage acute lymphoblastic leukemia, medulloblastoma, and a paucity of mutations in low-grade glioma, and retinoblastoma. The most frequently mutated genes are H3F3A, PHF6, ATRX, KDM6A, SMARCA4, ASXL2, CREBBP, EZH2, MLL2, USP7, ASXL1, NSD2, SETD2, SMC1A, and ZMYM3. Importantly, we identify novel loss-of-function mutations in the ubiquitin-specific-processing protease 7 (USP7) in pediatric leukemia, which result in a decrease in deubiquitination activity. Collectively, our results help to define the landscape of mutations in epigenetic regulatory genes in pediatric cancer and yield a valuable new database for investigating the role of epigenetic dysregulations in cancer.
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways.
Osteosarcoma is a neoplasm of mesenchymal origin with features of osteogenic differentiation. Patients with recurrent or metastatic disease have a very poor prognosis. To define the landscape of somatic mutations in pediatric osteosarcoma, we performed whole-genome sequencing of DNA from 20 osteosarcoma tumor samples and matched normal tissue (obtained from 19 patients) in the discovery cohort as well as 14 samples from 13 patients in the validation cohort. Our results demonstrate that pediatric osteosarcoma is characterized by multiple somatic chromosomal lesions, including structural variations (SVs) and copy number alterations (CNAs). Moreover, single nucleotide variations (SNVs) exhibit a pattern of localized hypermutation called “kataegis” in 50% of the tumors. Despite these regions of kataegis across the osteosarcoma genomes, we detected relatively few recurrent SNVs, and only when SVs were included did we identify the major pathways that are mutated in osteosarcoma. We identified p53 pathway lesions in all 19 patient’s tumors in the discovery cohort, 9 of which were translocations in the first intron of the TP53 gene, leading to gene inactivation. This mechanism of p53 gene inactivation is unique to osteosarcoma among pediatric cancers. In an additional cohort of 32 patients, TP53 gene alterations were identified in 29 of those tumors. Beyond TP53, the RB1, ATRX and DLG2 genes showed recurrent somatic alterations (SNVs and/or SVs) in 29–53% of the tumors. These data highlight the power of whole-genome sequencing in identifying recurrent somatic alterations in cancer genomes that may be missed using other methods.
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions—the population frequency of individual clones, their genetic composition, and their evolutionary relationships—which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Human cancers are genetically diverse populations of cells that evolve over the course of their natural history or in response to the selective pressure of therapy. In theory, it is possible to infer how this variation is structured into related populations of cells based on the frequency of individual mutations in bulk samples, but the accuracy of these models has not been evaluated across a large number of variants in individual cells. Here, we report a strategy for analyzing hundreds of variants within a single cell, and we apply this method to assess models of tumor clonality derived from bulk samples in three cases of leukemia. The data largely support the predicted population structure, though they suggest specific refinements. This type of approach not only illustrates the biological complexity of human cancer, but it also has the potential to inform patient management. That is, precise knowledge of which variants are present in which populations of cells may allow physicians to more effectively target combinations of mutations and predict how patients will respond to therapy.
Knowledge of individual ancestry is important for genetic association studies where population structure leads to false positive signals. Estimating individual ancestry with targeted sequence data, which constitutes the bulk of current sequence datasets, is challenging. Here, we propose a new method for accurate estimation of genetic ancestry. Our method skips genotype calling and directly analyzes sequence reads. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry with whole genome shotgun coverage as low as 0.001X. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1X. At an even finer-scale, the method improves discrimination between exome-sequenced participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and reduce the risk of spurious findings due to population structure.
The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially druggable genes. DGIdb can be accessed at dgidb.org.