A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
enterovirus D68; enterovirus; genomic; outbreak; respiratory; asthma; pediatric; viruses; St. Louis
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing- and alignment-related artifacts.
variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next-generation sequencing
Retinoblastoma is an aggressive childhood cancer of the developing retina that is initiated by the biallelic loss of the RB1 gene. To identify the mutations that cooperate with RB1 loss, we performed whole-genome sequencing of retinoblastomas. The overall mutational rate was very low; RB1 was the only known cancer gene mutated. We then evaluated RB1’s role in genome stability and considered nongenetic mechanisms of cancer pathway deregulation. Here we show that the retinoblastoma genome is stable, but multiple cancer pathways can be epigenetically deregulated. For example, the proto-oncogene SYK is upregulated in retinoblastoma and is required for tumor cell survival. Targeting SYK with a small-molecule inhibitor induced retinoblastoma tumor cell death in vitro and in vivo. Thus, RB1 inactivation may allow preneoplastic cells to acquire multiple hallmarks of cancer through epigenetic mechanisms, resulting directly or indirectly from RB1 loss. These data provide novel targets for chemotherapeutic interventions of retinoblastoma.
Rhabdomyosarcoma is a soft-tissue sarcoma with molecular and cellular features of developing skeletal muscle. Rhabdomyosarcoma has two major histological subtypes, embryonal and alveolar, each with distinct clinical, molecular, and genetic features. Genomic analysis show that embryonal tumors have more structural and copy number variations than alveolar tumors. Mutations in the RAS/NF1 pathway are significantly associated with intermediate- and high-risk embryonal rhabdomyosarcomas (ERMS). In contrast, alveolar rhabdomyosarcoma (ARMS) have fewer genetic lesions overall and no known recurrently mutated cancer consensus genes. To identify therapeutics for ERMS, we developed and characterized orthotopic xenografts of tumors that were sequenced in our study. High throughput screening of primary cultures derived from those xenografts identified oxidative stress as a pathway of therapeutic relevance for ERMS.
The complex chromosomal aberrations found in therapy related acute myeloid leukemia (t-AML) suggest that the DNA double strand break (DSB) response may be altered. In this study we examined the DNA DSB response of primary bone marrow cells from t-AML patients and performed next-generation sequencing of 37 canonical homologous recombination (HR) and non-homologous end-joining (NHEJ) DNA repair genes, and a subset of DNA damage response genes using tumor and paired normal DNA obtained from t-AML patients. Our results suggest that the majority of t-AML patients (11 of 15) have tumor cell-intrinsic, functional dysregulation of their DSB response. Distinct patterns of abnormal DNA damage response in myeloblasts correlated with acquired genetic alterations in TP53 and the presence of inferred chromothripsis. Furthermore, the presence of trisomy 8 in tumor cells was associated with persistently elevated levels of DSBs. Although tumor-acquired point mutations or small indels in canonical HR and NHEJ genes do not appear to be a dominant means by which t-AML leukemogenesis occurs, our functional studies suggest that an abnormal response to DNA damage is a common finding in t-AML.
therapy-related AML; DNA damage; DNA repair; Trisomy 8
To reveal the clonal architecture of melanoma and associated driver mutations, whole genome sequencing (WGS) and targeted extension sequencing were used to characterize 124 melanoma cases. Significantly mutated gene analysis using 13 WGS cases and 15 additional paired extension cases identified known melanoma genes such as BRAF, NRAS, and CDKN2A, as well as a novel gene EPHA3, previously implicated in other cancer types. Extension studies using tumors from another 96 patients discovered a large number of truncation mutations in tumor suppressors (TP53 and RB1), protein phosphatases (e.g., PTEN, PTPRB, PTPRD, and PTPRT), as well as chromatin remodeling genes (e.g., ASXL3, MLL2, and ARID2). Deep sequencing of mutations revealed subclones in the majority of metastatic tumors from 13 WGS cases. Validated mutations from 12 out of 13 WGS patients exhibited a predominant UV signature characterized by a high frequency of C->T transitions occurring at the 3′ base of dipyrimidine sequences while one patient (MEL9) with a hypermutator phenotype lacked this signature. Strikingly, a subclonal mutation signature analysis revealed that the founding clone in MEL9 exhibited UV signature but the secondary clone did not, suggesting different mutational mechanisms for two clonal populations from the same tumor. Further analysis of four metastases from different geographic locations in 2 melanoma cases revealed phylogenetic relationships and highlighted the genetic alterations responsible for differential drug resistance among metastatic tumors. Our study suggests that clonal evaluation is crucial for understanding tumor etiology and drug resistance in melanoma.
Pediatric high-grade glioma (HGG) is a devastating disease with a two-year survival of less than 20%1. We analyzed 127 pediatric HGGs, including diffuse intrinsic pontine gliomas (DIPGs) and non-brainstem HGGs (NBS-HGGs) by whole genome, whole exome, and/or transcriptome sequencing. We identified recurrent somatic mutations in ACVR1 exclusively in DIPG (32%), in addition to the previously reported frequent somatic mutations in histone H3, TP53 and ATRX in both DIPG and NBS-HGGs2-5. Structural variants generating fusion genes were found in 47% of DIPGs and NBS-HGGs, with recurrent fusions involving the neurotrophin receptor genes NTRK1, 2, or 3 in 40% of NBS-HGGs in infants. Mutations targeting receptor tyrosine kinase/RAS/PI3K signaling, histone modification or chromatin remodeling, and cell cycle regulation were found in 68%, 73% and 59%, respectively, of pediatric HGGs, including DIPGs and NBS-HGGs. This comprehensive analysis provides insights into the unique and shared pathways driving pediatric HGG within and outside the brainstem.
The human X and Y chromosomes evolved from an ordinary pair of autosomes, but
millions of years ago genetic decay ravaged the Y chromosome, and only three percent of
its ancestral genes survived. We reconstructed the evolution of the Y chromosome across
eight mammals to identify biases in gene content and the selective pressures that
preserved the surviving ancestral genes. Our findings indicate that survival was
non-random, and in two cases, convergent across placental and marsupial mammals. We
conclude that the Y chromosome's gene content became specialized through selection
to maintain the ancestral dosage of homologous X-Y gene pairs that function as broadly
expressed regulators of transcription, translation and protein stability. We propose that
beyond its roles in testis determination and spermatogenesis, the Y chromosome is
essential for male viability, and plays unappreciated roles in Turner syndrome and in
phenotypic differences between the sexes in health and disease.
Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens.
We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation.
This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0466-3) contains supplementary material, which is available to authorized users.
The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing.
To apply whole-genome sequencing to a patient with suspected cancer susceptibility (and lacking a clear family history of cancer and no BRCA1 and BRCA2 mutations) to identify rare or novel germline variants in cancer susceptibility genes.
Design, Setting, and Participant
Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer and therapy-related acute myeloid leukemia (t-AML), and analyzed with: 1) whole genome sequencing using paired end reads; 2) SNP genotyping; 3) RNA expression profiling; and 4) spectral karyotyping.
Main Outcome Measures
Structural variants, copy number alterations, single nucleotide variants and small insertions and deletions (indels) were detected and validated using the above platforms.
Whole genome sequencing revealed a novel, heterozygous 3 Kb deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient’s leukemia genome.
Whole genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35–250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.
massively parallel sequencing; next generation sequencing; human genome; variant detection; short read alignment; whole genome sequencing
Whole genome sequencing (WGS) is becoming increasingly available for research purposes, but it has not yet been routinely used for clinical diagnosis.
To determine whether whole genome sequencing can identify cryptic, actionable mutations in a clinically relevant time frame.
Design, Setting, and Patient
We were referred a difficult diagnostic case of acute promyelocytic leukemia with no pathogenic X-RARA fusion identified by routine metaphase cytogenetics or interphase FISH. The patient was enrolled in an IRB approved protocol, with consent specifically tailored to the implications of whole genome sequencing. The protocol employs a ‘movable firewall,’ which maintains patient anonymity within the entire research team, but allows the research team to communicate medically relevant information to the treating physician.
Main Outcome Measure
Clinical relevance of whole genome sequencing and time to communicate validated results to the treating physician.
Massively parallel paired-end sequencing allowed us to identify a cytogenetically cryptic event: 77 kilobases from chromosome 15 was inserted en bloc into the second intron of the RARA gene on chromosome 17, resulting in a classic bcr3 PML-RARA fusion gene. RT-PCR subsequently validated the expression of the fusion transcript. Novel FISH probes identified two additional cases of t(15;17)-negative acute promyelocytic leukemia that had cytogenetically invisible insertions. Whole genome sequencing and validation were completed in seven weeks, and changed the treatment plan for the patient.
Whole genome sequencing can identify cytogenetically invisible oncogenes in a clinically relevant timeframe.
The nuclear factor-κB (NF-κB) family of transcriptional regulators are central mediators of the cellular inflammatory response. Although constitutive NF-κB signaling is present in most human tumours, mutations in pathway members are rare, complicating efforts to understand and block aberrant NF-κB activity in cancer. Here, we show that more than two thirds of supratentorial ependymomas contain oncogenic fusions between RELA, the principal effector of canonical NF-κB signalling, and an uncharacterized gene, C11orf95. In each case, C11orf95-RELA fusions resulted from chromothripsis involving chromosome 11q13.1. C11orf95-RELA fusion proteins translocated spontaneously to the nucleus to activate NF-κB target genes, and rapidly transformed neural stem cells—the cell of origin of ependymoma—to form these tumours in mice. Our data identify the first highly recurrent genetic alteration of RELA in human cancer, and the C11orf95-RELA fusion protein as a potential therapeutic target in supratentorial ependymoma.
Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML). It is characterized by the t(15;17)(q22;q11.2) chromosomal translocation that creates the promyelocytic leukemia–retinoic acid receptor α (PML-RARA) fusion oncogene. Although this fusion oncogene is known to initiate APL in mice, other cooperating mutations, as yet ill defined, are important for disease pathogenesis. To identify these, we used a mouse model of APL, whereby PML-RARA expressed in myeloid cells leads to a myeloproliferative disease that ultimately evolves into APL. Sequencing of a mouse APL genome revealed 3 somatic, nonsynonymous mutations relevant to APL pathogenesis, of which 1 (Jak1 V657F) was found to be recurrent in other affected mice. This mutation was identical to the JAK1 V658F mutation previously found in human APL and acute lymphoblastic leukemia samples. Further analysis showed that JAK1 V658F cooperated in vivo with PML-RARA, causing a rapidly fatal leukemia in mice. We also discovered a somatic 150-kb deletion involving the lysine (K)-specific demethylase 6A (Kdm6a, also known as Utx) gene, in the mouse APL genome. Similar deletions were observed in 3 out of 14 additional mouse APL samples and 1 out of 150 human AML samples. In conclusion, whole genome sequencing of mouse cancer genomes can provide an unbiased and comprehensive approach for discovering functionally relevant mutations that are also present in human leukemias.
The sensitivity of massively-parallel sequencing has confirmed that most cancers are oligoclonal, with subpopulations of neoplastic cells harboring distinct mutations. A fine resolution view of this clonal architecture provides insight into tumor heterogeneity, evolution, and treatment response, all of which may have clinical implications. Single tumor analysis already contributes to understanding these phenomena. However, cryptic subclones are frequently revealed by additional patient samples (e.g., collected at relapse or following treatment), indicating that accurately characterizing a tumor requires analyzing multiple samples from the same patient. To address this need, we present SciClone, a computational method that identifies the number and genetic composition of subclones by analyzing the variant allele frequencies of somatic mutations. We use it to detect subclones in acute myeloid leukemia and breast cancer samples that, though present at disease onset, are not evident from a single primary tumor sample. By doing so, we can track tumor evolution and identify the spatial origins of cells resisting therapy.
Sequencing the genomic DNA of cancers has revealed that tumors are not homogeneous. As a tumor grows, new mutations accumulate in individual cells, and as these cells replicate, the mutations are passed on to their offspring, which comprise only a portion of the tumor when it is sampled. We present a method for identifying the fraction of cells containing specific mutations, clustering them into subclonal populations, and tracking the changes in these subclones. This allows us to follow the clonal evolution of cancers as they respond to chemotherapy or develop therapy resistance, processes which may radically alter the subclonal composition of a tumor. It also gives us insight into the spatial organization of tumors, and we show that multiple biopsies from a single breast cancer may harbor different subclones that respond differently to treatment. Finally, we show that sequencing multiple samples from a patient's tumor is often critical, as it reveals cryptic subclones that cannot be discerned from only one sample. This is the first tool that can efficiently leverage multiple samples to identify these as distinct subpopulations of cells, thus contributing to understanding the biology of the tumor and influencing clinical decisions about therapy.
Here we sequence 633 genes, encoding the majority of known epigenetic regulatory proteins, in over 1000 pediatric tumors to define the landscape of somatic mutations in epigenetic regulators in pediatric cancer. Our results demonstrate a marked variation in the frequency of gene mutations across 21 different pediatric cancer subtypes, with the highest frequency of mutations detected in high-grade gliomas, T-lineage acute lymphoblastic leukemia, medulloblastoma, and a paucity of mutations in low-grade glioma, and retinoblastoma. The most frequently mutated genes are H3F3A, PHF6, ATRX, KDM6A, SMARCA4, ASXL2, CREBBP, EZH2, MLL2, USP7, ASXL1, NSD2, SETD2, SMC1A, and ZMYM3. Importantly, we identify novel loss-of-function mutations in the ubiquitin-specific-processing protease 7 (USP7) in pediatric leukemia, which result in a decrease in deubiquitination activity. Collectively, our results help to define the landscape of mutations in epigenetic regulatory genes in pediatric cancer and yield a valuable new database for investigating the role of epigenetic dysregulations in cancer.
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways.
Massively parallel DNA sequencing technologies provide an unprecedented ability to screen entire genomes for genetic changes associated with tumor progression. Here we describe the genomic analyses of four DNA samples from an African-American patient with basal-like breast cancer: peripheral blood, the primary tumor, a brain metastasis, and a xenograft derived from the primary tumor. The metastasis contained two de novo mutations and a large deletion not present in the primary tumor, and was significantly enriched for 20 shared mutations. The xenograft retained all primary tumor mutations, and displayed a mutation enrichment pattern that paralleled the metastasis (16 of 20 genes). Two overlapping large deletions, encompassing CTNNA1, were present in all three tumor samples. The differential mutation frequencies and structural variation patterns in metastasis and xenograft compared to the primary tumor suggest that secondary tumors may arise from a minority of cells within the primary.
Osteosarcoma is a neoplasm of mesenchymal origin with features of osteogenic differentiation. Patients with recurrent or metastatic disease have a very poor prognosis. To define the landscape of somatic mutations in pediatric osteosarcoma, we performed whole-genome sequencing of DNA from 20 osteosarcoma tumor samples and matched normal tissue (obtained from 19 patients) in the discovery cohort as well as 14 samples from 13 patients in the validation cohort. Our results demonstrate that pediatric osteosarcoma is characterized by multiple somatic chromosomal lesions, including structural variations (SVs) and copy number alterations (CNAs). Moreover, single nucleotide variations (SNVs) exhibit a pattern of localized hypermutation called “kataegis” in 50% of the tumors. Despite these regions of kataegis across the osteosarcoma genomes, we detected relatively few recurrent SNVs, and only when SVs were included did we identify the major pathways that are mutated in osteosarcoma. We identified p53 pathway lesions in all 19 patient’s tumors in the discovery cohort, 9 of which were translocations in the first intron of the TP53 gene, leading to gene inactivation. This mechanism of p53 gene inactivation is unique to osteosarcoma among pediatric cancers. In an additional cohort of 32 patients, TP53 gene alterations were identified in 29 of those tumors. Beyond TP53, the RB1, ATRX and DLG2 genes showed recurrent somatic alterations (SNVs and/or SVs) in 29–53% of the tumors. These data highlight the power of whole-genome sequencing in identifying recurrent somatic alterations in cancer genomes that may be missed using other methods.
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions—the population frequency of individual clones, their genetic composition, and their evolutionary relationships—which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Human cancers are genetically diverse populations of cells that evolve over the course of their natural history or in response to the selective pressure of therapy. In theory, it is possible to infer how this variation is structured into related populations of cells based on the frequency of individual mutations in bulk samples, but the accuracy of these models has not been evaluated across a large number of variants in individual cells. Here, we report a strategy for analyzing hundreds of variants within a single cell, and we apply this method to assess models of tumor clonality derived from bulk samples in three cases of leukemia. The data largely support the predicted population structure, though they suggest specific refinements. This type of approach not only illustrates the biological complexity of human cancer, but it also has the potential to inform patient management. That is, precise knowledge of which variants are present in which populations of cells may allow physicians to more effectively target combinations of mutations and predict how patients will respond to therapy.
The emergence of jawed vertebrates (gnathostomes) from jawless vertebrates was accompanied by major morphological and physiological innovations, such as hinged jaws, paired fins and immunoglobulin-based adaptive immunity. Gnathostomes subsequently diverged into two groups, the cartilaginous fishes and the bony vertebrates. Here we report the whole-genome analysis of a cartilaginous fish, the elephant shark (Callorhinchus milii). We find that the C. milii genome is the slowest evolving of all known vertebrates, including the ‘living fossil’ coelacanth, and features extensive synteny conservation with tetrapod genomes, making it a good model for comparative analyses of gnathostome genomes. Our functional studies suggest that the lack of genes encoding secreted calcium-binding phosphoproteins in cartilaginous fishes explains the absence of bone in their endoskeleton. Furthermore, the adaptive immune system of cartilaginous fishes is unusual: it lacks the canonical CD4 co-receptor and most transcription factors, cytokines and cytokine receptors related to the CD4 lineage, despite the presence of polymorphic major histocompatibility complex class II molecules. It thus presents a new model for understanding the origin of adaptive immunity.
Knowledge of individual ancestry is important for genetic association studies where population structure leads to false positive signals. Estimating individual ancestry with targeted sequence data, which constitutes the bulk of current sequence datasets, is challenging. Here, we propose a new method for accurate estimation of genetic ancestry. Our method skips genotype calling and directly analyzes sequence reads. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry with whole genome shotgun coverage as low as 0.001X. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1X. At an even finer-scale, the method improves discrimination between exome-sequenced participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and reduce the risk of spurious findings due to population structure.
The genome sequence of Acetobacter aceti 1023, an acetic acid bacterium adapted to traditional vinegar fermentation, comprises 3.0 Mb (chromosome plus plasmids). A. aceti 1023 is closely related to the cocoa fermenter Acetobacter pasteurianus 386B but possesses many additional insertion sequence elements.