1.  An integrated map of structural variation in 2,504 human genomes 
Sudmant, Peter H. | Rausch, Tobias | Gardner, Eugene J. | Handsaker, Robert E. | Abyzov, Alexej | Huddleston, John | Zhang, Yan | Ye, Kai | Jun, Goo | Fritz, Markus Hsi-Yang | Konkel, Miriam K. | Malhotra, Ankit | Stütz, Adrian M. | Shi, Xinghua | Casale, Francesco Paolo | Chen, Jieming | Hormozdiari, Fereydoun | Dayama, Gargi | Chen, Ken | Malig, Maika | Chaisson, Mark J.P. | Walter, Klaudia | Meiers, Sascha | Kashin, Seva | Garrison, Erik | Auton, Adam | Lam, Hugo Y. K. | Mu, Xinmeng Jasmine | Alkan, Can | Antaki, Danny | Bae, Taejeong | Cerveira, Eliza | Chines, Peter | Chong, Zechen | Clarke, Laura | Dal, Elif | Ding, Li | Emery, Sarah | Fan, Xian | Gujral, Madhusudan | Kahveci, Fatma | Kidd, Jeffrey M. | Kong, Yu | Lameijer, Eric-Wubbo | McCarthy, Shane | Flicek, Paul | Gibbs, Richard A. | Marth, Gabor | Mason, Christopher E. | Menelaou, Androniki | Muzny, Donna M. | Nelson, Bradley J. | Noor, Amina | Parrish, Nicholas F. | Pendleton, Matthew | Quitadamo, Andrew | Raeder, Benjamin | Schadt, Eric E. | Romanovitch, Mallory | Schlattl, Andreas | Sebra, Robert | Shabalin, Andrey A. | Untergasser, Andreas | Walker, Jerilyn A. | Wang, Min | Yu, Fuli | Zhang, Chengsheng | Zhang, Jing | Zheng-Bradley, Xiangqun | Zhou, Wanding | Zichner, Thomas | Sebat, Jonathan | Batzer, Mark A. | McCarroll, Steven A. | Mills, Ryan E. | Gerstein, Mark B. | Bashir, Ali | Stegle, Oliver | Devine, Scott E. | Lee, Charles | Eichler, Evan E. | Korbel, Jan O.
Nature  2015;526(7571):75-81.
Structural variants (SVs) are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight SV classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype-blocks in 26 human populations. Analyzing this set, we identify numerous gene-intersecting SVs exhibiting population stratification and describe naturally occurring homozygous gene knockouts suggesting the dispensability of a variety of human genes. We demonstrate that SVs are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of SV complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex SVs with multiple breakpoints likely formed through individual mutational events. Our catalog will enhance future studies into SV demography, functional impact and disease association.
PMCID: PMC4617611  PMID: 26432246
2.  Assembly and diploid architecture of an individual human genome via single-molecule technologies 
Nature methods  2015;12(8):780-786.
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
PMCID: PMC4646949  PMID: 26121404
3.  A novel autosomal recessive TERT T1129P mutation in a dyskeratosis congenita family leads to cellular senescence and loss of CD34+ hematopoietic stem cells not reversible by mTOR-inhibition 
Aging (Albany NY)  2015;7(11):911-927.
The TERT gene encodes for the reverse transcriptase activity of the telomerase complex and mutations in TERT can lead to dysfunctional telomerase activity resulting in diseases such as dyskeratosis congenita (DKC). Here, we describe a novel TERT mutation at position T1129P leading to DKC with progressive bone marrow (BM) failure in homozygous members of a consanguineous family. BM hematopoietic stem cells (HSCs) of an affected family member were 300-fold reduced associated with a significantly impaired colony forming capacity in vitro and impaired repopulation activity in mouse xenografts. Recent data in yeast suggested improved cellular checkpoint controls by mTOR inhibition preventing cells with short telomeres or DNA damage from dividing. To evaluate a potential therapeutic option for the patient, we treated her primary skin fibroblasts and BM HSCs with the mTOR inhibitor rapamycin. This led to prolonged survival and decreased levels of senescence in T1129P mutant fibroblasts. In contrast, the impaired HSC function could not be improved by mTOR inhibition, as colony forming capacity and multilineage engraftment potential in xenotransplanted mice remained severely impaired. Thus, rapamycin treatment did not rescue the compromised stem cell function of TERTT1129P mutant patient HSCs and outlines limitations of a potential DKC therapy based on rapamycin.
PMCID: PMC4694062  PMID: 26546739
TERT; TERC; mTOR; rapamycin; sirolimus; senescence
4.  Genome Sequencing of SHH Medulloblastoma Predicts Genotype-Related Response to Smoothened Inhibition 
Kool, Marcel | Jones, David T.W. | Jäger, Natalie | Northcott, Paul A. | Pugh, Trevor J. | Hovestadt, Volker | Piro, Rosario M. | Esparza, L. Adriana | Markant, Shirley L. | Remke, Marc | Milde, Till | Bourdeaut, Franck | Ryzhova, Marina | Sturm, Dominik | Pfaff, Elke | Stark, Sebastian | Hutter, Sonja | Şeker-Cin, Huriye | Johann, Pascal | Bender, Sebastian | Schmidt, Christin | Rausch, Tobias | Shih, David | Reimand, Jüri | Sieber, Laura | Wittmann, Andrea | Linke, Linda | Witt, Hendrik | Weber, Ursula D. | Zapatka, Marc | König, Rainer | Beroukhim, Rameen | Bergthold, Guillaume | van Sluis, Peter | Volckmann, Richard | Koster, Jan | Versteeg, Rogier | Schmidt, Sabine | Wolf, Stephan | Lawerenz, Chris | Bartholomae, Cynthia C. | von Kalle, Christof | Unterberg, Andreas | Herold-Mende, Christel | Hofer, Silvia | Kulozik, Andreas E. | von Deimling, Andreas | Scheurlen, Wolfram | Felsberg, Jörg | Reifenberger, Guido | Hasselblatt, Martin | Crawford, John R. | Grant, Gerald A. | Jabado, Nada | Perry, Arie | Cowdrey, Cynthia | Croul, Sydney | Zadeh, Gelareh | Korbel, Jan O. | Doz, Francois | Delattre, Olivier | Bader, Gary D. | McCabe, Martin G. | Collins, V. Peter | Kieran, Mark W. | Cho, Yoon-Jae | Pomeroy, Scott L. | Witt, Olaf | Brors, Benedikt | Taylor, Michael D. | Schüller, Ulrich | Korshunov, Andrey | Eils, Roland | Wechsler-Reya, Robert J. | Lichter, Peter | Pfister, Stefan M.
Cancer cell  2014;25(3):393-405.
Smoothened (SMO) inhibitors recently entered clinical trials for sonic-hedgehog-driven medulloblastoma (SHH-MB). Clinical response is highly variable. To understand the mechanism(s) of primary resistance and identify pathways cooperating with aberrant SHH signaling, we sequenced and profiled a large cohort of SHH-MBs (n = 133). SHH pathway mutations involved PTCH1 (across all age groups), SUFU (infants, including germline), and SMO (adults). Children >3 years old harbored an excess of downstream MYCN and GLI2 amplifications and frequent TP53 mutations, often in the germline, all of which were rare in infants and adults. Functional assays in different SHH-MB xenograft models demonstrated that SHH-MBs harboring a PTCH1 mutation were responsive to SMO inhibition, whereas tumors harboring an SUFU mutation or MYCN amplification were primarily resistant.
PMCID: PMC4493053  PMID: 24651015
5.  Identification of cytokine-induced modulation of microRNA expression and secretion as measured by a novel microRNA specific qPCR assay 
Scientific Reports  2015;5:11590.
microRNAs are an abundant class of small non-coding RNAs that control gene expression post-transcriptionally. Importantly, microRNA activity participates in the regulation of cellular processes and is a potentially valuable source of biomarkers in the diagnosis and prognosis of human diseases. Here we introduce miQPCR, an innovative method to quantify microRNAs expression by using Real-Time PCR. miQPCR exploits T4 RNA ligase activities to extend uniformly microRNAs’ 3′-ends by addition of a linker-adapter. The adapter is then used as ‘anchor’ to prime cDNA synthesis and throughout qPCR to amplify specifically target amplicons. miQPCR is an open, adaptable and cost-effective procedure, which offers the following advantages; i) universal elongation and reverse transcription of all microRNAs; ii) Tm-adjustment of microRNA-specific primers; iii) high sensitivity and specificity in discriminating among closely related sequences and; iv) suitable for the analysis of cellular and cell-free circulating microRNAs. Analysis of cellular and cell-free circulating microRNAs secreted by rat primary hepatocytes stimulated with cytokines and growth factors identifies for the first time a widespread modulation of both microRNAs expression and secretion. Altogether, our findings suggest that the pleiotropic activity of humoral factors on microRNAs may extensively affect liver function in response to injury and regeneration.
PMCID: PMC4480321  PMID: 26108880
6.  Identification of novel sequence variations in microRNAs in chronic lymphocytic leukemia 
Carcinogenesis  2013;35(5):992-1002.
We have analyzed the miRNA sequence variations in patients with CLL and the effect of these variations on their secondary structure and expression.
MicroRNA (miRNA) expression is deregulated in many tumors including chronic lymphocytic leukemia (CLL). Although the particular mechanism(s) responsible for their aberrant expression is not well characterized, the presence of mutations and single-nucleotide polymorphisms (SNPs) in miRNA genes, possibly affecting their secondary structure and expression, has been described. In CLL; however, the impact and frequency of such variations have yet to be elucidated. Using a custom resequencing microarray, we screened sequence variations in 109 cancer-related pre-miRNAs in 98 CLL patients. Additionally, the primary regions of miR-29b-2/29c and miR-16-1 were analyzed by Sanger sequencing in another cohort of 213 and 193 CLL patients, respectively. Altogether, we describe six novel miR-sequence variations and the presence of SNPs (n = 27), most of which changed the miR-secondary structure. Moreover, some of the identified SNPs have a significantly different frequency in CLL when compared with a control population. Additionally, we identified a novel variation in miR-16-1 that had not been described previously in CLL patients. We show that this variation affects the expression of mature miR-16-1. We also show that the expression of another miRNA with pathogenetic relevance for CLL, namely miR-29b-2, is influenced by the presence of a polymorphic insertion, which is more frequent in CLL than in a control population. Altogether, these data suggest that sequence variations may occur during CLL development and/or progression.
PMCID: PMC4004199  PMID: 24306027
7.  ICGC PedBrain: Dissecting the genomic complexity underlying medulloblastoma 
Jones, David TW | Jäger, Natalie | Kool, Marcel | Zichner, Thomas | Hutter, Barbara | Sultan, Marc | Cho, Yoon-Jae | Pugh, Trevor J | Hovestadt, Volker | Stütz, Adrian M | Rausch, Tobias | Warnatz, Hans-Jörg | Ryzhova, Marina | Bender, Sebastian | Sturm, Dominik | Pleier, Sabrina | Cin, Huriye | Pfaff, Elke | Sieber, Laura | Wittmann, Andrea | Remke, Marc | Witt, Hendrik | Hutter, Sonja | Tzaridis, Theophilos | Weischenfeldt, Joachim | Raeder, Benjamin | Avci, Meryem | Amstislavskiy, Vyacheslav | Zapatka, Marc | Weber, Ursula D | Wang, Qi | Lasitschka, Bärbel | Bartholomae, Cynthia C | Schmidt, Manfred | von Kalle, Christof | Ast, Volker | Lawerenz, Chris | Eils, Jürgen | Kabbe, Rolf | Benes, Vladimir | van Sluis, Peter | Koster, Jan | Volckmann, Richard | Shih, David | Betts, Matthew J | Russell, Robert B | Coco, Simona | Tonini, Gian Paolo | Schüller, Ulrich | Hans, Volkmar | Graf, Norbert | Kim, Yoo-Jin | Monoranu, Camelia | Roggendorf, Wolfgang | Unterberg, Andreas | Herold-Mende, Christel | Milde, Till | Kulozik, Andreas E | von Deimling, Andreas | Witt, Olaf | Maass, Eberhard | Rössler, Jochen | Ebinger, Martin | Schuhmann, Martin U | Frühwald, Michael C | Hasselblatt, Martin | Jabado, Nada | Rutkowski, Stefan | von Bueren, André O | Williamson, Dan | Clifford, Steven C | McCabe, Martin G | Collins, V. Peter | Wolf, Stephan | Wiemann, Stefan | Lehrach, Hans | Brors, Benedikt | Scheurlen, Wolfram | Felsberg, Jörg | Reifenberger, Guido | Northcott, Paul A | Taylor, Michael D | Meyerson, Matthew | Pomeroy, Scott L | Yaspo, Marie-Laure | Korbel, Jan O | Korshunov, Andrey | Eils, Roland | Pfister, Stefan M | Lichter, Peter
Nature  2012;488(7409):100-105.
Medulloblastoma is an aggressively-growing tumour, arising in the cerebellum or medulla/brain stem. It is the most common malignant brain tumour in children, and displays tremendous biological and clinical heterogeneity1. Despite recent treatment advances, approximately 40% of children experience tumour recurrence, and 30% will die from their disease. Those who survive often have a significantly reduced quality of life.
Four tumour subgroups with distinct clinical, biological and genetic profiles are currently discriminated2,3. WNT tumours, displaying activated wingless pathway signalling, carry a favourable prognosis under current treatment regimens4. SHH tumours show hedgehog pathway activation, and have an intermediate prognosis2. Group 3 & 4 tumours are molecularly less well-characterised, and also present the greatest clinical challenges2,3,5. The full repertoire of genetic events driving this distinction, however, remains unclear.
Here we describe an integrative deep-sequencing analysis of 125 tumour-normal pairs. Tetraploidy was identified as a frequent early event in Group 3 & 4 tumours, and a positive correlation between patient age and mutation rate was observed. Several recurrent mutations were identified, both in known medulloblastoma-related genes (CTNNB1, PTCH1, MLL2, SMARCA4) and in genes not previously linked to this tumour (DDX3X, CTDNEP1, KDM6A, TBR1), often in subgroup-specific patterns. RNA-sequencing confirmed these alterations, and revealed the expression of the first medulloblastoma fusion genes. Chromatin modifiers were frequently altered across all subgroups.
These findings enhance our understanding of the genomic complexity and heterogeneity underlying medulloblastoma, and provide several potential targets for new therapeutics, especially for Group 3 & 4 patients.
PMCID: PMC3662966  PMID: 22832583
8.  The Genomic and Transcriptomic Landscape of a HeLa Cell Line 
G3: Genes|Genomes|Genetics  2013;3(8):1213-1224.
HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.
PMCID: PMC3737162  PMID: 23550136
genomics; transcriptomics; HeLa cell line; resource; variation
9.  Genome Sequencing of Pediatric Medulloblastoma Links Catastrophic DNA Rearrangements with TP53 Mutations 
Cell  2012;148(1-2):59-71.
Genomic rearrangements are thought to occur progressively during tumor development. Recent findings, however, suggest an alternative mechanism, involving massive chromosome rearrangements in a one-step catastrophic event termed chromothripsis. We report the whole-genome sequencing-based analysis of a Sonic-Hedgehog medulloblastoma (SHH-MB) brain tumor from a patient with a germline TP53 mutation (Li-Fraumeni syndrome), uncovering massive, complex chromosome rearrangements. Integrating TP53 status with microarray and deep sequencing-based DNA rearrangement data in additional patients reveals a striking association between TP53 mutation and chromothripsis in SHH-MBs. Analysis of additional tumor entities substantiates a link between TP53 mutation and chromothripsis, and indicates a context-specific role for p53 in catastrophic DNA rearrangements. Among these, we observed a strong association between somatic TP53 mutations and chromothripsis in acute myeloid leukemia. These findings connect p53 status and chromothripsis in specific tumor types, providing a genetic basis for understanding particularly aggressive subtypes of cancer.
PMCID: PMC3332216  PMID: 22265402
11.  DELLY: structural variant discovery by integrated paired-end and split-read analysis 
Bioinformatics  2012;28(18):i333-i339.
Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.
Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.
Availability: DELLY is available at
PMCID: PMC3436805  PMID: 22962449
12.  Mapping copy number variation by population scale genome sequencing 
Nature  2011;470(7332):59-65.
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
PMCID: PMC3077050  PMID: 21293372
13.  A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads 
Bioinformatics  2009;25(9):1118-1124.
Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing.
Results: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools.
Availability: The consensus program can be downloaded from It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.
PMCID: PMC2732307  PMID: 19269990
14.  A Parallel Genetic Algorithm to Discover Patterns in Genetic Markers that Indicate Predisposition to Multifactorial Disease 
Computers in biology and medicine  2008;38(7):826-836.
This paper describes a novel algorithm to analyze genetic linkage data using pattern recognition techniques and genetic algorithms (GA). The method allows a search for regions of the chromosome that may contain genetic variations that jointly predispose individuals for a particular disease. The method uses correlation analysis, filtering theory and genetic algorithms (GA) to achieve this goal. Because current genome scans use from hundreds to hundreds of thousands of markers, two versions of the method have been implemented. The first is an exhaustive analysis version that can be used to visualize, explore, and analyze small genetic data sets for two marker correlations; the second is a GA version, which uses a parallel implementation allowing searches of higher-order correlations in large data sets. Results on simulated data sets indicate that the method can be informative in the identification of major disease loci and gene-gene interactions in genome-wide linkage data and that further exploration of these techniques is justified. The results presented for both variants of the method show that it can help genetic epidemiologists to identify promising combinations of genetic factors that might predispose to complex disorders. In particular, the correlation analysis of IBD expression patterns might hint to possible gene-gene interactions and the filtering might be a fruitful approach to distinguish true correlation signals from noise.
PMCID: PMC2532987  PMID: 18547558
Gene-Gene Interactions; Multifactorial Diseases; Pattern Recognition; Data Mining; Correlation Analysis; Parallel Genetic Algorithm
15.  SeqAn An efficient, generic C++ library for sequence analysis 
BMC Bioinformatics  2008;9:11.
The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.
To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.
We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.
PMCID: PMC2246154  PMID: 18184432

