PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-7 (7)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Genome Sequencing of Pediatric Medulloblastoma Links Catastrophic DNA Rearrangements with TP53 Mutations 
Cell  2012;148(1-2):59-71.
SUMMARY
Genomic rearrangements are thought to occur progressively during tumor development. Recent findings, however, suggest an alternative mechanism, involving massive chromosome rearrangements in a one-step catastrophic event termed chromothripsis. We report the whole-genome sequencing-based analysis of a Sonic-Hedgehog medulloblastoma (SHH-MB) brain tumor from a patient with a germline TP53 mutation (Li-Fraumeni syndrome), uncovering massive, complex chromosome rearrangements. Integrating TP53 status with microarray and deep sequencing-based DNA rearrangement data in additional patients reveals a striking association between TP53 mutation and chromothripsis in SHH-MBs. Analysis of additional tumor entities substantiates a link between TP53 mutation and chromothripsis, and indicates a context-specific role for p53 in catastrophic DNA rearrangements. Among these, we observed a strong association between somatic TP53 mutations and chromothripsis in acute myeloid leukemia. These findings connect p53 status and chromothripsis in specific tumor types, providing a genetic basis for understanding particularly aggressive subtypes of cancer.
doi:10.1016/j.cell.2011.12.013
PMCID: PMC3332216  PMID: 22265402
3.  DELLY: structural variant discovery by integrated paired-end and split-read analysis 
Bioinformatics  2012;28(18):i333-i339.
Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.
Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.
Availability: DELLY is available at www.korbel.embl.de/software.html
Contact: tobias.rausch@embl.de
doi:10.1093/bioinformatics/bts378
PMCID: PMC3436805  PMID: 22962449
4.  Mapping copy number variation by population scale genome sequencing 
Nature  2011;470(7332):59-65.
Summary
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
doi:10.1038/nature09708
PMCID: PMC3077050  PMID: 21293372
5.  A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads 
Bioinformatics  2009;25(9):1118-1124.
Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing.
Results: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools.
Availability: The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.
Contact: rausch@inf.fu-berlin.de
doi:10.1093/bioinformatics/btp131
PMCID: PMC2732307  PMID: 19269990
6.  A Parallel Genetic Algorithm to Discover Patterns in Genetic Markers that Indicate Predisposition to Multifactorial Disease 
Computers in biology and medicine  2008;38(7):826-836.
This paper describes a novel algorithm to analyze genetic linkage data using pattern recognition techniques and genetic algorithms (GA). The method allows a search for regions of the chromosome that may contain genetic variations that jointly predispose individuals for a particular disease. The method uses correlation analysis, filtering theory and genetic algorithms (GA) to achieve this goal. Because current genome scans use from hundreds to hundreds of thousands of markers, two versions of the method have been implemented. The first is an exhaustive analysis version that can be used to visualize, explore, and analyze small genetic data sets for two marker correlations; the second is a GA version, which uses a parallel implementation allowing searches of higher-order correlations in large data sets. Results on simulated data sets indicate that the method can be informative in the identification of major disease loci and gene-gene interactions in genome-wide linkage data and that further exploration of these techniques is justified. The results presented for both variants of the method show that it can help genetic epidemiologists to identify promising combinations of genetic factors that might predispose to complex disorders. In particular, the correlation analysis of IBD expression patterns might hint to possible gene-gene interactions and the filtering might be a fruitful approach to distinguish true correlation signals from noise.
doi:10.1016/j.compbiomed.2008.04.011
PMCID: PMC2532987  PMID: 18547558
Gene-Gene Interactions; Multifactorial Diseases; Pattern Recognition; Data Mining; Correlation Analysis; Parallel Genetic Algorithm
7.  SeqAn An efficient, generic C++ library for sequence analysis 
BMC Bioinformatics  2008;9:11.
Background
The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.
Results
To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.
Conclusion
We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.
doi:10.1186/1471-2105-9-11
PMCID: PMC2246154  PMID: 18184432

Results 1-7 (7)