PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-6 (6)
 

Clipboard (0)
None
Journals
Year of Publication
Document Types
1.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge 
Brownstein, Catherine A | Beggs, Alan H | Homer, Nils | Merriman, Barry | Yu, Timothy W | Flannery, Katherine C | DeChene, Elizabeth T | Towne, Meghan C | Savage, Sarah K | Price, Emily N | Holm, Ingrid A | Luquette, Lovelace J | Lyon, Elaine | Majzoub, Joseph | Neupert, Peter | McCallie Jr, David | Szolovits, Peter | Willard, Huntington F | Mendelsohn, Nancy J | Temme, Renee | Finkel, Richard S | Yum, Sabrina W | Medne, Livija | Sunyaev, Shamil R | Adzhubey, Ivan | Cassa, Christopher A | de Bakker, Paul IW | Duzkale, Hatice | Dworzyński, Piotr | Fairbrother, William | Francioli, Laurent | Funke, Birgit H | Giovanni, Monica A | Handsaker, Robert E | Lage, Kasper | Lebo, Matthew S | Lek, Monkol | Leshchiner, Ignaty | MacArthur, Daniel G | McLaughlin, Heather M | Murray, Michael F | Pers, Tune H | Polak, Paz P | Raychaudhuri, Soumya | Rehm, Heidi L | Soemedi, Rachel | Stitziel, Nathan O | Vestecka, Sara | Supper, Jochen | Gugenmus, Claudia | Klocke, Bernward | Hahn, Alexander | Schubach, Max | Menzel, Mortiz | Biskup, Saskia | Freisinger, Peter | Deng, Mario | Braun, Martin | Perner, Sven | Smith, Richard JH | Andorf, Janeen L | Huang, Jian | Ryckman, Kelli | Sheffield, Val C | Stone, Edwin M | Bair, Thomas | Black-Ziegelbein, E Ann | Braun, Terry A | Darbro, Benjamin | DeLuca, Adam P | Kolbe, Diana L | Scheetz, Todd E | Shearer, Aiden E | Sompallae, Rama | Wang, Kai | Bassuk, Alexander G | Edens, Erik | Mathews, Katherine | Moore, Steven A | Shchelochkov, Oleg A | Trapane, Pamela | Bossler, Aaron | Campbell, Colleen A | Heusel, Jonathan W | Kwitek, Anne | Maga, Tara | Panzer, Karin | Wassink, Thomas | Van Daele, Douglas | Azaiez, Hela | Booth, Kevin | Meyer, Nic | Segal, Michael M | Williams, Marc S | Tromp, Gerard | White, Peter | Corsmeier, Donald | Fitzgerald-Butt, Sara | Herman, Gail | Lamb-Thrush, Devon | McBride, Kim L | Newsom, David | Pierson, Christopher R | Rakowsky, Alexander T | Maver, Aleš | Lovrečić, Luca | Palandačić, Anja | Peterlin, Borut | Torkamani, Ali | Wedell, Anna | Huss, Mikael | Alexeyenko, Andrey | Lindvall, Jessica M | Magnusson, Måns | Nilsson, Daniel | Stranneheim, Henrik | Taylan, Fulya | Gilissen, Christian | Hoischen, Alexander | van Bon, Bregje | Yntema, Helger | Nelen, Marcel | Zhang, Weidong | Sager, Jason | Zhang, Lu | Blair, Kathryn | Kural, Deniz | Cariaso, Michael | Lennon, Greg G | Javed, Asif | Agrawal, Saloni | Ng, Pauline C | Sandhu, Komal S | Krishna, Shuba | Veeramachaneni, Vamsi | Isakov, Ofer | Halperin, Eran | Friedman, Eitan | Shomron, Noam | Glusman, Gustavo | Roach, Jared C | Caballero, Juan | Cox, Hannah C | Mauldin, Denise | Ament, Seth A | Rowen, Lee | Richards, Daniel R | Lucas, F Anthony San | Gonzalez-Garay, Manuel L | Caskey, C Thomas | Bai, Yu | Huang, Ying | Fang, Fang | Zhang, Yan | Wang, Zhengyuan | Barrera, Jorge | Garcia-Lobo, Juan M | González-Lamuño, Domingo | Llorca, Javier | Rodriguez, Maria C | Varela, Ignacio | Reese, Martin G | De La Vega, Francisco M | Kiruluta, Edward | Cargill, Michele | Hart, Reece K | Sorenson, Jon M | Lyon, Gholson J | Stevenson, David A | Bray, Bruce E | Moore, Barry M | Eilbeck, Karen | Yandell, Mark | Zhao, Hongyu | Hou, Lin | Chen, Xiaowei | Yan, Xiting | Chen, Mengjie | Li, Cong | Yang, Can | Gunel, Murat | Li, Peining | Kong, Yong | Alexander, Austin C | Albertyn, Zayed I | Boycott, Kym M | Bulman, Dennis E | Gordon, Paul MK | Innes, A Micheil | Knoppers, Bartha M | Majewski, Jacek | Marshall, Christian R | Parboosingh, Jillian S | Sawyer, Sarah L | Samuels, Mark E | Schwartzentruber, Jeremy | Kohane, Isaac S | Margulies, David M
Genome Biology  2014;15(3):R53.
Background
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
Results
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
Conclusions
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
doi:10.1186/gb-2014-15-3-r53
PMCID: PMC4073084  PMID: 24667040
2.  Local alignment of generalized k-base encoded DNA sequence 
BMC Bioinformatics  2010;11:347.
Background
DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.
Results
Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.
Conclusions
The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.
doi:10.1186/1471-2105-11-347
PMCID: PMC2911458  PMID: 20576157
3.  U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line 
PLoS Genetics  2010;6(1):e1000832.
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.
Author Summary
Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.
doi:10.1371/journal.pgen.1000832
PMCID: PMC2813426  PMID: 20126413
4.  Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing 
BMC Genomics  2009;10:646.
Background
The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance.
Results
Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions.
Conclusions
The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.
doi:10.1186/1471-2164-10-646
PMCID: PMC2808330  PMID: 20043857
5.  BFAST: An Alignment Tool for Large Scale Genome Resequencing 
PLoS ONE  2009;4(11):e7767.
Background
The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation.
Methodology
We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.
Conclusions
We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at http://bfast.sourceforge.net.
doi:10.1371/journal.pone.0007767
PMCID: PMC2770639  PMID: 19907642
6.  Local alignment of two-base encoded DNA sequence 
BMC Bioinformatics  2009;10:175.
Background
DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity.
Results
We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions.
Conclusion
The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.
doi:10.1186/1471-2105-10-175
PMCID: PMC2709925  PMID: 19508732

Results 1-6 (6)