Search tips
Search criteria

Results 1-22 (22)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Solving genetic disease at the population scale 
BMC Genomics  2014;15(Suppl 2):O15.
PMCID: PMC4075451
2.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge 
Brownstein, Catherine A | Beggs, Alan H | Homer, Nils | Merriman, Barry | Yu, Timothy W | Flannery, Katherine C | DeChene, Elizabeth T | Towne, Meghan C | Savage, Sarah K | Price, Emily N | Holm, Ingrid A | Luquette, Lovelace J | Lyon, Elaine | Majzoub, Joseph | Neupert, Peter | McCallie Jr, David | Szolovits, Peter | Willard, Huntington F | Mendelsohn, Nancy J | Temme, Renee | Finkel, Richard S | Yum, Sabrina W | Medne, Livija | Sunyaev, Shamil R | Adzhubey, Ivan | Cassa, Christopher A | de Bakker, Paul IW | Duzkale, Hatice | Dworzyński, Piotr | Fairbrother, William | Francioli, Laurent | Funke, Birgit H | Giovanni, Monica A | Handsaker, Robert E | Lage, Kasper | Lebo, Matthew S | Lek, Monkol | Leshchiner, Ignaty | MacArthur, Daniel G | McLaughlin, Heather M | Murray, Michael F | Pers, Tune H | Polak, Paz P | Raychaudhuri, Soumya | Rehm, Heidi L | Soemedi, Rachel | Stitziel, Nathan O | Vestecka, Sara | Supper, Jochen | Gugenmus, Claudia | Klocke, Bernward | Hahn, Alexander | Schubach, Max | Menzel, Mortiz | Biskup, Saskia | Freisinger, Peter | Deng, Mario | Braun, Martin | Perner, Sven | Smith, Richard JH | Andorf, Janeen L | Huang, Jian | Ryckman, Kelli | Sheffield, Val C | Stone, Edwin M | Bair, Thomas | Black-Ziegelbein, E Ann | Braun, Terry A | Darbro, Benjamin | DeLuca, Adam P | Kolbe, Diana L | Scheetz, Todd E | Shearer, Aiden E | Sompallae, Rama | Wang, Kai | Bassuk, Alexander G | Edens, Erik | Mathews, Katherine | Moore, Steven A | Shchelochkov, Oleg A | Trapane, Pamela | Bossler, Aaron | Campbell, Colleen A | Heusel, Jonathan W | Kwitek, Anne | Maga, Tara | Panzer, Karin | Wassink, Thomas | Van Daele, Douglas | Azaiez, Hela | Booth, Kevin | Meyer, Nic | Segal, Michael M | Williams, Marc S | Tromp, Gerard | White, Peter | Corsmeier, Donald | Fitzgerald-Butt, Sara | Herman, Gail | Lamb-Thrush, Devon | McBride, Kim L | Newsom, David | Pierson, Christopher R | Rakowsky, Alexander T | Maver, Aleš | Lovrečić, Luca | Palandačić, Anja | Peterlin, Borut | Torkamani, Ali | Wedell, Anna | Huss, Mikael | Alexeyenko, Andrey | Lindvall, Jessica M | Magnusson, Måns | Nilsson, Daniel | Stranneheim, Henrik | Taylan, Fulya | Gilissen, Christian | Hoischen, Alexander | van Bon, Bregje | Yntema, Helger | Nelen, Marcel | Zhang, Weidong | Sager, Jason | Zhang, Lu | Blair, Kathryn | Kural, Deniz | Cariaso, Michael | Lennon, Greg G | Javed, Asif | Agrawal, Saloni | Ng, Pauline C | Sandhu, Komal S | Krishna, Shuba | Veeramachaneni, Vamsi | Isakov, Ofer | Halperin, Eran | Friedman, Eitan | Shomron, Noam | Glusman, Gustavo | Roach, Jared C | Caballero, Juan | Cox, Hannah C | Mauldin, Denise | Ament, Seth A | Rowen, Lee | Richards, Daniel R | Lucas, F Anthony San | Gonzalez-Garay, Manuel L | Caskey, C Thomas | Bai, Yu | Huang, Ying | Fang, Fang | Zhang, Yan | Wang, Zhengyuan | Barrera, Jorge | Garcia-Lobo, Juan M | González-Lamuño, Domingo | Llorca, Javier | Rodriguez, Maria C | Varela, Ignacio | Reese, Martin G | De La Vega, Francisco M | Kiruluta, Edward | Cargill, Michele | Hart, Reece K | Sorenson, Jon M | Lyon, Gholson J | Stevenson, David A | Bray, Bruce E | Moore, Barry M | Eilbeck, Karen | Yandell, Mark | Zhao, Hongyu | Hou, Lin | Chen, Xiaowei | Yan, Xiting | Chen, Mengjie | Li, Cong | Yang, Can | Gunel, Murat | Li, Peining | Kong, Yong | Alexander, Austin C | Albertyn, Zayed I | Boycott, Kym M | Bulman, Dennis E | Gordon, Paul MK | Innes, A Micheil | Knoppers, Bartha M | Majewski, Jacek | Marshall, Christian R | Parboosingh, Jillian S | Sawyer, Sarah L | Samuels, Mark E | Schwartzentruber, Jeremy | Kohane, Isaac S | Margulies, David M
Genome Biology  2014;15(3):R53.
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
PMCID: PMC4073084  PMID: 24667040
3.  Familial Cortical Myoclonus with a Mutation in NOL3 
Annals of neurology  2012;72(2):175-183.
Myoclonus is characterized by sudden, brief involuntary movements and its presence is debilitating. We identified a family suffering from adult-onset, cortical myoclonus without associated seizures. We performed clinical, electrophysiological, and genetic studies to define this phenotype.
A large, four-generation family with history of myoclonus underwent careful questioning, examination, and electrophysiological testing. Thirty-five family members donated blood samples for genetic analysis, which included SNP mapping, microsatellite linkage, targeted massively parallel sequencing, and Sanger sequencing. In silico and in vitro experiments were performed to investigate functional significance of the mutation.
We identified 11 members of a Canadian Mennonite family suffering from adult-onset, slowly progressive, disabling, multifocal myoclonus. Somatosensory evoked potentials indicated a cortical origin of the myoclonus. There were no associated seizures. Some severely affected individuals developed signs of progressive cerebellar ataxia of variable severity late in the course of their illness. The phenotype was inherited in an autosomal dominant fashion. We demonstrated linkage to chromosome 16q21-22.1. We then sequenced all coding sequence in the critical region, identifying only a single co-segregating, novel, nonsynonymous mutation, which resides in the gene NOL3. Furthermore, this mutation was found to alter post-translational modification of NOL3 protein in vitro.
We propose that Familial Cortical Myoclonus (FCM) is a novel movement disorder that may be caused by mutation in NOL3. Further investigation of the role of NOL3 in neuronal physiology may shed light on neuronal membrane hyperexcitability and pathophysiology of myoclonus and related disorders.
PMCID: PMC3431191  PMID: 22926851
4.  Gain-of-function mutations in TRPV4 cause autosomal dominant brachyolmia 
Nature genetics  2008;40(8):999-1003.
The brachyolmias constitute a clinically and genetically heterogeneous group of skeletal dysplasias characterized by a short trunk, scoliosis and mild short stature1. Here, we identify a locus for an autosomal dominant form of brachyolmia on chromosome 12q24.1–12q24.2. Among the genes in the genetic interval, we selected TRPV4, which encodes a calcium permeable cation channel of the transient receptor potential (TRP) vanilloid family, as a candidate gene because of its cartilage-selective gene expression pattern. In two families with the phenotype, we identified point mutations in TRPV4 that encoded R616Q and V620I substitutions, respectively. Patch clamp studies of transfected HEK cells showed that both mutations resulted in a dramatic gain of function characterized by increased constitutive activity and elevated channel activation by either mechano-stimulation or agonist stimulation by arachidonic acid or the TRPV4-specific agonist 4α-phorbol 12,13-didecanoate (4αPDD). This study thus defines a previously unknown mechanism, activation of a calcium-permeable TRP ion channel, in skeletal dysplasia pathogenesis.
PMCID: PMC3525077  PMID: 18587396
5.  Introduction 
PMCID: PMC3447194  PMID: 23189382
bioinformatics; microarrays; sequencing; next-generation sequencing; 454; Ion Torrent; medical records; electronic health records
6.  CEP41 is mutated in Joubert syndrome and is required for tubulin glutamylation at the cilium 
Nature Genetics  2012;44(2):193-199.
Tubulin glutamylation is a post-translational modification (PTM) occurring predominantly on ciliary axonemal tubulin and has been suggested to be important for ciliary function 1,2. However, its relationship to disorders of the primary cilium, termed ‘ciliopathies’, has not been explored. Here, in Joubert syndrome (JBTS) 3, we identify the JBTS15 locus and the responsible gene as CEP41, encoding a centrosomal protein of 41 KDa 4. We show that CEP41 is localized to the basal body/primary cilium, and regulates the ciliary entry of TTLL6, an evolutionarily conserved polyglutamylase enzyme 5. Depletion of CEP41 causes ciliopathy-related phenotypes in zebrafish and mouse, and induces cilia axonemal glutamylation defects. Our data identify loss of CEP41 as a cause of JBTS ciliopathy and highlight involvement of tubulin PTM in pathogenesis of the ciliopathy spectrum.
PMCID: PMC3267856  PMID: 22246503
7.  Lethal Skeletal Dysplasia in Mice and Humans Lacking the Golgin GMAP-210 
The New England journal of medicine  2010;362(3):206-216.
Establishing the genetic basis of phenotypes such as skeletal dysplasia in model organisms can provide insights into biologic processes and their role in human disease.
We screened mutagenized mice and observed a neonatal lethal skeletal dysplasia with an autosomal recessive pattern of inheritance. Through genetic mapping and positional cloning, we identified the causative mutation.
Affected mice had a nonsense mutation in the thyroid hormone receptor interactor 11 gene (Trip11), which encodes the Golgi microtubule-associated protein 210 (GMAP-210); the affected mice lacked this protein. Golgi architecture was disturbed in multiple tissues, including cartilage. Skeletal development was severely impaired, with chondrocytes showing swelling and stress in the endoplasmic reticulum, abnormal cellular differentiation, and increased cell death. Golgi-mediated glycosylation events were altered in fibroblasts and chondrocytes lacking GMAP-210, and these chondrocytes had intracellular accumulation of perlecan, an extracellular matrix protein, but not of type II collagen or aggrecan, two other extracellular matrix proteins. The similarities between the skeletal and cellular phenotypes in these mice and those in patients with achondrogenesis type 1A, a neonatal lethal form of skeletal dysplasia in humans, suggested that achondrogenesis type 1A may be caused by GMAP-210 deficiency. Sequence analysis revealed loss-of-function mutations in the 10 unrelated patients with achondrogenesis type 1A whom we studied.
GMAP-210 is required for the efficient glycosylation and cellular transport of multiple proteins. The identification of a mutation affecting GMAP-210 in mice, and then in humans, as the cause of a lethal skeletal dysplasia underscores the value of screening for abnormal phenotypes in model organisms and identifying the causative mutations.
PMCID: PMC3108191  PMID: 20089971
8.  “High Density SNP Association Study of the 17q21 Chromosomal Region Linked to Autism Identifies CACNA1G as a Novel Candidate Gene” 
Molecular psychiatry  2009;15(10):996-1005.
Chromosome 17q11-q21 is a region of the genome likely to harbor susceptibility to autism (MIM[209850]) based on prior evidence of linkage to the disorder. This linkage is specific to multiplex pedigrees containing only male probands (MO) within the Autism Genetic Resource Exchange (AGRE). Previously, Stone et al.1 completed a high-density SNP association study of 13.7Mb within this interval, but common variant association was not sufficient to account for the linkage signal. Here we extend this SNP-based association study to complete the coverage of the 2 LOD support interval around the chromosome 17q linkage peak by testing the majority of common alleles in 284 MO trios.
Markers within an interval containing the gene CACNA1G were found to be associated with Autism Spectrum Disorder at a locally significant level (p = 1.9 × 10-5). While establishing CACNA1G as a novel candidate for autism, these alleles do not contribute sufficient genetic effect to explain the observed linkage, indicating there is substantial genetic heterogeneity despite the clear linkage signal. The region thus likely harbors a combination of multiple common and rare alleles contributing to the genetic risk. These data, along with previous studies of Chromosomes 5 and 7q3, suggest few if any major common risk alleles account for ASD risk under major linkage peaks in the AGRE sample. This provides important evidence for strategies to identify ASD genes, suggesting they should focus on identifying rare variants and common variants of small effect.
PMCID: PMC2889141  PMID: 19455149
Autism; Autism Spectrum Disorder; Association; Chromosome 17q; CACNA1G
9.  SeqWare Query Engine: storing and searching sequence data in the cloud 
BMC Bioinformatics  2010;11(Suppl 12):S2.
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands.
In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (
The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
PMCID: PMC3040528  PMID: 21210981
10.  Local alignment of generalized k-base encoded DNA sequence 
BMC Bioinformatics  2010;11:347.
DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.
Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.
The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.
PMCID: PMC2911458  PMID: 20576157
11.  Effect of Early versus Deferred Antiretroviral Therapy for HIV on Survival 
The New England journal of medicine  2009;360(18):1815-1826.
The optimal time for the initiation of antiretroviral therapy for asymptomatic patients with human immunodeficiency virus (HIV) infection is uncertain.
We conducted two parallel analyses involving a total of 17,517 asymptomatic patients with HIV infection in the United States and Canada who received medical care during the period from 1996 through 2005. None of the patients had undergone previous antiretroviral therapy. In each group, we stratified the patients according to the CD4+ count (351 to 500 cells per cubic millimeter or >500 cells per cubic millimeter) at the initiation of antiretroviral therapy. In each group, we compared the relative risk of death for patients who initiated therapy when the CD4+ count was above each of the two thresholds of interest (early-therapy group) with that of patients who deferred therapy until the CD4+ count fell below these thresholds (deferred-therapy group).
In the first analysis, which involved 8362 patients, 2084 (25%) initiated therapy at a CD4+ count of 351 to 500 cells per cubic millimeter, and 6278 (75%) deferred therapy. After adjustment for calendar year, cohort of patients, and demographic and clinical characteristics, among patients in the deferred-therapy group there was an increase in the risk of death of 69%, as compared with that in the early-therapy group (relative risk in the deferred-therapy group, 1.69; 95% confidence interval [CI], 1.26 to 2.26; P<0.001). In the second analysis involving 9155 patients, 2220 (24%) initiated therapy at a CD4+ count of more than 500 cells per cubic millimeter and 6935 (76%) deferred therapy. Among patients in the deferred-therapy group, there was an increase in the risk of death of 94% (relative risk, 1.94; 95% CI, 1.37 to 2.79; P<0.001).
The early initiation of antiretroviral therapy before the CD4+ count fell below two prespecified thresholds significantly improved survival, as compared with deferred therapy.
PMCID: PMC2854555  PMID: 19339714
12.  U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line 
PLoS Genetics  2010;6(1):e1000832.
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.
Author Summary
Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.
PMCID: PMC2813426  PMID: 20126413
13.  Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing 
BMC Genomics  2009;10:646.
The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance.
Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions.
The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.
PMCID: PMC2808330  PMID: 20043857
14.  BFAST: An Alignment Tool for Large Scale Genome Resequencing 
PLoS ONE  2009;4(11):e7767.
The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25–100 base range, in the presence of errors and true biological variation.
We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.
We compare BFAST to a selection of large-scale alignment tools - BLAT, MAQ, SHRiMP, and SOAP - in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at
PMCID: PMC2770639  PMID: 19907642
15.  Patient Factors Used by Pediatricians to Assign Asthma Treatment 
Pediatrics  2008;122(1):e195-e201.
Although asthma is often inappropriately treated in children, little is known about what information pediatricians use to adjust asthma therapy. The purpose of this work was to assess the importance of various dimensions of patient asthma status as the basis of pediatrician treatment decisions.
We conducted a cross-sectional, random-sample survey, between November 2005 and May 2006, of 500 members of the American Academy of Pediatrics using standardized case vignettes. Vignettes varied in regard to (1) acute health care use (hospitalized 6 months ago), (2) bother (parent bothered by the child’s asthma status), (3) control (frequency of symptoms and albuterol use), (4) direction (qualitative change in symptoms), and (5) wheezing during physical examination. Our primary outcome was the proportion of pediatricians who would adjust treatment in the presence or absence of these 5 factors.
Physicians used multiple dimensions of asthma status other than symptoms to determine treatment. Pediatricians were significantly more likely to increase treatment for a recently hospitalized patient (45% vs 18%), a bothered parent (67% vs 18%), poorly controlled symptoms (4–5 times per week; 100% vs 18%), or if there was wheezing on examination (45% vs 18%) compared with patients who only had well-controlled symptoms. Pediatricians were significantly less likely to decrease treatment for a child with well-controlled symptoms and recent hospitalization (28%), parents who reported being bothered (43%), or a child whose symptoms had worsened since the last doctor visit (10%) compared with children with well-controlled symptoms alone.
Pediatricians treat asthma on the basis of multiple dimensions of asthma status, including hospitalization, bother, symptom frequency, direction, and wheezing but use these factors differently to increase and decrease treatment. Tools that systematically assess multiple dimensions of asthma may be useful to help further improve pediatric asthma care.
PMCID: PMC2725186  PMID: 18595964
asthma; pediatrics; treatment; decision-making; survey
16.  Local alignment of two-base encoded DNA sequence 
BMC Bioinformatics  2009;10:175.
DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity.
We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions.
The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.
PMCID: PMC2709925  PMID: 19508732
17.  Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA methylation patterning 
Nature  2008;452(7184):215-219.
Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences 1, 2. Recent genomic studies in Arabidopsis have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels 3-5. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single base pair resolution of methylated cytosines for Arabidopsis, by combining bisulfite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyzer and Solexa sequencing technology 6. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. We describe methylation on previously inaccessible components of the genome along with an analysis of the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as mouse.
PMCID: PMC2377394  PMID: 18278030
18.  Are Physician Estimates of Asthma Severity Less Accurate in Black than in White Patients? 
Racial differences in asthma care are not fully explained by socioeconomic status, care access, and insurance status. Appropriate care requires accurate physician estimates of severity. It is unknown if accuracy of physician estimates differs between black and white patients, and how this relates to asthma care disparities.
We hypothesized that: 1) physician underestimation of asthma severity is more frequent among black patients; 2) among black patients, physician underestimation of severity is associated with poorer quality asthma care.
Design, Setting and Patients
We conducted a cross-sectional survey among adult patients with asthma cared for in 15 managed care organizations in the United States. We collected physicians’ estimates of their patients’ asthma severity. Physicians’ estimates of patients’ asthma as being less severe than patient-reported symptoms were classified as underestimates of severity.
Frequency of underestimation, asthma care, and communication.
Three thousand four hundred and ninety-four patients participated (13% were black). Blacks were significantly more likely than white patients to have their asthma severity underestimated (OR = 1.39, 95% CI 1.08–1.79). Among black patients, underestimation was associated with less use of daily inhaled corticosteroids (13% vs 20%, p < .05), less physician instruction on management of asthma flare-ups (33% vs 41%, p < .0001), and lower ratings of asthma care (p = .01) and physician communication (p = .04).
Biased estimates of asthma severity may contribute to racially disparate asthma care. Interventions to improve physicians’ assessments of asthma severity and patient–physician communication may minimize racial disparities in asthma care.
PMCID: PMC2583798  PMID: 17453263
asthma; racial disparities; patient–physician communication
19.  Repetitive sequence environment distinguishes housekeeping genes 
Gene  2006;390(1-2):153-165.
Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element 1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, were used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.
PMCID: PMC1857324  PMID: 17141428
random forest; Alu; SINE; LINE; repeat; tissue-specific genes; isochores
20.  Mortality in Patients Hospitalized for Asthma Exacerbations in the United States 
Rationale: Hospitalizations for asthma exacerbations are common in the United States, but there are no national estimates of outcomes in this population. It is also not known if race disparities in asthma deaths exist among hospitalized patients.
Objectives: To estimate outcomes of patients hospitalized for asthma in the United States and to determine if the risk of death in this population is higher among black patients compared with white patients.
Methods: We used the Nationwide Inpatient Sample for 2000. Admissions for asthma exacerbations among patients > 5 yr of age were included. Mortality was the primary outcome; secondary outcomes were length of stay and total hospital charges.
Measurements and Main Results: In-hospital asthma mortality was 0.5% (99% confidence interval [CI], 0.4–0.6), with mean hospital stay of 2.7 d (99% CI, 2.6–2.8 d) and $9,078 (99% CI, $8,300–9,855) in hospital charges. Deaths in this population accounted for about one-third of all asthma deaths reported in the United States. Black patients hospitalized for asthma exacerbations were less likely to die when compared with white patients (0.3 vs. 0.6%; p < 0.001). However, in multivariable analyses, there were no significant race differences in hospital deaths.
Conclusions: Mortality among patients hospitalized for asthma exacerbations accounts for one-third of all deaths from asthma. The higher overall risk of death from asthma in black patients compared with white patients in the United States is not explained by race differences in hospital deaths and therefore is attributable to factors preceding hospitalization.
PMCID: PMC2648055  PMID: 16778163
costs; epidemiology; length of stay; mortality; race
21.  Risk adjustment for hospital use using social security data: cross sectional small area analysis 
BMJ : British Medical Journal  2002;324(7334):390.
To identify demographic and socioeconomic determinants of need for acute hospital treatment at small area level. To establish whether there is a relation between poverty and use of inpatient services. To devise a risk adjustment formula for distributing public funds for hospital services using, as far as possible, variables that can be updated between censuses.
Cross sectional analysis. Spatial interactive modelling was used to quantify the proximity of the population to health service facilities. Two stage weighted least squares regression was used to model use against supply of hospital and community services and a wide range of potential needs drivers including health, socioeconomic census variables, uptake of income support and family credit, and religious denomination.
Northern Ireland.
Main outcome measure
Intensity of use of inpatient services.
After endogeneity of supply and use was taken into account, a statistical model was produced that predicted use based on five variables: income support, family credit, elderly people living alone, all ages standardised mortality ratio, and low birth weight. The main effect of the formula produced is to move resources from urban to rural areas.
This work has produced a population risk adjustment formula for acute hospital treatment in which four of the five variables can be updated annually rather than relying on census derived data. Inclusion of the social security data makes a substantial difference to the model and to the results produced by the formula.
What is already known on this topicUse of hospital services at small area level is related to supply and census derived proxy measures of socioeconomic status as well as morbidityChanges to census data can be determined only every 10 yearsWhat this study addsSocial security data directly reflecting household income predicts use of inpatient servicesUse of social security data allowed development of a risk adjustment model in which four of the five variables can be updated annuallyThe main effect of the resulting formula is to move resources from urban to rural areas
PMCID: PMC65531  PMID: 11850368

Results 1-22 (22)