Search tips
Search criteria

Results 1-25 (29)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Mitochondrial genome sequences reveal evolutionary relationships of the Phytophthora Ic clade species 
Current genetics  2015;61(4):567-577.
Phytophthora infestans is one of the most destructive plant pathogens of potato and tomato globally. The pathogen is closely related to four other Phytophthora species including P. phaseoli, P. ipomoeae, P. mirabilis, and P. andina that are important pathogens of other wild and domesticated hosts. P. andina is an interspecific hybrid between P. infestans and an unknown Phytophthora species. We have sequenced mitochondrial genomes of the sister species of P. infestans in order to resolve the evolutionary relationships within the clade. Phylogenetic analysis indicates that the P. phaseoli mitochondrial lineage is basal within the clade. P. mirabilis and P. ipomoeae are sister lineages and share a common ancestor with the Ic mitochondrial lineage of P. andina. These lineages in turn are sister to the P. infestans and P. andina Ia mitochondrial lineages. The P. andina Ic lineage diverged much earlier than the P. andina Ia mitochondrial lineage and P. infestans. The presence of two mitochondrial lineages in P. andina supports the hybrid nature of this species. The ancestral state of the P. andina Ic lineage in the tree and its occurrence only in the Andean regions of Ecuador, Colombia and Peru suggests further sampling in the Americasis warranted to understand the distribution of this species hybrid in nature.
PMCID: PMC4659649  PMID: 25754775
Phytophthora I c clade; mitochondria; Late blight; phylogenetics
2.  Comprehensive variation discovery in single human genomes 
Nature genetics  2014;46(12):1350-1355.
Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome, however calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from finished sequence of 103 randomly chosen Fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity several-fold, with the greatest impact in challenging regions of the human genome.
PMCID: PMC4244235  PMID: 25326702
3.  Exome Sequencing Can Improve Diagnosis and Alter Patient Management 
Science translational medicine  2012;4(138):138ra78.
The translation of “next-generation” sequencing directly to the clinic is still being assessed but has the potential for genetic diseases to reduce costs, advance accuracy, and point to unsuspected yet treatable conditions. To study its capability in the clinic, we performed whole-exome sequencing in 118 probands with a diagnosis of a pediatric-onset neurodevelopmental disease in which most known causes had been excluded. Twenty-two genes not previously identified as disease-causing were identified in this study (19% of cohort), further establishing exome sequencing as a useful tool for gene discovery. New genes identified included EXOC8 in Joubert syndrome and GFM2 in a patient with microcephaly, simplified gyral pattern, and insulin-dependent diabetes. Exome sequencing uncovered 10 probands (8% of cohort) with mutations in genes known to cause a disease different from the initial diagnosis. Upon further medical evaluation, these mutations were found to account for each pro-band's disease, leading to a change in diagnosis, some of which led to changes in patient management. Our data provide proof of principle that genomic strategies are useful in clarifying diagnosis in a proportion of patients with neurodevelopmental disorders.
PMCID: PMC4442637  PMID: 22700954
4.  De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly 
Nature genetics  2012;44(8):941-945.
De novo somatic mutations in focal areas are well documented in diseases such as neoplasia but are rarely reported in malformation of the developing brain. Hemimegalencephaly (HME) is characterized by overgrowth of either one of the two cerebral hemispheres. The molecular etiology of HME remains a mystery. The intractable epilepsy that is associated with HME can be relieved by the surgical treatment hemispherectomy, allowing sampling of diseased tissue. Exome sequencing and mass spectrometry analysis in paired brain-blood samples from individuals with HME (n = 20 cases) identified de novo somatic mutations in 30% of affected individuals in the PIK3CA, AKT3 and MTOR genes. A recurrent PIK3CA c.1633G>A mutation was found in four separate cases. Identified mutations were present in 8–40% of sequenced alleles in various brain regions and were associated with increased neuronal S6 protein phosphorylation in the brains of affected individuals, indicating aberrant activation of mammalian target of rapamycin (mTOR) signaling. Thus HME is probably a genetically mosaic disease caused by gain of function in phosphatidylinositol 3-kinase (PI3K)-AKT3-mTOR signaling.
PMCID: PMC4417942  PMID: 22729223
5.  Genomics of Loa loa, a Wolbachia-free filarial parasite of humans 
Nature genetics  2013;45(5):495-500.
Loa loa, the African eyeworm, is a major filarial pathogen of humans. Unlike most filariae, Loa loa does not contain the obligate intracellular Wolbachia endosymbiont. We describe the 91.4 Mb genome of Loa loa, and the genome of the related filarial parasite Wuchereria bancrofti, and predict 14,907 Loa loa genes based on microfilarial RNA sequencing. By comparing these genomes to that of another filarial parasite, Brugia malayi, and to several other nematode genomes, we demonstrate synteny among filariae but not with non-parasitic nematodes. The Loa loa genome encodes many immunologically relevant genes, as well as protein kinases targeted by drugs currently approved for humans. Despite lacking Wolbachia, Loa loa shows no new metabolic synthesis or transport capabilities compared to other filariae. These results suggest that the role played by Wolbachia in filarial biology is more subtle than previously thought and reveal marked differences between parasitic and non-parasitic nematodes.
PMCID: PMC4238225  PMID: 23525074
6.  Massively Parallel Sequencing of Human Urinary Exosome/Microvesicle RNA Reveals a Predominance of Non-Coding RNA 
PLoS ONE  2014;9(5):e96094.
Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation.
PMCID: PMC4015934  PMID: 24816817
7.  Semiconductor-based DNA sequencing of histone modification states 
Nature communications  2013;4:2672.
The recent development of a semiconductor-based, non-optical DNA sequencing technology promises scalable, low-cost and rapid sequence data production. The technology has previously been applied mainly to genomic sequencing and targeted re-sequencing. Here we demonstrate the utility of Ion Torrent semiconductor-based sequencing for sensitive, efficient and rapid chromatin immunoprecipitation followed by sequencing (ChIP-seq) through the application of sample preparation methods that are optimized for ChIP-seq on the Ion Torrent platform. We leverage this method for epigenetic profiling of tumour tissues.
PMCID: PMC3917140  PMID: 24157732
8.  Regulated aggregative multicellularity in a close unicellular relative of metazoa 
eLife  2013;2:e01287.
The evolution of metazoans from their unicellular ancestors was one of the most important events in the history of life. However, the cellular and genetic changes that ultimately led to the evolution of multicellularity are not known. In this study, we describe an aggregative multicellular stage in the protist Capsaspora owczarzaki, a close unicellular relative of metazoans. Remarkably, transition to the aggregative stage is associated with significant upregulation of orthologs of genes known to establish multicellularity and tissue architecture in metazoans. We further observe transitions in regulated alternative splicing during the C. owczarzaki life cycle, including the deployment of an exon network associated with signaling, a feature of splicing regulation so far only observed in metazoans. Our results reveal the existence of a highly regulated aggregative stage in C. owczarzaki and further suggest that features of aggregative behavior in an ancestral protist may had been co-opted to develop some multicellular properties currently seen in metazoans.
eLife digest
When living things made from many cells evolved from single-celled ancestors, it was a breakthrough in the history of life—and one that has occurred more than once. In fact, multicellular life has evolved independently at least 25 times, in groups as diverse as animals, fungi, plants, slime molds and seaweeds. There are broadly two ways to become multicellular. The most complex multicellular species, such as animals, will replicate a single cell, over and over, without separating the resultant cells. However, in species that are only occasionally multicellular, free-living cells tend instead to join together in one mass of many cells.
Evolution is constrained by its raw materials; so looking at the living relatives of a given species, or group, can lead to a better understanding of its evolution because its relatives contain clues about its ancestors. To gain insights into how animal multicellular life might have began; Sebé-Pedrós et al. studied the life cycle of the amoeboid organism Capsaspora owczarzaki. Found within the bodies of freshwater snails, this single-celled amoeba is a close relative of multicellular animals and could resemble one of their earliest ancestors.
At certain stages of the life cycle Sebé-Pedrós et al. noticed that individual amoebae gathered together to form a multicellular mass—something that had not been seen before in such a close relative of the animals. Moreover, the genes that ‘switched on’ when the amoebae began to aggregate are also found in animals; where, together with other genes, they control development and the formation of tissues. Sebé-Pedrós et al. suggest that the first multicellular animals could have recycled the genes that control the aggregation of single-celled species: in other words, genes that once controlled the changes that happen at different times in a life cycle, now control the changes that develop between different tissues at the same time.
Sebé-Pedrós et al. also observed that alternative splicing—a process that allows different proteins to be made from a single gene—occurs via two different mechanisms during the life cycle of Capsaspora. Most of the time Capsaspora employs a form of alternative splicing that is often seen in plants and fungi, and only rarely in animals; for the rest of the time it uses a form of alternative splicing similar to that used by animal cells.
The evolution of complex alternative splicing mechanisms is a hallmark feature of multicellular animals. The exploitation of two major forms of alternative splicing in Capsaspora could thus reflect an important transition during evolution that resulted in an increased diversity of proteins and in more complex gene regulation. Such a transition may ultimately have paved the way for the increased specialization of cell types seen in animals.
This glimpse into the possible transitions in gene regulation that contributed to the birth of multicellular animals indicates that the single-celled ancestor of the animals was likely more complex than previously thought. Future analyses of the animals’ close relatives may further improve our understanding of how single-celled organisms became multicellular animals.
PMCID: PMC3870316  PMID: 24368732
opisthokonts; Capsaspora; alternative splicing; cell differentiation; evolutionary transitions; Other
9.  Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen Saprolegnia parasitica 
PLoS Genetics  2013;9(6):e1003272.
Oomycetes in the class Saprolegniomycetidae of the Eukaryotic kingdom Stramenopila have evolved as severe pathogens of amphibians, crustaceans, fish and insects, resulting in major losses in aquaculture and damage to aquatic ecosystems. We have sequenced the 63 Mb genome of the fresh water fish pathogen, Saprolegnia parasitica. Approximately 1/3 of the assembled genome exhibits loss of heterozygosity, indicating an efficient mechanism for revealing new variation. Comparison of S. parasitica with plant pathogenic oomycetes suggests that during evolution the host cellular environment has driven distinct patterns of gene expansion and loss in the genomes of plant and animal pathogens. S. parasitica possesses one of the largest repertoires of proteases (270) among eukaryotes that are deployed in waves at different points during infection as determined from RNA-Seq data. In contrast, despite being capable of living saprotrophically, parasitism has led to loss of inorganic nitrogen and sulfur assimilation pathways, strikingly similar to losses in obligate plant pathogenic oomycetes and fungi. The large gene families that are hallmarks of plant pathogenic oomycetes such as Phytophthora appear to be lacking in S. parasitica, including those encoding RXLR effectors, Crinkler's, and Necrosis Inducing-Like Proteins (NLP). S. parasitica also has a very large kinome of 543 kinases, 10% of which is induced upon infection. Moreover, S. parasitica encodes several genes typical of animals or animal-pathogens and lacking from other oomycetes, including disintegrins and galactose-binding lectins, whose expression and evolutionary origins implicate horizontal gene transfer in the evolution of animal pathogenesis in S. parasitica.
Author Summary
Fish are an increasingly important source of animal protein globally, with aquaculture production rising dramatically over the past decade. Saprolegnia is a fungal-like oomycete and one of the most destructive fish pathogens, causing millions of dollars in losses to the aquaculture industry annually. Saprolegnia has also been linked to a worldwide decline in wild fish and amphibian populations. Here we describe the genome sequence of the first animal pathogenic oomycete and compare the genome content with the available plant pathogenic oomycetes. We found that Saprolegnia lacks the large effector families that are hallmarks of plant pathogenic oomycetes, showing evolutionary adaptation to the host. Moreover, Saprolegnia harbors pathogenesis-related genes that were derived by lateral gene transfer from the host and other animal pathogens. The retrotransposon LINE family also appears to be acquired from animal lineages. By transcriptome analysis we show a high rate of allelic variation, which reveals rapidly evolving genes and potentially adaptive evolutionary mechanisms coupled to selective pressures exerted by the animal host. The genome and transcriptome data, as well as subsequent biochemical analyses, provided us with insight in the disease process of Saprolegnia at a molecular and cellular level, providing us with targets for sustainable control of Saprolegnia.
PMCID: PMC3681718  PMID: 23785293
10.  Evolutionarily Assembled cis-Regulatory Module at a Human Ciliopathy Locus 
Science (New York, N.Y.)  2012;335(6071):966-969.
Neighboring genes are often coordinately expressed within cis-regulatory modules, but evidence that nonparalogous genes share functions in mammals is lacking. Here, we report that mutation of either TMEM138 or TMEM216 causes a phenotypically indistinguishable human ciliopathy, Joubert syndrome. Despite a lack of sequence homology, the genes are aligned in a head-to-tail configuration and joined by chromosomal rearrangement at the amphibian-to-reptile evolutionary transition. Expression of the two genes is mediated by a conserved regulatory element in the noncoding intergenic region. Coordinated expression is important for their interdependent cellular role in vesicular transport to primary cilia. Hence, during vertebrate evolution of genes involved in ciliogenesis, nonparalogous genes were arranged to a functional gene cluster with shared regulatory elements.
PMCID: PMC3671610  PMID: 22282472
11.  Characterizing and measuring bias in sequence data 
Genome Biology  2013;14(5):R51.
DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.
We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.
The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.
PMCID: PMC4053816  PMID: 23718773
12.  Premetazoan genome evolution and the regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta 
Genome Biology  2013;14(2):R15.
Metazoan multicellularity is rooted in mechanisms of cell adhesion, signaling, and differentiation that first evolved in the progenitors of metazoans. To reconstruct the genome composition of metazoan ancestors, we sequenced the genome and transcriptome of the choanoflagellate Salpingoeca rosetta, a close relative of metazoans that forms rosette-shaped colonies of cells.
A comparison of the 55 Mb S. rosetta genome with genomes from diverse opisthokonts suggests that the origin of metazoans was preceded by a period of dynamic gene gain and loss. The S. rosetta genome encodes homologs of cell adhesion, neuropeptide, and glycosphingolipid metabolism genes previously found only in metazoans and expands the repertoire of genes inferred to have been present in the progenitors of metazoans and choanoflagellates. Transcriptome analysis revealed that all four S. rosetta septins are upregulated in colonies relative to single cells, suggesting that these conserved cytokinesis proteins may regulate incomplete cytokinesis during colony development. Furthermore, genes shared exclusively by metazoans and choanoflagellates were disproportionately upregulated in colonies and the single cells from which they develop.
The S. rosetta genome sequence refines the catalog of metazoan-specific genes while also extending the evolutionary history of certain gene families that are central to metazoan biology. Transcriptome data suggest that conserved cytokinesis genes, including septins, may contribute to S. rosetta colony formation and indicate that the initiation of colony development may preferentially draw upon genes shared with metazoans, while later stages of colony maturation are likely regulated by genes unique to S. rosetta.
PMCID: PMC4054682  PMID: 23419129
Nature  2012;488(7409):106-110.
Medulloblastomas are the most common malignant brain tumors in children1. Identifying and understanding the genetic events that drive these tumors is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma based on transcriptional and copy number profiles2–5. Here, we utilized whole exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas exhibit low mutation rates consistent with other pediatric tumors, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR, and LDB1, novel findings in medulloblastoma. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant but not wild type beta-catenin. Together, our study reveals the alteration of Wnt, Hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic beta-catenin signaling in medulloblastoma.
PMCID: PMC3413789  PMID: 22820256
14.  A framework for human microbiome research 
Methé, Barbara A. | Nelson, Karen E. | Pop, Mihai | Creasy, Heather H. | Giglio, Michelle G. | Huttenhower, Curtis | Gevers, Dirk | Petrosino, Joseph F. | Abubucker, Sahar | Badger, Jonathan H. | Chinwalla, Asif T. | Earl, Ashlee M. | FitzGerald, Michael G. | Fulton, Robert S. | Hallsworth-Pepin, Kymberlie | Lobos, Elizabeth A. | Madupu, Ramana | Magrini, Vincent | Martin, John C. | Mitreva, Makedonka | Muzny, Donna M. | Sodergren, Erica J. | Versalovic, James | Wollam, Aye M. | Worley, Kim C. | Wortman, Jennifer R. | Young, Sarah K. | Zeng, Qiandong | Aagaard, Kjersti M. | Abolude, Olukemi O. | Allen-Vercoe, Emma | Alm, Eric J. | Alvarado, Lucia | Andersen, Gary L. | Anderson, Scott | Appelbaum, Elizabeth | Arachchi, Harindra M. | Armitage, Gary | Arze, Cesar A. | Ayvaz, Tulin | Baker, Carl C. | Begg, Lisa | Belachew, Tsegahiwot | Bhonagiri, Veena | Bihan, Monika | Blaser, Martin J. | Bloom, Toby | Vivien Bonazzi, J. | Brooks, Paul | Buck, Gregory A. | Buhay, Christian J. | Busam, Dana A. | Campbell, Joseph L. | Canon, Shane R. | Cantarel, Brandi L. | Chain, Patrick S. | Chen, I-Min A. | Chen, Lei | Chhibba, Shaila | Chu, Ken | Ciulla, Dawn M. | Clemente, Jose C. | Clifton, Sandra W. | Conlan, Sean | Crabtree, Jonathan | Cutting, Mary A. | Davidovics, Noam J. | Davis, Catherine C. | DeSantis, Todd Z. | Deal, Carolyn | Delehaunty, Kimberley D. | Dewhirst, Floyd E. | Deych, Elena | Ding, Yan | Dooling, David J. | Dugan, Shannon P. | Dunne, Wm. Michael | Durkin, A. Scott | Edgar, Robert C. | Erlich, Rachel L. | Farmer, Candace N. | Farrell, Ruth M. | Faust, Karoline | Feldgarden, Michael | Felix, Victor M. | Fisher, Sheila | Fodor, Anthony A. | Forney, Larry | Foster, Leslie | Di Francesco, Valentina | Friedman, Jonathan | Friedrich, Dennis C. | Fronick, Catrina C. | Fulton, Lucinda L. | Gao, Hongyu | Garcia, Nathalia | Giannoukos, Georgia | Giblin, Christina | Giovanni, Maria Y. | Goldberg, Jonathan M. | Goll, Johannes | Gonzalez, Antonio | Griggs, Allison | Gujja, Sharvari | Haas, Brian J. | Hamilton, Holli A. | Harris, Emily L. | Hepburn, Theresa A. | Herter, Brandi | Hoffmann, Diane E. | Holder, Michael E. | Howarth, Clinton | Huang, Katherine H. | Huse, Susan M. | Izard, Jacques | Jansson, Janet K. | Jiang, Huaiyang | Jordan, Catherine | Joshi, Vandita | Katancik, James A. | Keitel, Wendy A. | Kelley, Scott T. | Kells, Cristyn | Kinder-Haake, Susan | King, Nicholas B. | Knight, Rob | Knights, Dan | Kong, Heidi H. | Koren, Omry | Koren, Sergey | Kota, Karthik C. | Kovar, Christie L. | Kyrpides, Nikos C. | La Rosa, Patricio S. | Lee, Sandra L. | Lemon, Katherine P. | Lennon, Niall | Lewis, Cecil M. | Lewis, Lora | Ley, Ruth E. | Li, Kelvin | Liolios, Konstantinos | Liu, Bo | Liu, Yue | Lo, Chien-Chi | Lozupone, Catherine A. | Lunsford, R. Dwayne | Madden, Tessa | Mahurkar, Anup A. | Mannon, Peter J. | Mardis, Elaine R. | Markowitz, Victor M. | Mavrommatis, Konstantinos | McCorrison, Jamison M. | McDonald, Daniel | McEwen, Jean | McGuire, Amy L. | McInnes, Pamela | Mehta, Teena | Mihindukulasuriya, Kathie A. | Miller, Jason R. | Minx, Patrick J. | Newsham, Irene | Nusbaum, Chad | O’Laughlin, Michelle | Orvis, Joshua | Pagani, Ioanna | Palaniappan, Krishna | Patel, Shital M. | Pearson, Matthew | Peterson, Jane | Podar, Mircea | Pohl, Craig | Pollard, Katherine S. | Priest, Margaret E. | Proctor, Lita M. | Qin, Xiang | Raes, Jeroen | Ravel, Jacques | Reid, Jeffrey G. | Rho, Mina | Rhodes, Rosamond | Riehle, Kevin P. | Rivera, Maria C. | Rodriguez-Mueller, Beltran | Rogers, Yu-Hui | Ross, Matthew C. | Russ, Carsten | Sanka, Ravi K. | Pamela Sankar, J. | Sathirapongsasuti, Fah | Schloss, Jeffery A. | Schloss, Patrick D. | Schmidt, Thomas M. | Scholz, Matthew | Schriml, Lynn | Schubert, Alyxandria M. | Segata, Nicola | Segre, Julia A. | Shannon, William D. | Sharp, Richard R. | Sharpton, Thomas J. | Shenoy, Narmada | Sheth, Nihar U. | Simone, Gina A. | Singh, Indresh | Smillie, Chris S. | Sobel, Jack D. | Sommer, Daniel D. | Spicer, Paul | Sutton, Granger G. | Sykes, Sean M. | Tabbaa, Diana G. | Thiagarajan, Mathangi | Tomlinson, Chad M. | Torralba, Manolito | Treangen, Todd J. | Truty, Rebecca M. | Vishnivetskaya, Tatiana A. | Walker, Jason | Wang, Lu | Wang, Zhengyuan | Ward, Doyle V. | Warren, Wesley | Watson, Mark A. | Wellington, Christopher | Wetterstrand, Kris A. | White, James R. | Wilczek-Boney, Katarzyna | Wu, Yuan Qing | Wylie, Kristine M. | Wylie, Todd | Yandava, Chandri | Ye, Liang | Ye, Yuzhen | Yooseph, Shibu | Youmans, Bonnie P. | Zhang, Lan | Zhou, Yanjiao | Zhu, Yiming | Zoloth, Laurie | Zucker, Jeremy D. | Birren, Bruce W. | Gibbs, Richard A. | Highlander, Sarah K. | Weinstock, George M. | Wilson, Richard K. | White, Owen
Nature  2012;486(7402):215-221.
A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.
PMCID: PMC3377744  PMID: 22699610
15.  Pacific biosciences sequencing technology for genotyping and variation discovery in human data 
BMC Genomics  2012;13:375.
Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.
We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.
Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.
PMCID: PMC3443046  PMID: 22863213
16.  Genome-wide identification and characterization of replication origins by deep sequencing 
Genome Biology  2012;13(4):R27.
DNA replication initiates at distinct origins in eukaryotic genomes, but the genomic features that define these sites are not well understood.
We have taken a combined experimental and bioinformatic approach to identify and characterize origins of replication in three distantly related fission yeasts: Schizosaccharomyces pombe, Schizosaccharomyces octosporus and Schizosaccharomyces japonicus. Using single-molecule deep sequencing to construct amplification-free high-resolution replication profiles, we located origins and identified sequence motifs that predict origin function. We then mapped nucleosome occupancy by deep sequencing of mononucleosomal DNA from the corresponding species, finding that origins tend to occupy nucleosome-depleted regions.
The sequences that specify origins are evolutionarily plastic, with low complexity nucleosome-excluding sequences functioning in S. pombe and S. octosporus, and binding sites for trans-acting nucleosome-excluding proteins functioning in S. japonicus. Furthermore, chromosome-scale variation in replication timing is conserved independently of origin location and via a mechanism distinct from known heterochromatic effects on origin function. These results are consistent with a model in which origins are simply the nucleosome-depleted regions of the genome with the highest affinity for the origin recognition complex. This approach provides a general strategy for understanding the mechanisms that define DNA replication origins in eukaryotes.
PMCID: PMC3446301  PMID: 22531001
17.  Comparative Functional Genomics of the Fission Yeasts 
Science (New York, N.Y.)  2011;332(6032):930-936.
The fission yeast clade, comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus and S. japonicus, occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, suggesting a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
PMCID: PMC3131103  PMID: 21511999
18.  Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing 
Molecular ecology  2010;19(Suppl 1):212-227.
Differences in gene expression are thought to be an important source of phenotypic diversity, so dissecting the genetic components of natural variation in gene expression is important for understanding the evolutionary mechanisms that lead to adaptation. Gene expression is a complex trait that, in diploid organisms, results from transcription of both maternal and paternal alleles. Directly measuring allelic expression rather than total gene expression offers greater insight into regulatory variation. The recent emergence of high-throughput sequencing offers an unprecedented opportunity to study allelic transcription at a genomic scale for virtually any species. By sequencing transcript pools derived from heterozygous individuals, estimates of allelic expression can be directly obtained. The statistical power of this approach is influenced by the number of transcripts sequenced and the ability to unambiguously assign individual sequence fragments to specific alleles on the basis of transcribed nucleotide polymorphisms. Here, using mathematical modelling and computer simulations, we determine the minimum sequencing depth required to accurately measure relative allelic expression and detect allelic imbalance via high-throughput sequencing under a variety of conditions. We conclude that, within a species, a minimum of 500–1000 sequencing reads per gene are needed to test for allelic imbalance, and consequently, at least five to 10 millions reads are required for studying a genome expressing 10 000 genes. Finally, using 454 sequencing, we illustrate an application of allelic expression by testing for cis-regulatory divergence between closely related Drosophila species.
PMCID: PMC3217793  PMID: 20331781
cis-regulation; Drosophila melanogaster; Drosophila simulans; gene expression; hybrids
19.  Hybrid selection for sequencing pathogen genomes from clinical samples 
Genome Biology  2011;12(8):R73.
We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.
PMCID: PMC3245613  PMID: 21835008
20.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries 
Genome Biology  2011;12(2):R18.
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
PMCID: PMC3188800  PMID: 21338519
21.  A cellular memory of developmental history generates phenotypic diversity in C. elegans 
Current biology : CB  2010;20(2):149-155.
Early life experiences have a major impact on adult phenotypes [1–3]. However, the mechanisms by which animals retain a cellular memory of early experience are not well understood. Here we show that adult wild-type C. elegans that transiently passed through the stress-resistant dauer larval stage exhibit distinct gene expression profiles and life history traits, as compared to adult animals that bypassed this stage. Using chromatin immmunoprecipitation experiments coupled with massively parallel sequencing, we find that genome-wide levels of specific histone tail modifications are markedly altered in post-dauer animals. Mutations in subsets of genes implicated in chromatin remodeling abolish, or alter, the observed changes in gene expression and life history traits in post-dauer animals. Modifications to the epigenome as a consequence of early experience may contribute in part to a memory of early experience, and generate phenotypic variation in an isogenic population.
PMCID: PMC2990539  PMID: 20079644
22.  Mutations in TMEM216 perturb ciliogenesis and cause Joubert, Meckel and related syndromes 
Nature genetics  2010;42(7):619-625.
Joubert syndrome (JBTS), related disorders (JSRD) and Meckel syndrome (MKS) are ciliopathies. We now report that MKS2 and JBTS2 loci are allelic and due to mutations in TMEM216, encoding an uncharacterized tetraspan transmembrane protein. JBTS2 patients displayed frequent nephronophthisis and polydactytly, and two cases conformed to the Oro-Facio-Digital type VI phenotype, whereas skeletal dysplasia was common in MKS fetuses. A single p.R73L mutation was identified in all patients of Ashkenazi Jewish descent (n=10). TMEM216 localized to the base of primary cilia, and loss of TMEM216 in patient fibroblasts or following siRNA knockdown caused defective ciliogenesis and centrosomal docking, with concomitant hyperactivation of RhoA and Dishevelled. TMEM216 complexed with Meckelin, encoded by a gene also mutated in JSRD and MKS. Abrogation of tmem216 expression in zebrafish led to gastrulation defects that overlap with other ciliary morphants. The data implicate a new family of proteins in the ciliopathies, and further support allelism between ciliopathy disorders.
PMCID: PMC2894012  PMID: 20512146
24.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells 
Nature  2007;448(7153):553-560.
We report the application of single molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells. By obtaining over 4 billion bases of sequence from chromatin immunoprecipitated DNA, we generated genome-wide chromatin state maps of mouse embryonic stem cells, neural progenitor cells and embryonic fibroblasts. We find that lysine 4 and lysine 27 tri-methylation effectively discriminate genes that are expressed, poised for expression, or stably repressed, and therefore reflect cell state and lineage potential. Lysine 36 tri-methylation marks primary coding and non-coding transcripts, facilitating gene annotation. Lysine 9 and lysine 20 tri-methylation are detected at satellite, telomeric and active long-terminal repeats, and can spread into proximal unique sequences. Lysine 4 and lysine 9 tri-methylation mark imprinting control regions. Finally, we show that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms. This study provides a framework for the application of comprehensive chromatin profiling towards characterization of diverse mammalian cell populations.
PMCID: PMC2921165  PMID: 17603471
25.  A scalable, fully automated process for construction of sequence-ready barcoded libraries for 454 
Genome Biology  2010;11(2):R15.
An automated method for constructing libraries for 454 sequencing significantly reduces the cost and time required.
We present an automated, high throughput library construction process for 454 technology. Sample handling errors and cross-contamination are minimized via end-to-end barcoding of plasticware, along with molecular DNA barcoding of constructs. Automation-friendly magnetic bead-based size selection and cleanup steps have been devised, eliminating major bottlenecks and significant sources of error. Using this methodology, one technician can create 96 sequence-ready 454 libraries in 2 days, a dramatic improvement over the standard method.
PMCID: PMC2872875  PMID: 20137071

Results 1-25 (29)