Search tips
Search criteria

Results 1-25 (54)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Ebola Virus Epidemiology, Transmission, and Evolution during Seven Months in Sierra Leone 
Park, Daniel J. | Dudas, Gytis | Wohl, Shirlee | Goba, Augustine | Whitmer, Shannon L.M. | Andersen, Kristian G. | Sealfon, Rachel S. | Ladner, Jason T. | Kugelman, Jeffrey R. | Matranga, Christian B. | Winnicki, Sarah M. | Qu, James | Gire, Stephen K. | Gladden-Young, Adrianne | Jalloh, Simbirie | Nosamiefan, Dolo | Yozwiak, Nathan L. | Moses, Lina M. | Jiang, Pan-Pan | Lin, Aaron E. | Schaffner, Stephen F. | Bird, Brian | Towner, Jonathan | Mamoh, Mambu | Gbakie, Michael | Kanneh, Lansana | Kargbo, David | Massally, James L.B. | Kamara, Fatima K. | Konuwa, Edwin | Sellu, Josephine | Jalloh, Abdul A. | Mustapha, Ibrahim | Foday, Momoh | Yillah, Mohamed | Erickson, Bobbie R. | Sealy, Tara | Blau, Dianna | Paddock, Christopher | Brault, Aaron | Amman, Brian | Basile, Jane | Bearden, Scott | Belser, Jessica | Bergeron, Eric | Campbell, Shelley | Chakrabarti, Ayan | Dodd, Kimberly | Flint, Mike | Gibbons, Aridth | Goodman, Christin | Klena, John | McMullan, Laura | Morgan, Laura | Russell, Brandy | Salzer, Johanna | Sanchez, Angela | Wang, David | Jungreis, Irwin | Tomkins-Tinch, Christopher | Kislyuk, Andrey | Lin, Michael F. | Chapman, Sinead | MacInnis, Bronwyn | Matthews, Ashley | Bochicchio, James | Hensley, Lisa E. | Kuhn, Jens H. | Nusbaum, Chad | Schieffelin, John S. | Birren, Bruce W. | Forget, Marc | Nichol, Stuart T. | Palacios, Gustavo F. | Ndiaye, Daouda | Happi, Christian | Gevao, Sahr M. | Vandi, Mohamed A. | Kargbo, Brima | Holmes, Edward C. | Bedford, Trevor | Gnirke, Andreas | Ströher, Ute | Rambaut, Andrew | Garry, Robert F. | Sabeti, Pardis C.
Cell  2015;161(7):1516-1526.
The 2013–2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission and offered important information for outbreak response. Here, we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic. We confirm sustained human-to-human transmission within Sierra Leone and find no evidence for import or export of EBOV across national borders after its initial introduction. Using high-depth replicate sequencing, we observe both host-to-host transmission and recurrent emergence of intrahost genetic variants. We trace the increasing impact of purifying selection in suppressing the accumulation of nonsynonymous mutations over time. Finally, we note changes in the mucin-like domain of EBOV glycoprotein that merit further investigation. These findings clarify the movement of EBOV within the region and describe viral evolution during prolonged human-to-human transmission.
Graphical Abstract
•In Sierra Leone, transmission has primarily been within-country, not between-country•Infectious doses are large enough for intrahost variants to transmit between hosts•A prolonged epidemic removes deleterious mutations from the viral population•There is preliminary evidence for human RNA editing effects on the Ebola genome
Ebola virus genomes from 232 patients sampled over 7 months in Sierra Leone were sequenced. Transmission of intrahost genetic variants suggests a sufficiently high infectious dose during transmission. The human host may have caused direct alterations to the Ebola virus genome.
PMCID: PMC4503805  PMID: 26091036
2.  Comprehensive variation discovery in single human genomes 
Nature genetics  2014;46(12):1350-1355.
Complete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome, however calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging. To improve variant calling, we developed a new algorithm, DISCOVAR, and examined its performance on improved, low-cost sequence data. Using a newly created reference set of variants from finished sequence of 103 randomly chosen Fosmids, we find that some standard variant call sets miss up to 25% of variants. We show that the combination of new methods and improved data increases sensitivity several-fold, with the greatest impact in challenging regions of the human genome.
PMCID: PMC4244235  PMID: 25326702
3.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak 
Science (New York, N.Y.)  2014;345(6202):1369-1372.
In its largest outbreak, Ebola virus disease is spreading through Guinea, Liberia, Sierra Leone, and Nigeria. We sequenced 99 Ebola virus genomes from 78 patients in Sierra Leone to ∼2000× coverage. We observed a rapid accumulation of interhost and intrahost genetic variation, allowing us to characterize patterns of viral transmission over the initial weeks of the epidemic. This West African variant likely diverged from central African lineages around 2004, crossed from Guinea to Sierra Leone in May 2014, and has exhibited sustained human-to-human transmission subsequently, with no evidence of additional zoonotic sources. Because many of the mutations alter protein sequences and other biologically meaningful targets, they should be monitored for impact on diagnostics, vaccines, and therapies critical to outbreak response.
PMCID: PMC4431643  PMID: 25214632
4.  Massively Parallel Sequencing of Human Urinary Exosome/Microvesicle RNA Reveals a Predominance of Non-Coding RNA 
PLoS ONE  2014;9(5):e96094.
Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation.
PMCID: PMC4015934  PMID: 24816817
5.  Semiconductor-based DNA sequencing of histone modification states 
Nature communications  2013;4:2672.
The recent development of a semiconductor-based, non-optical DNA sequencing technology promises scalable, low-cost and rapid sequence data production. The technology has previously been applied mainly to genomic sequencing and targeted re-sequencing. Here we demonstrate the utility of Ion Torrent semiconductor-based sequencing for sensitive, efficient and rapid chromatin immunoprecipitation followed by sequencing (ChIP-seq) through the application of sample preparation methods that are optimized for ChIP-seq on the Ion Torrent platform. We leverage this method for epigenetic profiling of tumour tissues.
PMCID: PMC3917140  PMID: 24157732
6.  Mutations causing medullary cystic kidney disease type 1 (MCKD1) lie in a large VNTR in MUC1 missed by massively parallel sequencing 
Nature genetics  2013;45(3):299-303.
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS.
PMCID: PMC3901305  PMID: 23396133
7.  A high throughput Chromatin ImmunoPrecipitation approach reveals principles of dynamic gene regulation in mammals 
Molecular cell  2012;47(5):10.1016/j.molcel.2012.07.030.
Understanding the principles governing mammalian gene regulation has been hampered by the difficulty in measuring in-vivo binding dynamics of large numbers of transcription factors (TF) to DNA. Here, we develop a high-throughput Chromatin ImmunoPrecipitation (HT-ChIP) method to systematically map protein-DNA interactions. HT-ChIP was applied to define the dynamics of DNA binding by 25 TFs and 4 chromatin marks at 4 time-points following pathogen stimulus of dendritic cells. Analyzing over 180,000 TF-DNA interactions we find that TFs vary substantially in their temporal binding landscapes. This data suggests a model for transcription regulation whereby TF networks are hierarchically organized into cell differentiation factors, factors that bind targets prior to stimulus to prime them for induction, and factors that regulate specific gene programs. Overlaying HT-ChIP data on gene expression dynamics shows that many TF-DNA interactions are established prior to the stimuli, predominantly at immediate-early genes, and identified specific TF ensembles that coordinately regulate gene-induction.
PMCID: PMC3873101  PMID: 22940246
8.  Regulated aggregative multicellularity in a close unicellular relative of metazoa 
eLife  2013;2:e01287.
The evolution of metazoans from their unicellular ancestors was one of the most important events in the history of life. However, the cellular and genetic changes that ultimately led to the evolution of multicellularity are not known. In this study, we describe an aggregative multicellular stage in the protist Capsaspora owczarzaki, a close unicellular relative of metazoans. Remarkably, transition to the aggregative stage is associated with significant upregulation of orthologs of genes known to establish multicellularity and tissue architecture in metazoans. We further observe transitions in regulated alternative splicing during the C. owczarzaki life cycle, including the deployment of an exon network associated with signaling, a feature of splicing regulation so far only observed in metazoans. Our results reveal the existence of a highly regulated aggregative stage in C. owczarzaki and further suggest that features of aggregative behavior in an ancestral protist may had been co-opted to develop some multicellular properties currently seen in metazoans.
eLife digest
When living things made from many cells evolved from single-celled ancestors, it was a breakthrough in the history of life—and one that has occurred more than once. In fact, multicellular life has evolved independently at least 25 times, in groups as diverse as animals, fungi, plants, slime molds and seaweeds. There are broadly two ways to become multicellular. The most complex multicellular species, such as animals, will replicate a single cell, over and over, without separating the resultant cells. However, in species that are only occasionally multicellular, free-living cells tend instead to join together in one mass of many cells.
Evolution is constrained by its raw materials; so looking at the living relatives of a given species, or group, can lead to a better understanding of its evolution because its relatives contain clues about its ancestors. To gain insights into how animal multicellular life might have began; Sebé-Pedrós et al. studied the life cycle of the amoeboid organism Capsaspora owczarzaki. Found within the bodies of freshwater snails, this single-celled amoeba is a close relative of multicellular animals and could resemble one of their earliest ancestors.
At certain stages of the life cycle Sebé-Pedrós et al. noticed that individual amoebae gathered together to form a multicellular mass—something that had not been seen before in such a close relative of the animals. Moreover, the genes that ‘switched on’ when the amoebae began to aggregate are also found in animals; where, together with other genes, they control development and the formation of tissues. Sebé-Pedrós et al. suggest that the first multicellular animals could have recycled the genes that control the aggregation of single-celled species: in other words, genes that once controlled the changes that happen at different times in a life cycle, now control the changes that develop between different tissues at the same time.
Sebé-Pedrós et al. also observed that alternative splicing—a process that allows different proteins to be made from a single gene—occurs via two different mechanisms during the life cycle of Capsaspora. Most of the time Capsaspora employs a form of alternative splicing that is often seen in plants and fungi, and only rarely in animals; for the rest of the time it uses a form of alternative splicing similar to that used by animal cells.
The evolution of complex alternative splicing mechanisms is a hallmark feature of multicellular animals. The exploitation of two major forms of alternative splicing in Capsaspora could thus reflect an important transition during evolution that resulted in an increased diversity of proteins and in more complex gene regulation. Such a transition may ultimately have paved the way for the increased specialization of cell types seen in animals.
This glimpse into the possible transitions in gene regulation that contributed to the birth of multicellular animals indicates that the single-celled ancestor of the animals was likely more complex than previously thought. Future analyses of the animals’ close relatives may further improve our understanding of how single-celled organisms became multicellular animals.
PMCID: PMC3870316  PMID: 24368732
opisthokonts; Capsaspora; alternative splicing; cell differentiation; evolutionary transitions; Other
9.  Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen Saprolegnia parasitica 
PLoS Genetics  2013;9(6):e1003272.
Oomycetes in the class Saprolegniomycetidae of the Eukaryotic kingdom Stramenopila have evolved as severe pathogens of amphibians, crustaceans, fish and insects, resulting in major losses in aquaculture and damage to aquatic ecosystems. We have sequenced the 63 Mb genome of the fresh water fish pathogen, Saprolegnia parasitica. Approximately 1/3 of the assembled genome exhibits loss of heterozygosity, indicating an efficient mechanism for revealing new variation. Comparison of S. parasitica with plant pathogenic oomycetes suggests that during evolution the host cellular environment has driven distinct patterns of gene expansion and loss in the genomes of plant and animal pathogens. S. parasitica possesses one of the largest repertoires of proteases (270) among eukaryotes that are deployed in waves at different points during infection as determined from RNA-Seq data. In contrast, despite being capable of living saprotrophically, parasitism has led to loss of inorganic nitrogen and sulfur assimilation pathways, strikingly similar to losses in obligate plant pathogenic oomycetes and fungi. The large gene families that are hallmarks of plant pathogenic oomycetes such as Phytophthora appear to be lacking in S. parasitica, including those encoding RXLR effectors, Crinkler's, and Necrosis Inducing-Like Proteins (NLP). S. parasitica also has a very large kinome of 543 kinases, 10% of which is induced upon infection. Moreover, S. parasitica encodes several genes typical of animals or animal-pathogens and lacking from other oomycetes, including disintegrins and galactose-binding lectins, whose expression and evolutionary origins implicate horizontal gene transfer in the evolution of animal pathogenesis in S. parasitica.
Author Summary
Fish are an increasingly important source of animal protein globally, with aquaculture production rising dramatically over the past decade. Saprolegnia is a fungal-like oomycete and one of the most destructive fish pathogens, causing millions of dollars in losses to the aquaculture industry annually. Saprolegnia has also been linked to a worldwide decline in wild fish and amphibian populations. Here we describe the genome sequence of the first animal pathogenic oomycete and compare the genome content with the available plant pathogenic oomycetes. We found that Saprolegnia lacks the large effector families that are hallmarks of plant pathogenic oomycetes, showing evolutionary adaptation to the host. Moreover, Saprolegnia harbors pathogenesis-related genes that were derived by lateral gene transfer from the host and other animal pathogens. The retrotransposon LINE family also appears to be acquired from animal lineages. By transcriptome analysis we show a high rate of allelic variation, which reveals rapidly evolving genes and potentially adaptive evolutionary mechanisms coupled to selective pressures exerted by the animal host. The genome and transcriptome data, as well as subsequent biochemical analyses, provided us with insight in the disease process of Saprolegnia at a molecular and cellular level, providing us with targets for sustainable control of Saprolegnia.
PMCID: PMC3681718  PMID: 23785293
10.  Characterizing and measuring bias in sequence data 
Genome Biology  2013;14(5):R51.
DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.
We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.
The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.
PMCID: PMC4053816  PMID: 23718773
11.  Premetazoan genome evolution and the regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta 
Genome Biology  2013;14(2):R15.
Metazoan multicellularity is rooted in mechanisms of cell adhesion, signaling, and differentiation that first evolved in the progenitors of metazoans. To reconstruct the genome composition of metazoan ancestors, we sequenced the genome and transcriptome of the choanoflagellate Salpingoeca rosetta, a close relative of metazoans that forms rosette-shaped colonies of cells.
A comparison of the 55 Mb S. rosetta genome with genomes from diverse opisthokonts suggests that the origin of metazoans was preceded by a period of dynamic gene gain and loss. The S. rosetta genome encodes homologs of cell adhesion, neuropeptide, and glycosphingolipid metabolism genes previously found only in metazoans and expands the repertoire of genes inferred to have been present in the progenitors of metazoans and choanoflagellates. Transcriptome analysis revealed that all four S. rosetta septins are upregulated in colonies relative to single cells, suggesting that these conserved cytokinesis proteins may regulate incomplete cytokinesis during colony development. Furthermore, genes shared exclusively by metazoans and choanoflagellates were disproportionately upregulated in colonies and the single cells from which they develop.
The S. rosetta genome sequence refines the catalog of metazoan-specific genes while also extending the evolutionary history of certain gene families that are central to metazoan biology. Transcriptome data suggest that conserved cytokinesis genes, including septins, may contribute to S. rosetta colony formation and indicate that the initiation of colony development may preferentially draw upon genes shared with metazoans, while later stages of colony maturation are likely regulated by genes unique to S. rosetta.
PMCID: PMC4054682  PMID: 23419129
12.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data 
Nature biotechnology  2011;29(7):644-652.
Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
PMCID: PMC3571712  PMID: 21572440
13.  How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? 
BMC Genomics  2012;13:734.
High-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. A central challenge in designing RNA-Seq-based experiments is estimating a priori the number of reads per sample needed to detect and quantify thousands of individual transcripts with a large dynamic range of abundance.
We have conducted a systematic examination of how changes in the number of RNA-Seq reads per sample influences both profiling of a single bacterial transcriptome and the comparison of gene expression among samples. Our findings suggest that the number of reads typically produced in a single lane of the Illumina HiSeq sequencer far exceeds the number needed to saturate the annotated transcriptomes of diverse bacteria growing in monoculture. Moreover, as sequencing depth increases, so too does the detection of cDNAs that likely correspond to spurious transcripts or genomic DNA contamination. Finally, even when dozens of barcoded individual cDNA libraries are sequenced in a single lane, the vast majority of transcripts in each sample can be detected and numerous genes differentially expressed between samples can be identified.
Our analysis provides a guide for the many researchers seeking to determine the appropriate sequencing depth for RNA-Seq-based studies of diverse bacterial species.
PMCID: PMC3543199  PMID: 23270466
14.  A framework for human microbiome research 
Methé, Barbara A. | Nelson, Karen E. | Pop, Mihai | Creasy, Heather H. | Giglio, Michelle G. | Huttenhower, Curtis | Gevers, Dirk | Petrosino, Joseph F. | Abubucker, Sahar | Badger, Jonathan H. | Chinwalla, Asif T. | Earl, Ashlee M. | FitzGerald, Michael G. | Fulton, Robert S. | Hallsworth-Pepin, Kymberlie | Lobos, Elizabeth A. | Madupu, Ramana | Magrini, Vincent | Martin, John C. | Mitreva, Makedonka | Muzny, Donna M. | Sodergren, Erica J. | Versalovic, James | Wollam, Aye M. | Worley, Kim C. | Wortman, Jennifer R. | Young, Sarah K. | Zeng, Qiandong | Aagaard, Kjersti M. | Abolude, Olukemi O. | Allen-Vercoe, Emma | Alm, Eric J. | Alvarado, Lucia | Andersen, Gary L. | Anderson, Scott | Appelbaum, Elizabeth | Arachchi, Harindra M. | Armitage, Gary | Arze, Cesar A. | Ayvaz, Tulin | Baker, Carl C. | Begg, Lisa | Belachew, Tsegahiwot | Bhonagiri, Veena | Bihan, Monika | Blaser, Martin J. | Bloom, Toby | Vivien Bonazzi, J. | Brooks, Paul | Buck, Gregory A. | Buhay, Christian J. | Busam, Dana A. | Campbell, Joseph L. | Canon, Shane R. | Cantarel, Brandi L. | Chain, Patrick S. | Chen, I-Min A. | Chen, Lei | Chhibba, Shaila | Chu, Ken | Ciulla, Dawn M. | Clemente, Jose C. | Clifton, Sandra W. | Conlan, Sean | Crabtree, Jonathan | Cutting, Mary A. | Davidovics, Noam J. | Davis, Catherine C. | DeSantis, Todd Z. | Deal, Carolyn | Delehaunty, Kimberley D. | Dewhirst, Floyd E. | Deych, Elena | Ding, Yan | Dooling, David J. | Dugan, Shannon P. | Dunne, Wm. Michael | Durkin, A. Scott | Edgar, Robert C. | Erlich, Rachel L. | Farmer, Candace N. | Farrell, Ruth M. | Faust, Karoline | Feldgarden, Michael | Felix, Victor M. | Fisher, Sheila | Fodor, Anthony A. | Forney, Larry | Foster, Leslie | Di Francesco, Valentina | Friedman, Jonathan | Friedrich, Dennis C. | Fronick, Catrina C. | Fulton, Lucinda L. | Gao, Hongyu | Garcia, Nathalia | Giannoukos, Georgia | Giblin, Christina | Giovanni, Maria Y. | Goldberg, Jonathan M. | Goll, Johannes | Gonzalez, Antonio | Griggs, Allison | Gujja, Sharvari | Haas, Brian J. | Hamilton, Holli A. | Harris, Emily L. | Hepburn, Theresa A. | Herter, Brandi | Hoffmann, Diane E. | Holder, Michael E. | Howarth, Clinton | Huang, Katherine H. | Huse, Susan M. | Izard, Jacques | Jansson, Janet K. | Jiang, Huaiyang | Jordan, Catherine | Joshi, Vandita | Katancik, James A. | Keitel, Wendy A. | Kelley, Scott T. | Kells, Cristyn | Kinder-Haake, Susan | King, Nicholas B. | Knight, Rob | Knights, Dan | Kong, Heidi H. | Koren, Omry | Koren, Sergey | Kota, Karthik C. | Kovar, Christie L. | Kyrpides, Nikos C. | La Rosa, Patricio S. | Lee, Sandra L. | Lemon, Katherine P. | Lennon, Niall | Lewis, Cecil M. | Lewis, Lora | Ley, Ruth E. | Li, Kelvin | Liolios, Konstantinos | Liu, Bo | Liu, Yue | Lo, Chien-Chi | Lozupone, Catherine A. | Lunsford, R. Dwayne | Madden, Tessa | Mahurkar, Anup A. | Mannon, Peter J. | Mardis, Elaine R. | Markowitz, Victor M. | Mavrommatis, Konstantinos | McCorrison, Jamison M. | McDonald, Daniel | McEwen, Jean | McGuire, Amy L. | McInnes, Pamela | Mehta, Teena | Mihindukulasuriya, Kathie A. | Miller, Jason R. | Minx, Patrick J. | Newsham, Irene | Nusbaum, Chad | O’Laughlin, Michelle | Orvis, Joshua | Pagani, Ioanna | Palaniappan, Krishna | Patel, Shital M. | Pearson, Matthew | Peterson, Jane | Podar, Mircea | Pohl, Craig | Pollard, Katherine S. | Priest, Margaret E. | Proctor, Lita M. | Qin, Xiang | Raes, Jeroen | Ravel, Jacques | Reid, Jeffrey G. | Rho, Mina | Rhodes, Rosamond | Riehle, Kevin P. | Rivera, Maria C. | Rodriguez-Mueller, Beltran | Rogers, Yu-Hui | Ross, Matthew C. | Russ, Carsten | Sanka, Ravi K. | Pamela Sankar, J. | Sathirapongsasuti, Fah | Schloss, Jeffery A. | Schloss, Patrick D. | Schmidt, Thomas M. | Scholz, Matthew | Schriml, Lynn | Schubert, Alyxandria M. | Segata, Nicola | Segre, Julia A. | Shannon, William D. | Sharp, Richard R. | Sharpton, Thomas J. | Shenoy, Narmada | Sheth, Nihar U. | Simone, Gina A. | Singh, Indresh | Smillie, Chris S. | Sobel, Jack D. | Sommer, Daniel D. | Spicer, Paul | Sutton, Granger G. | Sykes, Sean M. | Tabbaa, Diana G. | Thiagarajan, Mathangi | Tomlinson, Chad M. | Torralba, Manolito | Treangen, Todd J. | Truty, Rebecca M. | Vishnivetskaya, Tatiana A. | Walker, Jason | Wang, Lu | Wang, Zhengyuan | Ward, Doyle V. | Warren, Wesley | Watson, Mark A. | Wellington, Christopher | Wetterstrand, Kris A. | White, James R. | Wilczek-Boney, Katarzyna | Wu, Yuan Qing | Wylie, Kristine M. | Wylie, Todd | Yandava, Chandri | Ye, Liang | Ye, Yuzhen | Yooseph, Shibu | Youmans, Bonnie P. | Zhang, Lan | Zhou, Yanjiao | Zhu, Yiming | Zoloth, Laurie | Zucker, Jeremy D. | Birren, Bruce W. | Gibbs, Richard A. | Highlander, Sarah K. | Weinstock, George M. | Wilson, Richard K. | White, Owen
Nature  2012;486(7402):215-221.
A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.
PMCID: PMC3377744  PMID: 22699610
15.  Pacific biosciences sequencing technology for genotyping and variation discovery in human data 
BMC Genomics  2012;13:375.
Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.
We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.
Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.
PMCID: PMC3443046  PMID: 22863213
16.  Genome-wide identification and characterization of replication origins by deep sequencing 
Genome Biology  2012;13(4):R27.
DNA replication initiates at distinct origins in eukaryotic genomes, but the genomic features that define these sites are not well understood.
We have taken a combined experimental and bioinformatic approach to identify and characterize origins of replication in three distantly related fission yeasts: Schizosaccharomyces pombe, Schizosaccharomyces octosporus and Schizosaccharomyces japonicus. Using single-molecule deep sequencing to construct amplification-free high-resolution replication profiles, we located origins and identified sequence motifs that predict origin function. We then mapped nucleosome occupancy by deep sequencing of mononucleosomal DNA from the corresponding species, finding that origins tend to occupy nucleosome-depleted regions.
The sequences that specify origins are evolutionarily plastic, with low complexity nucleosome-excluding sequences functioning in S. pombe and S. octosporus, and binding sites for trans-acting nucleosome-excluding proteins functioning in S. japonicus. Furthermore, chromosome-scale variation in replication timing is conserved independently of origin location and via a mechanism distinct from known heterochromatic effects on origin function. These results are consistent with a model in which origins are simply the nucleosome-depleted regions of the genome with the highest affinity for the origin recognition complex. This approach provides a general strategy for understanding the mechanisms that define DNA replication origins in eukaryotes.
PMCID: PMC3446301  PMID: 22531001
18.  Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes 
Genome Biology  2012;13(3):r23.
We have developed a process for transcriptome analysis of bacterial communities that accommodates both intact and fragmented starting RNA and combines efficient rRNA removal with strand-specific RNA-seq. We applied this approach to an RNA mixture derived from three diverse cultured bacterial species and to RNA isolated from clinical stool samples. The resulting expression profiles were highly reproducible, enriched up to 40-fold for non-rRNA transcripts, and correlated well with profiles representing undepleted total RNA.
PMCID: PMC3439974  PMID: 22455878
19.  Comparative Functional Genomics of the Fission Yeasts 
Science (New York, N.Y.)  2011;332(6032):930-936.
The fission yeast clade, comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus and S. japonicus, occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, suggesting a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
PMCID: PMC3131103  PMID: 21511999
20.  Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing 
Molecular ecology  2010;19(Suppl 1):212-227.
Differences in gene expression are thought to be an important source of phenotypic diversity, so dissecting the genetic components of natural variation in gene expression is important for understanding the evolutionary mechanisms that lead to adaptation. Gene expression is a complex trait that, in diploid organisms, results from transcription of both maternal and paternal alleles. Directly measuring allelic expression rather than total gene expression offers greater insight into regulatory variation. The recent emergence of high-throughput sequencing offers an unprecedented opportunity to study allelic transcription at a genomic scale for virtually any species. By sequencing transcript pools derived from heterozygous individuals, estimates of allelic expression can be directly obtained. The statistical power of this approach is influenced by the number of transcripts sequenced and the ability to unambiguously assign individual sequence fragments to specific alleles on the basis of transcribed nucleotide polymorphisms. Here, using mathematical modelling and computer simulations, we determine the minimum sequencing depth required to accurately measure relative allelic expression and detect allelic imbalance via high-throughput sequencing under a variety of conditions. We conclude that, within a species, a minimum of 500–1000 sequencing reads per gene are needed to test for allelic imbalance, and consequently, at least five to 10 millions reads are required for studying a genome expressing 10 000 genes. Finally, using 454 sequencing, we illustrate an application of allelic expression by testing for cis-regulatory divergence between closely related Drosophila species.
PMCID: PMC3217793  PMID: 20331781
cis-regulation; Drosophila melanogaster; Drosophila simulans; gene expression; hybrids
21.  Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells 
Nature biotechnology  2011;29(5):436-442.
Regulation of RNA levels is determined through the interplay between RNA production, processing and degradation. However, since most global studies of RNA regulation do not distinguish the separate contributions of these processes, relatively little is known about how they are temporally integrated to determine changes in RNA levels. In particular, while some studies emphasize the role of changes in the rate of transcription, others suggest a prominent involvement of time-varying degradation rates. Here, we combine metabolic labeling of RNA at high temporal resolution with advanced RNA quantification assays and computational modeling to estimate RNA transcription and degradation rates during the model response of immune dendritic cells (DCs) to pathogens. We find that changes in transcription rates determine the majority of temporal changes in RNA levels, but that changes in degradation rate are important for shaping sharp ‘peaked’ responses. Furthermore, transcription rate changes precede corresponding changes in RNA level by a small lag (15-30 min), which is shorter for induced than for repressed genes. We used massively parallel sequencing of the newly-transcribed RNA population – including non-polyadenylated transcripts – to estimate constant RNA degradation and processing rates. We find that temporally constant degradation rates vary significantly between genes and contribute substantially to the observed differences in the dynamic response, and that specific groups of transcripts, mostly cytokines and transcription factors, are undergoing faster mRNA maturation. Our study provides a new quantitative approach to study key steps in the integrative process of RNA regulation.
PMCID: PMC3114636  PMID: 21516085
23.  Complete Genome Sequence of Algoriphagus sp. PR1, Bacterial Prey of a Colony-Forming Choanoflagellate ▿  
Journal of Bacteriology  2010;193(6):1485-1486.
Bacteria are the primary food source of choanoflagellates, the closest known relatives of animals. Studying signaling interactions between the Gram-negative Bacteroidetes bacterium Algoriphagus sp. PR1 and its predator, the choanoflagellate Salpingoeca rosetta, provides a promising avenue for testing hypotheses regarding the involvement of bacteria in animal evolution. Here we announce the complete genome sequence of Algoriphagus sp. PR1 and initial findings from its annotation.
PMCID: PMC3067637  PMID: 21183675
24.  Hybrid selection for sequencing pathogen genomes from clinical samples 
Genome Biology  2011;12(8):R73.
We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.
PMCID: PMC3245613  PMID: 21835008
25.  High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans 
PLoS ONE  2011;6(8):e22365.
Antibodies' protective, pathological, and therapeutic properties result from their considerable diversity. This diversity is almost limitless in potential, but actual diversity is still poorly understood. Here we use deep sequencing to characterize the diversity of the heavy-chain CDR3 region, the most important contributor to antibody binding specificity, and the constituent V, D, and J segments that comprise it. We find that, during the stepwise D-J and then V-DJ recombination events, the choice of D and J segments exert some bias on each other; however, we find the choice of the V segment is essentially independent of both. V, D, and J segments are utilized with different frequencies, resulting in a highly skewed representation of VDJ combinations in the repertoire. Nevertheless, the pattern of segment usage was almost identical between two different individuals. The pattern of V, D, and J segment usage and recombination was insufficient to explain overlap that was observed between the two individuals' CDR3 repertoires. Finally, we find that while there are a near-infinite number of heavy-chain CDR3s in principle, there are about 3–9 million in the blood of an adult human being.
PMCID: PMC3150326  PMID: 21829618

Results 1-25 (54)