PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (51)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Massively Parallel Sequencing of Human Urinary Exosome/Microvesicle RNA Reveals a Predominance of Non-Coding RNA 
PLoS ONE  2014;9(5):e96094.
Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation.
doi:10.1371/journal.pone.0096094
PMCID: PMC4015934  PMID: 24816817
2.  Semiconductor-based DNA sequencing of histone modification states 
Nature communications  2013;4:2672.
The recent development of a semiconductor-based, non-optical DNA sequencing technology promises scalable, low-cost and rapid sequence data production. The technology has previously been applied mainly to genomic sequencing and targeted re-sequencing. Here we demonstrate the utility of Ion Torrent semiconductor-based sequencing for sensitive, efficient and rapid chromatin immunoprecipitation followed by sequencing (ChIP-seq) through the application of sample preparation methods that are optimized for ChIP-seq on the Ion Torrent platform. We leverage this method for epigenetic profiling of tumour tissues.
doi:10.1038/ncomms3672
PMCID: PMC3917140  PMID: 24157732
3.  Mutations causing medullary cystic kidney disease type 1 (MCKD1) lie in a large VNTR in MUC1 missed by massively parallel sequencing 
Nature genetics  2013;45(3):299-303.
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS.
doi:10.1038/ng.2543
PMCID: PMC3901305  PMID: 23396133
4.  A high throughput Chromatin ImmunoPrecipitation approach reveals principles of dynamic gene regulation in mammals 
Molecular cell  2012;47(5):10.1016/j.molcel.2012.07.030.
Understanding the principles governing mammalian gene regulation has been hampered by the difficulty in measuring in-vivo binding dynamics of large numbers of transcription factors (TF) to DNA. Here, we develop a high-throughput Chromatin ImmunoPrecipitation (HT-ChIP) method to systematically map protein-DNA interactions. HT-ChIP was applied to define the dynamics of DNA binding by 25 TFs and 4 chromatin marks at 4 time-points following pathogen stimulus of dendritic cells. Analyzing over 180,000 TF-DNA interactions we find that TFs vary substantially in their temporal binding landscapes. This data suggests a model for transcription regulation whereby TF networks are hierarchically organized into cell differentiation factors, factors that bind targets prior to stimulus to prime them for induction, and factors that regulate specific gene programs. Overlaying HT-ChIP data on gene expression dynamics shows that many TF-DNA interactions are established prior to the stimuli, predominantly at immediate-early genes, and identified specific TF ensembles that coordinately regulate gene-induction.
doi:10.1016/j.molcel.2012.07.030
PMCID: PMC3873101  PMID: 22940246
5.  Regulated aggregative multicellularity in a close unicellular relative of metazoa 
eLife  2013;2:e01287.
The evolution of metazoans from their unicellular ancestors was one of the most important events in the history of life. However, the cellular and genetic changes that ultimately led to the evolution of multicellularity are not known. In this study, we describe an aggregative multicellular stage in the protist Capsaspora owczarzaki, a close unicellular relative of metazoans. Remarkably, transition to the aggregative stage is associated with significant upregulation of orthologs of genes known to establish multicellularity and tissue architecture in metazoans. We further observe transitions in regulated alternative splicing during the C. owczarzaki life cycle, including the deployment of an exon network associated with signaling, a feature of splicing regulation so far only observed in metazoans. Our results reveal the existence of a highly regulated aggregative stage in C. owczarzaki and further suggest that features of aggregative behavior in an ancestral protist may had been co-opted to develop some multicellular properties currently seen in metazoans.
DOI: http://dx.doi.org/10.7554/eLife.01287.001
eLife digest
When living things made from many cells evolved from single-celled ancestors, it was a breakthrough in the history of life—and one that has occurred more than once. In fact, multicellular life has evolved independently at least 25 times, in groups as diverse as animals, fungi, plants, slime molds and seaweeds. There are broadly two ways to become multicellular. The most complex multicellular species, such as animals, will replicate a single cell, over and over, without separating the resultant cells. However, in species that are only occasionally multicellular, free-living cells tend instead to join together in one mass of many cells.
Evolution is constrained by its raw materials; so looking at the living relatives of a given species, or group, can lead to a better understanding of its evolution because its relatives contain clues about its ancestors. To gain insights into how animal multicellular life might have began; Sebé-Pedrós et al. studied the life cycle of the amoeboid organism Capsaspora owczarzaki. Found within the bodies of freshwater snails, this single-celled amoeba is a close relative of multicellular animals and could resemble one of their earliest ancestors.
At certain stages of the life cycle Sebé-Pedrós et al. noticed that individual amoebae gathered together to form a multicellular mass—something that had not been seen before in such a close relative of the animals. Moreover, the genes that ‘switched on’ when the amoebae began to aggregate are also found in animals; where, together with other genes, they control development and the formation of tissues. Sebé-Pedrós et al. suggest that the first multicellular animals could have recycled the genes that control the aggregation of single-celled species: in other words, genes that once controlled the changes that happen at different times in a life cycle, now control the changes that develop between different tissues at the same time.
Sebé-Pedrós et al. also observed that alternative splicing—a process that allows different proteins to be made from a single gene—occurs via two different mechanisms during the life cycle of Capsaspora. Most of the time Capsaspora employs a form of alternative splicing that is often seen in plants and fungi, and only rarely in animals; for the rest of the time it uses a form of alternative splicing similar to that used by animal cells.
The evolution of complex alternative splicing mechanisms is a hallmark feature of multicellular animals. The exploitation of two major forms of alternative splicing in Capsaspora could thus reflect an important transition during evolution that resulted in an increased diversity of proteins and in more complex gene regulation. Such a transition may ultimately have paved the way for the increased specialization of cell types seen in animals.
This glimpse into the possible transitions in gene regulation that contributed to the birth of multicellular animals indicates that the single-celled ancestor of the animals was likely more complex than previously thought. Future analyses of the animals’ close relatives may further improve our understanding of how single-celled organisms became multicellular animals.
DOI: http://dx.doi.org/10.7554/eLife.01287.002
doi:10.7554/eLife.01287
PMCID: PMC3870316  PMID: 24368732
opisthokonts; Capsaspora; alternative splicing; cell differentiation; evolutionary transitions; Other
6.  Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen Saprolegnia parasitica 
PLoS Genetics  2013;9(6):e1003272.
Oomycetes in the class Saprolegniomycetidae of the Eukaryotic kingdom Stramenopila have evolved as severe pathogens of amphibians, crustaceans, fish and insects, resulting in major losses in aquaculture and damage to aquatic ecosystems. We have sequenced the 63 Mb genome of the fresh water fish pathogen, Saprolegnia parasitica. Approximately 1/3 of the assembled genome exhibits loss of heterozygosity, indicating an efficient mechanism for revealing new variation. Comparison of S. parasitica with plant pathogenic oomycetes suggests that during evolution the host cellular environment has driven distinct patterns of gene expansion and loss in the genomes of plant and animal pathogens. S. parasitica possesses one of the largest repertoires of proteases (270) among eukaryotes that are deployed in waves at different points during infection as determined from RNA-Seq data. In contrast, despite being capable of living saprotrophically, parasitism has led to loss of inorganic nitrogen and sulfur assimilation pathways, strikingly similar to losses in obligate plant pathogenic oomycetes and fungi. The large gene families that are hallmarks of plant pathogenic oomycetes such as Phytophthora appear to be lacking in S. parasitica, including those encoding RXLR effectors, Crinkler's, and Necrosis Inducing-Like Proteins (NLP). S. parasitica also has a very large kinome of 543 kinases, 10% of which is induced upon infection. Moreover, S. parasitica encodes several genes typical of animals or animal-pathogens and lacking from other oomycetes, including disintegrins and galactose-binding lectins, whose expression and evolutionary origins implicate horizontal gene transfer in the evolution of animal pathogenesis in S. parasitica.
Author Summary
Fish are an increasingly important source of animal protein globally, with aquaculture production rising dramatically over the past decade. Saprolegnia is a fungal-like oomycete and one of the most destructive fish pathogens, causing millions of dollars in losses to the aquaculture industry annually. Saprolegnia has also been linked to a worldwide decline in wild fish and amphibian populations. Here we describe the genome sequence of the first animal pathogenic oomycete and compare the genome content with the available plant pathogenic oomycetes. We found that Saprolegnia lacks the large effector families that are hallmarks of plant pathogenic oomycetes, showing evolutionary adaptation to the host. Moreover, Saprolegnia harbors pathogenesis-related genes that were derived by lateral gene transfer from the host and other animal pathogens. The retrotransposon LINE family also appears to be acquired from animal lineages. By transcriptome analysis we show a high rate of allelic variation, which reveals rapidly evolving genes and potentially adaptive evolutionary mechanisms coupled to selective pressures exerted by the animal host. The genome and transcriptome data, as well as subsequent biochemical analyses, provided us with insight in the disease process of Saprolegnia at a molecular and cellular level, providing us with targets for sustainable control of Saprolegnia.
doi:10.1371/journal.pgen.1003272
PMCID: PMC3681718  PMID: 23785293
7.  Characterizing and measuring bias in sequence data 
Genome Biology  2013;14(5):R51.
Background
DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.
Results
We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.
Conclusions
The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.
doi:10.1186/gb-2013-14-5-r51
PMCID: PMC4053816  PMID: 23718773
8.  Premetazoan genome evolution and the regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta 
Genome Biology  2013;14(2):R15.
Background
Metazoan multicellularity is rooted in mechanisms of cell adhesion, signaling, and differentiation that first evolved in the progenitors of metazoans. To reconstruct the genome composition of metazoan ancestors, we sequenced the genome and transcriptome of the choanoflagellate Salpingoeca rosetta, a close relative of metazoans that forms rosette-shaped colonies of cells.
Results
A comparison of the 55 Mb S. rosetta genome with genomes from diverse opisthokonts suggests that the origin of metazoans was preceded by a period of dynamic gene gain and loss. The S. rosetta genome encodes homologs of cell adhesion, neuropeptide, and glycosphingolipid metabolism genes previously found only in metazoans and expands the repertoire of genes inferred to have been present in the progenitors of metazoans and choanoflagellates. Transcriptome analysis revealed that all four S. rosetta septins are upregulated in colonies relative to single cells, suggesting that these conserved cytokinesis proteins may regulate incomplete cytokinesis during colony development. Furthermore, genes shared exclusively by metazoans and choanoflagellates were disproportionately upregulated in colonies and the single cells from which they develop.
Conclusions
The S. rosetta genome sequence refines the catalog of metazoan-specific genes while also extending the evolutionary history of certain gene families that are central to metazoan biology. Transcriptome data suggest that conserved cytokinesis genes, including septins, may contribute to S. rosetta colony formation and indicate that the initiation of colony development may preferentially draw upon genes shared with metazoans, while later stages of colony maturation are likely regulated by genes unique to S. rosetta.
doi:10.1186/gb-2013-14-2-r15
PMCID: PMC4054682  PMID: 23419129
9.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data 
Nature biotechnology  2011;29(7):644-652.
Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
doi:10.1038/nbt.1883
PMCID: PMC3571712  PMID: 21572440
10.  How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? 
BMC Genomics  2012;13:734.
Background
High-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. A central challenge in designing RNA-Seq-based experiments is estimating a priori the number of reads per sample needed to detect and quantify thousands of individual transcripts with a large dynamic range of abundance.
Results
We have conducted a systematic examination of how changes in the number of RNA-Seq reads per sample influences both profiling of a single bacterial transcriptome and the comparison of gene expression among samples. Our findings suggest that the number of reads typically produced in a single lane of the Illumina HiSeq sequencer far exceeds the number needed to saturate the annotated transcriptomes of diverse bacteria growing in monoculture. Moreover, as sequencing depth increases, so too does the detection of cDNAs that likely correspond to spurious transcripts or genomic DNA contamination. Finally, even when dozens of barcoded individual cDNA libraries are sequenced in a single lane, the vast majority of transcripts in each sample can be detected and numerous genes differentially expressed between samples can be identified.
Conclusions
Our analysis provides a guide for the many researchers seeking to determine the appropriate sequencing depth for RNA-Seq-based studies of diverse bacterial species.
doi:10.1186/1471-2164-13-734
PMCID: PMC3543199  PMID: 23270466
11.  A framework for human microbiome research 
Methé, Barbara A. | Nelson, Karen E. | Pop, Mihai | Creasy, Heather H. | Giglio, Michelle G. | Huttenhower, Curtis | Gevers, Dirk | Petrosino, Joseph F. | Abubucker, Sahar | Badger, Jonathan H. | Chinwalla, Asif T. | Earl, Ashlee M. | FitzGerald, Michael G. | Fulton, Robert S. | Hallsworth-Pepin, Kymberlie | Lobos, Elizabeth A. | Madupu, Ramana | Magrini, Vincent | Martin, John C. | Mitreva, Makedonka | Muzny, Donna M. | Sodergren, Erica J. | Versalovic, James | Wollam, Aye M. | Worley, Kim C. | Wortman, Jennifer R. | Young, Sarah K. | Zeng, Qiandong | Aagaard, Kjersti M. | Abolude, Olukemi O. | Allen-Vercoe, Emma | Alm, Eric J. | Alvarado, Lucia | Andersen, Gary L. | Anderson, Scott | Appelbaum, Elizabeth | Arachchi, Harindra M. | Armitage, Gary | Arze, Cesar A. | Ayvaz, Tulin | Baker, Carl C. | Begg, Lisa | Belachew, Tsegahiwot | Bhonagiri, Veena | Bihan, Monika | Blaser, Martin J. | Bloom, Toby | Vivien Bonazzi, J. | Brooks, Paul | Buck, Gregory A. | Buhay, Christian J. | Busam, Dana A. | Campbell, Joseph L. | Canon, Shane R. | Cantarel, Brandi L. | Chain, Patrick S. | Chen, I-Min A. | Chen, Lei | Chhibba, Shaila | Chu, Ken | Ciulla, Dawn M. | Clemente, Jose C. | Clifton, Sandra W. | Conlan, Sean | Crabtree, Jonathan | Cutting, Mary A. | Davidovics, Noam J. | Davis, Catherine C. | DeSantis, Todd Z. | Deal, Carolyn | Delehaunty, Kimberley D. | Dewhirst, Floyd E. | Deych, Elena | Ding, Yan | Dooling, David J. | Dugan, Shannon P. | Dunne, Wm. Michael | Durkin, A. Scott | Edgar, Robert C. | Erlich, Rachel L. | Farmer, Candace N. | Farrell, Ruth M. | Faust, Karoline | Feldgarden, Michael | Felix, Victor M. | Fisher, Sheila | Fodor, Anthony A. | Forney, Larry | Foster, Leslie | Di Francesco, Valentina | Friedman, Jonathan | Friedrich, Dennis C. | Fronick, Catrina C. | Fulton, Lucinda L. | Gao, Hongyu | Garcia, Nathalia | Giannoukos, Georgia | Giblin, Christina | Giovanni, Maria Y. | Goldberg, Jonathan M. | Goll, Johannes | Gonzalez, Antonio | Griggs, Allison | Gujja, Sharvari | Haas, Brian J. | Hamilton, Holli A. | Harris, Emily L. | Hepburn, Theresa A. | Herter, Brandi | Hoffmann, Diane E. | Holder, Michael E. | Howarth, Clinton | Huang, Katherine H. | Huse, Susan M. | Izard, Jacques | Jansson, Janet K. | Jiang, Huaiyang | Jordan, Catherine | Joshi, Vandita | Katancik, James A. | Keitel, Wendy A. | Kelley, Scott T. | Kells, Cristyn | Kinder-Haake, Susan | King, Nicholas B. | Knight, Rob | Knights, Dan | Kong, Heidi H. | Koren, Omry | Koren, Sergey | Kota, Karthik C. | Kovar, Christie L. | Kyrpides, Nikos C. | La Rosa, Patricio S. | Lee, Sandra L. | Lemon, Katherine P. | Lennon, Niall | Lewis, Cecil M. | Lewis, Lora | Ley, Ruth E. | Li, Kelvin | Liolios, Konstantinos | Liu, Bo | Liu, Yue | Lo, Chien-Chi | Lozupone, Catherine A. | Lunsford, R. Dwayne | Madden, Tessa | Mahurkar, Anup A. | Mannon, Peter J. | Mardis, Elaine R. | Markowitz, Victor M. | Mavrommatis, Konstantinos | McCorrison, Jamison M. | McDonald, Daniel | McEwen, Jean | McGuire, Amy L. | McInnes, Pamela | Mehta, Teena | Mihindukulasuriya, Kathie A. | Miller, Jason R. | Minx, Patrick J. | Newsham, Irene | Nusbaum, Chad | O’Laughlin, Michelle | Orvis, Joshua | Pagani, Ioanna | Palaniappan, Krishna | Patel, Shital M. | Pearson, Matthew | Peterson, Jane | Podar, Mircea | Pohl, Craig | Pollard, Katherine S. | Priest, Margaret E. | Proctor, Lita M. | Qin, Xiang | Raes, Jeroen | Ravel, Jacques | Reid, Jeffrey G. | Rho, Mina | Rhodes, Rosamond | Riehle, Kevin P. | Rivera, Maria C. | Rodriguez-Mueller, Beltran | Rogers, Yu-Hui | Ross, Matthew C. | Russ, Carsten | Sanka, Ravi K. | Pamela Sankar, J. | Sathirapongsasuti, Fah | Schloss, Jeffery A. | Schloss, Patrick D. | Schmidt, Thomas M. | Scholz, Matthew | Schriml, Lynn | Schubert, Alyxandria M. | Segata, Nicola | Segre, Julia A. | Shannon, William D. | Sharp, Richard R. | Sharpton, Thomas J. | Shenoy, Narmada | Sheth, Nihar U. | Simone, Gina A. | Singh, Indresh | Smillie, Chris S. | Sobel, Jack D. | Sommer, Daniel D. | Spicer, Paul | Sutton, Granger G. | Sykes, Sean M. | Tabbaa, Diana G. | Thiagarajan, Mathangi | Tomlinson, Chad M. | Torralba, Manolito | Treangen, Todd J. | Truty, Rebecca M. | Vishnivetskaya, Tatiana A. | Walker, Jason | Wang, Lu | Wang, Zhengyuan | Ward, Doyle V. | Warren, Wesley | Watson, Mark A. | Wellington, Christopher | Wetterstrand, Kris A. | White, James R. | Wilczek-Boney, Katarzyna | Wu, Yuan Qing | Wylie, Kristine M. | Wylie, Todd | Yandava, Chandri | Ye, Liang | Ye, Yuzhen | Yooseph, Shibu | Youmans, Bonnie P. | Zhang, Lan | Zhou, Yanjiao | Zhu, Yiming | Zoloth, Laurie | Zucker, Jeremy D. | Birren, Bruce W. | Gibbs, Richard A. | Highlander, Sarah K. | Weinstock, George M. | Wilson, Richard K. | White, Owen
Nature  2012;486(7402):215-221.
A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.
doi:10.1038/nature11209
PMCID: PMC3377744  PMID: 22699610
12.  Pacific biosciences sequencing technology for genotyping and variation discovery in human data 
BMC Genomics  2012;13:375.
Background
Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.
Results
We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.
Conclusion
Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.
doi:10.1186/1471-2164-13-375
PMCID: PMC3443046  PMID: 22863213
13.  Genome-wide identification and characterization of replication origins by deep sequencing 
Genome Biology  2012;13(4):R27.
Background
DNA replication initiates at distinct origins in eukaryotic genomes, but the genomic features that define these sites are not well understood.
Results
We have taken a combined experimental and bioinformatic approach to identify and characterize origins of replication in three distantly related fission yeasts: Schizosaccharomyces pombe, Schizosaccharomyces octosporus and Schizosaccharomyces japonicus. Using single-molecule deep sequencing to construct amplification-free high-resolution replication profiles, we located origins and identified sequence motifs that predict origin function. We then mapped nucleosome occupancy by deep sequencing of mononucleosomal DNA from the corresponding species, finding that origins tend to occupy nucleosome-depleted regions.
Conclusions
The sequences that specify origins are evolutionarily plastic, with low complexity nucleosome-excluding sequences functioning in S. pombe and S. octosporus, and binding sites for trans-acting nucleosome-excluding proteins functioning in S. japonicus. Furthermore, chromosome-scale variation in replication timing is conserved independently of origin location and via a mechanism distinct from known heterochromatic effects on origin function. These results are consistent with a model in which origins are simply the nucleosome-depleted regions of the genome with the highest affinity for the origin recognition complex. This approach provides a general strategy for understanding the mechanisms that define DNA replication origins in eukaryotes.
doi:10.1186/gb-2012-13-4-r27
PMCID: PMC3446301  PMID: 22531001
15.  Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes 
Genome Biology  2012;13(3):r23.
We have developed a process for transcriptome analysis of bacterial communities that accommodates both intact and fragmented starting RNA and combines efficient rRNA removal with strand-specific RNA-seq. We applied this approach to an RNA mixture derived from three diverse cultured bacterial species and to RNA isolated from clinical stool samples. The resulting expression profiles were highly reproducible, enriched up to 40-fold for non-rRNA transcripts, and correlated well with profiles representing undepleted total RNA.
doi:10.1186/gb-2012-13-3-r23
PMCID: PMC3439974  PMID: 22455878
16.  Comparative Functional Genomics of the Fission Yeasts 
Science (New York, N.Y.)  2011;332(6032):930-936.
The fission yeast clade, comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus and S. japonicus, occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, suggesting a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
doi:10.1126/science.1203357
PMCID: PMC3131103  PMID: 21511999
17.  Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing 
Molecular ecology  2010;19(Suppl 1):212-227.
Differences in gene expression are thought to be an important source of phenotypic diversity, so dissecting the genetic components of natural variation in gene expression is important for understanding the evolutionary mechanisms that lead to adaptation. Gene expression is a complex trait that, in diploid organisms, results from transcription of both maternal and paternal alleles. Directly measuring allelic expression rather than total gene expression offers greater insight into regulatory variation. The recent emergence of high-throughput sequencing offers an unprecedented opportunity to study allelic transcription at a genomic scale for virtually any species. By sequencing transcript pools derived from heterozygous individuals, estimates of allelic expression can be directly obtained. The statistical power of this approach is influenced by the number of transcripts sequenced and the ability to unambiguously assign individual sequence fragments to specific alleles on the basis of transcribed nucleotide polymorphisms. Here, using mathematical modelling and computer simulations, we determine the minimum sequencing depth required to accurately measure relative allelic expression and detect allelic imbalance via high-throughput sequencing under a variety of conditions. We conclude that, within a species, a minimum of 500–1000 sequencing reads per gene are needed to test for allelic imbalance, and consequently, at least five to 10 millions reads are required for studying a genome expressing 10 000 genes. Finally, using 454 sequencing, we illustrate an application of allelic expression by testing for cis-regulatory divergence between closely related Drosophila species.
doi:10.1111/j.1365-294X.2010.04472.x
PMCID: PMC3217793  PMID: 20331781
cis-regulation; Drosophila melanogaster; Drosophila simulans; gene expression; hybrids
18.  Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells 
Nature biotechnology  2011;29(5):436-442.
Regulation of RNA levels is determined through the interplay between RNA production, processing and degradation. However, since most global studies of RNA regulation do not distinguish the separate contributions of these processes, relatively little is known about how they are temporally integrated to determine changes in RNA levels. In particular, while some studies emphasize the role of changes in the rate of transcription, others suggest a prominent involvement of time-varying degradation rates. Here, we combine metabolic labeling of RNA at high temporal resolution with advanced RNA quantification assays and computational modeling to estimate RNA transcription and degradation rates during the model response of immune dendritic cells (DCs) to pathogens. We find that changes in transcription rates determine the majority of temporal changes in RNA levels, but that changes in degradation rate are important for shaping sharp ‘peaked’ responses. Furthermore, transcription rate changes precede corresponding changes in RNA level by a small lag (15-30 min), which is shorter for induced than for repressed genes. We used massively parallel sequencing of the newly-transcribed RNA population – including non-polyadenylated transcripts – to estimate constant RNA degradation and processing rates. We find that temporally constant degradation rates vary significantly between genes and contribute substantially to the observed differences in the dynamic response, and that specific groups of transcripts, mostly cytokines and transcription factors, are undergoing faster mRNA maturation. Our study provides a new quantitative approach to study key steps in the integrative process of RNA regulation.
doi:10.1038/nbt.1861
PMCID: PMC3114636  PMID: 21516085
20.  Complete Genome Sequence of Algoriphagus sp. PR1, Bacterial Prey of a Colony-Forming Choanoflagellate ▿  
Journal of Bacteriology  2010;193(6):1485-1486.
Bacteria are the primary food source of choanoflagellates, the closest known relatives of animals. Studying signaling interactions between the Gram-negative Bacteroidetes bacterium Algoriphagus sp. PR1 and its predator, the choanoflagellate Salpingoeca rosetta, provides a promising avenue for testing hypotheses regarding the involvement of bacteria in animal evolution. Here we announce the complete genome sequence of Algoriphagus sp. PR1 and initial findings from its annotation.
doi:10.1128/JB.01421-10
PMCID: PMC3067637  PMID: 21183675
21.  Hybrid selection for sequencing pathogen genomes from clinical samples 
Genome Biology  2011;12(8):R73.
We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.
doi:10.1186/gb-2011-12-8-r73
PMCID: PMC3245613  PMID: 21835008
22.  High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans 
PLoS ONE  2011;6(8):e22365.
Antibodies' protective, pathological, and therapeutic properties result from their considerable diversity. This diversity is almost limitless in potential, but actual diversity is still poorly understood. Here we use deep sequencing to characterize the diversity of the heavy-chain CDR3 region, the most important contributor to antibody binding specificity, and the constituent V, D, and J segments that comprise it. We find that, during the stepwise D-J and then V-DJ recombination events, the choice of D and J segments exert some bias on each other; however, we find the choice of the V segment is essentially independent of both. V, D, and J segments are utilized with different frequencies, resulting in a highly skewed representation of VDJ combinations in the repertoire. Nevertheless, the pattern of segment usage was almost identical between two different individuals. The pattern of V, D, and J segment usage and recombination was insufficient to explain overlap that was observed between the two individuals' CDR3 repertoires. Finally, we find that while there are a near-infinite number of heavy-chain CDR3s in principle, there are about 3–9 million in the blood of an adult human being.
doi:10.1371/journal.pone.0022365
PMCID: PMC3150326  PMID: 21829618
23.  Comprehensive comparative analysis of strand-specific RNA sequencing methods 
Nature methods  2010;7(9):709-715.
Strand-specific, massively-parallel cDNA sequencing (RNA-Seq) is a powerful tool for novel transcript discovery, genome annotation, and expression profiling. Despite multiple published methods for strand-specific RNA-Seq, no consensus exists as to how to choose between them. Here, we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-Seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library construction protocols, including both published and our own novel methods. We found marked differences in strand-specificity, library complexity, evenness and continuity of coverage, agreement with known annotations, and accuracy for expression profiling. Weighing each method’s performance and ease, we identify the dUTP second strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.
doi:10.1038/nmeth.1491
PMCID: PMC3005310  PMID: 20711195
24.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries 
Genome Biology  2011;12(2):R18.
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
doi:10.1186/gb-2011-12-2-r18
PMCID: PMC3188800  PMID: 21338519
25.  A cellular memory of developmental history generates phenotypic diversity in C. elegans 
Current biology : CB  2010;20(2):149-155.
SUMMARY
Early life experiences have a major impact on adult phenotypes [1–3]. However, the mechanisms by which animals retain a cellular memory of early experience are not well understood. Here we show that adult wild-type C. elegans that transiently passed through the stress-resistant dauer larval stage exhibit distinct gene expression profiles and life history traits, as compared to adult animals that bypassed this stage. Using chromatin immmunoprecipitation experiments coupled with massively parallel sequencing, we find that genome-wide levels of specific histone tail modifications are markedly altered in post-dauer animals. Mutations in subsets of genes implicated in chromatin remodeling abolish, or alter, the observed changes in gene expression and life history traits in post-dauer animals. Modifications to the epigenome as a consequence of early experience may contribute in part to a memory of early experience, and generate phenotypic variation in an isogenic population.
doi:10.1016/j.cub.2009.11.035
PMCID: PMC2990539  PMID: 20079644

Results 1-25 (51)