Contamination is a critical issue in high-throughput metagenomic studies, yet progress towards a comprehensive solution has been limited. We present SourceTracker, a Bayesian approach to estimating the proportion of a novel community that comes from a set of source environments. We apply SourceTracker to new microbial surveys from neonatal intensive care units (NICUs), offices, and molecular biology laboratories, and provide a database of known contaminants for future testing.
Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.
We describe a protein quantification method that exploits the subtle mass differences caused by neutron-binding energy variation in stable isotopes. These mass differences are synthetically encoded into amino acids and incorporated into yeast and mouse proteins with metabolic labeling; analysis with high mass resolution (>100,000) reveals the isotopologue-embedded peptide signals permitting quantification. We conclude neutron encoding will enable high levels of multi-plexing (> 10) with high dynamic range and accuracy.
We present a genome-wide method to map DNA double-strand breaks (DSBs) at nucleotide resolution by direct in situ
breaks labeling, enrichment on streptavidin, and next-generation sequencing (BLESS). We comprehensively validated and tested BLESS using different human and mouse cells, DSBs-inducing agents, and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs, and complex genome-wide DSBs landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and identified over two thousand non-uniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions with a specificity and resolution unachievable by current techniques.
Here we demonstrate that quantitation of stimuli-induced proteome dynamics in primary cells is feasible by combining the power of Bio-Orthogonal Non Canonical Amino acid Tagging (BONCAT) and Stable Isotope Labelling of Amino acids in Cell culture (SILAC). In conjunction with nanoLC-MS/MS QuaNCAT allowed us to monitor the early expression changes of > 600 proteins in primary resting T cells subjected to activation stimuli.
quantitative proteomics; SILAC; BONCAT; T cell activation
We introduce a non-intrusive method exploiting post-division single-cell variability to validate protein localization. The results show that Clp proteases, widely reported to form biologically relevant foci, are in fact uniformly distributed inside Escherichia coli cells, and that many commonly used fluorescent proteins (FPs) cause severe mislocalization when fused to homo-oligomers. Re-tagging five other reportedly foci-forming proteins with the most monomeric FP tested suggests the foci were caused by the FPs.
Despite the explosive growth in the biological applications of single molecule methods over the last decade, these techniques have thus far been practiced mostly by researchers who are biophysically oriented. This is partly because of the lack of commercial instruments in many cases and also because of the perceived steep learning curve and need for expensive equipments. We wish to provide a practical guide to using Förster (or Fluorescence) Resonance Energy Transfer (FRET) at the single molecule level, focusing on the study of immobilized molecules that allow measurements of single molecule reaction trajectories from about 1 millisecond to many minutes. An instrument can be built at a reasonable cost using various off-the-shelf components and operated reliably using current well-established protocols and freely available software.
Recent evidence suggests the existence of progenitor cells in adult tissues that are capable of differentiating into vascular structures as well as into all hematopoietic cell lineages. Here we describe an efficient and reproducible method for generating large numbers of these bipotential progenitors—known as hemangioblasts—from human embryonic stem (hES) cells using an in vitro differentiation system. Blast cells expressed gene signatures characteristic of hemangioblasts, and could be expanded, cryopreserved and differentiated into multiple hematopoietic lineages as well as into endothelial cells. When we injected these cells into rats with diabetes or into mice with ischemia-reperfusion injury of the retina, they localized to the site of injury in the damaged vasculature and appeared to participate in repair. Injection of the cells also reduced the mortality rate after myocardial infarction and restored blood flow in hind limb ischemia in mouse models. Our data suggest that hES-derived blast cells (hES-BCs) could be important in vascular repair.
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools.
Artificial transcription activator-like (TAL) effector-based activators (TALE activators) have broad utility but previous studies suggest that these monomeric proteins often possess low activities. Here we demonstrate that TALE activators can robustly function individually or in synergistic combinations to increase expression of endogenous human genes over wide dynamic ranges. These findings will encourage applications of TALE activators for research and therapy and guide design of novel monomeric TAL effector-based fusion proteins.
Cholesterol is an essential structural component of cellular membranes and serves as a precursor for several classes of signaling molecules. Cholesterol exerts its effects and is, itself, regulated in large part by engaging in specific interactions with proteins. The full complement of sterol-binding proteins that exist in mammalian cells, however, remains unknown. Here, we describe a chemoproteomic strategy that uses clickable, photoreactive sterol probes in combination with quantitative mass spectrometry to globally map cholesterol-protein interactions directly in living cells. We identified over 250 cholesterol-binding proteins, including many established and previously unreported interactions with receptors, channels, and enzymes. Prominent among the newly identified interactions were enzymes that regulate sugars, glycerolipids, and cholesterol itself, as well as those involved in vesicular transport and protein glycosylation and degradation, pointing to key nodes in biochemical pathways that may couple sterol concentrations to the control of other metabolites and protein localization and modification.
Mammalian genes are regulated by the cooperative and synergistic actions of many transcription factors. In this study we recapitulate this complex regulation in human cells by targeting endogenous gene promoters, including regions of closed chromatin upstream of silenced genes, with combinations of engineered transcription activator–like effectors (TALEs). These combinations of TALE transcription factors induced substantial gene activation and allowed tuning of gene expression levels that will broadly enable synthetic biology, gene therapy and biotechnology.
We developed a flow cytometry method, chromosome flow fluorescence in situ hybridization (FISH), called CFF, to analyze repetitive DNA in chromosomes using FISH with directly labeled peptide nucleic acid (PNA) probes. We used CFF to measure the abundance of interstitial telomeric sequences in Chinese hamster chromosomes and major satellite sequences in mouse chromosomes. Using CFF we also identified parental homologs of human chromosome 18 with different amounts of repetitive DNA.
We have developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters, and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genome readily usable for rapid and flexible generation of probes, making HD-FISH invaluable for many research and diagnostic applications.
We present a single-molecule instrument that combines a timeshared ultra-high resolution dual optical trap interlaced with a confocal fluorescence microscope. In a demonstration experiment, individual single-fluorophore labeled DNA oligonucleotides were observed to bind and unbind to complementary DNA suspended between two trapped beads. Simultaneous with the single-fluorophore detection, coincident angstrom-scale changes in tether extension could be clearly observed. Fluorescence readout allowed us to determine the duplex melting rate as a function of force. The new instrument will enable the simultaneous measurement of angstrom-scale mechanical motion of individual DNA-binding proteins (e.g., single base pair stepping of DNA translocases) along with the detection of fluorescently labeled protein properties (e.g., internal configuration).
Protein classification typically uses structural, sequence, or functional similarity. Here we introduce an orthogonal method that organizes proteins by ligand similarity, focusing here on the class A G protein-coupled receptor (GPCR) protein family. Comparing a ligand-based dendogram to a sequence-based one, we sought examples of GPCRs that were distantly linked by sequence but neighbors by ligand similarity. Experimental testing of compounds predicted to link three of these new pairs confirmed the predicted association, with potencies ranging from the low-nanomolar to low-micromolar. We then identified hundreds of non-GPCRs closely related to GPCRs by ligand similarity, including the CXCR2 chemokine receptor to Casein kinase I, the cannabinoid receptors to epoxide hydrolase 2, and the α2 adrenergic receptor to phospholipase D. These, too, were confirmed experimentally. Ligand similarities among these targets may reflect a chemical integration in the time domain of molecular signaling.
Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation.
cleavage and polyadenylation; splicing; mouse; mRNA; lncRNA; development; differentiation
Chromatin immunoprecipitation (ChIP) assays have contributed greatly to our understanding of the role of histone modifications in gene regulation. However, a major limitation is that they do not permit analysis with single cell resolution thus confounding analyses of heterogeneous cell populations. Herein we present a new method which permits visualization of histone modifications of single genomic loci with single-cell resolution in formaldehyde-fixed paraffin-embedded tissue sections based on combined use of In Situ Hybridization (ISH) and Proximity Ligation Assays (PLA). Using this method we show that H3K4dime of the MYH11 locus is restricted to the smooth muscle cell (SMC) lineage in human and mouse tissue sections, and that the mark persists even in phenotypically modulated SMC within atherosclerotic lesions that show no detectable expression of SMC marker genes. This new methodology has promise for broad applications in the study of epigenetic mechanisms in complex multicellular tissues in development and disease.
Proximity Ligation Assay; Epigenetics; Cell lineage; Differentiation; Smooth; muscle cells; atherosclerosis
Transposons and γ-retroviruses have been efficiently used as insertional mutagens in different tissues to identify molecular culprits of cancer. However, these systems are characterized by recurring integrations that accumulate in tumor cells, hampering the identification of early cancer-driving events amongst bystander and progression-related events. We developed an insertional mutagenesis platform based on lentiviral vectors (LVV) by which we could efficiently induce hepatocellular carcinoma (HCC) in 3 different mouse models. By virtue of LVV’s replication-deficient nature and broad genome-wide integration pattern, LVV-based insertional mutagenesis allowed identification of 4 new liver cancer genes from a limited number of integrations. We validated the oncogenic potential of all the identified genes in vivo, with different levels of penetrance. Our newly identified cancer genes are likely to play a role in human disease, since they are upregulated and/or amplified/deleted in human HCCs and can predict clinical outcome of patients.
We established a conditional site–specific recombination system based on dimerizable Cre–mediated recombination in the apicomplexan parasite Toxoplasma gondii. Using a novel single vector strategy that allows ligand-dependent, efficient removal of a gene of interest, we generated three knockouts of apicomplexan genes considered essential for host-cell invasion. Our findings uncover the existence of an alternative invasion pathway in apicomplexan parasites.
We show that RNA editing sites can be called with high confidence using RNA sequencing data from multiple samples across either individuals or species, without the need for matched genomic DNA sequence. We identified many previously unidentified editing sites in both humans and Drosophila; our results nearly double the known number of human protein recoding events. We also found that human genes harboring conserved editing sites within Alu repeats are enriched for neuronal functions.
Comprehensive perspectives of macromolecular conformations are required to connect structure to biology. Here we present a small angle X-ray scattering (SAXS) Structural Similarity Map (SSM) and Volatility of Ratio (VR) metric providing comprehensive, quantitative and objective (superposition-independent) perspectives on solution state conformations. We validate VR and SSM utility on human MutSβ, a key ABC ATPase and chemotherapeutic target, by revealing MutSβ DNA sculpting and identifying multiple conformational states for biological activity.
A sophisticated analysis approach based on the concept of fluorophore localization provides dynamic super-resolution data of GFP-labeled live cells using a common, arc lamp–based wide-field fluorescence microscope.