Advances in next-generation technologies have rapidly improved sequencing fidelity and significantly decreased sequencing error rates. However, with billions of nucleotides in a human genome, even low experimental error rates yield many errors in variant calls. Erroneous variants can mimic true somatic and rare variants, thus requiring costly confirmatory experiments to minimize the number of false positives. Here we discuss sources of experimental error in next-generation sequencing and how replicates can be used to abate them.
RNA-guided Cas9 nucleases derived from clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems have dramatically transformed our ability to edit the genomes of diverse organisms. We believe tools and techniques based on Cas9, a single unifying factor capable of colocalizing RNA, DNA and protein, will grant unprecedented control over cellular organization, regulation and behavior. Here we describe the Cas9 targeting methodology, detail current and prospective engineering advances and suggest potential applications ranging from basic science to the clinic.
Bacteria rely on two known DNA-level defenses against their bacteriophage predators: restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Certain phages have evolved countermeasures that are known to block endonucleases. For example, phage T4 not only adds hydroxymethyl groups to all of its cytosines, but also glucosylates them, a strategy that defeats almost all restriction enzymes. We sought to determine whether these DNA modifications can similarly impede CRISPR-based defenses. In a bioinformatics search, we found naturally occurring CRISPR spacers that potentially target phages known to modify their DNA. Experimentally, we show that the Cas9 nuclease from the Type II CRISPR system of Streptococcus pyogenes can overcome a variety of DNA modifications in Escherichia coli. The levels of Cas9-mediated phage resistance to bacteriophage T4 and the mutant phage T4 gt, which contains hydroxymethylated but not glucosylated cytosines, were comparable to phages with unmodified cytosines, T7 and the T4-like phage RB49. Our results demonstrate that Cas9 is not impeded by N6-methyladenine, 5-methylcytosine, 5-hydroxymethylated cytosine, or glucosylated 5-hydroxymethylated cytosine.
Adenosine-to-inosine (A-to-I) RNA editing, in which genomically encoded adenosine is changed to inosine in RNA, is catalyzed by adenosine deaminase acting on RNA (ADAR). This fine-tuning mechanism is critical during normal development and diseases, particularly in relation to brain functions. A-to-I RNA editing has also been hypothesized to be a driving force in human brain evolution. A large number of RNA editing sites have recently been identified, mostly as a result of the development of deep sequencing and bioinformatic analyses. Deciphering the functional consequences of RNA editing events is challenging, but emerging genome engineering approaches may expedite new discoveries. To understand how RNA editing is dynamically regulated, it is imperative to construct a spatiotemporal atlas at the species, tissue and cell levels. Future studies will need to identify the cis and trans regulatory factors that drive the selectivity and frequency of RNA editing. We anticipate that recent technological advancements will aid researchers in acquiring a much deeper understanding of the functions and regulation of RNA editing.
CRISPR-Cas systems have been used with single-guide RNAs for accurate gene disruption and conversion in multiple biological systems. Here we report the use of the endonuclease Cas9 to target genomic sequences in the C. elegans germline, utilizing single-guide RNAs that are expressed from a U6 small nuclear RNA promoter. Our results demonstrate that targeted, heritable genetic alterations can be achieved in C. elegans, providing a convenient and effective approach for generating loss-of-function mutants.
The Cas9 protein from the Streptococcus pyogenes CRISPR-Cas immune system has been adapted for both RNA-guided genome editing and gene regulation in a variety of organisms, but can mediate only a single activity at a time within any given cell. Here we characterize a set of fully orthogonal Cas9 proteins and demonstrate their ability to mediate simultaneous and independently targeted gene regulation and editing in bacteria and in human cells. We find that Cas9 orthologs display consistent patterns in their recognition of target sequences and identify a highly targetable protein from Neisseria meningitidis. Our results provide a basal set of orthogonal RNA-guided proteins for controlling biological systems and establish a general methodology for characterizing additional proteins and adapting them to eukaryotic cells.
Despite rapid advances in genome engineering technologies, inserting genes into precise locations in the human genome remains an outstanding problem. It has been suggested that site-specific recombinases can be adapted towards use as transgene delivery vectors. The specificity of recombinases can be altered either with directed evolution or via fusions to modular DNA-binding domains. Unfortunately, both wildtype and altered variants often have detectable activities at off-target sites. Here we use bacterial selections to identify mutations in the dimerization surface of Cre recombinase (R32V, R32M, and 303GVSdup) that improve the accuracy of recombination. The mutants are functional in bacteria, in human cells, and in vitro (except for 303GVSdup, which we did not purify), and have improved selectivity against both model off-target sites and the entire E. coli genome. We propose that destabilizing binding cooperativity may be a general strategy for improving the accuracy of dimeric DNA-binding proteins.
Prokaryotic type II CRISPR-Cas systems can be adapted to enable targeted
genome modifications across a range of eukaryotes.1–7. Here we engineer this system to enable RNA-guided genome
regulation in human cells by tethering transcriptional activation domains either
directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide
RNA (sgRNA). Using this functionality we developed a novel transcriptional
activation–based assay to determine the landscape of off-target binding
of sgRNA:Cas9 complexes and compared it with the off-target activity of
transcription activator–like (TAL) effector proteins8, 9.
Our results reveal that specificity profiles are sgRNA dependent, and that
sgRNA:Cas9 complexes and 18-mer TAL effector proteins can potentially tolerate
1–3 and 1–2 target mismatches, respectively. By engineering a
requirement for cooperativity through offset nicking for genome editing or
through multiple synergistic sgRNAs for robust transcriptional activation, we
suggest methods to mitigate off-target phenomena. Our results expand the
versatility of the sgRNA:Cas9 tool and highlight the critical need to engineer
Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment.
Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project.
We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.
The dynamic nature of gene expression enables cellular programming, homeostasis, and environmental adaptation in living systems. Dissection of causal gene functions in cellular and organismal processes therefore necessitates approaches that enable spatially and temporally precise modulation of gene expression. Recently, a variety of microbial and plant-derived light-sensitive proteins have been engineered as optogenetic actuators, enabling high precision spatiotemporal control of many cellular functions1-11. However, versatile and robust technologies that enable optical modulation of transcription in the mammalian endogenous genome remain elusive. Here, we describe the development of Light-Inducible Transcriptional Effectors (LITEs), an optogenetic two-hybrid system integrating the customizable TALE DNA-binding domain12-14 with the light-sensitive cryptochrome 2 protein and its interacting partner CIB1 from Arabidopsis thaliana. LITEs do not require additional exogenous chemical co-factors, are easily customized to target many endogenous genomic loci, and can be activated within minutes with reversibility3,4,6,7,15. LITEs can be packaged into viral vectors and genetically targeted to probe specific cell populations. We have applied this system in primary mouse neurons, as well as in the brain of awake mice in vivo to mediate reversible modulation of mammalian endogenous gene expression as well as targeted epigenetic chromatin modifications. The LITE system establishes a novel mode of optogenetic control of endogenous cellular processes and enables direct testing of the causal roles of genetic and epigenetic regulation in normal biological processes and disease states.
Selection has been invaluable for genetic manipulation, although counter-selection has historically exhibited limited robustness and convenience. TolC, an outer membrane pore involved in transmembrane transport in E. coli, has been implemented as a selectable/counter-selectable marker, but counter-selection escape frequency using colicin E1 precludes using tolC for inefficient genetic manipulations and/or with large libraries. Here, we leveraged unbiased deep sequencing of 96 independent lineages exhibiting counter-selection escape to identify loss-of-function mutations, which offered mechanistic insight and guided strain engineering to reduce counter-selection escape frequency by ∼40-fold. We fundamentally improved the tolC counter-selection by supplementing a second agent, vancomycin, which reduces counter-selection escape by 425-fold, compared colicin E1 alone. Combining these improvements in a mismatch repair proficient strain reduced counter-selection escape frequency by 1.3E6-fold in total, making tolC counter-selection as effective as most selectable markers, and adding a valuable tool to the genome editing toolbox. These improvements permitted us to perform stable and continuous rounds of selection/counter-selection using tolC, enabling replacement of 10 alleles without requiring genotypic screening for the first time. Finally, we combined these advances to create an optimized E. coli strain for genome engineering that is ∼10-fold more efficient at achieving allelic diversity than previous best practices.
We have determined the three-dimensional (3D) architecture of the Caulobacter crescentus genome by combining genome-wide chromatin interaction detection, live-cell imaging, and computational modeling. Using chromosome conformation capture carbon copy (5C) technology, we derive ~13 Kb resolution 3D models of the Caulobacter genome. These models illustrate that the genome is ellipsoidal with periodically arranged arms. The parS sites, a pair of short contiguous sequence elements involved in chromosome segregation, are positioned at one pole of this structure, where they nucleate a compact chromatin conformation. Both 5C and imaging experiments demonstrate that placing these sequence elements at new genomic positions yields large-scale rotations of the genome within the cell. Utilizing automated fluorescent imaging, we orient the genome within the cell and illustrate that within the resolution of our data the parS proximal region is the only portion of the genome stably attached to the cell envelope. Our approach provides an experimental paradigm for deriving insight into the cis-determinants of 3D genome architecture.
Autoimmune disease results from a loss of tolerance to self-antigens in genetically susceptible individuals. Completely understanding this process requires that targeted antigens be identified, and so a number of techniques have been developed to determine immune receptor specificities. We previously reported the construction of a phage-displayed synthetic human peptidome and a proof-of-principle analysis of antibodies from three patients with neurological autoimmunity. Here we present data from a large-scale screen of 298 independent antibody repertoires, including those from 73 healthy sera, using phage immunoprecipitation sequencing. The resulting database of peptide-antibody interactions characterizes each individual’s unique autoantibody fingerprint, and includes specificities found to occur frequently in the general population as well as those associated with disease. Screening type 1 diabetes (T1D) patients revealed a prematurely polyautoreactive phenotype compared with their matched controls. A collection of cerebrospinal fluids and sera from 63 multiple sclerosis patients uncovered novel, as well as previously reported antibody-peptide interactions. Finally, a screen of synovial fluids and sera from 64 rheumatoid arthritis patients revealed novel disease-associated antibody specificities that were independent of seropositivity status. This work demonstrates the utility of performing PhIP-Seq screens on large numbers of individuals and is another step toward defining the full complement of autoimmunoreactivities in health and disease.
autoantigen discovery; high throughput screening; PhIP-Seq; proteomics
Broadly neutralizing HIV antibodies (bnAbs) are typically highly somatically mutated, raising doubts as to whether they can be elicited by vaccination. We used 454 sequencing and designed a novel phylogenetic method to model lineage evolution of the bnAbs PGT121–134 and found a positive correlation between the level of somatic hypermutation (SHM) and the development of neutralization breadth and potency. Strikingly, putative intermediates were characterized that show approximately half the mutation level of PGT121–134 but were still capable of neutralizing roughly 40–80% of PGT121–134 sensitive viruses in a 74-virus panel at median titers between 15- and 3-fold higher than PGT121–134. Such antibodies with lower levels of SHM may be more amenable to elicitation through vaccination while still providing noteworthy coverage. Binding characterization indicated a preference of inferred intermediates for native Env binding over monomeric gp120, suggesting that the PGT121–134 lineage may have been selected for binding to native Env at some point during maturation. Analysis of glycan-dependent neutralization for inferred intermediates identified additional adjacent glycans that comprise the epitope and suggests changes in glycan dependency or recognition over the course of affinity maturation for this lineage. Finally, patterns of neutralization of inferred bnAb intermediates suggest hypotheses as to how SHM may lead to potent and broad HIV neutralization and provide important clues for immunogen design.
A majority of the over 30 million HIV-1 infected individuals worldwide live in poorly resourced areas where multiple boost strategies, which are likely needed to generate highly mutated antibodies, present formidable logistical challenges. Accordingly, developing new vaccination strategies that are capable of generating highly mutated antibodies should be an active area of research. Another approach, that is not mutually exclusive, is to identify new bnAbs that are both broad and potent in neutralization, but are much less mutated than the bnAbs that currently exist. Here, we have identified bnAbs that are approximately half the mutation frequency of known bnAbs, but maintain high potency and moderate breadth. These less mutated bnAbs offer an important advantage in that they would likely be easier to induce through vaccination than more mutated antibodies. By characterizing these putative intermediates, we can also better estimate how affinity maturation proceeded to result in an antibody with broad and potent neutralization activity and offer more focused strategies for designing immunogens capable of eliciting these less mutated bnAbs.
We develop here a novel approach to barcode large numbers of cells through cell-surface expression of programmable zinc-finger DNA-binding domains (sZFs). We show sZFs enable double-stranded DNA to sequence-specifically label living cells, and also develop a sequential tagging approach to in situ image >3 cell types using just 3 fluorophores. Finally we demonstrate their broad versatility through ability to serve as surrogate reporters and facilitate selective cell capture and targeting.
Cytosine methylation, an epigenetic modification of DNA, is a target of growing interest for developing high throughput profiling technologies. Here we introduce two new, complementary techniques for cytosine methylation profiling utilizing next generation sequencing technology: bisulfite padlock probes (BSPPs) and methyl sensitive cut counting (MSCC). In the first method, we designed a set of ~10,000 BSPPs distributed over the ENCODE pilot project regions to take advantage of existing expression and chromatin immunoprecipitation data. We observed a pattern of low promoter methylation coupled with high gene body methylation in highly expressed genes. Using the second method, MSCC, we gathered genome-scale data for 1.4 million HpaII sites and confirmed that gene body methylation in highly expressed genes is a consistent phenomenon over the entire genome. Our observations highlight the usefulness of techniques which are not inherently or intentionally biased in favor of only profiling particular subsets like CpG islands or promoter regions.
The discovery of diverse codon reassignment events has demonstrated that the canonical genetic code is not universal. Studying coding reassignment at the molecular level is critical for understanding genetic code evolution, and provides clues to genetic code manipulation in synthetic biology. Here we report a novel reassignment event in the mitochondria of Ashbya (Eremothecium) gossypii, a filamentous-growing plant pathogen related to yeast (Saccharomycetaceae). Bioinformatics studies of conserved positions in mitochondrial DNA-encoded proteins suggest that CUU and CUA codons correspond to alanine in A. gossypii, instead of leucine in the standard code or threonine in yeast mitochondria. Reassignment of CUA to Ala was confirmed at the protein level by mass spectrometry. We further demonstrate that a predicted is transcribed and accurately processed in vivo, and is responsible for Ala reassignment. Enzymatic studies reveal that is efficiently recognized by A. gossypii mitochondrial alanyl-tRNA synthetase (AgAlaRS). AlaRS typically recognizes the G3:U70 base pair of tRNAAla; a G3A change in Ashbya
abolishes its recognition by AgAlaRS. Conversely, an A3G mutation in Saccharomyces cerevisiae
confers tRNA recognition by AgAlaRS. Our work highlights the dynamic feature of natural genetic codes in mitochondria, and the relative simplicity by which tRNA identity may be switched.
Conceived with the aim of meeting the needs of the neurobiology and clinical communities, the Brain Research through Advancing Innovative Technologies (BRAIN) Initiative builds on the lessons learned from major projects in genetics, such as the Human Genome Project. It concentrates on the use of new imaging technologies in conjunction with genomics to inform therapeutic decisions.
BRAIN Initiative; imaging; Genome Project; genomics
Neuroscientists have made impressive advances in understanding the microscale function of single neurons and the macroscale activity of the human brain. One can probe molecular and biophysical aspects of individual neurons and also view the human brain in action with magnetic resonance imaging (MRI) or magnetoencephalography (MEG). However, the mechanisms of perception, cognition, and action remain mysterious because they emerge from the real-time interactions of large sets of neurons in densely interconnected, widespread neural circuits.
This report describes an integrated method for in vitro construction of Escherichia coli ribosomes under near-physiological conditions. This method enables coupling of ribosome synthesis and assembly in a single, integrated system.
An integrated synthesis, assembly, and translation technology (termed iSAT) was developed to construct ribosomes in vitro.iSAT mimics co-transcription of rRNA and ribosome assembly as it occurs in vivo.iSAT makes possible the in vitro construction of modified ribosomes.iSAT is expected to aid studies of ribosome assembly and open new avenues for making ribosomes with altered capabilities.
Purely in vitro ribosome synthesis could provide a critical step towards unraveling the systems biology of ribosome biogenesis, constructing minimal cells from defined components, and engineering ribosomes with new functions. Here, as an initial step towards this goal, we report a method for constructing Escherichia coli ribosomes in crude S150 E. coli extracts. While conventional methods for E. coli ribosome reconstitution are non-physiological, our approach attempts to mimic chemical conditions in the cytoplasm, thus permitting several biological processes to occur simultaneously. Specifically, our integrated synthesis, assembly, and translation (iSAT) technology enables one-step co-activation of rRNA transcription, assembly of transcribed rRNA with native ribosomal proteins into functional ribosomes, and synthesis of active protein by these ribosomes in the same compartment. We show that iSAT makes possible the in vitro construction of modified ribosomes by introducing a 23S rRNA mutation that mediates resistance against clindamycin. We anticipate that iSAT will aid studies of ribosome assembly and open new avenues for making ribosomes with altered properties.
cell-free synthetic biology; in vitro; ribosome; transcription; translation
The function of neural circuits is an emergent property that arises from the coordinated activity of large numbers of neurons. To capture this, we propose launching a large-scale, international public effort, the Brain Activity Map Project, aimed at reconstructing the full record of neural activity across complete neural circuits. This technological challenge could prove to be an invaluable step toward understanding fundamental and pathological brain processes.
“The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear.” –More Is Different, P.W. Anderson
“New directions in science are launched by new tools much more often than by new concepts. The effect of a concept-driven revolution is to explain old things in new ways. The effect of a tool-driven revolution is to discover new things that have to be explained.” –Imagined Worlds, Freeman Dyson
Next-Generation Sequencing offers many advantages over other methods of microRNA (miRNA) expression profiling, such as sample throughput and the capability to discover novel miRNAs. As the sequencing depth of current sequencing platforms exceeds what is necessary to quantify miRNAs, multiplexing several samples in one sequencing run offers a significant cost advantage. Although previous studies have achieved this goal by adding barcodes to miRNA libraries at the ligation step, this was recently shown to introduce significant bias into the miRNA expression data. This bias can be avoided, however, by barcoding the miRNA libraries at the PCR step instead. Here, we describe a user-friendly PCR bar-coding method of preparing multiplexed microRNA libraries for Illumina-based sequencing. The method also prevents the production of adapter dimers and can be completed in one day.
miRNA; Illumina; Sequencing; library; multiplex; bar code
Neuroscience is at a crossroads. Great effort is being invested into deciphering specific neural interactions and circuits. At the same time, there exist few general theories or principles that explain brain function. We attribute this disparity, in part, to limitations in current methodologies. Traditional neurophysiological approaches record the activities of one neuron or a few neurons at a time. Neurochemical approaches focus on single neurotransmitters. Yet, there is an increasing realization that neural circuits operate at emergent levels, where the interactions between hundreds or thousands of neurons, utilizing multiple chemical transmitters, generate functional states. Brains function at the nanoscale, so tools to study brains must ultimately operate at this scale, as well. Nanoscience and nanotechnology are poised to provide a rich toolkit of novel methods to explore brain function by enabling simultaneous measurement and manipulation of activity of thousands or even millions of neurons. We and others refer to this goal as the Brain Activity Mapping Project. In this Nano Focus, we discuss how recent developments in nanoscale analysis tools and in the design and synthesis of nanomaterials have generated optical, electrical, and chemical methods that can readily be adapted for use in neuroscience. These approaches represent exciting areas of technical development and research. Moreover, unique opportunities exist for nanoscientists, nanotechnologists, and other physical scientists and engineers to contribute to tackling the challenging problems involved in understanding the fundamentals of brain function.