Selenocysteine (Sec) is naturally incorporated into proteins by recoding the stop codon UGA. Sec is not hardwired to UGA, as we found the Sec insertion machinery to be able to site-specifically incorporate Sec directed by 58 of the 64 codons. For 15 sense codons, complete conversion of the codon meaning from canonical amino acid to Sec was observed along with a 10-fold increase in selenoprotein yield compared to Sec insertion at the three stop codons. This high-fidelity sense-codon recoding mechanism was demonstrated for Escherichia coli formate dehydrogenase and recombinant human thioredoxin reductase and confirmed by independent biochemical and biophysical methods. Although Sec insertion at UGA is known to compete against protein termination, it is surprising that the Sec machinery has the ability to outcompete abundant aminoacyl-tRNAs in decoding sense codons. The findings have implications for the process of translation and the information storage capacity of the biological cell.
genetic code; sense codon recoding; RNA engineering; selenocysteine; synthetic biology
Complete Genome Sequences of T4-Like Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68
T4-like bacteriophages have been explored for phage therapy and are model organisms for phage genomics and evolution. Here, we describe the sequencing of 11 T4-like phages. We found a high nucleotide similarity among the T4, RB55, and RB59; RB32 and RB33; and RB3, RB5, RB6, RB7, RB9, and RB10 phages.
In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
To fully understand human biology and link genotype to phenotype, the phase of DNA variants must be known. Here we present a comprehensive analysis of haplotype-resolved genomes to assess the nature and variation of haplotypes and their pairs, diplotypes, in European population samples. We use a set of 14 haplotype-resolved genomes generated by fosmid clone-based sequencing, complemented and expanded by up to 372 statistically resolved genomes from the 1000 Genomes Project. We find immense diversity of both haploid and diploid gene forms, up to 4.1 and 3.9 million corresponding to 249 and 235 per gene on average. Less than 15% of autosomal genes have a predominant form. We describe a ‘common diplotypic proteome’, a set of 4,269 genes encoding two different proteins in over 30% of genomes. We show moreover an abundance of cis configurations of mutations in the 386 genomes with an average cis/trans ratio of 60:40, and distinguishable classes of cis- versus trans-abundant genes. This work identifies key features characterizing the diplotypic nature of human genomes and provides a conceptual and analytical framework, rich resources and novel hypotheses on the functional importance of diploidy.
Knowing which genetic variants exist on either parental chromosome requires diploid human genomes to be phased. Here the authors generate haplotype-resolved genomes and identify a large diversity of haploid and diploid gene forms, a common diplotypic proteome, and an abundance of cis configurations of mutations, highlighting the functional importance of diploidy.
Studying monogenic mitochondrial cardiomyopathies may yield insights into mitochondrial roles in cardiac development and disease. Here, we combine patient-derived and genetically engineered iPSCs with tissue engineering to elucidate the pathophysiology underlying the cardiomyopathy of Barth syndrome (BTHS), a mitochondrial disorder caused by mutation of the gene Tafazzin (TAZ). Using BTHS iPSC-derived cardiomyocytes (iPSC-CMs), we defined metabolic, structural, and functional abnormalities associated with TAZ mutation. BTHS iPSC-CMs assembled sparse and irregular sarcomeres, and engineered BTHS “heart on chip” tissues contracted weakly. Gene replacement and genome editing demonstrated that TAZ mutation is necessary and sufficient for these phenotypes. Sarcomere assembly and myocardial contraction abnormalities occurred in the context of normal whole cell ATP levels. Excess levels of reactive oxygen species mechanistically linked TAZ mutation to impaired cardiomyocyte function. Our study provides new insights into the pathogenesis of Barth syndrome, suggests new treatment strategies, and advances iPSC-based in vitro modeling of cardiomyopathy.
Gene drives may be capable of addressing ecological problems by altering entire populations of wild organisms, but their use has remained largely theoretical due to technical constraints. Here we consider the potential for RNA-guided gene drives based on the CRISPR nuclease Cas9 to serve as a general method for spreading altered traits through wild populations over many generations. We detail likely capabilities, discuss limitations, and provide novel precautionary strategies to control the spread of gene drives and reverse genomic changes. The ability to edit populations of sexual species would offer substantial benefits to humanity and the environment. For example, RNA-guided gene drives could potentially prevent the spread of disease, support agriculture by reversing pesticide and herbicide resistance in insects and weeds, and control damaging invasive species. However, the possibility of unwanted ecological effects and near-certainty of spread across political borders demand careful assessment of each potential application. We call for thoughtful, inclusive, and well-informed public discussions to explore the responsible use of this currently theoretical technology.
gene drive; ecological engineering; population engineering; cas9; CRISPR; emerging technology; none
Advances in cellular reprogramming and stem cell differentiation now enable ex
vivo studies of human neuronal differentiation. However, it remains challenging to
elucidate the underlying regulatory programs because differentiation protocols are laborious and
often result in low neuron yields. Here, we overexpressed two Neurogenin transcription factors in
human-induced pluripotent stem cells and obtained neurons with bipolar morphology in 4 days, at
greater than 90% purity. The high purity enabled mRNA and microRNA expression profiling
during neurogenesis, thus revealing the genetic programs involved in the rapid transition from stem
cell to neuron. The resulting cells exhibited transcriptional, morphological and functional
signatures of differentiated neurons, with greatest transcriptional similarity to prenatal human
brain samples. Our analysis revealed a network of key transcription factors and microRNAs that
promoted loss of pluripotency and rapid neurogenesis via progenitor states. Perturbations of key
transcription factors affected homogeneity and phenotypic properties of the resulting neurons,
suggesting that a systems-level view of the molecular biology of differentiation may guide
subsequent manipulation of human stem cells to rapidly obtain diverse neuronal types.
gene regulatory networks; microRNAs; neurogenesis; stem cell differentiation; transcriptomics
Understanding the spatial organization of gene expression with single nucleotide
resolution requires localizing the sequences of expressed RNA transcripts within a cell
in situ. Here we describe fluorescent in situ RNA
sequencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced within a
biological sample. Using 30-base reads from 8,742 genes in situ, we
examined RNA expression and localization in human primary fibroblasts using a simulated
wound healing assay. FISSEQ is compatible with tissue sections and whole mount embryos,
and reduces the limitations of optical resolution and noisy signals on single molecule
detection. Our platform enables massively parallel detection of genetic elements,
including gene transcripts and molecular barcodes, and can be used to investigate cellular
phenotype, gene regulation, and environment in situ.
In this study, we improve on current autoantigen discovery approaches by creating a synthetic representation of the complete human proteome, the T7 “peptidome” phage display library (T7-Pep), and use it to profile the autoantibody repertoires of individual patients. We provide methods for 1) designing and cloning large libraries of DNA microarray-derived oligonucleotides encoding peptides for display on bacteriophage, and 2) analysis of the peptide libraries using high throughput DNA sequencing. We applied phage immunoprecipitation sequencing (PhIP-Seq) to identify both known and novel autoantibodies contained in the spinal fluid of three patients with paraneoplastic neurological syndromes. We also show how our approach can be used more generally to identify peptide-protein interactions and point toward ways in which this technology will be further developed in the future. We envision that PhIP-Seq can become an important new tool in autoantibody analysis, as well as proteomic research in general.
Synthetic biology; proteomics; phage display; humoral autoimmunity; paraneoplastic neurological disorder; protein-protein interactions
Cell-free RNA and protein synthesis (CFPS) is becoming increasingly used for protein production as yields increase and costs decrease. Advances in reconstituted CFPS systems such as the Protein synthesis Using Recombinant Elements (PURE) system offer new opportunities to tailor the reactions for specialized applications including in vitro protein evolution, protein microarrays, isotopic labeling, and incorporating unnatural amino acids. In this study, using firefly luciferase synthesis as a reporter system, we improved PURE system productivity up to 5 fold by adding or adjusting a variety of factors that affect transcription and translation, including Elongation factors (EF-Ts, EF-Tu, EF-G, and EF4), ribosome recycling factor (RRF), release factors (RF1, RF2, RF3), chaperones (GroEL/ES), BSA and tRNAs. The work provides a more efficient defined in vitro transcription and translation system and a deeper understanding of the factors that limit the whole system efficiency.
Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs.
Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species.
Availability and implementation: The Naked Mole Rat Genome Resource is freely available online at http://www.naked-mole-rat.org. This resource is open source and the source code is available at https://github.com/maglab/naked-mole-rat-portal.
CRISPR-Cas systems have been used with single-guide RNAs for accurate gene disruption and conversion in multiple biological systems. Here we report the use of the endonuclease Cas9 to target genomic sequences in the C. elegans germline, utilizing single-guide RNAs that are expressed from a U6 small nuclear RNA promoter. Our results demonstrate that targeted, heritable genetic alterations can be achieved in C. elegans, providing a convenient and effective approach for generating loss-of-function mutants.
Rapid advancement of next generation sequencing technologies has made it
possible to study expression profiles of microRNAs (miRNAs) comprehensively and
efficiently. We have previously shown that multiplexing miRNA libraries by
barcoding can significantly reduce sequencing cost per sample without
compromising library quality [Alon et al.
2011, Vigneault et al 2012].
In this unit, we provide a step-by-step protocol to isolate miRNAs and construct
multiplexed miRNA libraries. We also describe a custom computational pipeline
designed to analyze the multiplexed miRNA library sequencing reads generated by
De novo synthesis of long double-stranded DNA constructs
has a myriad of applications in biology and biological engineering. However, its
widespread adoption has been hindered by high costs. Cost can be significantly
reduced by using oligonucleotides synthesized on high-density DNA chips.
However, most methods for using off-chip DNA for gene synthesis have failed to
scale due to the high error rates, low yields, and high chemical complexity of
the chip-synthesized oligonucleotides. We have recently demonstrated that some
commercial DNA chip manufacturers have improved error rates, and that the issues
of chemical complexity and low yields can be solved by using barcoded primers to
accurately and efficiently amplify subpools of oligonucleotides. This article
includes protocols for computationally designing the DNA chip, amplifying the
oligonucleotide subpools, and assembling 500-800 basepair (bp) constructs.
oligonucleotide; gene synthesis; nucleic acids; synthetic biology
Advances in next-generation technologies have rapidly improved sequencing fidelity and significantly decreased sequencing error rates. However, with billions of nucleotides in a human genome, even low experimental error rates yield many errors in variant calls. Erroneous variants can mimic true somatic and rare variants, thus requiring costly confirmatory experiments to minimize the number of false positives. Here we discuss sources of experimental error in next-generation sequencing and how replicates can be used to abate them.
RNA-guided Cas9 nucleases derived from clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems have dramatically transformed our ability to edit the genomes of diverse organisms. We believe tools and techniques based on Cas9, a single unifying factor capable of colocalizing RNA, DNA and protein, will grant unprecedented control over cellular organization, regulation and behavior. Here we describe the Cas9 targeting methodology, detail current and prospective engineering advances and suggest potential applications ranging from basic science to the clinic.
Bacteria rely on two known DNA-level defenses against their bacteriophage predators: restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Certain phages have evolved countermeasures that are known to block endonucleases. For example, phage T4 not only adds hydroxymethyl groups to all of its cytosines, but also glucosylates them, a strategy that defeats almost all restriction enzymes. We sought to determine whether these DNA modifications can similarly impede CRISPR-based defenses. In a bioinformatics search, we found naturally occurring CRISPR spacers that potentially target phages known to modify their DNA. Experimentally, we show that the Cas9 nuclease from the Type II CRISPR system of Streptococcus pyogenes can overcome a variety of DNA modifications in Escherichia coli. The levels of Cas9-mediated phage resistance to bacteriophage T4 and the mutant phage T4 gt, which contains hydroxymethylated but not glucosylated cytosines, were comparable to phages with unmodified cytosines, T7 and the T4-like phage RB49. Our results demonstrate that Cas9 is not impeded by N6-methyladenine, 5-methylcytosine, 5-hydroxymethylated cytosine, or glucosylated 5-hydroxymethylated cytosine.
Major advances in genome editing have recently been made possible with the
development of the TALEN and CRISPR/Cas9 methods. The speed and ease of
implementing these technologies has led to an explosion of mutant and transgenic
organisms. A rate-limiting step in efficiently applying TALEN and CRISPR/Cas9
methods is the selection and design of targeting constructs. We have developed
an online tool, CHOPCHOP (https://chopchop.rc.fas.harvard.edu), to expedite the design
process. CHOPCHOP accepts a wide range of inputs (gene identifiers, genomic
regions or pasted sequences) and provides an array of advanced options for
target selection. It uses efficient sequence alignment algorithms to minimize
search times, and rigorously predicts off-target binding of single-guide RNAs
(sgRNAs) and TALENs. Each query produces an interactive visualization of the
gene with candidate target sites displayed at their genomic positions and
color-coded according to quality scores. In addition, for each possible target
site, restriction sites and primer candidates are visualized, facilitating a
streamlined pipeline of mutant generation and validation. The ease-of-use and
speed of CHOPCHOP make it a valuable tool for genome engineering.
Adenosine-to-inosine (A-to-I) RNA editing, in which genomically encoded adenosine is changed to inosine in RNA, is catalyzed by adenosine deaminase acting on RNA (ADAR). This fine-tuning mechanism is critical during normal development and diseases, particularly in relation to brain functions. A-to-I RNA editing has also been hypothesized to be a driving force in human brain evolution. A large number of RNA editing sites have recently been identified, mostly as a result of the development of deep sequencing and bioinformatic analyses. Deciphering the functional consequences of RNA editing events is challenging, but emerging genome engineering approaches may expedite new discoveries. To understand how RNA editing is dynamically regulated, it is imperative to construct a spatiotemporal atlas at the species, tissue and cell levels. Future studies will need to identify the cis and trans regulatory factors that drive the selectivity and frequency of RNA editing. We anticipate that recent technological advancements will aid researchers in acquiring a much deeper understanding of the functions and regulation of RNA editing.
The Cas9 protein from the Streptococcus pyogenes CRISPR-Cas immune system has been adapted for both RNA-guided genome editing and gene regulation in a variety of organisms, but can mediate only a single activity at a time within any given cell. Here we characterize a set of fully orthogonal Cas9 proteins and demonstrate their ability to mediate simultaneous and independently targeted gene regulation and editing in bacteria and in human cells. We find that Cas9 orthologs display consistent patterns in their recognition of target sequences and identify a highly targetable protein from Neisseria meningitidis. Our results provide a basal set of orthogonal RNA-guided proteins for controlling biological systems and establish a general methodology for characterizing additional proteins and adapting them to eukaryotic cells.
Despite rapid advances in genome engineering technologies, inserting genes into precise locations in the human genome remains an outstanding problem. It has been suggested that site-specific recombinases can be adapted towards use as transgene delivery vectors. The specificity of recombinases can be altered either with directed evolution or via fusions to modular DNA-binding domains. Unfortunately, both wildtype and altered variants often have detectable activities at off-target sites. Here we use bacterial selections to identify mutations in the dimerization surface of Cre recombinase (R32V, R32M, and 303GVSdup) that improve the accuracy of recombination. The mutants are functional in bacteria, in human cells, and in vitro (except for 303GVSdup, which we did not purify), and have improved selectivity against both model off-target sites and the entire E. coli genome. We propose that destabilizing binding cooperativity may be a general strategy for improving the accuracy of dimeric DNA-binding proteins.
Prokaryotic type II CRISPR-Cas systems can be adapted to enable targeted
genome modifications across a range of eukaryotes.1–7. Here we engineer this system to enable RNA-guided genome
regulation in human cells by tethering transcriptional activation domains either
directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide
RNA (sgRNA). Using this functionality we developed a novel transcriptional
activation–based assay to determine the landscape of off-target binding
of sgRNA:Cas9 complexes and compared it with the off-target activity of
transcription activator–like (TAL) effector proteins8, 9.
Our results reveal that specificity profiles are sgRNA dependent, and that
sgRNA:Cas9 complexes and 18-mer TAL effector proteins can potentially tolerate
1–3 and 1–2 target mismatches, respectively. By engineering a
requirement for cooperativity through offset nicking for genome editing or
through multiple synergistic sgRNAs for robust transcriptional activation, we
suggest methods to mitigate off-target phenomena. Our results expand the
versatility of the sgRNA:Cas9 tool and highlight the critical need to engineer
Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment.
Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project.
We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.
The dynamic nature of gene expression enables cellular programming, homeostasis, and environmental adaptation in living systems. Dissection of causal gene functions in cellular and organismal processes therefore necessitates approaches that enable spatially and temporally precise modulation of gene expression. Recently, a variety of microbial and plant-derived light-sensitive proteins have been engineered as optogenetic actuators, enabling high precision spatiotemporal control of many cellular functions1-11. However, versatile and robust technologies that enable optical modulation of transcription in the mammalian endogenous genome remain elusive. Here, we describe the development of Light-Inducible Transcriptional Effectors (LITEs), an optogenetic two-hybrid system integrating the customizable TALE DNA-binding domain12-14 with the light-sensitive cryptochrome 2 protein and its interacting partner CIB1 from Arabidopsis thaliana. LITEs do not require additional exogenous chemical co-factors, are easily customized to target many endogenous genomic loci, and can be activated within minutes with reversibility3,4,6,7,15. LITEs can be packaged into viral vectors and genetically targeted to probe specific cell populations. We have applied this system in primary mouse neurons, as well as in the brain of awake mice in vivo to mediate reversible modulation of mammalian endogenous gene expression as well as targeted epigenetic chromatin modifications. The LITE system establishes a novel mode of optogenetic control of endogenous cellular processes and enables direct testing of the causal roles of genetic and epigenetic regulation in normal biological processes and disease states.