Search tips
Search criteria

Results 1-25 (103)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Recoding the Genetic Code with Selenocysteine 
Selenocysteine (Sec) is naturally incorporated into proteins by recoding the stop codon UGA. Sec is not hardwired to UGA, as we found the Sec insertion machinery to be able to site-specifically incorporate Sec directed by 58 of the 64 codons. For 15 sense codons, complete conversion of the codon meaning from canonical amino acid to Sec was observed along with a 10-fold increase in selenoprotein yield compared to Sec insertion at the three stop codons. This high-fidelity sense-codon recoding mechanism was demonstrated for Escherichia coli formate dehydrogenase and recombinant human thioredoxin reductase and confirmed by independent biochemical and biophysical methods. Although Sec insertion at UGA is known to compete against protein termination, it is surprising that the Sec machinery has the ability to outcompete abundant aminoacyl-tRNAs in decoding sense codons. The findings have implications for the process of translation and the information storage capacity of the biological cell.
PMCID: PMC4004526  PMID: 24511637
genetic code; sense codon recoding; RNA engineering; selenocysteine; synthetic biology
2.  Complete Genome Sequences of T4-Like Bacteriophages RB3, RB5, RB6, RB7, RB9, RB10, RB27, RB33, RB55, RB59, and RB68 
Genome Announcements  2015;3(1):e01122-14.
T4-like bacteriophages have been explored for phage therapy and are model organisms for phage genomics and evolution. Here, we describe the sequencing of 11 T4-like phages. We found a high nucleotide similarity among the T4, RB55, and RB59; RB32 and RB33; and RB3, RB5, RB6, RB7, RB9, and RB10 phages.
PMCID: PMC4293622  PMID: 25555735
3.  Multiplex single-molecule interaction profiling of DNA barcoded proteins 
Nature  2014;515(7528):554-557.
In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
PMCID: PMC4246050  PMID: 25252978
4.  Multiple haplotype-resolved genomes reveal population patterns of gene and protein diplotypes 
Nature Communications  2014;5:5569.
To fully understand human biology and link genotype to phenotype, the phase of DNA variants must be known. Here we present a comprehensive analysis of haplotype-resolved genomes to assess the nature and variation of haplotypes and their pairs, diplotypes, in European population samples. We use a set of 14 haplotype-resolved genomes generated by fosmid clone-based sequencing, complemented and expanded by up to 372 statistically resolved genomes from the 1000 Genomes Project. We find immense diversity of both haploid and diploid gene forms, up to 4.1 and 3.9 million corresponding to 249 and 235 per gene on average. Less than 15% of autosomal genes have a predominant form. We describe a ‘common diplotypic proteome’, a set of 4,269 genes encoding two different proteins in over 30% of genomes. We show moreover an abundance of cis configurations of mutations in the 386 genomes with an average cis/trans ratio of 60:40, and distinguishable classes of cis- versus trans-abundant genes. This work identifies key features characterizing the diplotypic nature of human genomes and provides a conceptual and analytical framework, rich resources and novel hypotheses on the functional importance of diploidy.
Knowing which genetic variants exist on either parental chromosome requires diploid human genomes to be phased. Here the authors generate haplotype-resolved genomes and identify a large diversity of haploid and diploid gene forms, a common diplotypic proteome, and an abundance of cis configurations of mutations, highlighting the functional importance of diploidy.
PMCID: PMC4263165  PMID: 25424553
5.  Modeling the mitochondrial cardiomyopathy of Barth syndrome with iPSC and heart-on-chip technologies 
Nature medicine  2014;20(6):616-623.
Studying monogenic mitochondrial cardiomyopathies may yield insights into mitochondrial roles in cardiac development and disease. Here, we combine patient-derived and genetically engineered iPSCs with tissue engineering to elucidate the pathophysiology underlying the cardiomyopathy of Barth syndrome (BTHS), a mitochondrial disorder caused by mutation of the gene Tafazzin (TAZ). Using BTHS iPSC-derived cardiomyocytes (iPSC-CMs), we defined metabolic, structural, and functional abnormalities associated with TAZ mutation. BTHS iPSC-CMs assembled sparse and irregular sarcomeres, and engineered BTHS “heart on chip” tissues contracted weakly. Gene replacement and genome editing demonstrated that TAZ mutation is necessary and sufficient for these phenotypes. Sarcomere assembly and myocardial contraction abnormalities occurred in the context of normal whole cell ATP levels. Excess levels of reactive oxygen species mechanistically linked TAZ mutation to impaired cardiomyocyte function. Our study provides new insights into the pathogenesis of Barth syndrome, suggests new treatment strategies, and advances iPSC-based in vitro modeling of cardiomyopathy.
PMCID: PMC4172922  PMID: 24813252
6.  Concerning RNA-guided gene drives for the alteration of wild populations 
eLife  2014;3:e03401.
Gene drives may be capable of addressing ecological problems by altering entire populations of wild organisms, but their use has remained largely theoretical due to technical constraints. Here we consider the potential for RNA-guided gene drives based on the CRISPR nuclease Cas9 to serve as a general method for spreading altered traits through wild populations over many generations. We detail likely capabilities, discuss limitations, and provide novel precautionary strategies to control the spread of gene drives and reverse genomic changes. The ability to edit populations of sexual species would offer substantial benefits to humanity and the environment. For example, RNA-guided gene drives could potentially prevent the spread of disease, support agriculture by reversing pesticide and herbicide resistance in insects and weeds, and control damaging invasive species. However, the possibility of unwanted ecological effects and near-certainty of spread across political borders demand careful assessment of each potential application. We call for thoughtful, inclusive, and well-informed public discussions to explore the responsible use of this currently theoretical technology.
PMCID: PMC4117217  PMID: 25035423
gene drive; ecological engineering; population engineering; cas9; CRISPR; emerging technology; none
7.  Rapid neurogenesis through transcriptional activation in human stem cells 
Molecular Systems Biology  2014;10(11):760.
Advances in cellular reprogramming and stem cell differentiation now enable ex vivo studies of human neuronal differentiation. However, it remains challenging to elucidate the underlying regulatory programs because differentiation protocols are laborious and often result in low neuron yields. Here, we overexpressed two Neurogenin transcription factors in human-induced pluripotent stem cells and obtained neurons with bipolar morphology in 4 days, at greater than 90% purity. The high purity enabled mRNA and microRNA expression profiling during neurogenesis, thus revealing the genetic programs involved in the rapid transition from stem cell to neuron. The resulting cells exhibited transcriptional, morphological and functional signatures of differentiated neurons, with greatest transcriptional similarity to prenatal human brain samples. Our analysis revealed a network of key transcription factors and microRNAs that promoted loss of pluripotency and rapid neurogenesis via progenitor states. Perturbations of key transcription factors affected homogeneity and phenotypic properties of the resulting neurons, suggesting that a systems-level view of the molecular biology of differentiation may guide subsequent manipulation of human stem cells to rapidly obtain diverse neuronal types.
PMCID: PMC4299601  PMID: 25403753
gene regulatory networks; microRNAs; neurogenesis; stem cell differentiation; transcriptomics
8.  Highly multiplexed subcellular RNA sequencing in situ 
Science (New York, N.Y.)  2014;343(6177):1360-1363.
Understanding the spatial organization of gene expression with single nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced within a biological sample. Using 30-base reads from 8,742 genes in situ, we examined RNA expression and localization in human primary fibroblasts using a simulated wound healing assay. FISSEQ is compatible with tissue sections and whole mount embryos, and reduces the limitations of optical resolution and noisy signals on single molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
PMCID: PMC4140943  PMID: 24578530
9.  Application of a synthetic human proteome to autoantigen discovery through PhIP-Seq 
Nature biotechnology  2011;29(6):535-541.
In this study, we improve on current autoantigen discovery approaches by creating a synthetic representation of the complete human proteome, the T7 “peptidome” phage display library (T7-Pep), and use it to profile the autoantibody repertoires of individual patients. We provide methods for 1) designing and cloning large libraries of DNA microarray-derived oligonucleotides encoding peptides for display on bacteriophage, and 2) analysis of the peptide libraries using high throughput DNA sequencing. We applied phage immunoprecipitation sequencing (PhIP-Seq) to identify both known and novel autoantibodies contained in the spinal fluid of three patients with paraneoplastic neurological syndromes. We also show how our approach can be used more generally to identify peptide-protein interactions and point toward ways in which this technology will be further developed in the future. We envision that PhIP-Seq can become an important new tool in autoantibody analysis, as well as proteomic research in general.
PMCID: PMC4169279  PMID: 21602805
Synthetic biology; proteomics; phage display; humoral autoimmunity; paraneoplastic neurological disorder; protein-protein interactions
10.  Improved Cell-Free RNA and Protein Synthesis System 
PLoS ONE  2014;9(9):e106232.
Cell-free RNA and protein synthesis (CFPS) is becoming increasingly used for protein production as yields increase and costs decrease. Advances in reconstituted CFPS systems such as the Protein synthesis Using Recombinant Elements (PURE) system offer new opportunities to tailor the reactions for specialized applications including in vitro protein evolution, protein microarrays, isotopic labeling, and incorporating unnatural amino acids. In this study, using firefly luciferase synthesis as a reporter system, we improved PURE system productivity up to 5 fold by adding or adjusting a variety of factors that affect transcription and translation, including Elongation factors (EF-Ts, EF-Tu, EF-G, and EF4), ribosome recycling factor (RRF), release factors (RF1, RF2, RF3), chaperones (GroEL/ES), BSA and tRNAs. The work provides a more efficient defined in vitro transcription and translation system and a deeper understanding of the factors that limit the whole system efficiency.
PMCID: PMC4152126  PMID: 25180701
11.  The Naked Mole Rat Genome Resource: facilitating analyses of cancer and longevity-related adaptations 
Bioinformatics  2014;30(24):3558-3560.
Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs.
Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (, featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species.
Availability and implementation: The Naked Mole Rat Genome Resource is freely available online at This resource is open source and the source code is available at
PMCID: PMC4253829  PMID: 25172923
12.  Heritable genome editing in C. elegans via a CRISPR-Cas9 system 
Nature methods  2013;10(8):10.1038/nmeth.2532.
CRISPR-Cas systems have been used with single-guide RNAs for accurate gene disruption and conversion in multiple biological systems. Here we report the use of the endonuclease Cas9 to target genomic sequences in the C. elegans germline, utilizing single-guide RNAs that are expressed from a U6 small nuclear RNA promoter. Our results demonstrate that targeted, heritable genetic alterations can be achieved in C. elegans, providing a convenient and effective approach for generating loss-of-function mutants.
PMCID: PMC3822328  PMID: 23817069
13.  Quantification of microRNA Expression with Next-Generation Sequencing 
Rapid advancement of next generation sequencing technologies has made it possible to study expression profiles of microRNAs (miRNAs) comprehensively and efficiently. We have previously shown that multiplexing miRNA libraries by barcoding can significantly reduce sequencing cost per sample without compromising library quality [Alon et al. 2011, Vigneault et al 2012]. In this unit, we provide a step-by-step protocol to isolate miRNAs and construct multiplexed miRNA libraries. We also describe a custom computational pipeline designed to analyze the multiplexed miRNA library sequencing reads generated by Illumina-based technology.
PMCID: PMC4138881  PMID: 23821442
15.  Gene Assembly from Chip-Synthesized Oligonucleotides 
De novo synthesis of long double-stranded DNA constructs has a myriad of applications in biology and biological engineering. However, its widespread adoption has been hindered by high costs. Cost can be significantly reduced by using oligonucleotides synthesized on high-density DNA chips. However, most methods for using off-chip DNA for gene synthesis have failed to scale due to the high error rates, low yields, and high chemical complexity of the chip-synthesized oligonucleotides. We have recently demonstrated that some commercial DNA chip manufacturers have improved error rates, and that the issues of chemical complexity and low yields can be solved by using barcoded primers to accurately and efficiently amplify subpools of oligonucleotides. This article includes protocols for computationally designing the DNA chip, amplifying the oligonucleotide subpools, and assembling 500-800 basepair (bp) constructs.
PMCID: PMC4112592  PMID: 25077042
oligonucleotide; gene synthesis; nucleic acids; synthetic biology
16.  The Role of Replicates for Error Mitigation in Next-Generation Sequencing 
Nature reviews. Genetics  2013;15(1):56-62.
Advances in next-generation technologies have rapidly improved sequencing fidelity and significantly decreased sequencing error rates. However, with billions of nucleotides in a human genome, even low experimental error rates yield many errors in variant calls. Erroneous variants can mimic true somatic and rare variants, thus requiring costly confirmatory experiments to minimize the number of false positives. Here we discuss sources of experimental error in next-generation sequencing and how replicates can be used to abate them.
PMCID: PMC4103745  PMID: 24322726
17.  Cas9 as a versatile tool for engineering biology 
Nature methods  2013;10(10):957-963.
RNA-guided Cas9 nucleases derived from clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems have dramatically transformed our ability to edit the genomes of diverse organisms. We believe tools and techniques based on Cas9, a single unifying factor capable of colocalizing RNA, DNA and protein, will grant unprecedented control over cellular organization, regulation and behavior. Here we describe the Cas9 targeting methodology, detail current and prospective engineering advances and suggest potential applications ranging from basic science to the clinic.
PMCID: PMC4051438  PMID: 24076990
18.  CRISPR/Cas9-Mediated Phage Resistance Is Not Impeded by the DNA Modifications of Phage T4 
PLoS ONE  2014;9(6):e98811.
Bacteria rely on two known DNA-level defenses against their bacteriophage predators: restriction-modification and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) systems. Certain phages have evolved countermeasures that are known to block endonucleases. For example, phage T4 not only adds hydroxymethyl groups to all of its cytosines, but also glucosylates them, a strategy that defeats almost all restriction enzymes. We sought to determine whether these DNA modifications can similarly impede CRISPR-based defenses. In a bioinformatics search, we found naturally occurring CRISPR spacers that potentially target phages known to modify their DNA. Experimentally, we show that the Cas9 nuclease from the Type II CRISPR system of Streptococcus pyogenes can overcome a variety of DNA modifications in Escherichia coli. The levels of Cas9-mediated phage resistance to bacteriophage T4 and the mutant phage T4 gt, which contains hydroxymethylated but not glucosylated cytosines, were comparable to phages with unmodified cytosines, T7 and the T4-like phage RB49. Our results demonstrate that Cas9 is not impeded by N6-methyladenine, 5-methylcytosine, 5-hydroxymethylated cytosine, or glucosylated 5-hydroxymethylated cytosine.
PMCID: PMC4041780  PMID: 24886988
19.  CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing 
Nucleic Acids Research  2014;42(Web Server issue):W401-W407.
Major advances in genome editing have recently been made possible with the development of the TALEN and CRISPR/Cas9 methods. The speed and ease of implementing these technologies has led to an explosion of mutant and transgenic organisms. A rate-limiting step in efficiently applying TALEN and CRISPR/Cas9 methods is the selection and design of targeting constructs. We have developed an online tool, CHOPCHOP (, to expedite the design process. CHOPCHOP accepts a wide range of inputs (gene identifiers, genomic regions or pasted sequences) and provides an array of advanced options for target selection. It uses efficient sequence alignment algorithms to minimize search times, and rigorously predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs. Each query produces an interactive visualization of the gene with candidate target sites displayed at their genomic positions and color-coded according to quality scores. In addition, for each possible target site, restriction sites and primer candidates are visualized, facilitating a streamlined pipeline of mutant generation and validation. The ease-of-use and speed of CHOPCHOP make it a valuable tool for genome engineering.
PMCID: PMC4086086  PMID: 24861617
20.  Deciphering the functions and regulation of brain-enriched A-to-I RNA editing 
Nature neuroscience  2013;16(11):1518-1522.
Adenosine-to-inosine (A-to-I) RNA editing, in which genomically encoded adenosine is changed to inosine in RNA, is catalyzed by adenosine deaminase acting on RNA (ADAR). This fine-tuning mechanism is critical during normal development and diseases, particularly in relation to brain functions. A-to-I RNA editing has also been hypothesized to be a driving force in human brain evolution. A large number of RNA editing sites have recently been identified, mostly as a result of the development of deep sequencing and bioinformatic analyses. Deciphering the functional consequences of RNA editing events is challenging, but emerging genome engineering approaches may expedite new discoveries. To understand how RNA editing is dynamically regulated, it is imperative to construct a spatiotemporal atlas at the species, tissue and cell levels. Future studies will need to identify the cis and trans regulatory factors that drive the selectivity and frequency of RNA editing. We anticipate that recent technological advancements will aid researchers in acquiring a much deeper understanding of the functions and regulation of RNA editing.
PMCID: PMC4015515  PMID: 24165678
21.  Orthogonal Cas9 Proteins for RNA-Guided Gene Regulation and Editing 
Nature methods  2013;10(11):10.1038/nmeth.2681.
The Cas9 protein from the Streptococcus pyogenes CRISPR-Cas immune system has been adapted for both RNA-guided genome editing and gene regulation in a variety of organisms, but can mediate only a single activity at a time within any given cell. Here we characterize a set of fully orthogonal Cas9 proteins and demonstrate their ability to mediate simultaneous and independently targeted gene regulation and editing in bacteria and in human cells. We find that Cas9 orthologs display consistent patterns in their recognition of target sequences and identify a highly targetable protein from Neisseria meningitidis. Our results provide a basal set of orthogonal RNA-guided proteins for controlling biological systems and establish a general methodology for characterizing additional proteins and adapting them to eukaryotic cells.
PMCID: PMC3844869  PMID: 24076762
22.  Mutants of Cre recombinase with improved accuracy 
Nature communications  2013;4:2509.
Despite rapid advances in genome engineering technologies, inserting genes into precise locations in the human genome remains an outstanding problem. It has been suggested that site-specific recombinases can be adapted towards use as transgene delivery vectors. The specificity of recombinases can be altered either with directed evolution or via fusions to modular DNA-binding domains. Unfortunately, both wildtype and altered variants often have detectable activities at off-target sites. Here we use bacterial selections to identify mutations in the dimerization surface of Cre recombinase (R32V, R32M, and 303GVSdup) that improve the accuracy of recombination. The mutants are functional in bacteria, in human cells, and in vitro (except for 303GVSdup, which we did not purify), and have improved selectivity against both model off-target sites and the entire E. coli genome. We propose that destabilizing binding cooperativity may be a general strategy for improving the accuracy of dimeric DNA-binding proteins.
PMCID: PMC3972015  PMID: 24056590
23.  CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering 
Nature biotechnology  2013;31(9):10.1038/nbt.2675.
Prokaryotic type II CRISPR-Cas systems can be adapted to enable targeted genome modifications across a range of eukaryotes.1–7. Here we engineer this system to enable RNA-guided genome regulation in human cells by tethering transcriptional activation domains either directly to a nuclease-null Cas9 protein or to an aptamer-modified single guide RNA (sgRNA). Using this functionality we developed a novel transcriptional activation–based assay to determine the landscape of off-target binding of sgRNA:Cas9 complexes and compared it with the off-target activity of transcription activator–like (TAL) effector proteins8, 9. Our results reveal that specificity profiles are sgRNA dependent, and that sgRNA:Cas9 complexes and 18-mer TAL effector proteins can potentially tolerate 1–3 and 1–2 target mismatches, respectively. By engineering a requirement for cooperativity through offset nicking for genome editing or through multiple synergistic sgRNAs for robust transcriptional activation, we suggest methods to mitigate off-target phenomena. Our results expand the versatility of the sgRNA:Cas9 tool and highlight the critical need to engineer improved specificity.
PMCID: PMC3818127  PMID: 23907171
24.  Harvard Personal Genome Project: lessons from participatory public research 
Genome Medicine  2014;6(2):10.
Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment.
Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project.
We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.
PMCID: PMC3978420  PMID: 24713084
25.  Optical Control of Mammalian Endogenous Transcription and Epigenetic States 
Nature  2013;500(7463):10.1038/nature12466.
The dynamic nature of gene expression enables cellular programming, homeostasis, and environmental adaptation in living systems. Dissection of causal gene functions in cellular and organismal processes therefore necessitates approaches that enable spatially and temporally precise modulation of gene expression. Recently, a variety of microbial and plant-derived light-sensitive proteins have been engineered as optogenetic actuators, enabling high precision spatiotemporal control of many cellular functions1-11. However, versatile and robust technologies that enable optical modulation of transcription in the mammalian endogenous genome remain elusive. Here, we describe the development of Light-Inducible Transcriptional Effectors (LITEs), an optogenetic two-hybrid system integrating the customizable TALE DNA-binding domain12-14 with the light-sensitive cryptochrome 2 protein and its interacting partner CIB1 from Arabidopsis thaliana. LITEs do not require additional exogenous chemical co-factors, are easily customized to target many endogenous genomic loci, and can be activated within minutes with reversibility3,4,6,7,15. LITEs can be packaged into viral vectors and genetically targeted to probe specific cell populations. We have applied this system in primary mouse neurons, as well as in the brain of awake mice in vivo to mediate reversible modulation of mammalian endogenous gene expression as well as targeted epigenetic chromatin modifications. The LITE system establishes a novel mode of optogenetic control of endogenous cellular processes and enables direct testing of the causal roles of genetic and epigenetic regulation in normal biological processes and disease states.
PMCID: PMC3856241  PMID: 23877069

Results 1-25 (103)