Autoimmune disease results from a loss of tolerance to self-antigens in genetically susceptible individuals. Completely understanding this process requires that targeted antigens be identified, and so a number of techniques have been developed to determine immune receptor specificities. We previously reported the construction of a phage-displayed synthetic human peptidome and a proof-of-principle analysis of antibodies from three patients with neurological autoimmunity. Here we present data from a large-scale screen of 298 independent antibody repertoires, including those from 73 healthy sera, using phage immunoprecipitation sequencing. The resulting database of peptide-antibody interactions characterizes each individual’s unique autoantibody fingerprint, and includes specificities found to occur frequently in the general population as well as those associated with disease. Screening type 1 diabetes (T1D) patients revealed a prematurely polyautoreactive phenotype compared with their matched controls. A collection of cerebrospinal fluids and sera from 63 multiple sclerosis patients uncovered novel, as well as previously reported antibody-peptide interactions. Finally, a screen of synovial fluids and sera from 64 rheumatoid arthritis patients revealed novel disease-associated antibody specificities that were independent of seropositivity status. This work demonstrates the utility of performing PhIP-Seq screens on large numbers of individuals and is another step toward defining the full complement of autoimmunoreactivities in health and disease.
autoantigen discovery; high throughput screening; PhIP-Seq; proteomics
We develop here a novel approach to barcode large numbers of cells through cell-surface expression of programmable zinc-finger DNA-binding domains (sZFs). We show sZFs enable double-stranded DNA to sequence-specifically label living cells, and also develop a sequential tagging approach to in situ image >3 cell types using just 3 fluorophores. Finally we demonstrate their broad versatility through ability to serve as surrogate reporters and facilitate selective cell capture and targeting.
Conceived with the aim of meeting the needs of the neurobiology and clinical communities, the Brain Research through Advancing Innovative Technologies (BRAIN) Initiative builds on the lessons learned from major projects in genetics, such as the Human Genome Project. It concentrates on the use of new imaging technologies in conjunction with genomics to inform therapeutic decisions.
BRAIN Initiative; imaging; Genome Project; genomics
Neuroscientists have made impressive advances in understanding the microscale function of single neurons and the macroscale activity of the human brain. One can probe molecular and biophysical aspects of individual neurons and also view the human brain in action with magnetic resonance imaging (MRI) or magnetoencephalography (MEG). However, the mechanisms of perception, cognition, and action remain mysterious because they emerge from the real-time interactions of large sets of neurons in densely interconnected, widespread neural circuits.
The function of neural circuits is an emergent property that arises from the coordinated activity of large numbers of neurons. To capture this, we propose launching a large-scale, international public effort, the Brain Activity Map Project, aimed at reconstructing the full record of neural activity across complete neural circuits. This technological challenge could prove to be an invaluable step toward understanding fundamental and pathological brain processes.
“The behavior of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear.” –More Is Different, P.W. Anderson
“New directions in science are launched by new tools much more often than by new concepts. The effect of a concept-driven revolution is to explain old things in new ways. The effect of a tool-driven revolution is to discover new things that have to be explained.” –Imagined Worlds, Freeman Dyson
Next-Generation Sequencing offers many advantages over other methods of microRNA (miRNA) expression profiling, such as sample throughput and the capability to discover novel miRNAs. As the sequencing depth of current sequencing platforms exceeds what is necessary to quantify miRNAs, multiplexing several samples in one sequencing run offers a significant cost advantage. Although previous studies have achieved this goal by adding barcodes to miRNA libraries at the ligation step, this was recently shown to introduce significant bias into the miRNA expression data. This bias can be avoided, however, by barcoding the miRNA libraries at the PCR step instead. Here, we describe a user-friendly PCR bar-coding method of preparing multiplexed microRNA libraries for Illumina-based sequencing. The method also prevents the production of adapter dimers and can be completed in one day.
miRNA; Illumina; Sequencing; library; multiplex; bar code
Neuroscience is at a crossroads. Great effort is being invested into deciphering specific neural interactions and circuits. At the same time, there exist few general theories or principles that explain brain function. We attribute this disparity, in part, to limitations in current methodologies. Traditional neurophysiological approaches record the activities of one neuron or a few neurons at a time. Neurochemical approaches focus on single neurotransmitters. Yet, there is an increasing realization that neural circuits operate at emergent levels, where the interactions between hundreds or thousands of neurons, utilizing multiple chemical transmitters, generate functional states. Brains function at the nanoscale, so tools to study brains must ultimately operate at this scale, as well. Nanoscience and nanotechnology are poised to provide a rich toolkit of novel methods to explore brain function by enabling simultaneous measurement and manipulation of activity of thousands or even millions of neurons. We and others refer to this goal as the Brain Activity Mapping Project. In this Nano Focus, we discuss how recent developments in nanoscale analysis tools and in the design and synthesis of nanomaterials have generated optical, electrical, and chemical methods that can readily be adapted for use in neuroscience. These approaches represent exciting areas of technical development and research. Moreover, unique opportunities exist for nanoscientists, nanotechnologists, and other physical scientists and engineers to contribute to tackling the challenging problems involved in understanding the fundamentals of brain function.
The identification and differentiation of a large number of distinct molecular species with high temporal and spatial resolution is a major challenge in biomedical science. Fluorescence microscopy is a powerful tool, but its multiplexing ability is limited by the number of spectrally distinguishable fluorophores. Here we use DNA-origami technology to construct sub-micrometer nanorods that act as fluorescent barcodes. We demonstrate that spatial control over the positioning of fluorophores on the surface of a stiff DNA nanorod can produce 216 distinct barcodes that can be unambiguously decoded using epifluorescence or total internal reflection fluorescence (TIRF) microscopy. Barcodes with higher spatial information density were demonstrated via the construction of super-resolution barcodes with features spaced by ~40 nm. One species of the barcodes was used to tag yeast surface receptors, suggesting their potential applications as in situ imaging probes for diverse biomolecular and cellular entities in their native environments.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (Cas) systems in bacteria and archaea use RNA-guided nuclease activity to provide adaptive immunity against invading foreign nucleic acids. Here, we report the use of type II bacterial CRISPR-Cas system in Saccharomyces cerevisiae for genome engineering. The CRISPR-Cas components, Cas9 gene and a designer genome targeting CRISPR guide RNA (gRNA), show robust and specific RNA-guided endonuclease activity at targeted endogenous genomic loci in yeast. Using constitutive Cas9 expression and a transient gRNA cassette, we show that targeted double-strand breaks can increase homologous recombination rates of single- and double-stranded oligonucleotide donors by 5-fold and 130-fold, respectively. In addition, co-transformation of a gRNA plasmid and a donor DNA in cells constitutively expressing Cas9 resulted in near 100% donor DNA recombination frequency. Our approach provides foundations for a simple and powerful genome engineering tool for site-specific mutagenesis and allelic replacement in yeast.
Protein pathways are dynamic and highly coordinated spatially and temporally, capable of performing a diverse range of complex chemistries and enzymatic reactions with precision and at high efficiency. Biotechnology aims to harvest these natural systems to construct more advanced in vitro reactions, capable of new chemistries and operating at high yield. Here, we present an efficient Multiplex Automated Genome Engineering (MAGE) strategy to simultaneously modify and co-purify large protein complexes and pathways from the model organism Escherichia coli to reconstitute functional synthetic proteomes in vitro. By application of over 110 MAGE cycles, we successfully inserted hexa-histidine sequences into 38 essential genes in vivo that encode for the entire translation machinery. Streamlined co-purification and reconstitution of the translation protein complex enabled protein synthesis in vitro. Our approach can be applied to a growing area of applications in in vitro one-pot multi-enzyme catalysis (MEC) to manipulate or enhance in vitro pathways such as natural product or carbohydrate biosynthesis.
Genome engineering; MAGE; cell-free protein synthesis; multi-enzyme catalysis; protein purification
Cytosine methylation, an epigenetic modification of DNA, is a target of growing interest for developing high throughput profiling technologies. Here we introduce two new, complementary techniques for cytosine methylation profiling utilizing next generation sequencing technology: bisulfite padlock probes (BSPPs) and methyl sensitive cut counting (MSCC). In the first method, we designed a set of ~10,000 BSPPs distributed over the ENCODE pilot project regions to take advantage of existing expression and chromatin immunoprecipitation data. We observed a pattern of low promoter methylation coupled with high gene body methylation in highly expressed genes. Using the second method, MSCC, we gathered genome-scale data for 1.4 million HpaII sites and confirmed that gene body methylation in highly expressed genes is a consistent phenomenon over the entire genome. Our observations highlight the usefulness of techniques which are not inherently or intentionally biased in favor of only profiling particular subsets like CpG islands or promoter regions.
Advances in computational metabolic optimization are required to realize the full potential of new in vivo metabolic engineering technologies by bridging the gap between computational design and strain development. We present Redirector, a new Flux Balance Analysis-based framework for identifying engineering targets to optimize metabolite production in complex pathways. Previous optimization frameworks have modeled metabolic alterations as directly controlling fluxes by setting particular flux bounds. Redirector develops a more biologically relevant approach, modeling metabolic alterations as changes in the balance of metabolic objectives in the system. This framework iteratively selects enzyme targets, adds the associated reaction fluxes to the metabolic objective, thereby incentivizing flux towards the production of a metabolite of interest. These adjustments to the objective act in competition with cellular growth and represent up-regulation and down-regulation of enzyme mediated reactions. Using the iAF1260 E. coli metabolic network model for optimization of fatty acid production as a test case, Redirector generates designs with as many as 39 simultaneous and 111 unique engineering targets. These designs discover proven in vivo targets, novel supporting pathways and relevant interdependencies, many of which cannot be predicted by other methods. Redirector is available as open and free software, scalable to computational resources, and powerful enough to find all known enzyme targets for fatty acid production.
A deeper understanding of biological processes, along with methods in synthetic biology, is driving the frontier of metabolic engineering. In particular, a better representation of cell metabolism will enable the engineering of bacterial strains that can act as factories for valuable biochemical products, from medicines to biofuels. Models which predict the behavior of these complex biological systems enable better engineering design as well as a more comprehensive understanding of fundamental biological principles. Here we develop a new method, called Redirector, for modeling metabolic alterations, and their relationship to cell growth. This method optimizes genetic engineering changes to achieve metabolite production using a new representation of the metabolic impact of genetic manipulation, which is more biologically realistic than existing models. We discover proven and novel engineering targets to improve fatty acid production, correctly predicting how different combinations of genes build upon one another. This work demonstrates that Redirector is a powerful method for designing cell factories and improving our understanding of metabolic systems.
Large-scale RNAi-based screens are a major technology, but require adequate prioritization and validation of candidate genes from the primary screen. In this work, we performed a large-scale pooled shRNA screen in mouse embryonic stem cells (ESCs) to discover genes associated with oxidative stress resistance and found several candidates. We then developed a bioinformatics pipeline to prioritize these candidates incorporating effect sizes, functional enrichment analysis, interaction networks and gene expression information. To validate candidates, we mixed normal cells with cells expressing the shRNA coupled to a fluorescent protein, which allows control cells to be used as an internal standard, and thus we could detect shRNAs with subtle effects. Although we did not identify genes associated with oxidative stress resistance, as a proof-of-concept of our pipeline we demonstrate a detrimental role of Edd1 silencing in ESC growth. Our methods may be useful for candidate gene prioritization of large-scale RNAi-based screens.
Multiplex Automated Genome Engineering (MAGE) employs short oligonucleotides to scarlessly modify genomes. However, insertions of >10 bases are still inefficient, but can be improved substantially by selection of highly modified chromosomes. Here, we describe Co-Selection MAGE (CoS-MAGE) to optimize biosynthesis of aromatic amino acid derivatives by combinatorially inserting multiple T7 promoters simultaneously into 12 genomic operons. Promoter libraries can be quickly generated to study gain-of-function epistatic interactions in gene networks.
Broadly neutralizing HIV antibodies (bnAbs) are typically highly somatically mutated, raising doubts as to whether they can be elicited by vaccination. We used 454 sequencing and designed a novel phylogenetic method to model lineage evolution of the bnAbs PGT121–134 and found a positive correlation between the level of somatic hypermutation (SHM) and the development of neutralization breadth and potency. Strikingly, putative intermediates were characterized that show approximately half the mutation level of PGT121–134 but were still capable of neutralizing roughly 40–80% of PGT121–134 sensitive viruses in a 74-virus panel at median titers between 15- and 3-fold higher than PGT121–134. Such antibodies with lower levels of SHM may be more amenable to elicitation through vaccination while still providing noteworthy coverage. Binding characterization indicated a preference of inferred intermediates for native Env binding over monomeric gp120, suggesting that the PGT121–134 lineage may have been selected for binding to native Env at some point during maturation. Analysis of glycan-dependent neutralization for inferred intermediates identified additional adjacent glycans that comprise the epitope and suggests changes in glycan dependency or recognition over the course of affinity maturation for this lineage. Finally, patterns of neutralization of inferred bnAb intermediates suggest hypotheses as to how SHM may lead to potent and broad HIV neutralization and provide important clues for immunogen design.
A majority of the over 30 million HIV-1 infected individuals worldwide live in poorly resourced areas where multiple boost strategies, which are likely needed to generate highly mutated antibodies, present formidable logistical challenges. Accordingly, developing new vaccination strategies that are capable of generating highly mutated antibodies should be an active area of research. Another approach, that is not mutually exclusive, is to identify new bnAbs that are both broad and potent in neutralization, but are much less mutated than the bnAbs that currently exist. Here, we have identified bnAbs that are approximately half the mutation frequency of known bnAbs, but maintain high potency and moderate breadth. These less mutated bnAbs offer an important advantage in that they would likely be easier to induce through vaccination than more mutated antibodies. By characterizing these putative intermediates, we can also better estimate how affinity maturation proceeded to result in an antibody with broad and potent neutralization activity and offer more focused strategies for designing immunogens capable of eliciting these less mutated bnAbs.
Pre-symptomatic prediction of disease and drug response based on genetic testing is a critical component of personalized medicine. Previous work has demonstrated that the predictive capacity of genetic testing is constrained by the heritability and prevalence of the tested trait, although these constraints have only been approximated under the assumption of a normally distributed genetic risk distribution.
Here, we mathematically derive the absolute limits that these factors impose on test accuracy in the absence of any distributional assumptions on risk. We present these limits in terms of the best-case receiver-operating characteristic (ROC) curve, consisting of the best-case test sensitivities and specificities, and the AUC (area under the curve) measure of accuracy. We apply our method to genetic prediction of type 2 diabetes and breast cancer, and we additionally show the best possible accuracy that can be obtained from integrated predictors, which can incorporate non-genetic features.
Knowledge of such limits is valuable in understanding the implications of genetic testing even before additional associations are identified.
DNA built from modular repeats presents a challenge for gene synthesis. We present a solid surface-based sequential ligation approach, which we refer to as iterative capped assembly (ICA), that adds DNA repeat monomers individually to a growing chain while using hairpin ‘capping’ oligonucleotides to block incompletely extended chains, greatly increasing the frequency of full-length final products. Applying ICA to a model problem, construction of custom transcription activator-like effector nucleases (TALENs) for genome engineering, we demonstrate efficient synthesis of TALE DNA-binding domains up to 21 monomers long and their ligation into a nuclease-carrying backbone vector all within 3 h. We used ICA to synthesize 20 TALENs of varying DNA target site length and tested their ability to stimulate gene editing by a donor oligonucleotide in human cells. All the TALENS show activity, with the ones >15 monomers long tending to work best. Since ICA builds full-length constructs from individual monomers rather than large exhaustive libraries of pre-fabricated oligomers, it will be trivial to incorporate future modified TALE monomers with improved or expanded function or to synthesize other types of repeat-modular DNA where the diversity of possible monomers makes exhaustive oligomer libraries impractical.
Genome-scale engineering of living organisms requires precise and economical methods to efficiently modify many loci within chromosomes. One such example is the directed integration of chemically synthesized single-stranded deoxyribonucleic acid (oligonucleotides) into the chromosome of Escherichia coli during replication. Herein, we present a general co-selection strategy in multiplex genome engineering that yields highly modified cells. We demonstrate that disparate sites throughout the genome can be easily modified simultaneously by leveraging selectable markers within 500 kb of the target sites. We apply this technique to the modification of 80 sites in the E. coli genome.
The microbial conversion of solid cellulosic biomass to liquid biofuels may provide a renewable energy source for transportation fuels. Endophytes represent a promising group of organisms, as they are a mostly untapped reservoir of metabolic diversity. They are often able to degrade cellulose, and they can produce an extraordinary diversity of metabolites. The filamentous fungal endophyte Ascocoryne sarcoides was shown to produce potential-biofuel metabolites when grown on a cellulose-based medium; however, the genetic pathways needed for this production are unknown and the lack of genetic tools makes traditional reverse genetics difficult. We present the genomic characterization of A. sarcoides and use transcriptomic and metabolomic data to describe the genes involved in cellulose degradation and to provide hypotheses for the biofuel production pathways. In total, almost 80 biosynthetic clusters were identified, including several previously found only in plants. Additionally, many transcriptionally active regions outside of genes showed condition-specific expression, offering more evidence for the role of long non-coding RNA in gene regulation. This is one of the highest quality fungal genomes and, to our knowledge, the only thoroughly annotated and transcriptionally profiled fungal endophyte genome currently available. The analyses and datasets contribute to the study of cellulose degradation and biofuel production and provide the genomic foundation for the study of a model endophyte system.
A renewable source of energy is a pressing global need. The biological conversion of lignocellulose to biofuels by microorganisms presents a promising avenue, but few organisms have been studied thoroughly enough to develop the genetic tools necessary for rigorous experimentation. The filamentous-fungal endophyte A. sarcoides produces metabolites when grown on a cellulose-based medium that include eight-carbon volatile organic compounds, which are potential biofuel targets. Here we use broadly applicable methods including genomics, transcriptomics, and metabolomics to explore the biofuel production of A. sarcoides. These data were used to assemble the genome into 16 scaffolds, to thoroughly annotate the cellulose-degradation machinery, and to make predictions for the production pathway for the eight-carbon volatiles. Extremely high expression of the gene swollenin when grown on cellulose highlights the importance of accessory proteins in addition to the enzymes that catalyze the breakdown of the polymers. Correlation of the production of the eight-carbon biofuel-like metabolites with the expression of lipoxygenase pathway genes suggests the catabolism of linoleic acid as the mechanism of eight-carbon compound production. This is the first fungal genome to be sequenced in the family Helotiaceae, and A. sarcoides was isolated as an endophyte, making this work also potentially useful in fungal systematics and the study of plant–fungus relationships.
Development of cheap, high-throughput, and reliable gene synthesis methods will broadly stimulate progress in biology and biotechnology1. Currently, the reliance on column-synthesized oligonucleotides as a source of DNA limits further cost reductions in gene synthesis2. Oligonucleotides from DNA microchips can reduce costs by at least an order of magnitude3,4,5, yet efforts to scale their use have been largely unsuccessful due to the high error rates and complexity of the oligonucleotide mixtures. Here we use high-fidelity DNA microchips, selective oligonucleotide pool amplification, optimized gene assembly protocols, and enzymatic error correction to develop a highly parallel gene synthesis platform. We tested our platform by assembling 47 genes, including 42 challenging therapeutic antibody sequences, encoding a total of ~35 kilo-basepairs of DNA. These assemblies were performed from a complex background containing 13,000 oligonucleotides encoding ~2.5 megabases of DNA, which is at least 50 times larger than previously published attempts.
Overlapping but oppositely oriented transcripts have the potential to form sense-antisense perfect double-stranded (ds) RNA duplexes. A bioinformatics approach has identified over 217 candidate overlapping transcriptional units, bringing the total number of predicted and validated examples of overlapping but oppositely oriented transcripts to over 300.
Overlapping but oppositely oriented transcripts have the potential to form sense-antisense perfect double-stranded (ds) RNA duplexes. Over recent years, the number and variety of examples of mammalian gene-regulatory phenomena in which endogenous dsRNA duplexes have been proposed or demonstrated to participate has greatly increased. These include genomic imprinting, RNA interference, translational regulation, alternative splicing, X-inactivation and RNA editing. We computationally mined public mouse and human expressed sequence tag (EST) databases to search for additional examples of bidirectionally transcribed genomic regions.
Our bioinformatics approach identified over 217 candidate overlapping transcriptional units, almost all of which are novel. From experimental validation of a subset of our predictions by orientation-specific RT-PCR, we estimate that our methodology has a specificity of 84% or greater. In many cases, regions of sense-antisense overlap within the 5'- or 3'-untranslated regions of a given transcript correlate with genomic patterns of mouse-human conservation.
Our results, in conjunction with the literature, bring the total number of predicted and validated examples of overlapping but oppositely oriented transcripts to over 300. Several of these cases support the hypothesis that a subset of the instances of substantial mouse-human conservation in the 5' and 3' UTRs of transcripts might be explained in part by functionality of an overlapping transcriptional unit.
Reprogramming; Induced pluripotent stem cells; Peripheral blood mononuclear cells; Terminally differentiated T-cells
Several emerging technologies are aiming to meet renewable fuel standards, mitigate greenhouse gas emissions, and provide viable alternatives to fossil fuels. Direct conversion of solar energy into fungible liquid fuel is a particularly attractive option, though conversion of that energy on an industrial scale depends on the efficiency of its capture and conversion. Large-scale programs have been undertaken in the recent past that used solar energy to grow innately oil-producing algae for biomass processing to biodiesel fuel. These efforts were ultimately deemed to be uneconomical because the costs of culturing, harvesting, and processing of algal biomass were not balanced by the process efficiencies for solar photon capture and conversion. This analysis addresses solar capture and conversion efficiencies and introduces a unique systems approach, enabled by advances in strain engineering, photobioreactor design, and a process that contradicts prejudicial opinions about the viability of industrial photosynthesis. We calculate efficiencies for this direct, continuous solar process based on common boundary conditions, empirical measurements and validated assumptions wherein genetically engineered cyanobacteria convert industrially sourced, high-concentration CO2 into secreted, fungible hydrocarbon products in a continuous process. These innovations are projected to operate at areal productivities far exceeding those based on accumulation and refining of plant or algal biomass or on prior assumptions of photosynthetic productivity. This concept, currently enabled for production of ethanol and alkane diesel fuel molecules, and operating at pilot scale, establishes a new paradigm for high productivity manufacturing of nonfossil-derived fuels and chemicals.
Cyanobacteria; Metabolic engineering; Hydrocarbon; Alkane; Diesel; Renewable fuel; Algae; Biomass; Biodiesel