|Home | About | Journals | Submit | Contact Us | Français|
Man-made molecular “computers” that operate inside live cells will enable unprecedented level of control over cellular physiology. A promising approach to building these computers uses RNA molecules and RNA-based regulation. RNA naturally lends itself to create “digital” molecular networks that embody standardized (normal) forms of logic functions. The network’s inputs, that may or may not be inverted by single-input NOT logic gates, feed into multi-input AND gates whose outputs are in turn integrated in a multi-input OR gate. Below I review recent steps that have been taken toward implementing these networks with allosteric riboswitches and ribozymes in bacteria and yeast, and RNAi in mammalian cells. I also propose how to co-opt recently-discovered additional RNA regulation mechanisms into future construction efforts.
Biological pathways and networks “compute” developmental decisions and environmental responses using molecules as information cues. Accordingly, our ability to rationally control and manipulate biological systems will take a quantum leap once we succeed in creating and embedding in biological hosts computational biomolecular networks that process molecular information in a desired fashion. For example, in medicine “diagnostic biocomputers” could lead to the next generation of therapy whereby the amount, location and instance of drug release will be determined in real time based on a diagnostic computation using disease-related molecular cues in each cell and organ of the body .
Unlike a silicon computer that can be reprogrammed with a few hits on a keypad, a biomolecular computer is really a set of design tools which, when provided with a description of a computational task, generates a blueprint of a molecular network that can implement this task. One important challenge is to make sure that these tools are flexible enough to enable a sufficient variety of tasks. In theoretical computer science this problem was solved by the invention of universal models of computation, such as the Turing machine. While it seems unlikely that similarly universal approaches could be realized with biomolecular building blocks anytime soon, biocomputer architectures could, and should, aspire to at least some degree of generality in the computational sense.
A blueprint of a biomolecular computational network subsequently needs to be translated into actual system operating in a living cell. In the current state of the art, network’s conceptual design takes only a fraction of time spent on its experimental construction, and the major challenge is the extensive fine-tuning of network’s components and interactions at the implementation stage. Among several alternatives, RNA has recently emerged as an attractive material for biocomputer construction [2-5], likely to support a wide range of computational tasks on one hand, and behaving in a relatively predictable way in engineered systems on the other. This review discusses the latest results and points toward future directions in the field of engineered RNA-based computational networks that work in live cells.
An example of a computational task is “if concentration of protein A is above A0, and concentration of protein B is above B0, and concentration of metabolite C in less than C0, then activate a biological pathway X (e.g. apoptosis)”. While different approaches to biocomputation will lead to different network blueprints, they will all have a stereotypical structure: sensors that read out individual signals (i.e. concentrations of proteins A, B and metabolite C); computational core that determines a value of a certain function that manipulates these concentrations, such as: if (A > A0 and B > B0 and C < C0) then output signal Y; and the actuation component: if Y, activate pathway X.
The above example is a special case of what is known as Boolean, or logic, functions. These functions are of singular importance for computer science and engineering, as all modern digital computers operate by calculating and acting upon the truth values of various logic functions. Moreover, logic functions faithfully describe many important biological processes, for example during development of a multicellular organism, and as shown above they can be used to guide rationally-designed intervention. A formal definition of a Boolean function on N inputs (N = 3 in the above example) is a mapping that assigns either a True or False value to each of 2N specific combinations of input values, where each input can also be either True or False (Fig. 1A, left panel). In a biological “computational” network one needs to convincingly define what does it mean for a molecular input or output to be True or False. The most common way to relay information within and between cellular modules is to use concentrations of molecules, or related notions of their absolute numbers or activity. Therefore it is natural to correlate a molecule’s concentration to its interpretation as True or False in the context of a Boolean computational network. Of course, intracellular concentrations of individual molecular species can take any non-negative value from zero to millimolar. Plausible physiological concentrations of certain biomolecules can be more restricted – for example, a certain transcription factor can be either completely turned off, or fully induced. Were such a factor used as an input for a computational network, its Boolean interpretation would be straightforward, with the off state corresponding to False, and fully induced state corresponding to True. However, if a physiological concentration of a given molecule spans a range of values, one could ask if there is a way to define upper and lower thresholds so that increasing the input concentration above the upper threshold, or decreasing it below the low one, will make no effect on the network’s output. (On a side note, this discussion only considers concentrations of inputs and output in the steady state, when neither of them changes over time. Dynamic behavior that may arise when one or more of the inputs change suddenly cannot be directly predicted from the Boolean description and is related to kinetic parameters of interactions between network’s components). In this case, any concentration above the upper threshold would be interpreted as True, and below the low threshold as False. The intermediate values will necessarily eliminate the digital, Boolean features of the computation and turn it into a gradual, analog response. Those values might either be ignored if they are physiologically unimportant or explicitly dealt with, in which case one will need to consider the behavior of the computational network in this analog regime and understand the consequences of this behavior. Neither alternative has been explicitly addressed by the field but it will become increasingly important in the future.
To test the Boolean molecular network in the steady state we assign all possible 2N True/False combinations to the inputs, and measure the network’s output. The output is an ultimate readout of the engineering system we aim at constructing, so a convincing definition of the output’s truth value should be based on the absolute concentration of the output molecule. In other words, all computations that are supposed to produce a True logic output should generate sufficiently uniform high levels of the output molecule, and those with False output should lead to distinctly lower uniform output values. The definition of “high” and “low” here can be somewhat arbitrary; the ability of the downstream natural processes that are controlled by the computational network to distinguish between them and generate distinct physiological responses will ultimately define the success of the implementation.
A good biocomputer toolkit should ideally be able to lay out a computational core for any pre-defined logic function, or at least a wide range of useful functions. It should also tell us how to build sensors that feed various molecular inputs into these functions. While the core should be able to compute the truth value of a Boolean expression such as (A > A0 and B < B0 and C > C0 and D > D0) between four molecular signals, sensors will substitute specific inputs for abstract placeholders A, B, C, D, such as the protein NFkB for A, the metabolite cAMP for B, and so on. There are two popular ways to compute a logic function in a molecular context. The first recurrently employs certain two-input logic functions called universal gates such as NOT AND, or NAND, that outputs False when both inputs are True and True otherwise . However, in practice this leads to deep, narrow cascades of the physical elements implementing the NAND gates whose outputs feed into the next layer (Fig. 1A, middle panel). The second uses disjunctive and conjunctive normal logic forms: certain combinations of multi-input AND and OR gates, and single-input NOT gates. Disjunctive Normal Form (DNF) consists of the OR function integrating multiple AND gates, and the Conjunctive Normal Form (CNF) consists of the AND function integrating multiple OR gates. Normal forms translate to wide circuits of at most three layers (Fig. 1A, right panel, Fig. 2A).
In biology, the construction of logic gates usually involves connecting molecular components via activating and inhibitory regulatory links (Figs. 1B, 1C) with independent components being the inputs, and the targets of regulation representing the result of intermediate logic processing or the global network output. (Mutual activation or inhibition, or long-range feedbacks will complicate the description and are not a part of this review. In general their introduction could significantly alter the functions computed by a network in the steady state, or eliminate the steady state altogether and lead to oscillations). If one decides to use universal logic gates for network construction, it is critical that the output of a gate be of the same molecular class as its input. Only then the gates can be cascaded for an unlimited number of times; all this, assuming that additional mechanisms prevent signal dissipation as information travels along the cascades . On the other hand, in normal-form circuits there is no need for indefinite cascading of layers; and therefore one could design layer-specific ways to relay information so that the output of upstream gates can serve as an input for the downstream gates. It so happens that in most RNA-based approaches the inputs to the molecular gates are radically different from their outputs. For example, the inputs are often small RNA molecules or motifs, while the output is mRNA. This poses a problem for cascading approaches but leaves the door open to circuits in normal forms. Moreover, RNA regulation mechanisms naturally integrate multiple activating or inhibitory inputs as required by AND and OR gates required for normal-form circuits (Fig. 2). Therefore in what follows I will examine how to create circuits that fully or partially implement normal forms of logic functions using RNA. (Additional RNA-based regulation mechanisms, including those that may not be readily scaled, or those that can be used in synthetic biology project not directly related to biological computation, are discussed in a number of excellent reviews [7,8]). I will classify the different approaches according to the type of a host organism and consider bacteria, yeast and mammalian cells.
RNA computation in bacteria is in its infancy. Most of the relevant mechanisms have only recently been discovered, muss less tinkered with from engineering perspective. Although there has truly been an explosion of discovery in the field , a few “winners” emerge that have the potential to support complex molecular programming. Two of them, intrinsic terminators controlled by riboswitches and sRNAs, are analyzed below.
Intrinsic terminators are irreversible cis-inhibitory stem-loop RNA structures with poly-U tract found in the 5’- or 3’-UTR of an mRNA . Properly folded terminators act by causing dissociation of the RNA polymerase/mRNA complex from the transcribed gene. The folding of intrinsic terminators can be altered – induced or destroyed – by “riboswitches”, regulatory sequences that are themselves remodeled in the presence of molecular ligands . Using our terminology, a gene/s with terminators constitute a computational core, while the ligand-binding motifs constitute the sensors (in this arrangement the sensors are physically connected to the core but this makes no difference from a conceptual standpoint). In a few cases in nature adjacent terminators are controlled by different riboswitches that respond to two different ligands . It is conceivable that the same mRNA can accommodate more than two terminators, resulting in multi-input regulatory programs. In particular, the network will implement parts or a whole of a DNF circuit based on Layout 3 (Figs. 2A, ,3A).3A). I note that engineering de novo riboswitches that remodel an intrinsic terminator stem-loop is challenging, because the switches require a combination of binding activity with a particular secondary structure remodeling at a relatively remote location. Indeed, to the best of my knowledge engineered riboswitches of this type have not been reported yet.
Another mode of RNA regulation in bacteria is based on small RNA (sRNA) regulation . These are relatively short non-coding transcripts that regulate other genes by sequence-dependent binding to certain mRNA regions. The binding can repress translation by interference with the ribosome binding site (RBS), or activate it by altering conformation of a different cis negative regulatory structure. While reminiscent of RNA interference (RNAi, see below) in higher eukaryotes, inhibitory sRNAs cannot be designed against arbitrary region in the mRNA. Instead the target is limited to the close vicinity of RBS and perhaps five N-terminal codons . Besides, most sRNAs act irreversibly in a stoichiometric fashion. If multiple sRNA inhibit the same element on an mRNA and are themselves controlled by external signals (which is the case in nature), the network will confer with Layout 3 and as above implement parts of, or an entire DNF circuit [14,15]. An implementation where sRNAs target a cis-inhibitory element is less amenable to scaling and it will constitute a multi-input OR gate of a CNF circuit (Fig. 3B). First steps in engineering networks using sRNA-inspired ideas have already taken place, for example in a reported switch by Isaacs et al.  or by fusing an sRNA target to a reporter gene . However, scaling these approaches to perform complex computations has not been shown yet.
Yeast is an important model organism and they are widely used in biotechnology applications, from alcoholic beverage industry to biofuels and fine chemicals biosynthesis. Reprogramming yeast via RNA-based computational networks could therefore have immediate consequences in these and other areas. However, as opposed to prokaryotes and metazoans (see below), yeast do not seem to have simple, clean-cut RNA-based regulatory mechanisms. For example, a pathway analogous to RNAi is used in fission yeast to remodel chromatin rather than knockdown individual genes in a modular fashion , while regulation by sRNA and riboswitches is all but non-existent (or waits to be discovered). One case of a riboswitch-based regulation has been reported by Breaker lab, and it involved induction of an alternative splicing pathway in the presence of a ligand, creating a translationally inactive transcript . It is feasible that a number of similar switches could be inserted in the same gene and result in complex multi-input logic programs.
Despite the relative dearth of endogenous RNA regulation, yeast has proven as a promising platform for incorporation of exogenous RNA regulatory elements. Switches based on a combination of an antisense RNA and an aptamer moieties have been successfully shown to repress or induce a reporter gene in the presence of a ligand . If scaled, these switches could enable DNF-like circuits similar to those in Figures 3A and 3B. A different approach explored allosterically-controlled self-cleaving hammerhead ribozymes fused in the 3’-UTR of a reporter gene [21, 22]. These cis-elements were design to be either active or inactive in their baseline state, with ligand binding respectively inactivating or activating the cleavage and hence modulating the reporter gene expression. Here the logic function was encoded by appropriate placement of the ribozymes moieties in the gene’s 3’-UTR, the placement of the aptamer moieties relative to the ribozymes, and the baseline ribozyme activity. In their core the networks are functionally similar to the multi-input riboswitches in prokaryotes and their logic functions can be appropriately described by Disjunctive Normal Forms (Fig. 4A).
Mammalian systems arguably offer the most raw materials for constructing complex computation networks in the form of numerous regulatory mechanisms. Regulation on the RNA level that involves RNA effectors includes alternative splicing, RNA interference (RNAi) and antisense RNA, to name a few. Exogenous regulatory elements such as riboswitches have also been shown to function in these cells. In particular, a report showed a ligand-regulated riboswitch that regulated gene expressed when incorporated in this gene’s intron . While in that work the ribozyme itself, its placement in the mRNA transcript, and the ligand effector were designed with the help of extensive screening, it is not infeasible that rational approaches could work in mammalian context to generate multiple ON/OFF regulatory elements and hence enable complex DNF-like logic. More recently, an endogenous ribozyme-like motif has been discovered in the 3’-UTR of a mammalian gene . Interestingly, the motif is discontinuous and it forms through long-range secondary interactions. White the question of allosteric regulation of the motif is left open, it is plausible that this approach could be co-opted into the biocomputer engineer toolkit to provide a basis for scalable logic computations.
Full-length antisense RNA transcripts are being discovered at an increasing pace with the help of new high-throughput methods, and their role in gene regulation and specifically repression of gene expression is becoming evident . While the relative importance of antisense RNA in the big picture of biological regulation is still being debated, effective antisense regulation is amenable to engineering as demonstrated by a recent report on a synthetic mammalian oscillator . Multiple effective antisense RNAs could potentially be used as mammalian counterparts of bacterial sRNAs and support similar logic networks.
RNAi is perhaps the most widespread, or at least best-studied, RNA-based regulation mechanism in mammalian cells. In its most basic, RNAi is an inhibitory interaction between a short (19-27 nt) RNA strand and an mRNA that contains a subsequence of either full or partial complementarity to this strand in its coding region or 3’-UTR. Short RNA strand can be delivered exogenously as small interfering RNA (siRNA), expressed from exogenously-introduced DNA as short hairpin RNA (shRNA) or expressed from endogenous genes as micro RNA (miRNA). (Synthetic constructs can also be built to mimic the endogenous miRNA genes.) siRNA, shRNA and engineered miRNA can be made into very efficient repressors that act via mRNA degradation when they meet certain sequence requirements. Moreover, since the sequences that respond to small RNAs are short, they can be engineered in large numbers to control a single gene. This has led to the realization that RNAi can be used to assemble large-scale normal form logic circuits . Using RNAi alone can afford DNF circuitry, while using an extra transcriptional layer enables CNF-like circuits (Fig. 4B). Experiments showed that indeed up to five siRNA inputs can be integrated in a DNF circuit and up to two inputs in CNF circuits . The next challenge is transducing a variety of molecular signals into small RNA format, in other words building the sensory component of the networks. Encouraging first steps have been made in a number of research groups who showed that small molecule ligands can inhibit shRNA molecules via fused aptamers [28-30]. Expanding the repertoire of eligible inputs will greatly increase the applicability of RNAi networks.
RNA-based regulation with its effectors and targets consists of a large variety of different mechanism in different biological hosts, but many of these mechanisms possess common features that make them attractive building blocks for the implementation of complex molecular computations. The effectors are often small RNAs that act based on sequence complementarity, and they target small motifs in mRNA transcripts. Therefore de novo RNA regulators could be engineered from scratch using rational design or in vitro selection , and subsequently integrated into large networks. A particular approach to computation, namely normal-form logic circuits, seems best suited to guide the construction of these networks. Indeed, both natural and synthetic examples of such circuits have been shown in bacteria, yeast and mammalian cells. Future challenges include putting these initial results on solid engineering foundation, demonstrating complex logic computations with a large number of inputs, expanding the repertoire of input signals, and solving real-life problems. Maintaining the digital behavior of large circuits is another challenge. Overall, RNA networks offer an exciting direction towards programmable cells and cell populations, and promise to become a workhorse of large synthetic systems.
I thank I. Benenson and M. Leisner for critical reading of the manuscript. This work was supported by the Bauer Fellows Program and NIGMS grant GM068763 for National Centers for Systems Biology.