|Home | About | Journals | Submit | Contact Us | Français|
Engineering artificial gene networks from modular components is one of the major goals of synthetic biology. However, the construction of gene networks with predictable functions remains hampered by a lack of suitable components and the fact that assembled networks often require extensive, iterative retrofitting to work as intended. Here we present an approach that couples libraries of diversified components (synthesized with randomized non-essential sequence) with in silico modeling to guide predictable gene network construction without the need for post-hoc tweaking. We demonstrate our approach in S. cerevisiae by synthesizing regulatory promoter libraries and using them to construct feedforward loop networks with different predicted input-output characteristics. We then expand our method to produce a synthetic gene network acting as a predictable timer, modifiable by component choice. We utilize this network to control the timing of yeast sedimentation, illustrating how the plug-and-play nature of our design can be readily applied to biotechnology.
Synthetic biology promises to revolutionize biotechnology by applying engineering principles to biological systems1. In less than a decade this field has already yielded technological applications, providing new avenues for drug manufacture2, 3, biofabrication4 and therapeutics5, 6, while also offering promises in alternative energy7. A major focus of the field is the synthesis of gene networks with predictable behavior8-10, either to endow cells with novel functions11-15 or provide study for analogous natural systems8, 16-19. Despite a booming community and notable successes, the basic task of assembling a predictable gene network from biomolecular parts remains a significant challenge and often takes many months before a desired network is realized20. To advance synthetic biology, it is essential to identify techniques that increase the predictability of gene network engineering and decrease the amount of hands-on molecular biology required to get a functional network up-and-running.
Current approaches of gene network construction typically use a small set of components plundered from different natural systems, which are then assembled and tested in vivo, often without guidance from a priori mathematical modeling13, 21. Networks rarely behave as intended the first time, usually because chosen parts have the correct function but lack the specific quantitative properties required. Even for those few synthetic biology studies which do utilize computational assistance22-25, in silico results have been mainly used for data interpretation, not for guiding design and assembly. Instead, in most projects an initial failed network is usually resolved over months of iterative retrofitting20, often by fine-tuning imperfect parts by mutation, by identifying alternative parts, or by adding on extra features to counterbalance the problems. Directed evolution has been shown to provide a short-cut through this phase21, but is complicated by the additional work needed to couple networks to selective pressures.
This time-consuming post-hoc tweaking phase stems, in part, from having to work with a limited set of imperfect components. Although this lack of reliable parts is being addressed by community efforts26, it remains an acute problem due to there being a limited number of components and the fact that the majority are inadequately characterized, e.g., many promoters are simply characterized as being ‘weak’ or ‘strong’. What is needed to resolve this problem and fast-track synthetic biology is a new approach that creates libraries of components ahead of any assembly; and then, by starting with a finer granular range of choices for each component, modeling can be used to quickly pick out the correct part needed to generate the intended network function. This approach offers the added attraction of allowing substantially different network outcomes to be chosen in advance, simply by selecting functionally-equivalent components with slightly different properties. This exploits a feature common to many types of finely-balanced networks, where small changes to one component can dramatically impact the behavior of the entire system.
Using regulated promoters as our example, we describe here how a simple synthesis technique can be used to rapidly create and parallel-characterize component libraries for synthetic biology. Working in S. cerevisiae, we demonstrate how such libraries can be teamed with predictive modeling to rationally guide the construction of gene networks that have diverse outputs. We also illustrate a plug-and-play application for one of our network designs by using it to control the timing of yeast sedimentation.
To demonstrate our library-modeling approach, we focused on regulated promoters, as they typically control gene network logic and modulate responses to stimuli. Promoter libraries have been created using DNA-shuffling/combinatorial approaches27-29 and mutation-selection techniques30-33. We modified an efficient synthesis-with-degenerate-sequence method32 to yield libraries of regulatory promoters that have a range of inputs and outputs. In this technique, promoters are constructed with runs of unspecified (‘N’) sequence separating key motifs32; the fixed motif sequences ensure promoter function, and the random bases surrounding them modulate their efficiency, presumably by subtly altering local DNA conformation34.
Our first library was designed to yield yeast promoters repressed by TetR (Tn10.B Tetracycline repressor35) and inducible with the TetR-inhibitor anhydrotetracycline (ATc). Jensen and Hammer’s Klenow-based synthesis method using inexpensive oligonucleotides31, 34 was utilized to build promoters containing the TATA box and start site from the commonly-used GAL1 promoter36. To introduce controlled regulation, we placed two tandem TetR operators (Tn10 operator tetO2) into the promoter at positions previously shown to provide tight repression29.
A schematic of our library synthesis technique is shown in Figure 1 and detailed in the Methods section. To permit screening, libraries were cloned in a vector with the strong constitutive TEF1 promoter37 directing TetR expression and the GAL1 upstream activation signal (UAS) region placed upstream of the engineered promoter to direct the removal of nucleosomes from the promoter in the presence of galactose38. We arbitrarily chose to build a library of 20 promoters (T1-T20) covering a wide range of expression and inhibition levels. We used flow cytometry detection of yeast-enhanced green fluorescent protein (yEGFP39) to quantitatively measure the promoters for minimum repressed output (Smin) and maximum unrepressed output (Smax) (Table 1). The approach was designed to yield interchangeable promoters that are identical except for Smin, Smax, and inter-motif sequences. This was confirmed by comparison to TX, a control promoter retaining the GAL1 wild-type sequence between defined motifs, as well by DNA sequencing (Supplementary Fig. 1) and dose-response curves (Supplementary Fig. 3).
To demonstrate how our approach can be applied in a gene network, we used the TetR-regulated promoter library with in silico modeling to investigate the incoherent type II negative feedforward loop network. This is a genetic motif found in S. cerevisiae and mammalian cells that consists of an output gene regulated by two repressor genes, one of which is also inhibited by the other40.
For our network, we used TetR and a eukaryotic-optimized version of the E. coli Lac inhibitor41 (LacI), as repressors (Fig. 2A). Each controlled yEGFP expression by regulating a hybrid GAL1-based promoter (POR-LT) containing both the K-12 E. coli O1 Lac Operator (OLac) and a tetO2 site, thus acting as an OR-gate promoter. TetR expression was constitutive from the TEF1 promoter, while LacI expression was driven by TetR-regulated promoters selected from our library (PLibT). By varying the concentrations of two inputs - ATc and the LacI inhibitor Isopropyl β-D-1-thiogalactopyranoside (IPTG) - the repressive effects of TetR and LacI could be tuned, modulating yEGFP expression output.
Before any network assembly, we used component properties from our experimental characterization steps to build a mathematical model to predict how network output would change when input levels (ATc/IPTG) and promoter properties were varied. The model served as a guide, predicting which components from the library could be selected to yield different network outcomes, and what dosage of induction would be most experimentally informative. In the model, the experimentally determined Smax and Smin values for the promoters (Table 1) were utilized; generic values were assumed for other parameters (see Supplementary Information for modeling details).
A simulation with PLibT = TX (control promoter: Smax 918.0, Smin 7.46) leads to an interesting non-monotonic expression landscape with an output peak at intermediate inputs (Fig. 2B). This occurs because TetR has simultaneously opposing effects on yEGFP output - inhibiting the production of yEGFP by binding to POR-LT, while also relieving LacI inhibition of yEGFP by binding to PLibT and repressing LacI production. This is consistent with previous in situ studies of naturally occurring negative feedforward loops42. Our synthetic library-modeling approach enables the investigation of this motif without the hindrance of inter-connected regulatory networks42-44.
By changing the Smin and Smax values of PLibT in the model, we can examine computationally how different promoters from our library affect this response landscape. The model predicts significant changes in output, and two examples most divergent from the TX simulation are shown in Figure 2B. Increasing the Smin value of the TetR-regulated promoter removes almost all expression in low concentrations of IPTG (PLibT = T8: Smin 30.88), whereas decreasing the Smax value shifts peak expression to occur only at higher concentrations of ATc (PLibT = T18: Smax 51.75). This demonstrates quantitatively that the same external induction (ATc or IPTG) can elicit very different responses from the motif simply due to small changes in promoter strength.
To test these in silico predictions, we assembled the three negative feedforward loop networks shown in Figure 2B using corresponding components from our libraries, and quantified their output responses to varied ATc/IPTG inputs using flow cytometry measurement of yEGFP. The experimental data (Fig. 2C) correlated very well with our computational predictions, particularly considering the many non-fitted generic parameters used in the model. These results demonstrate how small changes in promoter strength can have dramatic consequences on network responses. These findings also show that a model built from just component data and generic parameters can offer insights into a network response landscape, and such a model, when teamed with component libraries, can serve as a useful, rapid guide for producing networks with different predictable characteristics.
Having demonstrated our approach in a relatively simple network using one promoter library, we next utilized two promoter libraries in a more complex network with a richer set of dynamics. Using the mutual-repression motif of the genetic toggle switch14, we set out to produce predictable genetic ‘timers’. These timers exploit the finely-balanced nature of a mutual inhibitory network14, where changes in opposing repressor levels can disrupt bistability, and memory of induction can be lost as the system resets back to its original default state. These timers are effectively genetic toggle switches operating in a monostable regime, and their rate of resetting is directly related to relative expression levels of the two repressors - the further they are from the balanced values required for bistability14, the more rapidly memory of induction is lost.
For yeast timers, we used LacI and TetR as the two mutually-repressive genes (Fig. 3A). LacI is expressed from a TetR-regulated promoter (PLibT) selected from our component library described above, and TetR is expressed from a LacI-regulated promoter (PLibL) taken from the second component library. This second library of promoters (L1-L20 plus control LX) was synthesized and characterized as before, but with the Lac operator (OLac) in place of the tandem Tet operators (promoter data are shown in Table 1 and sequences in Supplementary Fig. 2). To follow the expression state of the timers, yEGFP was placed under the control of the LX promoter, giving an expression read-out directly correlated to TetR expression (Fig. 3A).
An initial model built with component properties from both libraries gave us qualitative insights into how timers can be set via imbalanced mutual inhibition (see Supplementary Information). However, a model built solely from component data cannot in this case capture important quantitative features of the network. The initial model with generic parameters revealed that changing the ratio of expression from the two opposing promoters affects the reset time, but it was not able to predict accurately by how much the reset time would change. The temporal dynamics of this system cannot be quantitatively predicted without first seeing a system in action to dissect some of the lumped parameters that remain fixed for all possible timers (see Supplementary Information for details). To address this, we assembled and tested a single example timer using the two control promoters (TX-LX); we then used the experimental data (Fig. 3B) from this system to calibrate a quantitatively predictive model for the other 440 possible timers afforded by our libraries.
The quantitative model gave us predictions as to how reset time could be varied by promoter selection, specifically by adjusting the ratio of relative expression from opposing promoters (using Smax-Smin to determine relative expression for each). We used the model to quantitatively predict the reset behavior of two timers (T18-LX and T4-LX) with ratios greater than that for TX-LX, and two timers (T7-L18 and TX-L14) with smaller ratios. These timers were assembled and tested in yeast. The experimental data for all four networks fell within the upper and lower bounds of the model predictions, validating our approach and ability to make quantitative predictions (Fig. 3C,D,E,F).
The model with calibrated parameters provides us with a “confidence interval” of reset times for all ratios, shown as blue lines in Figure. 3G. Closer inspection of the model reveals that reset times of different timer networks with low Smin values are approximately proportional to the reciprocal of the square root of the distance between the Smax-Smin ratio and the bifurcation ratio (the ideal Smax-Smin ratio for bistability). Mathematically, this is due to a temporal lag in resetting caused by the network passing through a ‘bottleneck’ as it leaves bistability45, 46. This direct relationship, shown in Figure. 3G, allows timers with any reset time between 50 and 150 hours to be chosen simply based on the strengths of the two promoters selected from the respective libraries.
To demonstrate how our approach can be readily applied in a biotechnology scenario, we used the plug-and-play nature of our timer networks to control the flocculation of yeast. Flocculation occurs when yeast cells express FLO1, which functions as yeast-specific adhesin that causes cells to clump together and sediment from the medium47-49. The phenotype is crucial in industrial beer, wine and bioethanol fermentation, as it allows for easy removal of yeast sediment after all sugars have been converted to ethanol48.
Since the reset times of T4-LX and T18-LX are very close to that of TX-LX, we chose TX-LX, T7-L18 and TX-L14 to test the application of genetic timers. Using these three networks, we controlled the timing of sedimentation by replacing yEGFP with the FLO1 gene (Fig. 4A). In our laboratory yeast strain, FLO1 is not expressed, but replacing its native promoter with a strong promoter re-activates flocculation, causing sedimentation to occur when a threshold of FLO1 expression is passed. The timing of sedimentation can therefore be tied to the resetting of each timer network by choosing an appropriate regulated promoter from the libraries. With the 441 possible timers, and a choice from two library sets of promoters for controlling FLO1 expression, we had the potential to produce >17,000 different flocculating networks. We selected the L7 promoter, which has relatively high Smax and very low Smin, to give a wide dynamic range. When LacI was abundant, the minimal expression from this promoter did not elicit sedimentation, allowing yeast to grow in suspension.
We experimentally determined the threshold of FLO1 expression that causes sedimentation (see Supplementary Information). Rescaling the timer data in Figure 3B,E,F to match the Smax and Smin values for the L7 promoter allowed us to plot estimates as to when this threshold would be passed for our networks (Fig. 4B). We tested these predictions by assembling the networks in yeast, and then growing the yeast cultures until sedimentation, days after the initial induction (ATc) was removed (Fig. 4C). For the TX-LX, T7-L18 and TX-L14 networks, sedimentation was seen at 60, 60 and 168 hours, respectively. These findings closely matched the predicted times assuming a 12-hour lag, presumably due to a longer phenotype maturation for flocculation compared to yEGFP fluorescence. This experiment demonstrates that we can quickly apply predictable networks built with our approach to control an industrially-relevant phenotype. Such accurate control of flocculation timing provides a wide window of opportunity to harvest fermentation product from cells and could be applied to improve biomass recycling in the biofuels industry.
This work establishes a new approach that can be used to rapidly increase the number of network components as well as decrease the time and effort required to engineer gene networks with desired functions. Our approach is compatible with plug-and-play synthetic biology and provides a platform to fast-track gene network construction. Here we focused on generating, characterizing and utilizing component libraries of promoters, but our approach is also applicable to other biomolecular components, as diversity in non-essential sequence also affects functional efficiency in proteins and RNA.
Although screening of mutated parts is not a new technique, our approach represents an advance over previous methods by coupling qualitative and quantitative modeling with library diversity to guide the construction of synthetic gene networks with predictable functions. In robust networks like our feedforward loop, models built entirely from component property sets are sufficient to guide the choice of parts required to elicit specific network phenotypes, such as high sensitivity to inputs or low maximum output. Although previous studies have shown that it is possible, in some cases, to accurately predict network behaviors based solely on component properties8, 50, we found that this conclusion cannot be generalized to more complex, finely-balanced networks, such as the timers described here. Instead we found that one must first assemble and experimentally characterize a single exemplary network of interest, in order to create a generalizable model with quantitative predictive capabilities. The experimental work needed to construct and test this one network is quickly offset by yielding accurate quantitative predictions for all other possibilities, and the benefit of this is especially significant when one considers that each additional component library incorporated increases the number of potential networks exponentially.
Our approach effectively moves component ‘tweaking’ to the front-end of gene network engineering. This arrangement is instinctively more rational than network retrofitting and made feasible by the coupling with mathematical modeling. As component libraries are produced in parallel at the same point in the process that individual parts are typically characterized for modeling, they require little extra effort in return for significant reward. Projects undertaken with this approach will help accelerate synthetic biology by yielding many more components for the community, and as library-synthesized components are designed to show variation only in intended properties, the need for extensive characterization of each component is eliminated or substantially reduced. Our work also provides an accessible method for introducing predictable, controlled variability to networks, a feature that is increasingly becoming desirable as synthetic biology enters its second decade18, 19. With advances in modern DNA synthesis technologies, the range of our approach will expand as synthesis becomes faster and cheaper, and as longer regions of biomolecules are able to be specifically varied in a systematic fashion.
S. cerevisiae strain YPH500 (α, ura3-52, lys2-801, ade2-101, trp1Δ63, his3Δ200, leu2Δ1) (Stratagene, La Jolla, CA) was used in all experiments, and all genomic integrations were specifically targeted to the redundant ura3-52 locus. Culturing, genetic transformation and verification of transformation were done as previously described29, using either the TRP1, HIS3 or LEU2 genes as selectable markers.
The TetR-regulated promoter library characterization vector (pTVGI, Fig. 1B) was adapted from the previously described yeast integrative plasmid pRS4D129, removing the GAL1/GAL10 promoter region and replacing it with the S. cerevisiae TEF1 promoter directing TetR expression and the GAL1 UAS region plus a synthesized library promoter directing yEGFP expression. A 489 bp span of arbitrary sequence from the ura3 gene was included between these promoters to buffer any cross-talk between them and to allow the vector to site-specifically integrate into the ura3-52 locus. For the LacI-regulated promoter library characterization vector (pLVGI), TetR was replaced by a synthetic codon-optimized LacI41 that had been altered to remove both an internal PstI restriction site (without changing the codon sequence or efficiency) and the hyper-strong SV40 nuclear localization signal. The control TX promoter was amplified directly from pRS4D1, whereas the LX promoter and OR-LT promoters were generated, as previously described, by standard oligonucleotide PCR mutation methods from the TX promoter and T123 pRS4D1 promoter, respectively29. All plasmids were constructed and transformed into E. coli to harvest DNA for yeast transformations, as previously described29.
Promoters were created from partially-overlapping pairs of 110mer PAGE-purified oligonucleotides (listed in Supplementary Information), which were custom synthesized by Sigma-Genosys (The Woodlands, TX). Second strand DNA synthesis by Klenow polymerase was followed by agarose gel electrophoresis purification to obtain fragments ready for insertion into characterization vectors34. Upon ligation, DH5α E. coli (NEB, Beverly MA) were transformed with the plasmid vectors and clones were selected by ampicilin resistance. 104-105 colonies were pooled directly from LB agar plates and harvested for plasmid using the QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA). The promoter plasmid library was then used to transform yeast as previously described29, scaling up by a factor of 10 and plating on 250mm × 250mm plates to yield ~3500 individual colonies. 192 of these colonies were transferred to two 96-well plates, and grown for 22 hours in 300 μl of media supplemented with 2% galactose and 250 ng/ml ATc. Cell fluorescence was measured at 450nm using a SpectraFluor Plate Reader (Tecan, Durham NC). Approximately one quarter of clones produced a detectable level of expression when TetR was inhibited by ATc, and roughly three quarters of these responded to the removal of ATc with a drop in expression, indicating controlled regulation. Expression was undetectable in glucose. Colonies selected to create our 20-member library were PCR-tested for single genomic integration, and then characterized by flow cytometry in the presence and absence of 250 ng/ml ATc. For LacI-regulated libraries, ATc was replaced with 10 mM IPTG in the screening and characterization stages.
Flow cytometry measurements were carried out as previously described8, running samples on a medium flow rate until 20,000 cells had been collected within a small forward and side scatter gate to reduce extrinsic noise. Data files were analyzed using MatLab (The MathWorks, Natick, MA), linearizing log-binned fluorescence intensity values and then calculating the median and standard deviation of the gated population. For both promoter library data (Table 1) and control ATc and IPTG induction curves, 3 ml cultures were grown for 20 hours to an optical density at 600 nm (OD600) of 1.00 at 30°C with orbital shaking before measurement. For the negative feedforward loop and genetic timer data, 300 μl of cells were grown to OD600 of 1.00 at 30°C in 96-well format. For the negative feedforward loop data, cells were grown for 22 hours before measurement. For the genetic timer data, cells were grown for 12 hours, a sample was taken for measurement, and then a fraction of the remaining cells was diluted into fresh media for the next 12 hours of growth.
To obtain flocculating strains, the L7 library promoter was inserted into pFA6a-KanMX6-pGAL1 in place of the GAL1 promoter sequences, and PCR amplification from primer pairs FL1 and FL2 was used to integrate these in place of the wild-type FLO1 promoter as described previously47. To measure flocculation over time, 1.2 ml cultures were grown for 12 hours to OD600 = 1.50 at 30°C with orbital shaking. 1 ml was removed for measurement and replaced with 1 ml of fresh culture media to continue growth. For measurement, 1 ml of culture was vortexed for 5 seconds before sitting for 10 minutes in a clear 3 ml culture tube. Cultures were photographed with a light box behind, and the image inverted and auto-contrasted using Picasa imaging software (Google, Mountain View, CA).
We thank Peter R. Jensen for advice on promoter library synthesis methods, Kevin Verstrepen for guidance and materials relating to yeast flocculation, and Henry H. Lee for valuable ideas in genetic device construction. This work was supported by the National Institutes of Health through the NIH Director’s Pioneer Award Program, grant number DP1 OD003644, the NSF FIBR program, and the Howard Hughes Medical Institute.