Synthetic biology aims to study the control of gene expression by constructing gene regulation systems from the “bottom-up” in order to better understand natural biological systems and develop useful tools for biotechnology.1
Despite many significant accomplishments, this field has largely been limited to studying artificial promoter-transgene systems with one or two transactivators, typically in microorganisms.1-6
In contrast, the natural regulation of mammalian gene expression is extraordinarily complex and is typically achieved by the combinatorial control of each gene by many regulatory factors. This level of complexity has not yet been achieved in synthetic gene regulation systems and has not been possible for the regulation of endogenous genes. However, the recent emergence of technologies for engineering transcription activator-like effectors (TALEs) targeted to almost any DNA sequence7-14
provides a unique opportunity for recapitulating this natural complexity. In the current study, the combinatorial regulation of endogenous mammalian genes in their natural chromosomal context is achieved by engineering several TALE transcription factors (TALE-TFs) to bind nearby sites upstream of the transcriptional start site (TSS) for a target gene. The composition of these combinations of independent TALE-TFs can be manipulated to control gene activation. Synergistic regulation of gene expression by multiple transcriptional activators is known to occur via simultaneous binding and stabilization of components of the pre-initiation complex.15-17
Building on this model, we activated endogenous genes with combinations of engineered transcription factors and were able to tune gene expression levels by systematically varying these combinations.
Each TALE-TF has two distinct protein domains that carry out individual molecular functions: (i) the repeat variable diresidue region binds to DNA at user-specified sequences,7, 8
and (ii) the VP64 effector domain recruits basal transcriptional machinery9, 10
(). This design permits rapid construction of synthetic transcription factors that function as autonomous units.9, 11, 12
Several TALE-TFs have recently been reported to regulate native mammalian gene expression.9, 10, 14, 18-21
However, the levels of gene activation in these studies were modest and several genes could not be induced (Supplementary Table 1
). Therefore there is clear need for improvements to gene activation strategies that capitalize on the synthetic TALE-TF technology.
Figure 1 Synergistic activation of gene expression by combinations of TALE-TFs. (a) Structure and sequence of TALE-TFs in this study. (b) Genomic positions of TALE-TF target sites in the CEACAM5, KLK3, IL1RN, and ERBB2 genes (hg19 coordinates) are indicated by (more ...)
We designed several TALE-TFs targeted to the promoter regions of the IL1RN, KLK3
(also known as prostate-specific antigen (PSA)), CEACAM5
(also known as CEA), and ERBB2
genes that are implicated in immunomodulation, inflammation, and cancer (, Supplementary Fig. 1
). TALE-TF expression plasmids were transfected into HEK293T cells and TALE-TF expression was confirmed by Western blot (Supplementary Fig. 2
). TALE-TF activity was first measured in reporter assays in which luciferase is under the control of the respective gene promoter (). Most individual TALE-TFs activated the co-transfected plasmid reporters, but only at modest levels similar to previous reports (Supplementary Table 1
).9, 10, 14, 18-21
However, the delivery of combinations of TALE-TFs led to substantial synergistic effects on gene activation. Importantly, the synergistic activation of the plasmid-based reporters was recapitulated in upregulation of the native genes in their natural chromosomal context as determined by quantitative RT-PCR, including induction of mRNA levels greater than 10,000-fold (). Detection of induced protein expression of IL-1ra, encoded by the IL1RN
gene, KLK3, CEACAM5, and erbB-2 by ELISA and Western blot validated the functional outcome of the activation of these genes (). In particular, we only reproducibly detected expression of IL-1ra, KLK3 and CEACAM5 protein in samples with combinations of TALE-TFs. We found low expression of ERBB-2 in control samples and cells transfected with single TALE-TFs, but its expression was substantially enhanced in cells transfected with all TALE-TFs ().
These results are consistent with a mechanism in which the VP64 acidic activation domain of multiple transcription factors is simultaneously interacting with and stabilizing components of the pre-initiation complex.15-17
This mechanism was confirmed by demonstrating that the VP64 domain was essential to achieving the synergistic effect, indicating that the synergy is not the result of nucleosome displacement by TALEs (Supplementary Fig. 3
). The synergistic gene activation was also conserved when using alternative acidic activation domains (Supplementary Fig. 4
). The expression of other genes nearby IL1RN
did not increase, indicating that this large synergistic activation was specific to the target gene (Supplementary Fig. 5
The TALE-TFs used in this study were not specifically designed to target DNase-hypersensitive regions (Supplementary Fig. 6
), in contrast to many other reports of synthetic transcription factors that only target open chromatin. In fact, IL1RN, KLK3,
are not expressed in HEK293T cells. Interestingly, targeting chromatin inaccessible to DNase did not prevent gene activation by the engineered TALE-TFs (). These results suggest that targeting open chromatin may not be a pre-requisite to successful TALE-TF engineering and that activation of silenced genes is possible in the absence of chromatin modifying drugs,18
particularly when using combinations of TALE-TFs. In contrast to these three genes, ERBB2
is moderately expressed in HEK293 cells and the TALE-TFs for ERBB2
regulation were targeted to open chromatin (Supplementary Fig. 6
). Combinations of these TALE-TFs also led to synergistic ERBB2
activation although the effect was not as substantial relative to the other genes as a result of higher levels of basal expression ().
To comprehensively characterize the effects of combinatorial regulation of mammalian genes by engineered TALE-TFs, all 63 permutations of six TALE-TFs targeting three different genes were co-transfected with a corresponding luciferase reporter in HEK293T cells (). Various combinations of TALE-TFs could be used to reproducibly achieve tunable levels of gene expression over a large dynamic range. Many TALE-TFs that did not activate the reporter when delivered alone contributed to synergistic activation of expression when combined with other TALE-TFs (Supplementary Table 2
). In some cases, the addition of a TALE-TF decreased gene expression. However, for all three genes there was an increase in the average level of gene expression with increased number of TALE-TFs (), and the average contribution of each additional TALE-TF decreased as the number of TALE-TFs increased ().
Figure 2 Combinatorial regulation of gene expression by TALE-TFs. (a-c) All possible 63 combinations of six TALE-TFs targeting the IL1RN, KLK3, and CEACAM5 genes were tested for activation of a luciferase reporter plasmid and ordered according to number of TALE-TFs (more ...)
In order to assign quantitative parameters to the relative contribution of each TALE-TF to the synergistic effect across the 63 data points in these experiments, a polynomial model was applied to the data set of each gene of the form
is the relative luciferase activity for the jth
combination of the six TALE-TFs. The value of xi,j
is 0 if the ith
TALE-TF is not included in the jth
combination and it is 1 if it is included. The effect coefficient wi
is a fit parameter that represents the relative contribution of the ith
TALE-TF to the regulation of its target promoter in the context of all permutations of the six TALE-TFs. Multiple regression was used to solve for values of wi
for all TALEs for each of the three target genes. These coefficients generate an excellent fit of the experimental data () and are highly significant (P
) in accurately describing the relative contribution of each TALE (Supplementary Table 3
). Importantly, the polynomial model provides a stronger description of the data than the corresponding additive and multiplicative models (Supplementary Fig. 7
). This is because the additive model does not account for synergy of TALE-TF activity () and the multiplicative model does not account for the diminishing contribution of each additional TALE-TF (). The superior fit of the polynomial model relative to the additive model can be mathematically explained by the second-order terms that are the product of effect coefficients for different TALE-TFs. This suggests the presence of some form of cooperativity, but cannot reveal the underlying mechanism. As discussed above, the simultaneous binding and stabilization of components of the pre-initiation complex by VP64 is likely to play a role,15-17
as well as other secondary effects of VP64-mediated gene activation on local epigenetics and chromatin structure.
Previous studies have suggested that TALE-TF activity may correspond to proximity to the transcription start site (TSS)18
or TALE RVD composition.13
In this study, there was no clear correlation of effect coefficient with TALE array length, composition, or distance to TSS that was consistent for all genes (Supplementary Fig. 1
). This suggests that these TALE design parameters cannot independently be used to predict highly effective TALE-TFs. It is likely that other biological and structural components of these gene promoters, including genome folding and competition with endogenous regulatory factors, play a dominant role in determining the activity of single TALE-TFs and TALE-TF combinations.
The cooperative activation of gene expression described here presents a unique opportunity to develop tunable transcription networks that operate at different levels as a function of the number and identity of TALE-TFs. This facilitates the control of gene expression levels without the need for small molecules used in conventional chemically regulated systems. Unlike prior work in synthetic biology that has focused on the regulation of transgenes by engineered promoters customized with multiple transcription factor binding sites1-6
or gene repression or silencing,22, 23
the use of TALE-TF combinations that target endogenous promoters begins to recapitulate the complexity of natural systems in a precise and controlled manner. This approach constitutes a powerful experimental system for elucidating fundamental mechanisms of natural gene regulation that are currently poorly understood. The capacity for combinatorial regulation also provides a novel framework for engineering biocomputation systems that control endogenous genes in mammalian cells, similar to recently developed genetic logic gates that control engineered transgenes.2-6
Precise control of gene expression with multiple tunable inputs may lead to greater robustness and predictability in bioengineered systems in the context of cell-machine interfaces and gene- and cell-based therapies. For example, this could include increasing the potency of therapeutic effects of engineered transcription factors.24
In summary, this approach to gene regulation extends the capacity of synthetic biology and biological programming in mammalian systems and provides a new facile technology for regulation of native mammalian genes with widespread potential applications.