Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5379849

Formats

Article sections

Authors

Related links

Nat Genet. Author manuscript; available in PMC 2017 December 28.

Published in final edited form as:

PMCID: PMC5379849

NIHMSID: NIHMS839600

The publisher's final edited version of this article is available at Nat Genet

See the article "Hierarchy within the mammary STAT5-driven *Wap* super-enhancer" in *Nat Genet*, volume 48 on page 904.

See the article "Genetic dissection of the α-globin super-enhancer *in vivo*" in *Nat Genet*, volume 48 on page 895.

See other articles in PMC that cite the published article.

The recent back-to-back articles by Hay et al.^{1} and Shin et al.^{2} both addressed the important question of how
the constituent enhancers of a so-called “super-enhancer” combine to
activate the expression of a target gene. Super-enhancers are collections of closely
spaced genomic regions that exhibit hallmarks of enhancers, such as binding by the
Mediator complex and acetylation of histone H3 at lysine 27 (H3K27ac)^{3–5}. As these authors noted, there is continuing controversy
over whether super-enhancers genuinely represent a new paradigm in transcriptional
regulation or whether they may essentially just be clusters of conventional
enhancers that together produce a strong transcriptional response^{6}.

At the heart of this question is whether the activity of a super-enhancer is
simply given by the sum of its constituent enhancers—that is, whether it is
*additive*—or whether these components instead exhibit
some kind of synergy. Indeed, this question of additivity is of general interest,
whether or not super-enhancers are qualitatively distinct from other loci. Hay et
al.^{1} and Shin et
al.^{2} addressed this
question by carefully dissecting the highly expressed
*α*-globin and *Wap* loci, respectively, and
measuring the reductions in gene expression resulting from several individual and
combined knockouts of constituent enhancers. Both articles described highly variable
effects on gene expression from different individual knockout experiments, and both
reported that it was necessary to disable multiple enhancers to abolish, or nearly
abolish, expression. On the question of additivity, however, the two articles
reached strikingly different conclusions: Hay et al. reported that the constituent
enhancers at the *α*-globin locus acted
“independently and in an additive fashion,” whereas Shin et al.
reported that their observations of the *Wap* super-enhancer
supported a “temporal and functional hierarchy” of constituent
enhancers that is presumably non-additive.

It was notable that neither of these articles offered a precise definition
for “additivity” or “hierarchy”. Moreover, neither
article explicitly compared a null hypothesis of additivity against an alternative
hypothesis. In reviewing these two works, we became interested in the various ways
in which a super-enhancer’s activity could plausibly be modeled using a
linear function of the activity of its constituent enhancers, possibly together with
a simple nonlinear “link” function^{7}, and in whether the data would allow a null
hypothesis of such generalized linearity to be formally rejected. Here we show, by
reanalyzing these two data sets, that they are both consistent with a generalized
linear model that has a simple biophysical interpretation and does not require any
hierarchy or synergy among constituent enhancers. Thus, we argue that it still
remains to be demonstrated that a super-enhancer is greater than the sum of its
parts.

Perhaps the simplest linear model would assume each constituent enhancer
makes an additive contribution directly to the expression level of the target gene,
such as might be the case if the constituent enhancers separately contribute to
transcription. (This appears to be the model that Hay et al.^{1} had in mind.) Specifically, let us define the
“activity” of the super-enhancer by the affine (linear plus
constant) function,

$$A\left(\mathbf{x}\right)={\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2}+\cdots +{\beta}_{n}{x}_{n},$$

(1)

where $\mathbf{x}=({x}_{1},\dots ,\phantom{\rule{0.2em}{0ex}}{x}_{\mathrm{n}})\prime $ is a vector of binary variables indicating whether
each constituent enhancer *x _{i}* is present
(

$$R\left(\mathbf{x}\right)=A\left(\mathbf{x}\right)+\epsilon \phantom{\rule{3em}{0ex}}(\mathbf{\text{additivemodel}})$$

(2)

In practice, we consider alternative noise models and find that a log-normal model fits the data best for all of the models that we consider (see Supplementary Note for details).

Another plausible scenario is that the constituent enhancers combine
multiplicatively, rather than additively, in determining
*R*(**x**). This multiplicative
relationship might be expected, say, if the constituent enhancers act to promote
transcription in a sequential manner, with each step having the opportunity to
amplify or dampen the outputs of previous steps. This relationship can be captured
simply by making *R*(**x**) an exponential,
rather than an additive, function of the activity
*A*(**x**). Because the scale of
*A*(**x**) is determined by free
parameters, the base associated with the exponent is unimportant. By convention, we
use base *e* and write,

$$R\left(\mathbf{x}\right)={e}^{A(\mathbf{x})}+\epsilon \phantom{\rule{3em}{0ex}}(\mathbf{\text{linear-exponential model}})$$

(3)

Equation
3 can be considered a generalized linear model with inverse link function
*e ^{x}* (in the language of GLMs

Notably, this model can be given an alternative biophysical interpretation.
Let us assume a physical system with two broadly defined “states,” a
low-energy state associated with active transcription and a higher-energy baseline
state. (The model is abstract: in reality, these “states” may each
correspond to large ensembles of particular configurations of molecules.)
Furthermore, let us interpret *A*(**x**) as
a measure of the reduction in energy of the transcription-associated state relative
to the baseline state. Statistical mechanics tells us that the occupancy of the
low-energy state should be given by a Boltzmann distribution and be proportional to
*A*(**x**)/*Z*, where
*Z* is the partition function. If we further assume that the
system is far from its optimum, then the occupancy of the low-energy state will be
approximately proportional to *eA*(**x**). Equation 3 can therefore be interpreted
as the model that results from assuming transcription is proportional to occupancy
of the low-energy state in this suboptimal regime.

This physical interpretation, with a two-state system and a linear energy
function, leads naturally to a third generalized linear model. In this case, we
abandon the “suboptimal” approximation and consider the full
Boltzmann distribution for the system^{8,9}. In the two-state
model, we can explicitly calculate the partition function *Z* and
write, ${e}^{A\left(\mathbf{x}\right)}/Z=\phantom{\rule{0.2em}{0ex}}{e}^{A\left(\mathbf{x}\right)}/\left(1+{e}^{A\left(\mathbf{x}\right)}\right)=\phantom{\rule{0.2em}{0ex}}\phantom{\rule{0.2em}{0ex}}1/\left(1+{e}^{-A\left(\mathbf{x}\right)}\right)\phantom{\rule{0.2em}{0ex}}$, which is known as a logistic function of
*A*(**x**). Thus, we can fully describe
the fraction of time the low-energy state is occupied using a generalized linear
model with the logistic function as the inverse link function. Assuming again that
gene expression is proportional to the occupancy of the low-energy state, we write,

$$R\left(\mathbf{x}\right)=\frac{\gamma}{1+{e}^{-A\left(\mathbf{x}\right)}}+\epsilon \phantom{\rule{3em}{0ex}}(\mathbf{\text{linear-logistic model}})$$

(4)

where *γ* defines the maximum
expected level of gene expression (for a similar model applied to enhancers, see
reference [^{10}]).
Equation 4 will behave similarly
to equation 3 when
*A*(**x**) is far from its optimum but
it will capture the phenomenon of diminishing returns in transcriptional output as
the energetics of productive transcriptional elongation approach an optimum and gene
expression is limited by other features of the system (saturation).

We fitted these three models (equations 2–4) to
the raw data from Hay et al.^{1} and
Shin et al.^{2} by maximum likelihood
using a numerical algorithm for optimization. The data consisted of all replicates
for each tested configuration (wild type and knockout) of the three constituent
enhancers of the *Wap* super-enhancer and the five constituent
enhancers of the *α*-globin superenhancer (see Supplementary Note for
complete details). We compared the goodness-of-fit of the models using the Bayesian
Information Criterion (BIC), which penalizes more complex models for their
additional parameters. (Here, the linear-logistic model has one additional
parameter, *γ*.)

For the *α*-globin data set^{1}, for which the authors claimed additivity, we
found that the additive model did indeed fit the data fairly well (Figure 1A). Nevertheless, the linear-logistic model was
preferred over the additive model according to the BIC, despite its additional
parameter. For the *Wap* data set^{2}, the linear-logistic model is the best-fitting model by a
substantial margin. Thus, for both of these data sets, the linear-logistic model
explains the observed data better than any other generalized linear model (Figure 1B&C), and therefore is a better
null model than the additive model. Notably, the linear-logistic model explains both
data sets well despite several important differences between the two loci (e.g., the
*Wap* component enhancers are substantially more tightly
clustered and closer to the TSS than those for *α*-globin)
and between the knock-out strategies used (Shin et al. deleted STAT5-binding sites
whereas Hay et al. deleted larger DNase-I hypersensitive regions), which underscores
the flexibility and generality of this simple model.

(A) Model fit for the *α*-globin (blue) and
*Wap* (green) data sets, measured as the Bayesian Information
Criterion (BIC) for the additive model minus the BIC for the additive (0 by
definition), linear-exponential, and linear-logistic models. (B) **...**

But do the data of Shin et al.^{2} for the *Wap* super-enhancer truly support
something more complex than a generalized linear relationship, as the authors seem
to claim? We attempted to address this question quantitatively in our framework by
introducing interaction terms for the two pairs of constituent enhancers that were
simultaneously knocked out in that study (ΔE1a/ΔE2 &
ΔE2/ΔE3). We found that models allowing for interactions between
constituent enhancers do have slightly higher likelihoods than the simple
linear-logistic model, as they must, but, according to the BIC, these improvements
are not sufficient to justify the use of an additional parameter (Figure 1D; see Supplementary Note for details). Thus, we find not only that
the linear-logistic model fits both the *α*-globin and
*Wap* data sets reasonably well, but also that this model cannot
confidently be rejected in favor of one that allows for interactions between
constituent enhancers.

It is possible, of course, that interactions between component enhancers do
occur in reality, but the data collected so far are insufficiently abundant or
precise to reject a generalized linear null model. In addition, our models are
limited in that they address only the knockout data from these studies. In
particular, our models do not address Shin et al.’s observation that the E1
enhancer is occupied by key transcription factors first during pregnancy, suggesting
possible non-additivity in temporal establishment of the *Wap*
super-enhancer, if not in its subsequent regulatory behavior. Finally, it is worth
emphasizing that our abstract modeling approach provides no direct mechanistic
insights into transcriptional regulation at either of these loci. Nevertheless, we
have shown that the observed knockout data for both of these super-enhancers can be
explained fairly well by a very simple generalized linear model, and this
observation can at least constrain the family of possible mechanistic models. More
broadly, we argue that the transcription field would benefit from clearer
definitions of null models and more rigorous criteria for rejecting them before
concluding that complex behaviors occur.

We thank the authors of references [^{1}] and [^{2}] for providing their raw data. We thank Barak Cohen,
Charles Danko, and John Lis for comments on the manuscript, and Justin Kinney for
useful discussions about biophysical models. This research was supported in part by
US National Institutes of Health grants GM102192 and HG007070. The content is solely
the responsibility of the authors and does not necessarily represent the official
views of the US National Institutes of Health.

**Note:** The computer code developed for this analysis is available on
GitHub (https://github.com/CshlSiepelLab/super-enhancer-code).

**Author Contributions.** N.D., Y.F.H, and B.G. contributed equally to
this work. N.D., Y.F.H., and B.G. designed the experiments and analyzed the
data. N.D., Y.F.H, B.G., and A.S. wrote the manuscript.

**Competing Financial Interests.** The authors declare no competing
financial interests.

1. Hay D, et al. Genetic dissection of the *α*-globin
super-enhancer *in vivo*. Nat Genet. 2016;48:895–903. [PMC free article] [PubMed]

2. Shin HY, et al. Hierarchy within the mammary STAT5-driven Wap
super-enhancer. Nat Genet. 2016;48:904–911. [PMC free article] [PubMed]

3. Hnisz D, et al. Super-enhancers in the control of cell identity and
disease. Cell. 2013;155:934–947. [PMC free article] [PubMed]

4. Whyte WA, et al. Master transcription factors and mediator establish
super-enhancers at key cell identity genes. Cell. 2013;153:307–319. [PMC free article] [PubMed]

5. Heinz S, Romanoski CE, Benner C, Glass CK. The selection and function of cell type-specific
enhancers. Nat Rev Mol Cell Biol. 2015;16:144–154. [PMC free article] [PubMed]

6. Pott S, Lieb JD. What are super-enhancers? Nat Genet. 2015;47:8–12. [PubMed]

7. Nelder JA, Wedderburn RWM. Generalized linear models. Journal of the Royal Statistical Society Series A (General) 1972;135:370–384.

8. Lassig M. From biophysics to evolutionary genetics: statistical aspects of
gene regulation. BMC Bioinformatics. 2007;8(Suppl 6):S7. [PMC free article] [PubMed]

9. Phillips R. Napoleon is in equilibrium. Annu Rev Condens Matter Phys. 2015;6:85–111. [PMC free article] [PubMed]

10. Crocker J, Ilsley GR, Stern DL. Quantitatively predictable control of Drosophila transcriptional
enhancers in vivo with engineered transcription factors. Nat Genet. 2016;48:292–298. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |