In order to assess the relative contribution of the paused state to early human development, we used a recently established system of directing the differentiation of human embryonic stem cells to early mesoderm
[20]. Briefly, Activin A and BMP4 were used to direct pluripotent H7 human embryonic stem cells to a mesoderm, or primitive-streak-like population, within 48 hours of growth factor addition (). This differentiation protocol provides two distinct stages of early embryonic differentiation (pluripotent and mesoderm)
[20], separated by a short enough time to allow us to directly observe the transcriptional activation or silencing of developmentally regulated loci ().
We used the histone H3-lysine-4 trimethyl modification (H3K4me3) as a chromatin marker of transcriptional initiation
[3]. Chromatin immunoprecipitates containing H3K4me3 were used to probe promoter arrays to globally define 5′ initiation. Gene expression microarrays were used to measure 3′ mRNA transcript, a result of productive transcriptional elongation. Using these measurements, we classified genes into distinct transcriptional states: active (both initiating and elongating transcription; paused (transcriptionally initiating but not elongating); and silent (not transcriptionally initiating) (). To provide an end-point for the differentiation process, the mesoderm derivatives were further matured to beating cardiac myocytes, and full-length transcription was determined in this definitive stage.
We first focused on three developmentally regulated loci with known functions in lineage commitment to see that they behaved as expected in our differentiation system: BRACHYURY T (a transcription factor required for mesoderm differentiation)
[24], NKX-2.5 (a transcription factor critical for cardiomyocyte differentiation)
[25] and NEUROD1 (a transcription factor required for neuronal cell differentiation)
[26]. In the pluripotent state the BRACHYURY T locus contained extensive H3K4me3 but produced minimal full-length transcript, indicating the locus was paused in embryonic stem cells. After differentiation to mesoderm, BRACHYURY T retained H3K4me3 binding, and increased levels of full-length transcript were detected, indicating a switch to active transcription (). The cardiac transcription factor NKX-2.5 was associated with H3K4me3 but did not produce significant full-length transcript in either pluripotent or mesodermal populations, indicating it was transcriptionally paused in both stages (). In contrast, NKX-2.5 showed high levels of 3′ transcript in definitive cardiomyocytes at two weeks. The NEUROD1 locus did not produce significant amounts of full-length transcript in any of the populations observed (). The NEUROD1 locus was associated with H3K4me3 in embryonic stem cells and had a marked loss of H3K4me3 in mesoderm. This indicates that NEUROD1 transitions from being paused in embryonic stem cells to silent in mesodermal cells. This transcriptional silencing of NEUROD1 is consistent with lineage commitment away from ectodermal derivatives, like neurons, during the induction of mesoderm. Suppression of ectoderm is also supported by the absence of 3′ transcript for the ectodermal transcription factor, SOX1, during our differentiation protocol (data not shown).
Having verified that genes with well-established biological functions behave as expected in our system, we next computationally sorted all detectable protein-coding loci in the human genome into one of the three transcriptional states (active, paused or silent) based on their association with H3K4me3 and full-length transcript abundance (). We then determined how many loci were changing between these three states during differentiation from embryonic stem cells to mesoderm (). Interestingly, the overall distribution of the 12,867 analyzed protein-coding loci among active (47% in pluripotency, 47.7% in mesoderm), paused (48.1% in pluripotency, 47.4% in mesoderm) and silent (4.8% in pluripotency, 4.9% in mesoderm) states did not change significantly during the transition from embryonic stem cells to mesoderm (). There were many offsetting changes, however, with 1526 loci (11.9%) changing state (). Of those loci that change transcriptional state, most are either “priming” from silent to paused (30.4% of the changing loci), or “archiving” from paused to silent (31.4% of changing loci) ().
We next sought to determine if genes changing expression transition directly from an active to a silent state (or vice-versa), or if they instead pass through a paused intermediate. Strikingly, we found an overwhelming 98.0 to 98.9% of genes changing expression are transitioning into or out from a paused intermediate (). These data indicate that the paused transcriptional state is a crucial control waypoint for developmentally regulated loci, and that initiation and elongation are distinctly regulated in hESC differentiation.
To better understand the physiological significance of these transcriptional changes from embryonic stem cells to mesoderm, the Gene Ontology database
[27] was used to categorize protein-coding loci by annotated function. Compared to the set of all genes, loci involved in multicellular organismal development had a much higher fraction that were transcriptionally paused in embryonic stem cells, with a portion of these either proceeding to active transcription or being archived to a silent state during differentiation (). Functional categories described as having housekeeping functions, such as translational elongation (), or components of the ATP-generating proton pump () had a much greater fraction of active loci in pluripotent cells, and few of these loci changed state during differentiation. Loci annotated for functions in later mesodermal derivatives, like regulation of heart contraction () had a high percentage that were transcriptionally paused during pluripotency and were transcriptionally activated in mesoderm. Conversely, loci annotated for functionality in ectodermal derivatives such as neurotransmitter receptor activity () or keratinization () tended to start as paused in pluripotent cells, and then were archived to a silent state in mesoderm. This is consistent with ectodermal derivatives being strongly suppressed in our directed differentiation system. Significantly, ontologies with developmentally relevant functions were more likely to contain paused genes becoming both active and silent during differentiation.
We hypothesized that, by focusing on genes that were transcriptionally paused in embryonic stem cells and changed state during commitment to mesoderm, we could predict the later cardiomyocyte fate of the population (
Fig S2). Consistent with the future fate of the mesodermal cell population, genes annotated for ectodermal ontologies of neurotransmitter receptor activity, keratinization and brain development have high percentages that lose initiation and are thus archived away. Conversely, the loci of genes in the cardiovascular mesoderm ontologies of heart development, blood vessel development, heart looping and regulation of heart contraction have high percentages that proceed to full-length transcription. By observing how loci exit from paused transcription early in differentiation, we get a strong prediction of future cell fate commitment, one that would not be available through conventional array analysis alone.