|Home | About | Journals | Submit | Contact Us | Français|
The investigation of metabolic regulation at the transcriptional level presents different challenges than those encountered in the study of other important problems like development or cancer. Levels of key components like glucose, insulin and lipids can be modulated, but rarely change in an all-or-none fashion, necessitating quantitative techniques that can be applied to multiple tissues and systems. This review examines recent advances in methods for studying transcriptional regulation, with special emphasis on metabolic science. We compare these methods for investigators trying to decide on the best approach for their particular physiological paradigm or model system.
The transcriptional components that drive the development and function of metabolically active cell types are often themselves controlled at the level of gene expression. Identifying such transcription factors (“TFs”) that are regulated in a spatial and temporal manner represents an important approach. However, discovering these factors has remained challenging to investigators of energy metabolism since relatively small changes in TF gene expression can have significant biological effects. Below we discuss classic and more modern expression profiling methods, highlighting those quantitative methods we believe are most appropriate to facilitate the discovery of differentially expressed transcriptional regulatory proteins involved in metabolic programs.
Differentially expressed transcriptional components can be identified using high-throughput expression methods that elucidate cellular mRNA profiles. Historically, this includes subtractive hybridization techniques, such as those employed in the discovery of the myogenic transcription factors MyoD and Pax7 (Davis et al., 1987; Seale et al., 2000), and microarray technology, which has been the most commonly used approach of the last decade. Microarray analyses have been successful in uncovering many novel transcriptional regulators of metabolism, including factors involved in the development and function of the endocrine pancreas and adipose tissue (Chen et al., 2005; Gunton et al., 2005; Smith et al., 2010; Soyer et al., 2010). There are, however, significant limitations to the microarray approach. Perhaps the most important is the limited sensitivity to detect signals accurately when expression levels are low; since transcriptional components can be expressed at low levels and still exert important actions, this is a serious concern. High background levels, due to non-specific binding to hybridization probes, as well as the tendency for saturation of signals, creates a relatively small dynamic range for quantitative analysis of gene expression (Okoniewski and Miller, 2006). Thus, identifying those critical regulators expressed only at low levels and/or those important factors whose expression changes only modestly can be challenging with this technology.
The introduction of high-throughput “next-generation” sequencing technologies over past few years has begun to revolutionize gene expression analyses. “RNA-Seq” is a recently developed approach that utilizes deep-sequencing technology for complete transcriptome profiling. In general, this approach involves the conversion of RNA into cDNA fragments containing adaptors that allow for sequencing. RNA-Seq is proving to be a highly sensitive and quantitative method for expression analysis (Wang et al., 2009). Importantly, this method is unbiased as its ability to quantify all isoforms and transcripts for a given mRNA, both known and unknown (Ozsolak and Milos, 2011). In the near future, this method has the potential to replace all current genome-wide expression profiling techniques.
The sequencing and annotation of whole mammalian genomes have allowed for more focused analyses of gene regulation. Direct analysis of transcriptional components offers significant advantages over whole transcriptome profiling for identifying transcriptional components on the basis of differential expression (Table 1). In particular, direct profiling eliminates the need to utilize bioinformatic tools to filter through large microarray or deep-sequencing datasets to identify potential transcriptional components. Transcriptional cascades involving members of the nuclear hormone receptor family were elucidated through quantitative PCR analysis of nuclear receptor gene expression across multiple murine tissues (Bookout et al., 2006; Gofflot et al., 2007). In 2004, Gray et al compiled a catalog of murine transcriptional components that includes all known transcription factors and all proteins that contain a motif that has been associated with transcriptional components, whether their function was known or not (Gray et al., 2004). This catalog appears to be rather comprehensive, containing both known and suspected transcriptional regulators. In situ hybridization probes generated with primers designed to amplify this complete list of predicted transcriptional regulators have been used to derive a relatively complete atlas of transcription factor gene expression in the murine brain and developing pancreas; this has resulted in the identification of novel regulators of glial and pancreatic endocrine development (Fu et al., 2009; Zhou et al., 2007). The validated primers used to amplify these genes have also been utilized for genome-wide RT-PCR analysis of transcriptional components in other developing murine tissues, leading to the discovery of novel regulators of gastrointestinal and brown adipose tissue development (Choi et al., 2006; Seale et al., 2007).
We have recently modified and improved the methodology for direct expression analysis of transcriptional components by building a high-throughput platform for quantitative real-time PCR (Gupta et al., 2010). This was achieved by redesigning and synthesizing new primers to the list of transcriptional components described by Gray and colleagues, to render them suitable for real-time PCR analysis. Importantly, we included primers that amplify transcriptional coregulators; this provides a method to identify both DNA-binding transcription factors as well as non DNA-binding proteins that assemble as complexes with TF's and modulate transcriptional activity. The conversion of the existing Gray et al platform to a quantitative PCR screen is significant since, as described above, quantitative biological traits may be driven by small differences in transcription factor gene expression. It is especially relevant for metabolic pathways, since these are often under quantitative rather than qualitative control.
This high-throughput real-time PCR platform for direct quantitative expression analysis of murine transcriptional components, termed Quanttrx, has been used to uncover novel transcriptional networks in energy metabolism. For example, it was used to explore the transcriptional basis of preadipocyte commitment, identifying the C2H2 zinc finger protein Zfp423, as a preadipocyte-enriched factor required for the regulation of PPARγ and adipocyte development (Gupta et al., 2010). Quanttrx also aided in the discovery of several novel transcriptional pathways controlling the physiological response of murine muscle to exercise. This led to the identification of a PGC-1α/HIF-2α regulatory cascade that mediates the PGC-1α-dependent fiber-type switch in skeletal muscle (Rasbach et al., 2010) as well as the identification of C/EBPβ as a repressor of cardiomyocyte hypertrophy and proliferation (Bostrom et al., 2010).
There are important limitations to all of the approaches described above (Table 1). First and foremost, the use of any form of expression profiling to identify transcriptional regulators relies on the assumption that such regulators are themselves controlled at the level of gene expression. Importantly, microarrays, in situ hybridization, and PCR are all hybridization-based techniques. Thus, non-specific binding or cross-reactivity can create both false-positive and false-negative results. Many genes encode multiple isoforms of gene products, many of which have yet to be identified. It is also likely that neither our current database of transcriptional components nor the transcriptome coverage of existing arrays is truly complete. RNA-seq can overcome these limitations; however, before other methods become obsolete, the costs of individual experiments and equipment required to execute this technology will likely have to be reduced. Importantly, the expertise to carry out the analysis of such large datasets will also need to become more widespread.
In the future, gene expression profiling techniques will continue to rapidly evolve. One interesting prospect for direct profiling of transcription factors is the conversion of Quanttrx to Nanostring. This technology allows for the direct quantitative measurement of mRNA levels without enzymatic reaction but at the sensitivity level of quantitative PCR (Geiss et al., 2008). Importantly, the Nanostring system can measure up to 800 transcripts at one time, significantly reducing the time needed for experiments.
We have described methods for identifying transcription factors and coregulators via the direct measurement of their mRNA levels. There are instances, however, in which the activity of a transcription factor changes without being reflected in the expression level of the factor itself. For example, many transcription factors exist in the cell in a `poised' state, ready to be activated by environmental changes. NFκB, SREBP2, and FoxO factors are examples of transcription factors with metabolic actions that shuttle between the nucleus and the cytoplasm depending upon nutritional and/or inflammatory signals (Goldstein et al., 2006; Van Der Heide et al., 2004). For such factors, determination of mRNA expression alone would not yield information about involvement in a particular pathway.
This problem can be circumvented by studying the DNA itself, with an eye toward finding regions that mediate key regulatory events. Once an important regulatory region has been identified, one can often infer transcriptional pathways by motif analysis. Alternatively, the sequence can serve as bait in a protein-binding assay. This is not a new idea, of course. For decades, “promoter bashing” techniques have been used to locate regulatory regions. Indeed, functional analysis of the Fabp4 (also called aP2) promoter led to the identification of the first fat-selective enhancer (Graves et al., 1992) and ultimately to the identification of PPARγ as a master regulator of fat cell development (Tontonoz et al., 1994).
There are two major problems with these approaches, as they have been traditionally applied. One is that it is necessary to predefine regions to focus on, such as promoters. These regions may be enriched for certain types of cis-elements, but other regions of potential importance are ignored a priori. Secondly, these methods are very time and labor intensive. Fortunately, recent advances now allow researchers to query the entire genome quickly as they search for important regulatory sequences.
New techniques for identifying regulatory sequences in an unbiased fashion include computational approaches, as well as experimental strategies such as DNase hypersensitivity and location analysis of modified histones or transcription factors using chromatin immunoprecipitation. These can be followed either by hybridization to an array (ChIP-chip) or more commonly, massively parallel sequencing (ChIP-Seq).
A large number of algorithms have been developed to identify potential transcription factor binding sites in the genome from sequence information. These methods either search for all examples of known motifs corresponding to specific cognate transcription factors, or they identify overrepresented motifs that may not yet be associated with a specific binding partner. The latter utilizes Hidden Markov Models, expectation-maximization, Gibbs sampling, or other approaches (Das and Dai, 2007; Elnitski et al., 2006). All of these strategies can be helpful, but in general they suffer from a lack of specificity. As one example, we recently identified approximately 40,000 binding sites for PPARγ in human adipocytes, compared to roughly 1.5 million `good' matches to the PPARγ motif in the human genome*; others have reported similar results (Lefterova et al., 2008; Mikkelsen et al., 2010; Nielsen et al., 2008). Although some of the excess binding sites are likely to be utilized by PPARγ in other cell types or physiological contexts (Lefterova et al., 2010), it is clear that the identity of a true binding site is defined by more than just the primary motif sequence. Furthermore, many motifs are shared by multiple transcription factors; again, nuclear hormone receptors provide a good example, as the PPARs, HNF-4s and RARs bind to very similar DR-1 motifs.
A long-held tenet of genome biology is that functionally important regions will be conserved, as mutational drift would presumably be deleterious to fitness. This is clearly true for exons and some other genomic elements, but not as clear for enhancers and other regulatory elements. For example, only 15–30% of modified histone marks were shared between developing murine and human adipocytes (Mikkelsen et al., 2010). This is also true for specific transcription factors: ~80% of PPARγ binding sites in mature murine 3T3-L1 adipocytes are not found at the orthologous position in human adipocytes. This is true of many other factors as well: FoxA2, C/EBPα, and HNF4α all show a high degree of species-specificity in binding (Schmidt et al., 2010; Schmidt et al., 2011; Soccio et al., 2011). Interestingly, while particular binding sites are generally not well conserved, there is excellent conservation of the genes regulated by any given TF in different species. Thus, lipid handling genes virtually all have PPARγ binding sites in human and murine adipocytes, though PPARγ binds to different places in each species (Mikkelsen et al., 2010). This does not mean that conservation is not useful in determining sites of potential TF interaction; in combination with other parameters (e.g. the presence of modified histones or open chromatin) it can be helpful (see below).
A variety of techniques have been employed to identify cis-regulatory elements. For the purposes of this review, we will focus on methods that allow genome-wide assessment of regulatory elements rather than approaches that focus on specific regions, such as reporter assays and electrophoretic mobility shift assays (EMSAs). These latter approaches are still very useful in validating and fine mapping individual elements of interest, but they are not particularly easy to perform on a genome-wide scale.
Chromatin has a complex and dynamic tertiary structure that can be more or less accessible to ancillary factors depending upon how tightly packed it is. In general, regions that are being actively transcribed or that participate in the regulation of transcription are `open', allowing for the rapid exchange of regulatory factors. The premise of the DNase hypersensitivity technique is that such regions are the first to be cleaved by small amounts of the nonspecific endonuclease DNaseI (Krebs and Peterson, 2000). By limiting the amount of DNaseI or the duration of exposure, cleavage will occur preferentially at spots where the chromatin has loosened, which include enhancers, promoters, silencers, and locus control regions. Until recently, detection of such cleavage sites was performed on a gene-by-gene basis using iterative Southern blotting, a cumbersome and difficult technique. An incremental advance involved the use of QPCR to query multiple sites at once (Eguchi et al., 2008), but true genome-wide assessment of DNase hypersensitivity has now been achieved using arrays (DNase-chip) or deep sequencing (DNase-seq) (Boyle et al., 2008). These analyses reveal that there are tens of thousands of hypersensitive sites throughout the genome that vary by cell type and physiological state, that they correlate well with other chromatin marks associated with regulatory events (e.g. modified histones; see below), and that they often confer functional activity on the appropriate reporter constructs when tested in vitro (Kharchenko et al., 2011; Stitzel et al., 2010).
There is an additional interesting and attractive feature of DNase hypersensitivity mapping that has been recently exploited. DNaseI preferentially cuts `open' or non-nucleosomal regions of the genome as described, and typically these regions are between 400 and 1000 bp in size. Within these regions, however, are the specific 6–20 bp binding sites utilized by a variety of transcription factors. The sequences of these binding sites can be inferred by their relative insensitivity to DNaseI digestion, an observation that has been used in the past in the method termed DNase footprinting (Boyle et al., 2011; Hesselberth et al., 2009). The limiting factor in finding these protected motifs within hypersensitive regions is sequencing depth; with increased numbers of reads now being obtained routinely, the opportunity to identify specific motifs makes hypersensitivity mapping an attractive option for those seeking to identify novel transcriptional pathways.
Another technique used to map regions that have been depleted of nucleosomes is Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) (Giresi et al., 2007). Formaldehyde is used to cross-link nucleosomes and other chromatin components to DNA, after which regions with few cross-links can be liberated by shearing. Sequencing these DNA fragments yields similar information as DNase hypersensitivity, although without the ability to detect specific TF binding motifs. In a recent study using FAIRE in human islets, 80,000 sites were identified including many that were islet-specific (Gaulton et al., 2010). Furthermore, certain noncoding single nucleotide polymorphisms (SNPs) associated with Type 2 diabetes were located within these regions, providing a link between regulatory cis-elements and disease risk.
Another way to identify potential regulatory elements utilizes chromatin immunoprecipitation of modified histones or other regulatory proteins. DNA is cross-linked to protein and then sheared, and a specific antibody is used to precipitate an epitope of interest. Following reversal of cross-linking, the DNA sequences immunoprecipitated with the selected protein are applied to a genomic array (ChIP-chip), or more commonly, subjected to high throughput sequencing (ChIP-Seq) (Schones and Zhao, 2008). At first glance, this might seem to contradict the notion just discussed, that nucleosome depletion is associated with regions of regulatory importance. While histones are typically displaced when important transcriptional events occur, those histones that remain in the area, as well as those on the borders of the depleted zone, are typically modified. Usually this involves methylation, phosphorylation, acetylation or some other post-translational event, but it can also involve switching to atypical histone variants, such as H3.3 or H2A.Z (Bernstein et al., 2007; Schones and Zhao, 2008).
It has recently become clear that the complement of modifications carried by histone proteins in a given region reflect its function, i.e. there exists a histone `code' through which one can infer genomic function if one knows the modifications (Ernst and Kellis, 2010; Myers et al., 2011). For our purposes, the most relevant marks relate to enhancer function. Monomethylation of histone 3 lysine 4 (H3K4me1) is a well-studied enhancer mark, as is histone 3 lysine 27 acetylation (H3K27Ac). Interestingly, H3K27Ac appears to mark `active' enhancers, while H3K4me1 marks sites that are both `active' and `poised' (Creyghton et al., 2010). Such poised sites are especially interesting because they may identify sites where transcriptional regulation may occur in settings other than the one under study, such as at different developmental stages or under different nutrient conditions.
A key issue is whether the histone marks that have been traditionally studied will yield useful information in metabolic studies. Most published data has centered on characterizing changes in chromatin state during malignancy or differentiation, processes characterized by profound changes in gene expression and phenotype. There have been relatively few studies looking at how histone marks change during less encompassing events, such as changes in hormonal status or nutritional state. However, transient hyperglycemia (16 hrs) has been shown to change H3K4 and H3K9 methylation at specific inflammatory loci in cultured endothelial cells, and similar changes were seen in vivo with only 6 hrs of hyperglycemia (Brasacchio et al., 2009; El-Osta et al., 2008). Similarly, fasting affected H3K27 methylation in the hypothalamus of chicks (Xu et al., 2011). Future studies will undoubtedly compare genome-wide histone modifications in a variety of situations of metabolic relevance. It is worth noting that histone modification requires a supply of metabolic intermediates such as acetyl-CoA and methyl donors. The fascinating connection between metabolic function and histone modification has been recently reviewed (Teperino et al., 2010).
Other `chippable' proteins that can yield important information about the regulatory state of the genome include RNA polymerase (a marker of transcription, obviously), CBP/p300 (which marks active enhancers) and CTCF (a transcription factor that marks insulator regions). Finally, of course, once one has identified any specific transcription factor or co-factor of interest, it is now standard practice to perform genome-wide ChIP-Seq to determine the complete cistrome of that factor under conditions of interest.
Although experimental and computational approaches were discussed separately, the most productive strategies usually involve combinations of wet lab and in silico techniques. As mentioned, the major limitation of pure computational motif finding is the lack of specificity. However, by first performing some sort of experimental analysis to determine which genomic regions to focus on, motif finding can be quite productive. As one example, DNase hypersensitivity analysis was employed to look at a limited number of genes, and specific loci were identified that showed differential chromatin accessibility during adipogenesis (Eguchi et al., 2008). Motif finding in these DNase hypersensitivity regions identified several motifs belonging to known adipogenic regulators in addition to motifs that suggested a role for factors not suspected to pay a role in fat cell development. This led to the discovery that interferon regulatory factor 1 (IRF1), IRF3, and IRF4 were antiadipogenic. A more detailed, genome-wide look at modified histone patterns during murine and human adipogenesis allowed for a more robust analysis that led to the identification of thousands of potential differentiation-dependent enhancers (Mikkelsen et al., 2010). Motif finding within sequences associated with preadipocyte-specific enhancers led to the prediction that promyelocytic leukemia zinc finger protein (PLZF, encoded by Zbtb16) and serum response factor (SRF) might be involved in adipogenesis; subsequent experiments showed that both factors are strongly antiadipogenic. Others have used modified histone mapping or DNase-seq to identify cis-regulatory elements that mark transient, early events in 3T3-L1 differentiation, and then followed this up with motif finding, leading to the identification of motifs for RXR, STAT5A/B, C/EBPβ, and GR (Siersbaek et al., 2011; Steger et al., 2010). Efforts continue in many labs to develop methods that integrate even larger datasets, encompassing DNase-seq, histone ChIP-seq, motif enrichment, and conservation to predict novel transcriptional pathways (Pique-Regi et al., 2011).
It is important to emphasize that all of these approaches are hypothesis-generating in nature. Identifying an overrepresented motif in a potential regulatory element does not guarantee that the cognate TF will play a role in the process under study. There are many reasons for this. For example, a single motif might be the target for a family of related TFs; finding the specific isoform of interest can be difficult. The motif might also predict a TF that is relevant at a different developmental stage, or in a different tissue altogether. Finally, many binding sites may not participate directly in gene expression events (Farnham, 2009). The meaning of these “excess” sites is currently unclear.
In many cases, discovery of a legitimate motif and its bona fide binding partner will not tell the whole story, as the activity of a TF is highly dependent upon post-translational modifications and interactions with other members of the enhanceosome, including co-activators and co-repressors that do not directly contact DNA. Although co-occuring motifs can give some clues about other TFs that might be involved in the process under study, the full extent of these modifying factors can be discerned only by careful experimentation.
Each approach to determining transcriptional pathways has advantages and disadvantages (see Table 1), and the best chance of success likely lies in the use of some combination of methods. At present, this can be a costly and time-consuming proposition, but recent advances may enable streamlining of this process. For example, the discovery of `eRNA' that marks active enhancers (Kim et al., 2010; Wang et al., 2011) may signal a shift toward using RNA-Seq to obtain TF expression data and information on key cis-regulatory elements in a single assay.
Drugs that directly target transcription factors are still relatively rare outside of the agents that act on the nuclear receptor family, but this large group of compounds gives us some indication of the therapeutic utility inherent in modulating transcription. Advances in chemical biology will hopefully demonstrate that there is no theoretical limit in drug discovery for transcription factors, which would suggest that a more complete understanding of the transcriptional pathways governing the differentiation and physiology of metabolic tissues will enable targeted therapy of diseases like obesity, Type 2 diabetes, dyslipidemia, and others. Cancer and aging are other conditions where it can be expected to see advances based in part on discoveries made in the metabolic arena.
In this review we have highlighted some of the new approaches that currently enable discovery in this area. This is a fast-moving area, of course, and it is to be expected that there will be additional refinements in technology, accompanied by advances in our understanding of how transcription factors and co-factors regulate lipid handling, insulin secretion, nutrient sensing, and other metabolic processes.
The authors thank M. Khandekar and S. Kleiner for helpful discussions and P. Cohen for critical reading of the manuscript. The authors are supported by NIH K01 DK090120-02 to RG, NIH R01 DK31405 to BMS, and NIH R01 DK078061 to EDR.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicting interests statement. The authors declare that they have no competing financial interests.
*In this instance, we defined a `good' match by looking at each position across the genome and asking whether it is more likely that the next 16 bases were generated by sampling from the DR1 position weight matrix or from a totally random selection of bases. A p-value threshold of 10e-4 was used to call motif occurrences.