The SCL (TAL1) transcription factor is a critical regulator of haematopoiesis and its expression is tightly controlled by multiple cis-acting regulatory elements. To elaborate further the DNA elements which control its regulation, we used genomic tiling microarrays covering 256 kb of the human SCL locus to perform a concerted analysis of chromatin structure and binding of regulatory proteins in human haematopoietic cell lines. This approach allowed us to characterise further or redefine known human SCL regulatory elements and led to the identification of six novel elements with putative regulatory function both up and downstream of the SCL gene. They bind a number of haematopoietic transcription factors (GATA1, E2A LMO2, SCL, LDB1), CTCF or components of the transcriptional machinery and are associated with relevant histone modifications, accessible chromatin and low nucleosomal density. Functional characterisation shows that these novel elements are able to enhance or repress SCL promoter activity, have endogenous promoter function or enhancer-blocking insulator function. Our analysis opens up several areas for further investigation and adds new layers of complexity to our understanding of the regulation of SCL expression.
Mammalian tissue- and/or time-specific transcription is primarily regulated in a combinatorial fashion through interactions between a specific set of transcriptional regulatory factors (TRFs) and their cognate cis-regulatory elements located in the regulatory regions. In exploring the DNA regions and TRFs involved in combinatorial transcriptional regulation, we noted that individual knockdown of a set of human liver-enriched TRFs such as HNF1A, HNF3A, HNF3B, HNF3G and HNF4A resulted in perturbation of the expression of several single TRF genes, such as HNF1A, HNF3G and CEBPA genes. We thus searched the potential binding sites for these five TRFs in the highly conserved genomic regions around these three TRF genes and found several putative combinatorial regulatory regions. Chromatin immunoprecipitation analysis revealed that almost all of the putative regulatory DNA regions were bound by the TRFs as well as two coactivators (CBP and p300). The strong transcription-enhancing activity of the putative combinatorial regulatory region located downstream of the CEBPA gene was confirmed. EMSA demonstrated specific bindings of these HNFs to the target DNA region. Finally, co-transfection reporter assays with various combinations of expression vectors for these HNF genes demonstrated the transcriptional activation of the CEBPA gene in a combinatorial manner by these TRFs.
COMPEL is a database on composite regulatory elements, the basic structures of combinatorial regulation. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. The structure of the relational model of COMPEL is determined by the concept of molecular structure and regulatory role of CEs. Based on the set of a particular CE, a program has been developed for searching potential CEs in gene regulatory regions. WWW search and browse routines were developed for COMPEL release 3.0. The COMPEL database equipped with the search and browse tools is available at http://compel.bionet.nsc.ru/ . The program for prediction of potential CEs of NFAT type is available at http://compel.bionet.nsc.ru/FunSite.html and http://transfac.gbf.de/dbsearch/funsitep/s_comp.html
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org.
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure–function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure–function features of its regulatory regions (transcription factor binding sites, promoters, enhancers, silencers, etc.) and gene expression regulation patterns. The current release, TRRD 4.2.5, comprises the description of 760 genes, 3403 expression patterns, and >4600 regulatory elements including 3604 transcription factor binding sites, 600 promoters and 152 enhancers. This information was obtained through annotation of 2537 scientific publications. TRRD 4.2.5 is available through the WWW at http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
Genomes are organized into high-level 3-dimensional structures, and DNA elements separated by long genomic distances could functionally interact. Many transcription factors bind to regulatory DNA elements distant from gene promoters. While distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Therefore, we developed Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) for de novo detection of global chromatin interactions, and comprehensively mapped the chromatin interaction network bound by oestrogen receptor α (ERα) in the human genome. We found that most high-confidence remote ERα binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ERα functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.
Nuclear receptors are involved in a myriad of physiological processes, responding to ligands and binding to DNA at sequence-specific cis-regulatory elements. This binding occurs in the context of chromatin, a critical factor in regulating eukaryotic transcription. Recent high-throughput assays have examined nuclear receptor action genome-wide, advancing our understanding of receptor binding to regulatory elements. Here we discuss current knowledge of genome-wide response element occupancy by receptors, and the function of transcription factor networks in regulating nuclear receptor action. We highlight emerging roles for the epigenome, chromatin remodeling, histone modification, histone variants and long-range chromosomal interactions in nuclear receptor binding and receptor-dependent gene regulation. These mechanisms contribute importantly to the action of nuclear receptors in health and disease.
Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts.
We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes.
Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.
Combinatorial interactions of sequence-specific trans-acting factors with localized genomic cis-element clusters are the principal mechanism for regulating tissue-specific and developmental gene expression. With the emergence of expanding numbers of genome-wide expression analyses, the identification of the cis-elements responsible for specific patterns of transcriptional regulation represents a critical area of investigation. Computational methods for the identification of functional cis-regulatory modules are difficult to devise, principally because of the short length and degenerate nature of individual cis-element binding sites and the inherent complexity that is generated by combinatorial interactions within cis-clusters. Filtering candidate cis-element clusters based on phylogenetic conservation is helpful for an individual ortholog gene pair, but combining data from cis-conservation and coordinate expression across multiple genes is a more difficult problem. To approach this, we have extended an ortholog gene-pair database with additional analytical architecture to allow for the analysis and identification of maximal numbers of compositionally similar and phylogenetically conserved cis-regulatory element clusters from a list of user-selected genes. The system has been successfully tested with a series of functionally related and microarray profile-based co-expressed ortholog pairs of promoters and genes using known regulatory regions as training sets and co-expressed genes in the olfactory and immunohematologic systems as test sets. CisMols Analyzer is accessible via a Web interface at .
Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.
Expression of eukaryotic genes during development requires complex spatial-temporal regulation. This complex regulation is often achieved through the coordinated interaction of transcription regulatory elements in the promoters of the target genes. The identification and mapping of regulatory elements in genome scale is crucial to understand how gene expression is regulated. Chromatin immunoprecipitation is a standard method for assessing the occupancy of DNA binding proteins in vivo in their native chromatin context using antibodies. However, standard chromatin immunoprecipitation procedure is time consuming, labor intensive and not suited for analyzing many samples simultaneously.
Recently, we have developed a simple ChIP protocol that requires fewer steps and less hands-on time. This protocol is compatible with both 96-well plate and single tube formats, and enables higher sensitivity and more reliable performance, as compared to conventional approaches.
We have successfully used this protocol to map various clinically relevant chromatin marks and controls across several cell types to quantitatively measure chromatin states. This analysis included a variety of marks corresponding to repressed, poised and active promoters, strong and weak enhancers, putative insulators, transcribed regions, as well as large-scale repressed and inactive domains. This study demonstrates the utility of this approach for the characterization of model cellular systems in perturbation studies with chemical probes.
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
transcription factor; ChIP-Seq; histone modification; chromatin
DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression1. Comparative genomics has recently been used to identify potential cis-regulatory sequences within the yeast genome on the basis of phylogenetic conservation2–6, but this information alone does not reveal if or when transcriptional regulators occupy these binding sites. We have constructed an initial map of yeast's transcriptional regulatory code by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species. The organization of regulatory elements in promoters and the environment-dependent use of these elements by regulators are discussed. We find that environment-specific use of regulatory elements predicts mechanistic models for the function of a large population of yeast's transcriptional regulators.
An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of functional regulatory elements in promoters of eukaryotic genes.
We present a method for precise prediction of a large group of transcription factor binding sites – steroid hormone response elements. We use a large training set of experimentally confirmed steroid hormone response elements, and adapt a sequence-based statistic method of position weight matrix, for identification of the binding sites in the query sequences. To estimate the accuracy level, a table of correspondence of sensitivity vs. specificity values is constructed from a number of independent tests. Furthermore, feed-forward neural network is used for cross-verification of the predicted response elements on genomic sequences.
The proposed method demonstrates high accuracy level, and therefore can be used for prediction of hormone response elements de novo. Experimental results support our analysis by showing significant improvement of the proposed method over previous HRE recognition methods.
A strategy combining classical motif overrepresentation in co-regulated genes with comparative footprinting is applied to identify 80 transcription factor binding sites and 139 regulatory modules in Arabidopsis thaliana.
Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation.
Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other.
These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
Understanding how complex patterns of temporal and spatial expression are regulated is central to deciphering genetic programs that drive development. Gene expression is initiated through the action of transcription factors and their cofactors converging on enhancer elements leading to a defined activity. Specific constellations of combinatorial occupancy are therefore often conceptualized as rigid binding codes that give rise to a common output of spatio-temporal expression. Here, we assessed this assumption using the regulatory input of two essential transcription factors within the Drosophila myogenic network. Mutations in either Myocyte enhancing factor 2 (Mef2) or the zinc-finger transcription factor lame duck (lmd) lead to very similar defects in myoblast fusion, yet the underlying molecular mechanism for this shared phenotype is not understood. Using a combination of ChIP-on-chip analysis and expression profiling of loss-of-function mutants, we obtained a global view of the regulatory input of both factors during development. The majority of Lmd-bound enhancers are co-bound by Mef2, representing a subset of Mef2's transcriptional input during these stages of development. Systematic analyses of the regulatory contribution of both factors demonstrate diverse regulatory roles, despite their co-occupancy of shared enhancer elements. These results indicate that Lmd is a tissue-specific modulator of Mef2 activity, acting as both a transcriptional activator and repressor, which has important implications for myogenesis. More generally, this study demonstrates considerable flexibility in the regulatory output of two factors, leading to additive, cooperative, and repressive modes of co-regulation.
While genetic studies are essential to reveal the phenotypic relationships between genes, it is often very difficult to disentangle the molecular mechanism of two genes that phenocopy each other. In this study, we used global scale and single gene analysis to investigate the relationship between two transcription factors whose mutant embryos have a similar defect in myogenesis. In Drosophila, Mef2 mutant embryos display a block in myoblast fusion, which is very similar to what is observed in mutant embryos for lmd, a zinc-finger transcription factor. To understand the underlying nature of these defects we used ChIP-on-chip analysis to obtain a global view of their co-regulated enhancers, and we used expression profiling of mutant embryos to reveal their downstream transcriptional response. The results indicate that Lmd acts as a tissue specific modulator of Mef2 activity. Using in vivo and in vitro reporter assays, we show that co-binding to the same enhancer element can lead to diverse regulatory responses. The presence of Lmd has an additive, cooperative, or repressive effect on Mef2 activity, demonstrating that it acts as a molecular switch for gene expression during muscle differentiation. More broadly, our results highlight the difficulty in translating information on combinatorial binding data into a functional regulatory response.
The Lim domain only 2 (Lmo2) gene encodes a transcriptional cofactor critical for the development of hematopoietic stem cells. Several distal regulatory elements have been identified upstream of the Lmo2 gene in the human and mouse genomes that are capable of enhancing reporter gene expression in erythroid cells and may be responsible for the high level transcription of Lmo2 in the erythroid lineage. In this study we investigate how these elements regulate transcription of Lmo2 and whether or not they function cooperatively in the endogenous context. Chromosome conformation capture (3C) experiments show that chromatin-chromatin interactions exist between upstream regulatory elements and the Lmo2 promoter in erythroid cells but that these interactions are absent from kidney where Lmo2 is transcribed at twelve fold lower levels. Specifically, long range chromatin-chromatin interactions occur between the Lmo2 proximal promoter and two broad regions, 3–31 and 66–105 kb upstream of Lmo2, which we term the proximal and distal control regions for Lmo2 (pCR and dCR respectively). Each of these regions is bound by several transcription factors suggesting that multiple regulatory elements cooperate in regulating high level transcription of Lmo2 in erythroid cells. Binding of CTCF and cohesin which support chromatin loops at other loci were also found within the dCR and at the Lmo2 proximal promoter. Intergenic transcription occurs throughout the dCR in erythroid cells but not in kidney suggesting a role for these intergenic transcripts in regulating Lmo2, similar to the broad domain of intergenic transcription observed at the human β-globin locus control region. Our data supports a model in which the dCR functions through a chromatin looping mechanism to contact and enhance Lmo2 transcription specifically in erythroid cells. Furthermore, these chromatin loops are supported by the cohesin complex recruited to both CTCF and transcription factor bound regions.
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment.
Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.
Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.
In early Drosophila embryos, the transcription factor Dorsal regulates patterns of gene expression and cell fate specification along the dorsal-ventral axis. How gene expression is produced within the broad lateral domain of the presumptive neurogenic ectoderm is not understood. To investigate transcriptional control during neurogenic ectoderm specification, we examined divergence and function of an embryonic cis-regulatory element controlling the gene short gastrulation (sog). While transcription factor binding sites are not completely conserved, we demonstrate that these sequences are bona fide regulatory elements, despite variable regulatory architecture. Mutational analysis of conserved putative transcription factor binding sites revealed that sites for Dorsal and Zelda, a ubiquitous maternal transcription factor, are required for proper sog expression. When Zelda and Dorsal sites are paired in a synthetic regulatory element, broad lateral expression results. However, synthetic regulatory elements that contain Dorsal and an additional activator, also drive expression throughout the neurogenic ectoderm. Our results suggest that interaction between Dorsal and Zelda drives expression within the presumptive neurogenic ectoderm, but they also demonstrate that regulatory architecture directing expression in this domain is flexible. We propose a model for neurogenic ectoderm specification in which gene regulation occurs at the intersection of temporal and spatial transcription factor inputs.
With the increasing number of eukaryotic genomes available, high-throughput automated tools for identification of regulatory DNA sequences are becoming increasingly feasible. Several computational approaches for the prediction of regulatory elements were recently developed. Here we combine the prediction of clusters of binding sites for transcription factors with context information taken from genome annotations. Target Explorer automates the entire process from the creation of a customized library of binding sites for known transcription factors through the prediction and annotation of putative target genes that are potentially regulated by these factors. It was specifically designed for the well-annotated Drosophila melanogaster genome, but most options can be used for sequences from other genomes as well. Target Explorer is available at http://trantor.bioc.columbia.edu/Target_Explorer/
Recent studies on transcriptional control of gene expression have pinpointed the importance of long-range interactions and three-dimensional organization of chromatins within the nucleus. Distal regulatory elements such as enhancers may activate transcription over long distances; hence, their action must be restricted within appropriate boundaries to prevent illegitimate activation of non-target genes. Insulators are DNA elements with enhancer-blocking and/or chromatin-bordering functions. In vertebrates, the versatile transcription regulator CCCTC-binding factor (CTCF) is the only identified trans-acting factor that confers enhancer-blocking insulator activity. CTCF-binding sites were found to be commonly distributed along the vertebrate genomes. We have constructed a CTCF-binding site database (CTCFBSDB) to characterize experimentally identified and computationally predicted CTCF-binding sties. Biological knowledge and data from multiple resources have been integrated into the database, including sequence data, genetic polymorphisms, function annotations, histone methylation profiles, gene expression profiles and comparative genomic information. A web-based user interface was implemented for data retrieval, analysis and visualization. In silico prediction of CTCF-binding motifs is provided to facilitate the identification of candidate insulators in the query sequences submitted by users. The database can be accessed at http://insulatordb.utmem.edu/
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
The mammalian genome is packed tightly in the nucleus of the cell. This packing is primarily facilitated by histone proteins and results in an ordered organization of the genome in chromosome territories that can be roughly divided in heterochromatic and euchromatic domains. On top of this organization several distinct gene regulatory elements on the same chromosome or other chromosomes are thought to dynamically communicate via chromatin looping. Advances in genome-wide technologies have revealed the existence of a plethora of these regulatory elements in various eukaryotic genomes. These regulatory elements are defined by particular in vitro assays as promoters, enhancers, insulators, and boundary elements. However, recent studies indicate that the in vivo distinction between these elements is often less strict. Regulatory elements are bound by a mixture of common and lineage-specific transcription factors which mediate the long-range interactions between these elements. Inappropriate modulation of the binding of these transcription factors can alter the interactions between regulatory elements, which in turn leads to aberrant gene expression with disease as an ultimate consequence. Here we discuss the bi-modal behavior of regulatory elements that act in cis (with a focus on enhancers), how their activity is modulated by transcription factor binding and the effect this has on gene regulation.
enhancer; transcription factor; chromatin looping; transcription; cis-regulation
The precise control of gene expression is essential for all biological processes. In addition to DNA-binding transcription factors, numerous transcription cofactors contribute another layer of regulation of gene transcription in eukaryotic cells. One of such transcription cofactors is the highly conserved Mediator complex, which has multiple subunits and is involved in various biological processes through directly interacting with relevant transcription factors. Although the current understanding on the biological functions of Mediator remains incomplete, research in the past decade has revealed an important role of Mediator in regulating lipid metabolism. Such function of Mediator is dependent on specific transcription factors, including peroxisome proliferator-activated receptor-gamma (PPARγ) and sterol regulatory element-binding proteins (SREBPs), which represent the master regulators of lipid metabolism. The medical significance of these findings is apparent, as aberrant lipid metabolism is intimately linked to major human diseases, such as type 2 diabetes and cardiovascular disease. Here, we briefly review the functions and molecular mechanisms of Mediator in regulation of lipid metabolism.
mediator; transcription; lipid; cofactor; metabolism