Mammalian tissue- and/or time-specific transcription is primarily regulated in a combinatorial fashion through interactions between a specific set of transcriptional regulatory factors (TRFs) and their cognate cis-regulatory elements located in the regulatory regions. In exploring the DNA regions and TRFs involved in combinatorial transcriptional regulation, we noted that individual knockdown of a set of human liver-enriched TRFs such as HNF1A, HNF3A, HNF3B, HNF3G and HNF4A resulted in perturbation of the expression of several single TRF genes, such as HNF1A, HNF3G and CEBPA genes. We thus searched the potential binding sites for these five TRFs in the highly conserved genomic regions around these three TRF genes and found several putative combinatorial regulatory regions. Chromatin immunoprecipitation analysis revealed that almost all of the putative regulatory DNA regions were bound by the TRFs as well as two coactivators (CBP and p300). The strong transcription-enhancing activity of the putative combinatorial regulatory region located downstream of the CEBPA gene was confirmed. EMSA demonstrated specific bindings of these HNFs to the target DNA region. Finally, co-transfection reporter assays with various combinations of expression vectors for these HNF genes demonstrated the transcriptional activation of the CEBPA gene in a combinatorial manner by these TRFs.
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org.
The SCL (TAL1) transcription factor is a critical regulator of haematopoiesis and its expression is tightly controlled by multiple cis-acting regulatory elements. To elaborate further the DNA elements which control its regulation, we used genomic tiling microarrays covering 256 kb of the human SCL locus to perform a concerted analysis of chromatin structure and binding of regulatory proteins in human haematopoietic cell lines. This approach allowed us to characterise further or redefine known human SCL regulatory elements and led to the identification of six novel elements with putative regulatory function both up and downstream of the SCL gene. They bind a number of haematopoietic transcription factors (GATA1, E2A LMO2, SCL, LDB1), CTCF or components of the transcriptional machinery and are associated with relevant histone modifications, accessible chromatin and low nucleosomal density. Functional characterisation shows that these novel elements are able to enhance or repress SCL promoter activity, have endogenous promoter function or enhancer-blocking insulator function. Our analysis opens up several areas for further investigation and adds new layers of complexity to our understanding of the regulation of SCL expression.
COMPEL is a database on composite regulatory elements, the basic structures of combinatorial regulation. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. The structure of the relational model of COMPEL is determined by the concept of molecular structure and regulatory role of CEs. Based on the set of a particular CE, a program has been developed for searching potential CEs in gene regulatory regions. WWW search and browse routines were developed for COMPEL release 3.0. The COMPEL database equipped with the search and browse tools is available at http://compel.bionet.nsc.ru/ . The program for prediction of potential CEs of NFAT type is available at http://compel.bionet.nsc.ru/FunSite.html and http://transfac.gbf.de/dbsearch/funsitep/s_comp.html
Genomes are organized into high-level 3-dimensional structures, and DNA elements separated by long genomic distances could functionally interact. Many transcription factors bind to regulatory DNA elements distant from gene promoters. While distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Therefore, we developed Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) for de novo detection of global chromatin interactions, and comprehensively mapped the chromatin interaction network bound by oestrogen receptor α (ERα) in the human genome. We found that most high-confidence remote ERα binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ERα functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure–function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure–function features of its regulatory regions (transcription factor binding sites, promoters, enhancers, silencers, etc.) and gene expression regulation patterns. The current release, TRRD 4.2.5, comprises the description of 760 genes, 3403 expression patterns, and >4600 regulatory elements including 3604 transcription factor binding sites, 600 promoters and 152 enhancers. This information was obtained through annotation of 2537 scientific publications. TRRD 4.2.5 is available through the WWW at http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
Expression of eukaryotic genes during development requires complex spatial-temporal regulation. This complex regulation is often achieved through the coordinated interaction of transcription regulatory elements in the promoters of the target genes. The identification and mapping of regulatory elements in genome scale is crucial to understand how gene expression is regulated. Chromatin immunoprecipitation is a standard method for assessing the occupancy of DNA binding proteins in vivo in their native chromatin context using antibodies. However, standard chromatin immunoprecipitation procedure is time consuming, labor intensive and not suited for analyzing many samples simultaneously.
Recently, we have developed a simple ChIP protocol that requires fewer steps and less hands-on time. This protocol is compatible with both 96-well plate and single tube formats, and enables higher sensitivity and more reliable performance, as compared to conventional approaches.
We have successfully used this protocol to map various clinically relevant chromatin marks and controls across several cell types to quantitatively measure chromatin states. This analysis included a variety of marks corresponding to repressed, poised and active promoters, strong and weak enhancers, putative insulators, transcribed regions, as well as large-scale repressed and inactive domains. This study demonstrates the utility of this approach for the characterization of model cellular systems in perturbation studies with chemical probes.
Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.
Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.
By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.
transcription factor; ChIP-Seq; histone modification; chromatin
Nuclear receptors are involved in a myriad of physiological processes, responding to ligands and binding to DNA at sequence-specific cis-regulatory elements. This binding occurs in the context of chromatin, a critical factor in regulating eukaryotic transcription. Recent high-throughput assays have examined nuclear receptor action genome-wide, advancing our understanding of receptor binding to regulatory elements. Here we discuss current knowledge of genome-wide response element occupancy by receptors, and the function of transcription factor networks in regulating nuclear receptor action. We highlight emerging roles for the epigenome, chromatin remodeling, histone modification, histone variants and long-range chromosomal interactions in nuclear receptor binding and receptor-dependent gene regulation. These mechanisms contribute importantly to the action of nuclear receptors in health and disease.
The regulatory DNA sequence elements that control the expression of the hepatitis B virus major surface antigen gene in the hepatoblastoma cell line HepG2 were analyzed by using transient transfection assays. In this system, the hepatitis B virus enhancer increases transcription from the surface antigen promoter approximately twofold. The promoter elements regulating the expression of this gene are within a 200-nucleotide sequence located immediately upstream of the transcription initiation sites. The promoter consists of an 85-nucleotide distal element which increases transcription from the surface antigen gene by two- to fourfold and a proximal element of approximately 115 nucleotides which is essential for transcriptional activity. The proximal and distal promoter elements were shown to bind factors present in HepG2 nuclear extracts, which is consistent with the regulatory role demonstrated for these sequences. The regulatory role of these promoter sequences in the hepatocellular carcinoma cell lines PLC/PRF/5 and Hep3B was also demonstrated, indicating similar transcriptional regulation of the surface antigen gene in each of these differentiated hepatoma cell lines.
Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts.
We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes.
Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.
The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors whose binding sites are tightly clustered and form cis-regulatory modules. In this paper, we present a web server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this data set to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web server is available at http://creme.dcode.org.
Combinatorial interactions of sequence-specific trans-acting factors with localized genomic cis-element clusters are the principal mechanism for regulating tissue-specific and developmental gene expression. With the emergence of expanding numbers of genome-wide expression analyses, the identification of the cis-elements responsible for specific patterns of transcriptional regulation represents a critical area of investigation. Computational methods for the identification of functional cis-regulatory modules are difficult to devise, principally because of the short length and degenerate nature of individual cis-element binding sites and the inherent complexity that is generated by combinatorial interactions within cis-clusters. Filtering candidate cis-element clusters based on phylogenetic conservation is helpful for an individual ortholog gene pair, but combining data from cis-conservation and coordinate expression across multiple genes is a more difficult problem. To approach this, we have extended an ortholog gene-pair database with additional analytical architecture to allow for the analysis and identification of maximal numbers of compositionally similar and phylogenetically conserved cis-regulatory element clusters from a list of user-selected genes. The system has been successfully tested with a series of functionally related and microarray profile-based co-expressed ortholog pairs of promoters and genes using known regulatory regions as training sets and co-expressed genes in the olfactory and immunohematologic systems as test sets. CisMols Analyzer is accessible via a Web interface at .
Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14.
Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8.
We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video.
Genetics; Issue 62; ChIP; ChIA-PET; Chromatin Interactions; Genomics; Next-Generation Sequencing
With the increasing number of eukaryotic genomes available, high-throughput automated tools for identification of regulatory DNA sequences are becoming increasingly feasible. Several computational approaches for the prediction of regulatory elements were recently developed. Here we combine the prediction of clusters of binding sites for transcription factors with context information taken from genome annotations. Target Explorer automates the entire process from the creation of a customized library of binding sites for known transcription factors through the prediction and annotation of putative target genes that are potentially regulated by these factors. It was specifically designed for the well-annotated Drosophila melanogaster genome, but most options can be used for sequences from other genomes as well. Target Explorer is available at http://trantor.bioc.columbia.edu/Target_Explorer/
The mammalian genome is packed tightly in the nucleus of the cell. This packing is primarily facilitated by histone proteins and results in an ordered organization of the genome in chromosome territories that can be roughly divided in heterochromatic and euchromatic domains. On top of this organization several distinct gene regulatory elements on the same chromosome or other chromosomes are thought to dynamically communicate via chromatin looping. Advances in genome-wide technologies have revealed the existence of a plethora of these regulatory elements in various eukaryotic genomes. These regulatory elements are defined by particular in vitro assays as promoters, enhancers, insulators, and boundary elements. However, recent studies indicate that the in vivo distinction between these elements is often less strict. Regulatory elements are bound by a mixture of common and lineage-specific transcription factors which mediate the long-range interactions between these elements. Inappropriate modulation of the binding of these transcription factors can alter the interactions between regulatory elements, which in turn leads to aberrant gene expression with disease as an ultimate consequence. Here we discuss the bi-modal behavior of regulatory elements that act in cis (with a focus on enhancers), how their activity is modulated by transcription factor binding and the effect this has on gene regulation.
enhancer; transcription factor; chromatin looping; transcription; cis-regulation
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.
Combinatorial regulation of transcription factors (TFs) is important in determining the complex gene expression patterns particularly in higher organisms. Deciphering regulatory rules between cooperative TFs is a critical step towards understanding the mechanisms of combinatorial regulation.
We present here a Bayesian network approach called GBNet to search for DNA motifs that may be cooperative in transcriptional regulation and the sequence constraints that these motifs may satisfy. We showed that GBNet outperformed the other available methods in the simulated and the yeast data. We also demonstrated the usefulness of GBNet on learning regulatory rules between YY1, a human TF, and its co-factors. Most of the rules learned by GBNet on YY1 and co-factors were supported by literature. In addition, a spacing constraint between YY1 and E2F was also supported by independent TF binding experiments.
We thus conclude that GBNet is a useful tool for deciphering the "grammar" of transcriptional regulation.
The expression of eukaryotic genes is regulated by cis-regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins. One of the great challenges in the gene regulation field is to characterise these elements. This involves the identification of transcription factor (TF) binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase and the subsequent analysis of regions protected from cleavage (DNase footprinting) has for many years been used to identify specific binding sites occupied by TFs at individual cis-elements with high resolution. This methodology has recently been adapted for high-throughput sequencing (DNase-seq). In this study, we describe an imbalance in the DNA strand-specific alignment information of DNase-seq data surrounding protein–DNA interactions that allows accurate prediction of occupied TF binding sites. Our study introduces a novel algorithm, Wellington, which considers the imbalance in this strand-specific information to efficiently identify DNA footprints. This algorithm significantly enhances specificity by reducing the proportion of false positives and requires significantly fewer predictions than previously reported methods to recapitulate an equal amount of ChIP-seq data. We also provide an open-source software package, pyDNase, which implements the Wellington algorithm to interface with DNase-seq data and expedite analyses.
The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.
Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models.
Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide 1,2 has successfully identified specific subtypes of regulatory elements 3. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb-Response Elements 4, chromatin states 5, transcription factor binding sites (TFBS) 6–9, PolII regulation 8, and insulator elements 10; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome based on more than 300 chromatin immuno-precipitation (ChIP) datasets for eight chromatin features, five histone deacetylases (HDACs) and thirty-eight site-specific transcription factors (TFs) at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and we validated a subset of predictions for promoters, enhancers, and insulators in vivo. We also identified nearly 2,000 genomic regions of dense TF binding associated with chromatin activity and accessibility. We discovered hundreds of new TF co-binding relationships and defined a TF network with over 800 potential regulatory relationships.
DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression1. Comparative genomics has recently been used to identify potential cis-regulatory sequences within the yeast genome on the basis of phylogenetic conservation2–6, but this information alone does not reveal if or when transcriptional regulators occupy these binding sites. We have constructed an initial map of yeast's transcriptional regulatory code by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species. The organization of regulatory elements in promoters and the environment-dependent use of these elements by regulators are discussed. We find that environment-specific use of regulatory elements predicts mechanistic models for the function of a large population of yeast's transcriptional regulators.
An important step in understanding the conditions that specify gene expression is the recognition of gene regulatory elements. Due to high diversity of different types of transcription factors and their DNA binding preferences, it is a challenging problem to establish an accurate model for recognition of functional regulatory elements in promoters of eukaryotic genes.
We present a method for precise prediction of a large group of transcription factor binding sites – steroid hormone response elements. We use a large training set of experimentally confirmed steroid hormone response elements, and adapt a sequence-based statistic method of position weight matrix, for identification of the binding sites in the query sequences. To estimate the accuracy level, a table of correspondence of sensitivity vs. specificity values is constructed from a number of independent tests. Furthermore, feed-forward neural network is used for cross-verification of the predicted response elements on genomic sequences.
The proposed method demonstrates high accuracy level, and therefore can be used for prediction of hormone response elements de novo. Experimental results support our analysis by showing significant improvement of the proposed method over previous HRE recognition methods.
Hox transcription factors specify numerous cell fates along the anterior-posterior axis by regulating the expression of downstream target genes. While expression analysis has uncovered large numbers of de-regulated genes in cells with altered Hox activity, determining which are direct versus indirect targets has remained a significant challenge. Here, we characterize the DNA binding activity of Hox transcription factor complexes on eight experimentally verified cis-regulatory elements. Hox factors regulate the activity of each element by forming protein complexes with two cofactor proteins, Extradenticle (Exd) and Homothorax (Hth). Using comparative DNA binding assays, we found that a number of flexible arrangements of Hox, Exd, and Hth binding sites mediate cooperative transcription factor complexes. Moreover, analysis of a Distal-less regulatory element (DMXR) that is repressed by abdominal Hox factors revealed that suboptimal binding sites can be combined to form high affinity transcription complexes. Lastly, we determined that the anterior Hox factors are more dependent upon Exd and Hth for complex formation than posterior Hox factors. Based upon these findings, we suggest a general set of guidelines to serve as a basis for designing bioinformatics algorithms aimed at identifying Hox regulatory elements using the wealth of recently sequenced genomes.
Hox; Extradenticle; Homothorax; transcription factor; cis-regulation