Search tips
Search criteria

Results 1-25 (28)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  ETO family protein Mtg16 regulates the balance of dendritic cell subsets by repressing Id2 
The Journal of Experimental Medicine  2014;211(8):1623-1635.
Transcriptional cofactor of the ETO family Mtg16 promotes pDCs and restricts cDC differentiation in part by repressing Id2.
Dendritic cells (DCs) comprise two major subsets, the interferon (IFN)-producing plasmacytoid DCs (pDCs) and antigen-presenting classical DCs (cDCs). The development of pDCs is promoted by E protein transcription factor E2-2, whereas E protein antagonist Id2 is specifically absent from pDCs. Conversely, Id2 is prominently expressed in cDCs and promotes CD8+ cDC development. The mechanisms that control the balance between E and Id proteins during DC subset specification remain unknown. We found that the loss of Mtg16, a transcriptional cofactor of the ETO protein family, profoundly impaired pDC development and pDC-dependent IFN response. The residual Mtg16-deficient pDCs showed aberrant phenotype, including the expression of myeloid marker CD11b. Conversely, the development of cDC progenitors (pre-DCs) and of CD8+ cDCs was enhanced. Genome-wide expression and DNA-binding analysis identified Id2 as a direct target of Mtg16. Mtg16-deficient cDC progenitors and pDCs showed aberrant induction of Id2, and the deletion of Id2 facilitated the impaired development of Mtg16-deficient pDCs. Thus, Mtg16 promotes pDC differentiation and restricts cDC development in part by repressing Id2, revealing a cell-intrinsic mechanism that controls subset balance during DC development.
PMCID: PMC4113936  PMID: 24980046
2.  SELEX-seq, a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes 
The closely related members of the Hox family of homeodomain transcription factors have similar DNA-binding preferences as monomers, yet carry out distinct functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis we developed a new experimental and computational platform, termed SELEX-seq, to characterize DNA binding specificities of Hox-based multiprotein complexes. We found that complex formation with the same cofactor reveals latent specificities that are not observed for monomeric Hox factors. The findings from this in vitro platform are consistent with in vivo data, and the ‘latent specificity’ concept serves as a precedent for how the specificities of similar transcription factors might be distinguished in vivo. Importantly, the SELEX-seq platform is flexible and can be used to determine the relative affinities to any DNA sequence for any transcription factor or multiprotein complex.
PMCID: PMC4265583  PMID: 25151169
Hox proteins; transcription factor specificity; Extradenticle; Pbx; SELEX; next-generation sequencing; computational analysis
3.  PQM-1 complements DAF-16 as a key transcriptional regulator of DAF-2-mediated development and longevity 
Cell  2013;154(3):676-690.
Reduced insulin/IGF-1-like signaling (IIS) extends C. elegans lifespan by upregulating stress response (Class I) and downregulating other (Class II) genes through a mechanism that depends on the conserved transcription factor DAF-16/FOXO. By integrating genomewide mRNA expression responsiveness to DAF-16 with genomewide in vivo binding data for a compendium of transcription factors, we discovered that PQM-1 is the elusive transcriptional activator that directly controls development (Class II) genes by binding to the DAF-16 associated element (DAE). DAF-16 directly regulates Class I genes only, through the DAF-16 binding element (DBE). Loss of PQM-1 suppresses daf-2 longevity and further slows development. Surprisingly, the nuclear localization of PQM-1 and DAF-16 is controlled by IIS in opposite ways, and was also found to be mutually antagonistic. We observe progressive loss of nuclear PQM-1 with age, explaining declining expression of PQM-1 targets. Together, our data suggest an elegant mechanism for balancing stress response and development.
PMCID: PMC3763726  PMID: 23911329
4.  Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast 
BMC Genomics  2014;15(1):494.
Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed the existence of high-occupancy target (HOT) regions or “hotspots” that show enrichment across many assayed DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not been fully characterized.
Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation.
Together our results suggest that the co-enrichment patterns observed in yeast represent transcription factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions.
PMCID: PMC4124144  PMID: 24947676
Transcription factors; Chromatin immunoprecipitation; Saccharomyces cerevisiae
5.  Harnessing Natural Sequence Variation to Dissect Posttranscriptional Regulatory Networks in Yeast 
G3: Genes|Genomes|Genetics  2014;4(8):1539-1553.
Understanding how genomic variation influences phenotypic variation through the molecular networks of the cell is one of the central challenges of biology. Transcriptional regulation has received much attention, but equally important is the posttranscriptional regulation of mRNA stability. Here we applied a systems genetics approach to dissect posttranscriptional regulatory networks in the budding yeast Saccharomyces cerevisiae. Quantitative sequence-to-affinity models were built from high-throughput in vivo RNA binding protein (RBP) binding data for 15 yeast RBPs. Integration of these models with genome-wide mRNA expression data allowed us to estimate protein-level RBP regulatory activity for individual segregants from a genetic cross between two yeast strains. Treating these activities as a quantitative trait, we mapped trans-acting loci (activity quantitative trait loci, or aQTLs) that act via posttranscriptional regulation of transcript stability. We predicted and experimentally confirmed that a coding polymorphism at the IRA2 locus modulates Puf4p activity. Our results also indicate that Puf3p activity is modulated by distinct loci, depending on whether it acts via the 5′ or the 3′ untranslated region of its target mRNAs. Together, our results validate a general strategy for dissecting the connectivity between posttranscriptional regulators and their upstream signaling pathways.
PMCID: PMC4132183  PMID: 24938291
RNA-binding proteins; cis-regulatory analysis; inference of protein-level regulatory activity; quantitative trait locus (QTL) mapping; Pumilio/FBF homology domain proteins (Puf3p Puf4p)
7.  Nuclear pore component Nup98 is a potential tumor suppressor and regulates post-transcriptional expression of select p53 target genes 
Molecular cell  2012;48(5):799-810.
The p53 tumor suppressor utilizes multiple mechanisms to selectively regulate its myriad target genes, which in turn mediate diverse cellular processes. Here, using conventional and single molecule mRNA analyses, we demonstrate that the nucleoporin Nup98 is required for full expression of p21, a key effector of the p53 pathway, but not several other p53 target genes. Nup98 regulates p21 mRNA levels by a post-transcriptional mechanism in which a complex containing Nup98 and the p21 mRNA 3′-UTR protects p21 mRNA from degradation by the exosome. An in silico approach revealed another p53 target (14-3-3σ) to be similarly regulated by Nup98. The expression of Nup98 is reduced in murine and human hepatocellular carcinomas (HCC) and correlates with p21 expression in HCC patients. Our study elucidates a previously unrecognized function of wild-type Nup98 in regulating select p53 target genes that is distinct from the well-characterized oncogenic properties of Nup98 fusion proteins.
PMCID: PMC3525737  PMID: 23102701
8.  Evaluation of methods for modeling transcription-factor sequence specificity 
Nature biotechnology  2013;31(2):126-134.
Genomic analyses often involve scanning for potential transcription-factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein’s binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For 9 TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices learned by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10%). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.
PMCID: PMC3687085  PMID: 23354101
9.  Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins 
Cell  2011;147(6):1270-1282.
Members of transcription factor families typically have similar DNA binding specificities yet execute unique functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis, we developed an experimental and computational platform, SELEX-seq, that can be used to determine the relative affinities to any DNA sequence for any transcription factor complex. Applying this method to all eight Drosophila Hox proteins, we show that they obtain novel recognition properties when they bind DNA with the dimeric cofactor Extradenticle-Homothorax (Exd). Exd-Hox specificities group into three main classes that obey Hox gene collinearity rules and DNA structure predictions suggest that anterior and posterior Hox proteins prefer DNA sequences with distinct minor groove topographies. Together, these data suggest that emergent DNA recognition properties revealed by interactions with cofactors contribute to transcription factor specificities in vivo.
PMCID: PMC3319069  PMID: 22153072
10.  Perturbation-based analysis and modeling of combinatorial regulation in the yeast sulfur assimilation pathway 
Molecular Biology of the Cell  2012;23(15):2993-3007.
Here we establish the utility of a recently described perturbative method to study complex regulatory circuits in vivo. By combining rapid modulation of single TFs under physiological conditions with genome-wide expression analysis, we elucidate several novel regulatory features within the pathways of sulfur assimilation and beyond.
In yeast, the pathways of sulfur assimilation are combinatorially controlled by five transcriptional regulators (three DNA-binding proteins [Met31p, Met32p, and Cbf1p], an activator [Met4p], and a cofactor [Met28p]) and a ubiquitin ligase subunit (Met30p). This regulatory system exerts combinatorial control not only over sulfur assimilation and methionine biosynthesis, but also on many other physiological functions in the cell. Recently we characterized a gene induction system that, upon the addition of an inducer, results in near-immediate transcription of a gene of interest under physiological conditions. We used this to perturb levels of single transcription factors during steady-state growth in chemostats, which facilitated distinction of direct from indirect effects of individual factors dynamically through quantification of the subsequent changes in genome-wide patterns of gene expression. We were able to show directly that Cbf1p acts sometimes as a repressor and sometimes as an activator. We also found circumstances in which Met31p/Met32p function as repressors, as well as those in which they function as activators. We elucidated and numerically modeled feedback relationships among the regulators, notably feedforward regulation of Met32p (but not Met31p) by Met4p that generates dynamic differences in abundance that can account for the differences in function of these two proteins despite their identical binding sites.
PMCID: PMC3408425  PMID: 22696683
11.  Combinatorial control of diverse metabolic and physiological functions by transcriptional regulators of the yeast sulfur assimilation pathway 
Molecular Biology of the Cell  2012;23(15):3008-3024.
The sulfur assimilation pathway is used to understand how combinatorial transcription coordinates cellular processes. Global gene expression was measured in yeast lacking different combinations of transcription factors in order to determine how these factors coordinate sulfur assimilation with diverse metabolic and physiological processes.
Methionine abundance affects diverse cellular functions, including cell division, redox homeostasis, survival under starvation, and oxidative stress response. Regulation of the methionine biosynthetic pathway involves three DNA-binding proteins—Met31p, Met32p, and Cbf1p. We hypothesized that there exists a “division of labor” among these proteins that facilitates coordination of methionine biosynthesis with diverse biological processes. To explore combinatorial control in this regulatory circuit, we deleted CBF1, MET31, and MET32 individually and in combination in a strain lacking methionine synthase. We followed genome-wide gene expression as these strains were starved for methionine. Using a combination of bioinformatic methods, we found that these regulators control genes involved in biological processes downstream of sulfur assimilation; many of these processes had not previously been documented as methionine dependent. We also found that the different factors have overlapping but distinct functions. In particular, Met31p and Met32p are important in regulating methionine metabolism, whereas p functions as a “generalist” transcription factor that is not specific to methionine metabolism. In addition, Met31p and Met32p appear to regulate iron–sulfur cluster biogenesis through direct and indirect mechanisms and have distinguishable target specificities. Finally, CBF1 deletion sometimes has the opposite effect on gene expression from MET31 and MET32 deletion.
PMCID: PMC3408426  PMID: 22696679
12.  Systematic protein location mapping reveals five principal chromatin types in Drosophila cells 
Cell  2010;143(2):212-224.
Chromatin is important for the regulation of transcription and other functions, yet the diversity of chromatin composition and the distribution along chromosomes is still poorly characterized. By integrative analysis of genome-wide binding maps of 53 broadly selected chromatin components in Drosophila cells, we show that the genome is segmented into five principal chromatin types that are defined by unique, yet overlapping combinations of proteins, and form domains that can extend over >100 kb. We identify a repressive chromatin type that covers about half of the genome and lacks classic heterochromatin markers. Furthermore, transcriptionally active euchromatin consists of two types that differ in molecular organization and H3K36 methylation, and regulate distinct classes of genes. Finally, we provide evidence that the different chromatin types help to target DNA-binding factors to specific genomic regions. These results provide a global view of chromatin diversity and domain organization in a metazoan cell.
PMCID: PMC3119929  PMID: 20888037
13.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
14.  The RNA backbone plays a crucial role in mediating the intrinsic stability of the GpU dinucleotide platform and the GpUpA/GpA miniduplex 
Nucleic Acids Research  2010;38(14):4868-4876.
The side-by-side interactions of nucleobases contribute to the organization of RNA, forming the planar building blocks of helices and mediating chain folding. Dinucleotide platforms, formed by side-by-side pairing of adjacent bases, frequently anchor helices against loops. Surprisingly, GpU steps account for over half of the dinucleotide platforms observed in RNA-containing structures. Why GpU should stand out from other dinucleotides in this respect is not clear from the single well-characterized H-bond found between the guanine N2 and the uracil O4 groups. Here, we describe how an RNA-specific H-bond between O2′(G) and O2P(U) adds to the stability of the GpU platform. Moreover, we show how this pair of oxygen atoms forms an out-of-plane backbone ‘edge’ that is specifically recognized by a non-adjacent guanine in over 90% of the cases, leading to the formation of an asymmetric miniduplex consisting of ‘complementary’ GpUpA and GpA subunits. Together, these five nucleotides constitute the conserved core of the well-known loop-E motif. The backbone-mediated intrinsic stabilities of the GpU dinucleotide platform and the GpUpA/GpA miniduplex plausibly underlie observed evolutionary constraints on base identity. We propose that they may also provide a reason for the extreme conservation of GpU observed at most 5′-splice sites.
PMCID: PMC2919703  PMID: 20223772
15.  Paired Hormone Response Elements Predict Caveolin-1 as a Glucocorticoid Target Gene 
PLoS ONE  2010;5(1):e8839.
Glucocorticoids act in part via glucocortocoid receptor binding to hormone response elements (HREs), but their direct target genes in vivo are still largely unknown. We developed the criterion that genomic occurrence of paired HREs at an inter-HRE distance less than 200 bp predicts hormone responsiveness, based on synergy of multiple HREs, and HRE information from known target genes. This criterion predicts a substantial number of novel responsive genes, when applied to genomic regions 10 kb upstream of genes. Multiple-tissue in situ hybridization showed that mRNA expression of 6 out of 10 selected genes was induced in a tissue-specific manner in mice treated with a single dose of corticosterone, with the spleen being the most responsive organ. Caveolin-1 was strongly responsive in several organs, and the HRE pair in its upstream region showed increased occupancy by glucocorticoid receptor in response to corticosterone. Our approach allowed for discovery of novel tissue specific glucocorticoid target genes, which may exemplify responses underlying the permissive actions of glucocorticoids.
PMCID: PMC2809115  PMID: 20098621
16.  Inferring Condition-Specific Modulation of Transcription Factor Activity in Yeast through Regulon-Based Analysis of Genomewide Expression 
PLoS ONE  2008;3(9):e3112.
A key goal of systems biology is to understand how genomewide mRNA expression levels are controlled by transcription factors (TFs) in a condition-specific fashion. TF activity is frequently modulated at the post-translational level through ligand binding, covalent modification, or changes in sub-cellular localization. In this paper, we demonstrate how prior information about regulatory network connectivity can be exploited to infer condition-specific TF activity as a hidden variable from the genomewide mRNA expression pattern in the yeast Saccharomyces cerevisiae.
Methodology/Principal Findings
We first validate experimentally that by scoring differential expression at the level of gene sets or “regulons” comprised of the putative targets of a TF, we can accurately predict modulation of TF activity at the post-translational level. Next, we create an interactive database of inferred activities for a large number of TFs across a large number of experimental conditions in S. cerevisiae. This allows us to perform TF-centric analysis of the yeast regulatory network.
We analyze the degree to which the mRNA expression level of each TF is predictive of its regulatory activity. We also organize TFs into “co-modulation networks” based on their inferred activity profile across conditions, and find that this reveals functional and mechanistic relationships. Finally, we present evidence that the PAC and rRPE motifs antagonize TBP-dependent regulation, and function as core promoter elements governed by the transcription regulator NC2. Regulon-based monitoring of TF activity modulation is a powerful tool for analyzing regulatory network function that should be applicable in other organisms. Tools and results are available online at
PMCID: PMC2518834  PMID: 18769540
17.  Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences 
Bioinformatics  2008;24(13):i165-i171.
Motivation: The identification of transcription factor (TF) binding sites and the regulatory circuitry that they define is currently an area of intense research. Data from whole-genome chromatin immunoprecipitation (ChIP–chip), whole-genome expression microarrays, and sequencing of multiple closely related genomes have all proven useful. By and large, existing methods treat the interpretation of functional data as a classification problem (between bound and unbound DNA), and the analysis of comparative data as a problem of local alignment (to recover phylogenetic footprints of presumably functional elements). Both of these approaches suffer from the inability to model and detect low-affinity binding sites, which have recently been shown to be abundant and functional.
Results: We have developed a method that discovers functional regulatory targets of TFs by predicting the total affinity of each promoter for those factors and then comparing that affinity across orthologous promoters in closely related species. At each promoter, we consider the minimum affinity among orthologs to be the fraction of the affinity that is functional. Because we calculate the affinity of the entire promoter, our method is independent of local alignment. By comparing with functional annotation information and gene expression data in Saccharomyces cerevisiae, we have validated that this biophysically motivated use of evolutionary conservation gives rise to dramatic improvement in prediction of regulatory connectivity and factor–factor interactions compared to the use of a single genome. We propose novel biological functions for several yeast TFs, including the factors Snt2 and Stb4, for which no function has been reported. Our affinity-based approach towards comparative genomics may allow a more quantitative analysis of the principles governing the evolution of non-coding DNA.
Availability: The MatrixREDUCE software package is available from
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2718632  PMID: 18586710
18.  Global Chromatin Domain Organization of the Drosophila Genome 
PLoS Genetics  2008;4(3):e1000045.
In eukaryotes, neighboring genes can be packaged together in specific chromatin structures that ensure their coordinated expression. Examples of such multi-gene chromatin domains are well-documented, but a global view of the chromatin organization of eukaryotic genomes is lacking. To systematically identify multi-gene chromatin domains, we constructed a compendium of genome-scale binding maps for a broad panel of chromatin-associated proteins in Drosophila melanogaster. Next, we computationally analyzed this compendium for evidence of multi-gene chromatin domains using a novel statistical segmentation algorithm. We find that at least 50% of all fly genes are organized into chromatin domains, which often consist of dozens of genes. The domains are characterized by various known and novel combinations of chromatin proteins. The genes in many of the domains are coregulated during development and tend to have similar biological functions. Furthermore, during evolution fewer chromosomal rearrangements occur inside chromatin domains than outside domains. Our results indicate that a substantial portion of the Drosophila genome is packaged into functionally coherent, multi-gene chromatin domains. This has broad mechanistic implications for gene regulation and genome evolution.
Author Summary
Genes are packaged into chromatin by a variety of specialized proteins. Many different types of chromatin exist, and each may regulate gene expression in different ways. It was previously observed that neighboring genes are sometimes packaged together into a single type of chromatin, which can facilitate their coordinated regulation. However, it has been unclear whether such multi-gene chromatin domains are exceptional, or may occur more frequently. Here, we report a systematic analysis of genome-wide binding patterns of a large set of chromatin components in the fruit fly Drosophila melanogaster. Strikingly, we find that at least 50% of all genes in this organism are packaged together with several of their neighboring genes into a single type of chromatin. Each chromatin domain can include dozens of genes and can be made up of different combinations of chromatin proteins. We show that genes in each domain often have similar functions and are coordinately expressed during development. Moreover, we find that many of these multi-gene domains have been kept intact during evolution, indicating that they are important functional units. In summary, multi-gene chromatin domains are much more common than previously thought, and they are likely to play important roles in the orchestration of gene expression.
PMCID: PMC2274884  PMID: 18369463
19.  TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors 
Nucleic Acids Research  2007;36(Database issue):D125-D131.
Accurate and comprehensive information about the nucleotide sequence specificity of trans-acting factors (TFs) is essential for computational and experimental analyses of gene regulatory networks. We present the Yeast Transfactome Database, a repository of sequence specificity models and condition-specific regulatory activities for a large number of DNA- and RNA-binding proteins in Saccharomyces cerevisiae. The sequence specificities in TransfactomeDB, represented as position-specific affinity matrices (PSAMs), are directly estimated from genomewide measurements of TF-binding using our previously published MatrixREDUCE algorithm, which is based on a biophysical model. For each mRNA expression profile in the NCBI Gene Expression Omnibus, we used sequence-based regression analysis to estimate the post-translational regulatory activity of each TF for which a PSAM is available. The trans-factor activity profiles across multiple experiments available in TransfactomeDB allow the user to explore potential regulatory roles of hundreds of TFs in any of thousands of microarray experiments. Our resource is freely available at
PMCID: PMC2238954  PMID: 17947326
20.  Dissecting complex transcriptional responses using pathway-level scores based on prior information 
BMC Bioinformatics  2007;8(Suppl 6):S6.
The genomewide pattern of changes in mRNA expression measured using DNA microarrays is typically a complex superposition of the response of multiple regulatory pathways to changes in the environment of the cells. The use of prior information, either about the function of the protein encoded by each gene, or about the physical interactions between regulatory factors and the sequences controlling its expression, has emerged as a powerful approach for dissecting complex transcriptional responses.
We review two different approaches for combining the noisy expression levels of multiple individual genes into robust pathway-level differential expression scores. The first is based on a comparison between the distribution of expression levels of genes within a predefined gene set and those of all other genes in the genome. The second starts from an estimate of the strength of genomewide regulatory network connectivities based on sequence information or direct measurements of protein-DNA interactions, and uses regression analysis to estimate the activity of gene regulatory pathways. The statistical methods used are explained in detail.
By avoiding the thresholding of individual genes, pathway-level analysis of differential expression based on prior information can be considerably more sensitive to subtle changes in gene expression than gene-level analysis. The methods are technically straightforward and yield results that are easily interpretable, both biologically and statistically.
PMCID: PMC1995543  PMID: 17903287
21.  Detecting transcriptionally active regions using genomic tiling arrays 
Genome Biology  2006;7(7):R59.
A new method for designing and integrating genomic tiling array data is described and applied to Anopheles and human arrays.
We have developed a method for interpreting genomic tiling array data, implemented as the program TranscriptionDetector. Probed loci expressed above background are identified by combining replicates in a way that makes minimal assumptions about the data. We performed medium-resolution Anopheles gambiae tiling array experiments and found extensive transcription of both coding and non-coding regions. Our method also showed improved detection of transcriptional units when applied to high-density tiling array data for ten human chromosomes.
PMCID: PMC1779562  PMID: 16859498
22.  Modeling gene expression control using Omes Law 
Molecular Systems Biology  2006;2:2006.0013.
PMCID: PMC1681487  PMID: 16738558
23.  T-profiler: scoring the activity of predefined groups of genes using gene expression data 
Nucleic Acids Research  2005;33(Web Server issue):W592-W595.
One of the key challenges in the analysis of gene expression data is how to relate the expression level of individual genes to the underlying transcriptional programs and cellular state. Here we describe T-profiler, a tool that uses the t-test to score changes in the average activity of predefined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif or location on the same chromosome. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters. Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported. Users can upload their microarray data for analysis on the web at .
PMCID: PMC1160244  PMID: 15980543
24.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data 
BMC Bioinformatics  2004;5:31.
Functional genomics studies are yielding information about regulatory processes in the cell at an unprecedented scale. In the yeast S. cerevisiae, DNA microarrays have not only been used to measure the mRNA abundance for all genes under a variety of conditions but also to determine the occupancy of all promoter regions by a large number of transcription factors. The challenge is to extract useful information about the global regulatory network from these data.
We present MA-Networker, an algorithm that combines microarray data for mRNA expression and transcription factor occupancy to define the regulatory network of the cell. Multivariate regression analysis is used to infer the activity of each transcription factor, and the correlation across different conditions between this activity and the mRNA expression of a gene is interpreted as regulatory coupling strength. Applying our method to S. cerevisiae, we find that, on average, 58% of the genes whose promoter region is bound by a transcription factor are true regulatory targets. These results are validated by an analysis of enrichment for functional annotation, response for transcription factor deletion, and over-representation of cis-regulatory motifs. We are able to assign directionality to transcription factors that control divergently transcribed genes sharing the same promoter region. Finally, we identify an intrinsic limitation of transcription factor deletion experiments related to the combinatorial nature of transcriptional control, to which our approach provides an alternative.
Our reliable classification of ChIP positives into functional and non-functional TF targets based on their expression pattern across a wide range of conditions provides a starting point for identifying the unknown sequence features in non-coding DNA that directly or indirectly determine the context dependence of transcription factor action. Complete analysis results are available for browsing or download at .
PMCID: PMC407845  PMID: 15113405
25.  REDUCE: an online tool for inferring cis-regulatory elements and transcriptional module activities from microarray data 
Nucleic Acids Research  2003;31(13):3487-3490.
REDUCE is a motif-based regression method for microarray analysis. The only required inputs are (i) a single genome-wide set of absolute or relative mRNA abundances and (ii) the DNA sequence of the regulatory region associated with each gene that is probed. Currently supported organisms are yeast, worm and fly; it is an open question whether in its current incarnation our approach can be used for mouse or human. REDUCE uses unbiased statistics to identify oligonucleotide motifs whose occurrence in the regulatory region of a gene correlates with the level of mRNA expression. Regression analysis is used to infer the activity of the transcriptional module associated with each motif. REDUCE is available online at This web site provides functionality for the upload and management of microarray data. REDUCE analysis results can be viewed and downloaded, and optionally be shared with other users or made publicly accessible.
PMCID: PMC169192  PMID: 12824350

Results 1-25 (28)