Search tips
Search criteria

Results 1-25 (25)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
2.  Nuclear pore component Nup98 is a potential tumor suppressor and regulates post-transcriptional expression of select p53 target genes 
Molecular cell  2012;48(5):799-810.
The p53 tumor suppressor utilizes multiple mechanisms to selectively regulate its myriad target genes, which in turn mediate diverse cellular processes. Here, using conventional and single molecule mRNA analyses, we demonstrate that the nucleoporin Nup98 is required for full expression of p21, a key effector of the p53 pathway, but not several other p53 target genes. Nup98 regulates p21 mRNA levels by a post-transcriptional mechanism in which a complex containing Nup98 and the p21 mRNA 3′-UTR protects p21 mRNA from degradation by the exosome. An in silico approach revealed another p53 target (14-3-3σ) to be similarly regulated by Nup98. The expression of Nup98 is reduced in murine and human hepatocellular carcinomas (HCC) and correlates with p21 expression in HCC patients. Our study elucidates a previously unrecognized function of wild-type Nup98 in regulating select p53 target genes that is distinct from the well-characterized oncogenic properties of Nup98 fusion proteins.
PMCID: PMC3525737  PMID: 23102701
3.  Evaluation of methods for modeling transcription-factor sequence specificity 
Nature biotechnology  2013;31(2):126-134.
Genomic analyses often involve scanning for potential transcription-factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein’s binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For 9 TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices learned by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10%). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.
PMCID: PMC3687085  PMID: 23354101
4.  Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins 
Cell  2011;147(6):1270-1282.
Members of transcription factor families typically have similar DNA binding specificities yet execute unique functions in vivo. Transcription factors often bind DNA as multiprotein complexes, raising the possibility that complex formation might modify their DNA binding specificities. To test this hypothesis, we developed an experimental and computational platform, SELEX-seq, that can be used to determine the relative affinities to any DNA sequence for any transcription factor complex. Applying this method to all eight Drosophila Hox proteins, we show that they obtain novel recognition properties when they bind DNA with the dimeric cofactor Extradenticle-Homothorax (Exd). Exd-Hox specificities group into three main classes that obey Hox gene collinearity rules and DNA structure predictions suggest that anterior and posterior Hox proteins prefer DNA sequences with distinct minor groove topographies. Together, these data suggest that emergent DNA recognition properties revealed by interactions with cofactors contribute to transcription factor specificities in vivo.
PMCID: PMC3319069  PMID: 22153072
5.  Perturbation-based analysis and modeling of combinatorial regulation in the yeast sulfur assimilation pathway 
Molecular Biology of the Cell  2012;23(15):2993-3007.
Here we establish the utility of a recently described perturbative method to study complex regulatory circuits in vivo. By combining rapid modulation of single TFs under physiological conditions with genome-wide expression analysis, we elucidate several novel regulatory features within the pathways of sulfur assimilation and beyond.
In yeast, the pathways of sulfur assimilation are combinatorially controlled by five transcriptional regulators (three DNA-binding proteins [Met31p, Met32p, and Cbf1p], an activator [Met4p], and a cofactor [Met28p]) and a ubiquitin ligase subunit (Met30p). This regulatory system exerts combinatorial control not only over sulfur assimilation and methionine biosynthesis, but also on many other physiological functions in the cell. Recently we characterized a gene induction system that, upon the addition of an inducer, results in near-immediate transcription of a gene of interest under physiological conditions. We used this to perturb levels of single transcription factors during steady-state growth in chemostats, which facilitated distinction of direct from indirect effects of individual factors dynamically through quantification of the subsequent changes in genome-wide patterns of gene expression. We were able to show directly that Cbf1p acts sometimes as a repressor and sometimes as an activator. We also found circumstances in which Met31p/Met32p function as repressors, as well as those in which they function as activators. We elucidated and numerically modeled feedback relationships among the regulators, notably feedforward regulation of Met32p (but not Met31p) by Met4p that generates dynamic differences in abundance that can account for the differences in function of these two proteins despite their identical binding sites.
PMCID: PMC3408425  PMID: 22696683
6.  Combinatorial control of diverse metabolic and physiological functions by transcriptional regulators of the yeast sulfur assimilation pathway 
Molecular Biology of the Cell  2012;23(15):3008-3024.
The sulfur assimilation pathway is used to understand how combinatorial transcription coordinates cellular processes. Global gene expression was measured in yeast lacking different combinations of transcription factors in order to determine how these factors coordinate sulfur assimilation with diverse metabolic and physiological processes.
Methionine abundance affects diverse cellular functions, including cell division, redox homeostasis, survival under starvation, and oxidative stress response. Regulation of the methionine biosynthetic pathway involves three DNA-binding proteins—Met31p, Met32p, and Cbf1p. We hypothesized that there exists a “division of labor” among these proteins that facilitates coordination of methionine biosynthesis with diverse biological processes. To explore combinatorial control in this regulatory circuit, we deleted CBF1, MET31, and MET32 individually and in combination in a strain lacking methionine synthase. We followed genome-wide gene expression as these strains were starved for methionine. Using a combination of bioinformatic methods, we found that these regulators control genes involved in biological processes downstream of sulfur assimilation; many of these processes had not previously been documented as methionine dependent. We also found that the different factors have overlapping but distinct functions. In particular, Met31p and Met32p are important in regulating methionine metabolism, whereas p functions as a “generalist” transcription factor that is not specific to methionine metabolism. In addition, Met31p and Met32p appear to regulate iron–sulfur cluster biogenesis through direct and indirect mechanisms and have distinguishable target specificities. Finally, CBF1 deletion sometimes has the opposite effect on gene expression from MET31 and MET32 deletion.
PMCID: PMC3408426  PMID: 22696679
7.  Systematic protein location mapping reveals five principal chromatin types in Drosophila cells 
Cell  2010;143(2):212-224.
Chromatin is important for the regulation of transcription and other functions, yet the diversity of chromatin composition and the distribution along chromosomes is still poorly characterized. By integrative analysis of genome-wide binding maps of 53 broadly selected chromatin components in Drosophila cells, we show that the genome is segmented into five principal chromatin types that are defined by unique, yet overlapping combinations of proteins, and form domains that can extend over >100 kb. We identify a repressive chromatin type that covers about half of the genome and lacks classic heterochromatin markers. Furthermore, transcriptionally active euchromatin consists of two types that differ in molecular organization and H3K36 methylation, and regulate distinct classes of genes. Finally, we provide evidence that the different chromatin types help to target DNA-binding factors to specific genomic regions. These results provide a global view of chromatin diversity and domain organization in a metazoan cell.
PMCID: PMC3119929  PMID: 20888037
8.  De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference 
PLoS Computational Biology  2011;7(2):e1001070.
Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at
Author Summary
Binding of transcription factors to promoters of genes, and subsequent enhancement or repression of transcription, is one of the main steps of transcriptional gene regulation. Direct or indirect wet-lab experiments allow the identification of approximate regions potentially bound or regulated by a transcription factor. Subsequently, de-novo motif discovery tools can be used for detecting the precise positions of binding sites. Many traditional tools focus on motifs over-represented in the target regions, which often turn out to be similarly over-represented in the entire genome. In contrast, several recent tools focus on differentially abundant motifs in target regions compared to a control set. As binding sites are often located at some preferred distance to the transcription start site, it is favorable to include this information into de-novo motif discovery. Here, we present Dispom a novel approach for learning differentially abundant motifs and their positional preferences simultaneously, which predicts binding sites with increased accuracy compared to many popular de-novo motif discovery tools. When applying Dispom to promoters of auxin-responsive genes of Arabidopsis thaliana, we find a binding motif slightly different from the canonical auxin-response element, which exhibits a strong positional preference and which is considerably more specific to auxin-responsive genes.
PMCID: PMC3037384  PMID: 21347314
9.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
10.  The RNA backbone plays a crucial role in mediating the intrinsic stability of the GpU dinucleotide platform and the GpUpA/GpA miniduplex 
Nucleic Acids Research  2010;38(14):4868-4876.
The side-by-side interactions of nucleobases contribute to the organization of RNA, forming the planar building blocks of helices and mediating chain folding. Dinucleotide platforms, formed by side-by-side pairing of adjacent bases, frequently anchor helices against loops. Surprisingly, GpU steps account for over half of the dinucleotide platforms observed in RNA-containing structures. Why GpU should stand out from other dinucleotides in this respect is not clear from the single well-characterized H-bond found between the guanine N2 and the uracil O4 groups. Here, we describe how an RNA-specific H-bond between O2′(G) and O2P(U) adds to the stability of the GpU platform. Moreover, we show how this pair of oxygen atoms forms an out-of-plane backbone ‘edge’ that is specifically recognized by a non-adjacent guanine in over 90% of the cases, leading to the formation of an asymmetric miniduplex consisting of ‘complementary’ GpUpA and GpA subunits. Together, these five nucleotides constitute the conserved core of the well-known loop-E motif. The backbone-mediated intrinsic stabilities of the GpU dinucleotide platform and the GpUpA/GpA miniduplex plausibly underlie observed evolutionary constraints on base identity. We propose that they may also provide a reason for the extreme conservation of GpU observed at most 5′-splice sites.
PMCID: PMC2919703  PMID: 20223772
11.  Paired Hormone Response Elements Predict Caveolin-1 as a Glucocorticoid Target Gene 
PLoS ONE  2010;5(1):e8839.
Glucocorticoids act in part via glucocortocoid receptor binding to hormone response elements (HREs), but their direct target genes in vivo are still largely unknown. We developed the criterion that genomic occurrence of paired HREs at an inter-HRE distance less than 200 bp predicts hormone responsiveness, based on synergy of multiple HREs, and HRE information from known target genes. This criterion predicts a substantial number of novel responsive genes, when applied to genomic regions 10 kb upstream of genes. Multiple-tissue in situ hybridization showed that mRNA expression of 6 out of 10 selected genes was induced in a tissue-specific manner in mice treated with a single dose of corticosterone, with the spleen being the most responsive organ. Caveolin-1 was strongly responsive in several organs, and the HRE pair in its upstream region showed increased occupancy by glucocorticoid receptor in response to corticosterone. Our approach allowed for discovery of novel tissue specific glucocorticoid target genes, which may exemplify responses underlying the permissive actions of glucocorticoids.
PMCID: PMC2809115  PMID: 20098621
12.  Inferring Condition-Specific Modulation of Transcription Factor Activity in Yeast through Regulon-Based Analysis of Genomewide Expression 
PLoS ONE  2008;3(9):e3112.
A key goal of systems biology is to understand how genomewide mRNA expression levels are controlled by transcription factors (TFs) in a condition-specific fashion. TF activity is frequently modulated at the post-translational level through ligand binding, covalent modification, or changes in sub-cellular localization. In this paper, we demonstrate how prior information about regulatory network connectivity can be exploited to infer condition-specific TF activity as a hidden variable from the genomewide mRNA expression pattern in the yeast Saccharomyces cerevisiae.
Methodology/Principal Findings
We first validate experimentally that by scoring differential expression at the level of gene sets or “regulons” comprised of the putative targets of a TF, we can accurately predict modulation of TF activity at the post-translational level. Next, we create an interactive database of inferred activities for a large number of TFs across a large number of experimental conditions in S. cerevisiae. This allows us to perform TF-centric analysis of the yeast regulatory network.
We analyze the degree to which the mRNA expression level of each TF is predictive of its regulatory activity. We also organize TFs into “co-modulation networks” based on their inferred activity profile across conditions, and find that this reveals functional and mechanistic relationships. Finally, we present evidence that the PAC and rRPE motifs antagonize TBP-dependent regulation, and function as core promoter elements governed by the transcription regulator NC2. Regulon-based monitoring of TF activity modulation is a powerful tool for analyzing regulatory network function that should be applicable in other organisms. Tools and results are available online at
PMCID: PMC2518834  PMID: 18769540
13.  Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences 
Bioinformatics  2008;24(13):i165-i171.
Motivation: The identification of transcription factor (TF) binding sites and the regulatory circuitry that they define is currently an area of intense research. Data from whole-genome chromatin immunoprecipitation (ChIP–chip), whole-genome expression microarrays, and sequencing of multiple closely related genomes have all proven useful. By and large, existing methods treat the interpretation of functional data as a classification problem (between bound and unbound DNA), and the analysis of comparative data as a problem of local alignment (to recover phylogenetic footprints of presumably functional elements). Both of these approaches suffer from the inability to model and detect low-affinity binding sites, which have recently been shown to be abundant and functional.
Results: We have developed a method that discovers functional regulatory targets of TFs by predicting the total affinity of each promoter for those factors and then comparing that affinity across orthologous promoters in closely related species. At each promoter, we consider the minimum affinity among orthologs to be the fraction of the affinity that is functional. Because we calculate the affinity of the entire promoter, our method is independent of local alignment. By comparing with functional annotation information and gene expression data in Saccharomyces cerevisiae, we have validated that this biophysically motivated use of evolutionary conservation gives rise to dramatic improvement in prediction of regulatory connectivity and factor–factor interactions compared to the use of a single genome. We propose novel biological functions for several yeast TFs, including the factors Snt2 and Stb4, for which no function has been reported. Our affinity-based approach towards comparative genomics may allow a more quantitative analysis of the principles governing the evolution of non-coding DNA.
Availability: The MatrixREDUCE software package is available from
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2718632  PMID: 18586710
14.  Global Chromatin Domain Organization of the Drosophila Genome 
PLoS Genetics  2008;4(3):e1000045.
In eukaryotes, neighboring genes can be packaged together in specific chromatin structures that ensure their coordinated expression. Examples of such multi-gene chromatin domains are well-documented, but a global view of the chromatin organization of eukaryotic genomes is lacking. To systematically identify multi-gene chromatin domains, we constructed a compendium of genome-scale binding maps for a broad panel of chromatin-associated proteins in Drosophila melanogaster. Next, we computationally analyzed this compendium for evidence of multi-gene chromatin domains using a novel statistical segmentation algorithm. We find that at least 50% of all fly genes are organized into chromatin domains, which often consist of dozens of genes. The domains are characterized by various known and novel combinations of chromatin proteins. The genes in many of the domains are coregulated during development and tend to have similar biological functions. Furthermore, during evolution fewer chromosomal rearrangements occur inside chromatin domains than outside domains. Our results indicate that a substantial portion of the Drosophila genome is packaged into functionally coherent, multi-gene chromatin domains. This has broad mechanistic implications for gene regulation and genome evolution.
Author Summary
Genes are packaged into chromatin by a variety of specialized proteins. Many different types of chromatin exist, and each may regulate gene expression in different ways. It was previously observed that neighboring genes are sometimes packaged together into a single type of chromatin, which can facilitate their coordinated regulation. However, it has been unclear whether such multi-gene chromatin domains are exceptional, or may occur more frequently. Here, we report a systematic analysis of genome-wide binding patterns of a large set of chromatin components in the fruit fly Drosophila melanogaster. Strikingly, we find that at least 50% of all genes in this organism are packaged together with several of their neighboring genes into a single type of chromatin. Each chromatin domain can include dozens of genes and can be made up of different combinations of chromatin proteins. We show that genes in each domain often have similar functions and are coordinately expressed during development. Moreover, we find that many of these multi-gene domains have been kept intact during evolution, indicating that they are important functional units. In summary, multi-gene chromatin domains are much more common than previously thought, and they are likely to play important roles in the orchestration of gene expression.
PMCID: PMC2274884  PMID: 18369463
15.  Identification of Synaptic Targets of Drosophila Pumilio 
PLoS Computational Biology  2008;4(2):e1000026.
Drosophila Pumilio (Pum) protein is a translational regulator involved in embryonic patterning and germline development. Recent findings demonstrate that Pum also plays an important role in the nervous system, both at the neuromuscular junction (NMJ) and in long-term memory formation. In neurons, Pum appears to play a role in homeostatic control of excitability via down regulation of para, a voltage gated sodium channel, and may more generally modulate local protein synthesis in neurons via translational repression of eIF-4E. Aside from these, the biologically relevant targets of Pum in the nervous system remain largely unknown. We hypothesized that Pum might play a role in regulating the local translation underlying synapse-specific modifications during memory formation. To identify relevant translational targets, we used an informatics approach to predict Pum targets among mRNAs whose products have synaptic localization. We then used both in vitro binding and two in vivo assays to functionally confirm the fidelity of this informatics screening method. We find that Pum strongly and specifically binds to RNA sequences in the 3′UTR of four of the predicted target genes, demonstrating the validity of our method. We then demonstrate that one of these predicted target sequences, in the 3′UTR of discs large (dlg1), the Drosophila PSD95 ortholog, can functionally substitute for a canonical NRE (Nanos response element) in vivo in a heterologous functional assay. Finally, we show that the endogenous dlg1 mRNA can be regulated by Pumilio in a neuronal context, the adult mushroom bodies (MB), which is an anatomical site of memory storage.
Author Summary
The Drosophila Pumilio (Pum) protein was originally identified as a translational control factor for embryo patterning. Subsequent studies have identified Pum's role in multiple biological processes, including the maintenance of germline stem cell, the proliferation and migration of primordial germ cells, olfactory leaning and memory, and synaptic plasticity. Pum is highly conserved across phyla, i.e., from worm to human; however, the mRNA targets of Pum within each tissue and organism are largely unknown. On the other hand, the prediction of RNA binding sites remains a hard question in the computational field. We were interested in finding Pum targets in the nervous system using fruit flies as a model organism. To accomplish this, we used the few Pum binding sequences that had previously been shown in vivo as “training sequences” to construct bioinformatic models of the Pum binding site. We then predicted a few Pum mRNA targets among the genes known to function in neuronal synapses. We then used a combination of “golden standards” to verify these predictions: a biochemical assay called gel shifts, and in vivo functional assays both in embryo and neurons. With these approaches, we successfully confirmed one of the targets as Dlg, which is the Drosophila ortholog of human PSD95. Therefore, we present a complete story from computational study to real biological functions.
PMCID: PMC2265480  PMID: 18463699
16.  TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors 
Nucleic Acids Research  2007;36(Database issue):D125-D131.
Accurate and comprehensive information about the nucleotide sequence specificity of trans-acting factors (TFs) is essential for computational and experimental analyses of gene regulatory networks. We present the Yeast Transfactome Database, a repository of sequence specificity models and condition-specific regulatory activities for a large number of DNA- and RNA-binding proteins in Saccharomyces cerevisiae. The sequence specificities in TransfactomeDB, represented as position-specific affinity matrices (PSAMs), are directly estimated from genomewide measurements of TF-binding using our previously published MatrixREDUCE algorithm, which is based on a biophysical model. For each mRNA expression profile in the NCBI Gene Expression Omnibus, we used sequence-based regression analysis to estimate the post-translational regulatory activity of each TF for which a PSAM is available. The trans-factor activity profiles across multiple experiments available in TransfactomeDB allow the user to explore potential regulatory roles of hundreds of TFs in any of thousands of microarray experiments. Our resource is freely available at
PMCID: PMC2238954  PMID: 17947326
17.  Dissecting complex transcriptional responses using pathway-level scores based on prior information 
BMC Bioinformatics  2007;8(Suppl 6):S6.
The genomewide pattern of changes in mRNA expression measured using DNA microarrays is typically a complex superposition of the response of multiple regulatory pathways to changes in the environment of the cells. The use of prior information, either about the function of the protein encoded by each gene, or about the physical interactions between regulatory factors and the sequences controlling its expression, has emerged as a powerful approach for dissecting complex transcriptional responses.
We review two different approaches for combining the noisy expression levels of multiple individual genes into robust pathway-level differential expression scores. The first is based on a comparison between the distribution of expression levels of genes within a predefined gene set and those of all other genes in the genome. The second starts from an estimate of the strength of genomewide regulatory network connectivities based on sequence information or direct measurements of protein-DNA interactions, and uses regression analysis to estimate the activity of gene regulatory pathways. The statistical methods used are explained in detail.
By avoiding the thresholding of individual genes, pathway-level analysis of differential expression based on prior information can be considerably more sensitive to subtle changes in gene expression than gene-level analysis. The methods are technically straightforward and yield results that are easily interpretable, both biologically and statistically.
PMCID: PMC1995543  PMID: 17903287
18.  Detecting transcriptionally active regions using genomic tiling arrays 
Genome Biology  2006;7(7):R59.
A new method for designing and integrating genomic tiling array data is described and applied to Anopheles and human arrays.
We have developed a method for interpreting genomic tiling array data, implemented as the program TranscriptionDetector. Probed loci expressed above background are identified by combining replicates in a way that makes minimal assumptions about the data. We performed medium-resolution Anopheles gambiae tiling array experiments and found extensive transcription of both coding and non-coding regions. Our method also showed improved detection of transcriptional units when applied to high-density tiling array data for ten human chromosomes.
PMCID: PMC1779562  PMID: 16859498
19.  Modeling gene expression control using Omes Law 
Molecular Systems Biology  2006;2:2006.0013.
PMCID: PMC1681487  PMID: 16738558
20.  T-profiler: scoring the activity of predefined groups of genes using gene expression data 
Nucleic Acids Research  2005;33(Web Server issue):W592-W595.
One of the key challenges in the analysis of gene expression data is how to relate the expression level of individual genes to the underlying transcriptional programs and cellular state. Here we describe T-profiler, a tool that uses the t-test to score changes in the average activity of predefined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif or location on the same chromosome. If desired, an iterative procedure can be used to select a single, optimal representative from sets of overlapping gene groups. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters. Currently, gene expression data from Saccharomyces cerevisiae and Candida albicans are supported. Users can upload their microarray data for analysis on the web at .
PMCID: PMC1160244  PMID: 15980543
21.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data 
BMC Bioinformatics  2004;5:31.
Functional genomics studies are yielding information about regulatory processes in the cell at an unprecedented scale. In the yeast S. cerevisiae, DNA microarrays have not only been used to measure the mRNA abundance for all genes under a variety of conditions but also to determine the occupancy of all promoter regions by a large number of transcription factors. The challenge is to extract useful information about the global regulatory network from these data.
We present MA-Networker, an algorithm that combines microarray data for mRNA expression and transcription factor occupancy to define the regulatory network of the cell. Multivariate regression analysis is used to infer the activity of each transcription factor, and the correlation across different conditions between this activity and the mRNA expression of a gene is interpreted as regulatory coupling strength. Applying our method to S. cerevisiae, we find that, on average, 58% of the genes whose promoter region is bound by a transcription factor are true regulatory targets. These results are validated by an analysis of enrichment for functional annotation, response for transcription factor deletion, and over-representation of cis-regulatory motifs. We are able to assign directionality to transcription factors that control divergently transcribed genes sharing the same promoter region. Finally, we identify an intrinsic limitation of transcription factor deletion experiments related to the combinatorial nature of transcriptional control, to which our approach provides an alternative.
Our reliable classification of ChIP positives into functional and non-functional TF targets based on their expression pattern across a wide range of conditions provides a starting point for identifying the unknown sequence features in non-coding DNA that directly or indirectly determine the context dependence of transcription factor action. Complete analysis results are available for browsing or download at .
PMCID: PMC407845  PMID: 15113405
22.  REDUCE: an online tool for inferring cis-regulatory elements and transcriptional module activities from microarray data 
Nucleic Acids Research  2003;31(13):3487-3490.
REDUCE is a motif-based regression method for microarray analysis. The only required inputs are (i) a single genome-wide set of absolute or relative mRNA abundances and (ii) the DNA sequence of the regulatory region associated with each gene that is probed. Currently supported organisms are yeast, worm and fly; it is an open question whether in its current incarnation our approach can be used for mouse or human. REDUCE uses unbiased statistics to identify oligonucleotide motifs whose occurrence in the regulatory region of a gene correlates with the level of mRNA expression. Regression analysis is used to infer the activity of the transcriptional module associated with each motif. REDUCE is available online at This web site provides functionality for the upload and management of microarray data. REDUCE analysis results can be viewed and downloaded, and optionally be shared with other users or made publicly accessible.
PMCID: PMC169192  PMID: 12824350
23.  Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models 
Nucleic Acids Research  2003;31(8):2242-2251.
Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at
PMCID: PMC153734  PMID: 12682375
24.  Hap4p overexpression in glucose-grown Saccharomyces cerevisiae induces cells to enter a novel metabolic state 
Genome Biology  2002;4(1):R3.
To understand why overexpression of HAP4 is able to override the signals that normally result in glucose repression of mitochondrial function, changes that occur in these cells were analyzed.
Metabolic and regulatory gene networks generally tend to be stable. However, we have recently shown that overexpression of the transcriptional activator Hap4p in yeast causes cells to move to a state characterized by increased respiratory activity. To understand why overexpression of HAP4 is able to override the signals that normally result in glucose repression of mitochondrial function, we analyzed in detail the changes that occur in these cells.
Whole-genome expression profiling and fingerprinting of the regulatory activity network show that HAP4 overexpression provokes changes that also occur during the diauxic shift. Overexpression of HAP4, however, primarily acts on mitochondrial function and biogenesis. In fact, a number of nuclear genes encoding mitochondrial proteins are induced to a greater extent than in cells that have passed through a normal diauxic shift: in addition to genes required for mitochondrial energy conservation they include genes encoding mitochondrial ribosomal proteins.
We show that overproduction of a single nuclear transcription factor enables cells to move to a novel state that displays features typical of, but clearly not identical to, other derepressed states.
PMCID: PMC151284  PMID: 12537548
25.  Dissection of Transient Oxidative Stress Response in Saccharomyces cerevisiae by Using DNA MicroarraysD⃞ 
Molecular Biology of the Cell  2002;13(8):2783-2794.
Yeast cells were grown in glucose-limited chemostat cultures and forced to switch to a new carbon source, the fatty acid oleate. Alterations in gene expression were monitored using DNA microarrays combined with bioinformatics tools, among which was included the recently developed algorithm REDUCE. Immediately after the switch to oleate, a transient and very specific stress response was observed, followed by the up-regulation of genes encoding peroxisomal enzymes required for fatty acid metabolism. The stress response included up-regulation of genes coding for enzymes to keep thioredoxin and glutathione reduced, as well as enzymes required for the detoxification of reactive oxygen species. Among the genes coding for various isoenzymes involved in these processes, only a specific subset was expressed. Not the general stress transcription factors Msn2 and Msn4, but rather the specific factor Yap1p seemed to be the main regulator of the stress response. We ascribe the initiation of the oxidative stress response to a combination of poor redox flux and fatty acid-induced uncoupling of the respiratory chain during the metabolic reprogramming phase.
PMCID: PMC117942  PMID: 12181346

Results 1-25 (25)