1.  Uncovering a Macrophage Transcriptional Program by Integrating Evidence from Motif Scanning and Expression Dynamics 
PLoS Computational Biology  2008;4(3):e1000021.
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation.
Author Summary
Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
PMCID: PMC2265556  PMID: 18369420
2.  The phosphoproteome of toll-like receptor-activated macrophages 
First global and quantitative analysis of phosphorylation cascades induced by toll-like receptor (TLR) stimulation in macrophages identifies nearly 7000 phosphorylation sites and shows extensive and dynamic up-regulation and down-regulation after lipopolysaccharide (LPS).In addition to the canonical TLR-associated pathways, mining of the phosphorylation data suggests an involvement of ATM/ATR kinases in signalling and shows that the cytoskeleton is a hotspot of TLR-induced phosphorylation.Intersecting transcription factor phosphorylation with bioinformatic promoter analysis of genes induced by LPS identified several candidate transcriptional regulators that were previously not implicated in TLR-induced transcriptional control.
Toll-like receptors (TLR) are a family of pattern recognition receptors that enable innate immune cells to sense infectious danger. Recognition of microbial structures, like lipopolysaccharide (LPS) of Gram-negative bacteria by TLR4, causes within hours substantial re-programming of macrophage gene expression, including up-regulation of chemokines driving inflammation, anti-microbial effector molecules and cytokines directing adaptive immune responses. TLR signalling is initiated by the adapter protein Myd88 and leads to the activation of kinase cascades that result in activation of the MAPK and NFkB pathways. Phosphorylation has an essential role in these early steps of TLR signalling, and in addition regulates critical transcription factors (TFs). Although TLR signalling has been extensively studied, a comprehensive analysis of phosphorylation events in TLR-activated macrophages is lacking. It is therefore unknown whether the canonical MAPK and NFkB pathways comprise the main phosphorylation events and which other molecular functions and processes are regulated by phosphorylation after stimulation with LPS.
Recent progress in mass spectrometry-based proteomics has opened the possibility to quantitatively investigate global changes in protein abundance and post-translational modifications. Stable isotope labelling with amino acids in cell culture (SILAC) allows highly accurate quantification, and has proved especially useful for direct comparison of phosphopeptide abundance in time-course or treatment analyses.
Here, we adapted SILAC to primary mouse macrophages, and performed a global, quantitative and kinetic analysis of the macrophage phosphoproteome after LPS stimulation. Bioinformatic analyses were used to identify kinases, pathways and biological processes enriched in the LPS-regulated phosphoproteome. To connect TF phosphorylation with transcription, we generated a parallel dataset of nascent RNA and used in silico promoter analysis to identify transcriptional regulators with binding site enrichment among the LPS-regulated gene set.
After establishing SILAC conditions for efficient labelling of primary bone marrow-derived macrophages in two independent experiments 1850 phosphoproteins with a total of 6956 phosphorylation sites were reproducibly identified. Phosphoproteins were detected from all cellular compartments, with a clear enrichment for nuclear and cytoskeleton-associated proteins. LPS caused major regulation of a large fraction of phosphopeptides, with 24% of all sites up-regulated and 9% down-regulated after stimulation (Figure 3A and B). These changes were highly dynamic, as the majority of the regulated phosphopeptides were up-regulated or down-regulated transiently or in a delayed manner (Figure 3C). Overall, the extent of changes in the phosphoproteome was comparable to the transcriptional re-programming, underscoring the importance of phosphorylation cascades in TLR signalling. Our parallel transcriptome data also showed that widespread phosphorylation precedes massive transcriptional changes.
To obtain footprints of kinase activation in response to TLR ligation, we searched phosphopeptide sequences for known linear sequence motifs of 33 kinases and identified kinase motifs enriched among LPS-regulated phosphorylation sites (compared to non-regulated phosphorylation sites) (Table I). Motif ERK/MAPK was highly enriched, in accordance with the essential role of the MAPK module in TLR signalling. Other kinases with motif enrichment have also recently been linked to TLR signalling (e.g. PKD; AKT and its targets GSK3 and mTOR). However, the DNA damage-actviated kinases ATM/ATR and the cell cycle-associated kinases AURORA and CHK1/2 have not been associated with the macrophage response to TLR activation yet. These finding shed new light on older data on the effect of TLR on macrophage proliferation in response to macrophage colony stimulating factor. Of interest, in follow-up experiments using pharmacological inhibitors of the kinases with motif enrichment, we observed that inhibition of ATM kinase activity caused increased LPS-induced expression of several cytokines and chemokines, suggesting that this pathway regulates inflammatory responses.
In further bioinformatic analyses, the Gene Ontology and signalling pathway annotations of phosphoproteins were used to identify signalling pathways and cellular processes targeted by TLR4-controlled phosphorylation (Table II). Among the expected hits, based on the known TLR pathways, were TLR signalling, MAPK and AKT as well as mTOR signalling. Of interest, the annotation terms ‘Rho GTPase cycle' and ‘cytoskeleton' were significantly enriched among LPS-regulated phosphoproteins, indicating a more prominent role for cytoskeletal proteins in the transduction of TLR signals or in the biological response to it.
We were especially interested in the phosphorylation of TFs and its regulation by LPS (Figure 6A). We hypothesised that functionally important TFs should have an increased frequency of binding sites in the promoters of LPS-regulated genes (Figure 6B). To identify transcriptionally regulated genes with high sensitivity, we isolated nascent RNA after metabolic labelling (Figure 6C–E). In silico promoter scanning using Genomatix software for binding sites for all 50 TF families with phosphorylated members was used to test for enrichment in transciptionally induced genes (Figure 6F). At the early time point, binding site enrichment for the canonical TLR-associated TF NFkB was detected, and in addition we found that several other TF families with an established role in the transcription of individual LPS-target genes showed binding site enrichment (CEBP, MEF2, NFAT and HEAT). In addition, enrichment for OCT and HOXC binding sites at the early time point and SORY matrices later after stimulation indicated an involvement of the phosphorylated members of the respective TF families in the execution of TLR-induced transcriptional responses. An initial test of the function for a few of these candidate transcriptional regulators was performed using siRNA knockdown in primary macrophages. These experiments suggested that knock down of the SORY binding phosphoprotein Capicua homolog (Cic) and to a lesser extent of the CREB family member Atf7 selectively attenuates LPS-induced expression of Il1a and Il1b.
In summary, this study provides a novel and global perspective on innate immune activation by TLR signalling (Figure 5). We quantitatively detected a large number of previously unknown site-specific phosphorylation events, which are now publicly available through the Phosida database. By combining different data mining approaches, we consistently identified canonical and newly implicated TLR-activated signalling modules. In particular, the PI3K/AKT and the related mTOR pathway were highlighted; furthermore, DNA damage–response associated ATM/ATR kinases and the cytoskeleton emerged as unexpected hotspots for phosphorylation. Finally, weaving together corresponding phophoproteome and nascent transcriptome datasets through the loom of in silico promoter analysis we identified TFs with a likely role in mediating TLR-induced gene expression programmes.
Recognition of microbial danger signals by toll-like receptors (TLR) causes re-programming of macrophages. To investigate kinase cascades triggered by the TLR4 ligand lipopolysaccharide (LPS) on systems level, we performed a global, quantitative and kinetic analysis of the phosphoproteome of primary macrophages using stable isotope labelling with amino acids in cell culture, phosphopeptide enrichment and high-resolution mass spectrometry. In parallel, nascent RNA was profiled to link transcription factor (TF) phosphorylation to TLR4-induced transcriptional activation. We reproducibly identified 1850 phosphoproteins with 6956 phosphorylation sites, two thirds of which were not reported earlier. LPS caused major dynamic changes in the phosphoproteome (24% up-regulation and 9% down-regulation). Functional bioinformatic analyses confirmed canonical players of the TLR pathway and highlighted other signalling modules (e.g. mTOR, ATM/ATR kinases) and the cytoskeleton as hotspots of LPS-regulated phosphorylation. Finally, weaving together phosphoproteome and nascent transcriptome data by in silico promoter analysis, we implicated several phosphorylated TFs in primary LPS-controlled gene expression.
PMCID: PMC2913394  PMID: 20531401
macrophage; nascent RNA; phosphoproteome; SILAC; toll-like receptors
3.  Epigenetic programming during monocyte to macrophage differentiation and trained innate immunity 
Science (New York, N.Y.)  2014;345(6204):1251086.
Structured Abstract
Monocytes circulate in the bloodstream for up to 3–5 days. Concomitantly, immunological imprinting of either tolerance (immunosuppression) or trained immunity (innate immune memory) determines the functional fate of monocytes and monocyte-derived macrophages, as observed after infection or vaccination.
Purified circulating monocytes from healthy volunteers were differentiated under the homeostatic M-CSF concentrations present in human serum. During the first 24 hours, trained immunity was induced by β-glucan (BG) priming, while post-sepsis immunoparalysis was mimicked by exposure to LPS, generating endotoxin-induced tolerance. Epigenomic profiling of the histone marks H3K4me1, H3K4me3 and H3K27ac, DNase I accessibility and RNA sequencing were performed at both the start of the experiment (ex vivo monocytes) and at the end of the six days of in vitro culture (macrophages).
Compared to monocytes (Mo), naïve macrophages (Mf) display a remodeled metabolic enzyme repertoire and attenuated innate inflammatory pathways; most likely necessary to generate functional tissue macrophages. Epigenetic profiling uncovered ~8000 dynamic regions associated with ~11000 DNase I hypersensitive sites. Changes in histone acetylation identified most dynamic events. Furthermore, these regions of differential histone marks displayed some degree of DNase I accessibility that was already present in monocytes. H3K4me1 mark increased in parallel with de novo H3K27ac deposition at distal regulatory regions; H3K4me1 mark remained even after the loss of H3K27ac, marking decommissioned regulatory elements. β-glucan priming specifically induced ~3000 distal regulatory elements, whereas LPS-tolerization uniquely induced H3K27ac at ~500 distal regulatory regions.
At the transcriptional level, we identified co-regulated gene modules during monocyte to macrophage differentiation, as well as discordant modules between trained and tolerized cells. These indicate that training likely involves an increased expression of modules expressed in naïve macrophages, including genes that code for metabolic enzymes. On the other hand, endotoxin tolerance involves gene modules that are more active in monocytes than in naïve macrophages. About 12% of known human transcription factors display variation in expression during macrophage differentiation, training and tolerance. We also observed transcription factor motifs in DNase I hypersensitive sites at condition-specific dynamic epigenomic regions, implying that specific transcription factors are required for trained and tolerized macrophage epigenetic and transcriptional programs. Finally, our analyses and functional validation indicate that the inhibition of cAMP generation blocked trained immunity in vitro and during an in vivo model of lethal C. albicans infection, abolishing the protective effects of trained immunity.
We documented the importance of epigenetic regulation of the immunological pathways underlying monocyte-to-macrophage differentiation and trained immunity. These dynamic epigenetic elements may inform on potential pharmacological targets that modulate innate immunity. Altogether, we uncovered the epigenetic and transcriptional programs of monocyte differentiation to macrophages that distinguish tolerant and trained macrophage phenotypes, providing a resource to further understand and manipulate immune-mediated responses.
PMCID: PMC4242194  PMID: 25258085
4.  Epigenome-Guided Analysis of the Transcriptome of Plaque Macrophages during Atherosclerosis Regression Reveals Activation of the Wnt Signaling Pathway 
PLoS Genetics  2014;10(12):e1004828.
We report the first systems biology investigation of regulators controlling arterial plaque macrophage transcriptional changes in response to lipid lowering in vivo in two distinct mouse models of atherosclerosis regression. Transcriptome measurements from plaque macrophages from the Reversa mouse were integrated with measurements from an aortic transplant-based mouse model of plaque regression. Functional relevance of the genes detected as differentially expressed in plaque macrophages in response to lipid lowering in vivo was assessed through analysis of gene functional annotations, overlap with in vitro foam cell studies, and overlap of associated eQTLs with human atherosclerosis/CAD risk SNPs. To identify transcription factors that control plaque macrophage responses to lipid lowering in vivo, we used an integrative strategy – leveraging macrophage epigenomic measurements – to detect enrichment of transcription factor binding sites upstream of genes that are differentially expressed in plaque macrophages during regression. The integrated analysis uncovered eight transcription factor binding site elements that were statistically overrepresented within the 5′ regulatory regions of genes that were upregulated in plaque macrophages in the Reversa model under maximal regression conditions and within the 5′ regulatory regions of genes that were upregulated in the aortic transplant model during regression. Of these, the TCF/LEF binding site was present in promoters of upregulated genes related to cell motility, suggesting that the canonical Wnt signaling pathway may be activated in plaque macrophages during regression. We validated this network-based prediction by demonstrating that β-catenin expression is higher in regressing (vs. control group) plaques in both regression models, and we further demonstrated that stimulation of canonical Wnt signaling increases macrophage migration in vitro. These results suggest involvement of canonical Wnt signaling in macrophage emigration from the plaque during lipid lowering-induced regression, and they illustrate the discovery potential of an epigenome-guided, systems approach to understanding atherosclerosis regression.
Author Summary
Atherosclerosis, a progressive accumulation of lipid-rich plaque within arteries, is an inflammatory disease in which the response of macrophages (a key cell type of the innate immune system) to plasma lipoproteins plays a central role. In humans, the goal of significantly reducing already-established plaque through drug treatments, including statins, remains elusive. In mice, atherosclerosis can be reversed by experimental manipulations that lower circulating lipid levels. A common feature of many regression models is that macrophages transition to a less inflammatory state and emigrate from the plaque. While the molecular regulators that control these responses are largely unknown, we hypothesized that by integrating global measurements of macrophage gene expression in regressing plaques with measurements of the macrophage chromatin landscape, we could identify key molecules that control macrophage responses to the lowering of circulating lipid levels. Our systems biology analysis of plaque macrophages yielded a network in which the Wnt signaling pathway emerged as a candidate upstream regulator. Wnt signaling is known to affect both inflammation and the ability of macrophages to migrate from one location to another, and our targeted validation studies provide evidence that Wnt signaling is increased in plaque macrophages during regression. Our findings both demonstrate the power of a systems approach to uncover candidate regulators of regression and to identify a potential new therapeutic target.
PMCID: PMC4256277  PMID: 25474352
5.  A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis 
PLoS Genetics  2012;8(3):e1002531.
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns.
Author Summary
The development of multicellular organisms requires the formation of a diversity of cell types. Each cell has a unique genetic program that is orchestrated by regulatory sequences called enhancers, comprising multiple short DNA sequences that bind distinct transcription factors. Understanding developmental regulatory networks requires knowledge of the sequence features of functionally related enhancers. We developed an integrated evolutionary and computational approach for deciphering enhancer regulatory codes and applied this method to discover new components of the transcriptional network controlling muscle development in the fruit fly, Drosophila melanogaster. Our method involves assembling known muscle enhancers, expanding this set with evolutionarily conserved sequences, computationally classifying these enhancers based on their shared sequence features, and scanning the entire Drosophila genome to predict additional related enhancers. Using this approach, we created a map of 5,500 putative muscle enhancers, identified candidate transcription factors to which they bind, observed a strong correlation between mapped enhancers and muscle gene expression, and uncovered extensive heterogeneity among combinations of transcription factor binding sites in validated muscle enhancers, a feature that may contribute to the individual cellular specificities of these regulatory elements. Our strategy can readily be generalized to study transcriptional networks in other organisms and developmental contexts.
PMCID: PMC3297574  PMID: 22412381
6.  Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo 
A well-defined set of transcriptional regulatory modules was created and analyzed in the Drosophila embryo.Fractional occupancy-based models were developed to explain the interaction of short range transcriptional repressors with endogenous activators by using quantitative data from these modules.Our fractional occupancy-based modeling uncovered specific quantitative features of short-range repressors; a complex nonlinear quenching relationship, similar quenching efficiencies for different activators, and modest levels of cooperativityThe extension of the study to endogenous enhancers highlighted several features of enhancer architecture design in Drosophila embryos.
Transcriptional regulatory information, represented by patterns of protein-binding sites on DNA, comprises an important portion of genetic coding. Despite the abundance of genomic sequences now available, identifying and characterizing this information remain a major challenge. Minor changes in protein-binding sites can have profound effects on gene expression, and such changes have been shown to underlie important aspects of disease and evolution. Thus, an important aim in contemporary systems biology is to develop a global understanding of the transcriptional regulatory code, allowing prediction of gene output based on DNA sequence information. Recent studies have focused on endogenous transcriptional regulatory sequences (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008); however, distinct enhancers differ in many features, including transcription factor activity, spacing, and cooperativity, making it difficult to learn the effects of individual features and generalize them to other cis-regulatory elements. We have pursued a bottom up approach to understand the mechanistic processing of regulatory elements by the transcriptional machinery, using a well-defined and characterized set of repressors and activators in Drosophila blastoderm embryos. The study focuses on the Giant, Krüppel, Knirps, and Snail proteins, which have been characterized as short-range repressors, able to act locally to interfere with activator function (quenching) (Gray et al, 1994; Arnosti et al, 1996a). Such repressors have central functions in development.
The aim our study was to enable ab initio predictions of enhancer function, given defined quantities of regulatory proteins and the sequence of the enhancer (Figure 1). We have generated a large quantitative data set using fluorescent confocal laser scanning microscopy to determine the inputs (Giant, Krüppel, and Knirps protein levels) and outputs (lacZ mRNA levels) of the regulatory elements introduced into Drosophila by transgenesis. We analyzed the effect of altering specific features of a set of related gene modules, designed to uncover critical aspects of repression, including quenching distance, cooperativity, and overall factor potency.
We generated specific descriptions for each regulatory element using fractional occupancy-based modeling and identified quantitative values for parameters affecting transcriptional regulation in vivo, and these parameters were used to build and test the model. Through this process, we uncovered earlier unknown features that allow correct predictions of regulation by short-range repressors, including a non-monotonic distance function for quenching, which implicates possible phasing effects, a modest contribution for repressor–repressor cooperativity, and similarity in repression of disparate activators.
By applying these parameters to a model of the endogenous rhomboid enhancer, we uncovered novel insights into the architecture of this enhancer (Figure 8). Our study provides essential quantitative elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms. Extension of these predictive models should facilitate the development of more sophisticated computational algorithms for the identification and functional characterization of novel regulatory elements. The development of such quantitative modeling tools will change our understanding of the genome from essentially a parts list to a dynamically regulated system, and will greatly facilitate studies in disease, population genetics, and evolutionary biology.
Systems biology seeks a genomic-level interpretation of transcriptional regulatory information represented by patterns of protein-binding sites. Obtaining this information without direct experimentation is challenging; minor alterations in binding sites can have profound effects on gene expression, and underlie important aspects of disease and evolution. Quantitative modeling offers an alternative path to develop a global understanding of the transcriptional regulatory code. Recent studies have focused on endogenous regulatory sequences; however, distinct enhancers differ in many features, making it difficult to generalize to other cis-regulatory elements. We applied a systematic approach to simpler elements and present here the first quantitative analysis of short-range transcriptional repressors, which have central functions in metazoan development. Our fractional occupancy-based modeling uncovered unexpected features of these proteins' activity that allow accurate predictions of regulation by the Giant, Knirps, Krüppel, and Snail repressors, including modeling of an endogenous enhancer. This study provides essential elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms.
PMCID: PMC2824527  PMID: 20087339
Drosophila; enhancer; modeling; repression; transcription
7.  A Predictive Model of the Oxygen and Heme Regulatory Network in Yeast 
PLoS Computational Biology  2008;4(11):e1000224.
Deciphering gene regulatory mechanisms through the analysis of high-throughput expression data is a challenging computational problem. Previous computational studies have used large expression datasets in order to resolve fine patterns of coexpression, producing clusters or modules of potentially coregulated genes. These methods typically examine promoter sequence information, such as DNA motifs or transcription factor occupancy data, in a separate step after clustering. We needed an alternative and more integrative approach to study the oxygen regulatory network in Saccharomyces cerevisiae using a small dataset of perturbation experiments. Mechanisms of oxygen sensing and regulation underlie many physiological and pathological processes, and only a handful of oxygen regulators have been identified in previous studies. We used a new machine learning algorithm called MEDUSA to uncover detailed information about the oxygen regulatory network using genome-wide expression changes in response to perturbations in the levels of oxygen, heme, Hap1, and Co2+. MEDUSA integrates mRNA expression, promoter sequence, and ChIP-chip occupancy data to learn a model that accurately predicts the differential expression of target genes in held-out data. We used a novel margin-based score to extract significant condition-specific regulators and assemble a global map of the oxygen sensing and regulatory network. This network includes both known oxygen and heme regulators, such as Hap1, Mga2, Hap4, and Upc2, as well as many new candidate regulators. MEDUSA also identified many DNA motifs that are consistent with previous experimentally identified transcription factor binding sites. Because MEDUSA's regulatory program associates regulators to target genes through their promoter sequences, we directly tested the predicted regulators for OLE1, a gene specifically induced under hypoxia, by experimental analysis of the activity of its promoter. In each case, deletion of the candidate regulator resulted in the predicted effect on promoter activity, confirming that several novel regulators identified by MEDUSA are indeed involved in oxygen regulation. MEDUSA can reveal important information from a small dataset and generate testable hypotheses for further experimental analysis. Supplemental data are included.
Author Summary
The cell uses complex regulatory networks to modulate the expression of genes in response to changes in cellular and environmental conditions. The transcript level of a gene is directly affected by the binding of transcriptional regulators to DNA motifs in its promoter sequence. Therefore, both expression levels of transcription factors and other regulatory proteins as well as sequence information in the promoters contribute to transcriptional gene regulation. In this study, we describe a new computational strategy for learning gene regulatory programs from gene expression data based on the MEDUSA algorithm. We learn a model that predicts differential expression of target genes from the expression levels of regulators, the presence of DNA motifs in promoter sequences, and binding data for transcription factors. Unlike many previous approaches, we do not assume that genes are regulated in clusters, and we learn DNA motifs de novo from promoter sequences as an integrated part of our algorithm. We use MEDUSA to produce a global map of the yeast oxygen and heme regulatory network. To demonstrate that MEDUSA can reveal detailed information about regulatory mechanisms, we perform biochemical experiments to confirm the predicted regulators for an important hypoxia gene.
PMCID: PMC2573020  PMID: 19008939
8.  Construction and Modelling of an Inducible Positive Feedback Loop Stably Integrated in a Mammalian Cell-Line 
PLoS Computational Biology  2011;7(6):e1002074.
Understanding the relationship between topology and dynamics of transcriptional regulatory networks in mammalian cells is essential to elucidate the biology of complex regulatory and signaling pathways. Here, we characterised, via a synthetic biology approach, a transcriptional positive feedback loop (PFL) by generating a clonal population of mammalian cells (CHO) carrying a stable integration of the construct. The PFL network consists of the Tetracycline-controlled transactivator (tTA), whose expression is regulated by a tTA responsive promoter (CMV-TET), thus giving rise to a positive feedback. The same CMV-TET promoter drives also the expression of a destabilised yellow fluorescent protein (d2EYFP), thus the dynamic behaviour can be followed by time-lapse microscopy. The PFL network was compared to an engineered version of the network lacking the positive feedback loop (NOPFL), by expressing the tTA mRNA from a constitutive promoter. Doxycycline was used to repress tTA activation (switch off), and the resulting changes in fluorescence intensity for both the PFL and NOPFL networks were followed for up to 43 h. We observed a striking difference in the dynamics of the PFL and NOPFL networks. Using non-linear dynamical models, able to recapitulate experimental observations, we demonstrated a link between network topology and network dynamics. Namely, transcriptional positive autoregulation can significantly slow down the “switch off” times, as comparared to the nonautoregulatated system. Doxycycline concentration can modulate the response times of the PFL, whereas the NOPFL always switches off with the same dynamics. Moreover, the PFL can exhibit bistability for a range of Doxycycline concentrations. Since the PFL motif is often found in naturally occurring transcriptional and signaling pathways, we believe our work can be instrumental to characterise their behaviour.
Author Summary
Synthetic Biology aims at designing and building new biological functions in living organisms. At the same time, Synthetic Biology approaches can be used to uncover the design principles of natural biological systems through the rational construction of simplified regulatory networks. Mathematical models of the networks are then derived from physical considerations and can be used to explain the observed dynamical behaviours. We have characterised a regulatory motif often found in transcriptional and signalling pathways. We constructed a positive feedback loop motif in mammalian cells, consisting of a protein controlling its own expression. We have shown that this motif exhibits a dynamic behaviour which is very different from that obtained when the autoregulation is removed. This difference is intrinsic to the specific wiring diagram chosen by the cell to control its behaviour (feedback versus non-feedback configurations), and can be instrumental in understanding the complex network of regulation occurring in a cell.
PMCID: PMC3127819  PMID: 21765813
9.  Global coordination of transcriptional control and mRNA decay during cellular differentiation 
We have systematically identified the targets of the Schizosaccharomyces pombe RNA-binding protein Meu5p, which is transiently induced during cellular differentiation. Meu5p-bound transcripts (>80) are expressed at low levels and have shorter half-lives in meu5 mutants, suggesting that Meu5p binding stabilizes its RNA targets.Most Meu5p targets are induced during differentiation by the activity of the Mei4p transcription factor. However, although most Mei4p targets display a sharp peak of expression, Meu5p targets are expressed for a longer period. In the absence of Meu5p, all Mei4p targets are expressed with similar kinetics (similar to non-Meu5p targets). Therefore, Meu5p determines the temporal profile of its targets.As the meu5 gene is itself a target of the transcription factor Mei4p, the RNA-binding protein Meu5p and their shared targets form a feed-forward loop (FFL), a network motif that is common in transcriptional networks.Our data highlight the importance of considering both transcriptional and posttranscriptional controls to understand dynamic changes in RNA levels, and provide insight into the structure of the regulatory networks that integrate transcription and RNA decay.
RNA levels are determined by the balance between RNA production (transcription) and degradation (decay or turnover). Therefore, cells can alter transcript levels by modulating either or both processes. Regulation of transcriptional initiation is one of the most common ways to regulate RNA levels. This function is frequently performed by transcription factors (TFs), which recognize specific sequence motifs on the promoters of their target genes and activate or repress their transcription. At the posttranscriptional level, RNA-binding proteins (RBPs) can bind to specific sequences on their target RNAs and regulate their rates of turnover.
RNA decay can be studied at the genome-wide level using microarrays or next-generation sequencing. The contribution of RNA turnover to transcript levels can be assessed by directly measuring decay rates. This is usually achieved by using microarrays to follow the decrease of RNA levels after inactivation of RNA polymerase II, or by in vivo labelling of newly synthesized RNA with modified nucleosides. These approaches can be applied to mutants in genes encoding RBPs, allowing the dissection of their specific functions in RNA turnover. Moreover, direct RBP targets can be identified by purifying RBP–RNA complexes, which are then analysed using microarrays (RIp-chip, for RBP Immunoprecipitation followed by analysis with DNA chips).
Many biological processes involve the establishment of complex programs of gene expression, in which the levels of hundreds of mRNAs are dynamically regulated. Although the genome-wide function of TFs in these processes has been studied extensively, much less is known about the contribution of RBPs, and especially about how the activity of TFs and RBPs is coordinated. Sexual differentiation of the fission yeast Schizosaccharomyces pombe culminates in meiosis and sporulation and is driven by an extensive gene expression program during which ∼40% of the genome (∼2000 genes) is regulated in complex temporal patterns. Transcriptional control is essential for the implementation of this program, and TFs responsible for the induction of most groups of upregulated genes have been identified. In particular, a transcription factor called Mei4p, which is itself transiently expressed during the meiotic divisions, induces the temporary expression of over 500 genes.
Here, we use genome-wide approaches to investigate the function of the Meu5p RBP, which is transiently induced by the Mei4p TF during the meiotic divisions. RIp-chip experiments identified >80 transcripts bound to Meu5p during meiosis, most of which were also targets of the Mei4p transcription factor. In meu5 mutants, Meu5p targets are expressed at low levels and have shorter half-lives, indicating that Meu5p stabilizes the transcripts it binds to. This stabilization has biological importance, as cells without meu5 are defective in spore formation.
Although the majority of Mei4p TF targets reach their peak in expression levels with similar kinetics, we noticed that the timing of their downregulation was heterogeneous. We could identify two discrete groups among Mei4p targets: a set of mRNAs with short (∼1 h) and sharp gene expression profiles (early decrease), and a group that displayed a broader expression pattern, with high levels of expression for 2–3 h (late decrease).
Most Meu5p RBP targets belonged to the late-decrease group, suggesting a simple model in which Meu5p might stabilize its targets, thus extending the duration of their expression. To test this idea, we followed gene expression in synchronized cultures of wild-type and meu5Δ meiotic cells. Although the expression of early decrease genes was not affected by the absence of meu5, late-decrease genes switched their profile to a pattern similar to that of early decrease genes. As transcription of meu5 is under the control of Mei4p, the TF Mei4p, the RBP Meu5p, and their common targets form a so-called feed-forward loop, in which a protein regulates a target both directly and indirectly through a second protein. This arrangement is common in transcriptional and protein phosphorylation networks.
Our results serve as a paradigm of how the coordination of the action of TFs and RBPs determines how RNA levels are dynamically regulated.
The function of transcription in dynamic gene expression programs has been extensively studied, but little is known about how it is integrated with RNA turnover at the genome-wide level. We investigated these questions using the meiotic gene expression program of Schizosaccharomyces pombe. We identified over 80 transcripts that co-purify with the meiotic-specific Meu5p RNA-binding protein. Their levels and half-lives were reduced in meu5 mutants, demonstrating that Meu5p stabilizes its targets. Most Meu5p-bound RNAs were also targets of the Mei4p transcription factor, which induces the transient expression of ∼500 meiotic genes. Although many Mei4p targets showed sharp expression peaks, Meu5p targets had broad expression profiles. In the absence of meu5, all Mei4p targets were expressed with similar kinetics, indicating that Meu5p alters the global features of the gene expression program. As Mei4p activates meu5 transcription, Mei4p, Meu5p and their common targets form a feed-forward loop, a motif common in transcriptional networks but not studied in the context of mRNA decay. Our data provide insight into the topology of regulatory networks integrating transcriptional and posttranscriptional controls.
PMCID: PMC2913401  PMID: 20531409
mRNA decay; RIp-chip; posttranscriptional control
10.  Ab initio identification of human microRNAs based on structure motifs 
BMC Bioinformatics  2007;8:478.
MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, ab initio approaches are able to discover species-specific miRNAs without known sequence homology.
MiRPred is a novel method for ab initio prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns.
Our motif finding method is at least competitive to state-of-the-art feature-based methods for ab initio miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge.
PMCID: PMC2238772  PMID: 18088431
11.  Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas 
BMC Plant Biology  2013;13:42.
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize.
A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.
An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
PMCID: PMC3658923  PMID: 23497159
Promoter; cis-acting; Motif; Maize; Anthocyanin; Phlobaphene; Bioprospector; MEME; Weeder; C1; P
12.  Delineation of Diverse Macrophage Activation Programs in Response to Intracellular Parasites and Cytokines 
The ability to reside and proliferate in macrophages is characteristic of several infectious agents that are of major importance to public health, including the intracellular parasites Trypanosoma cruzi (the etiological agent of Chagas disease) and Leishmania species (etiological agents of Kala-Azar and cutaneous leishmaniasis). Although recent studies have elucidated some of the ways macrophages respond to these pathogens, the relationships between activation programs elicited by these pathogens and the macrophage activation programs elicited by bacterial pathogens and cytokines have not been delineated.
Methodology/Principal Findings
To provide a global perspective on the relationships between macrophage activation programs and to understand how certain pathogens circumvent them, we used transcriptional profiling by genome-wide microarray analysis to compare the responses of mouse macrophages following exposure to the intracellular parasites T. cruzi and Leishmania mexicana, the bacterial product lipopolysaccharide (LPS), and the cytokines IFNG, TNF, IFNB, IL-4, IL-10, and IL-17. We found that LPS induced a classical activation state that resembled macrophage stimulation by the Th1 cytokines IFNG and TNF. However, infection by the protozoan pathogen L. mexicana produced so few transcriptional changes that the infected macrophages were almost indistinguishable from uninfected cells. T. cruzi activated macrophages produced a transcriptional signature characterized by the induction of interferon-stimulated genes by 24 h post-infection. Despite this delayed IFN response by T. cruzi, the transcriptional response of macrophages infected by the kinetoplastid pathogens more closely resembled the transcriptional response of macrophages stimulated by the cytokines IL-4, IL-10, and IL-17 than macrophages stimulated by Th1 cytokines.
This study provides global gene expression data for a diverse set of biologically significant pathogens and cytokines and identifies the relationships between macrophage activation states induced by these stimuli. By comparing macrophage activation programs to pathogens and cytokines under identical experimental conditions, we provide new insights into how macrophage responses to kinetoplastids correlate with the overall range of macrophage activation states.
Author Summary
Macrophages are a type of immune cell that engulf and digest microorganisms. Despite their role in protecting the host from infection, many pathogens have developed ways to hijack the macrophage and use the cell for their own survival and proliferation. This includes the parasites Trypanosoma cruzi and Leishmania mexicana. In order to gain further understanding of how these pathogens interact with the host macrophage, we compared macrophages that have been infected with these parasites to macrophages that have been stimulated in a number of different ways. Macrophages can be activated by a wide variety of stimuli, including common motifs found on pathogens (known as pathogen associated molecular patterns or PAMPs) and cytokines secreted by other immune cells. In this study, we have delineated the relationships between the macrophage activation programs elicited by a number of cytokines and PAMPs. Furthermore, we have placed the macrophage responses to T. cruzi and L. mexicana into the context of these activation programs, providing a better understanding of the interactions between these pathogens and macrophages.
PMCID: PMC2846935  PMID: 20361029
13.  Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests 
PLoS Computational Biology  2009;5(6):e1000414.
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.
Author Summary
Transcriptional regulation, one of the most complex and intriguing processes in living cells, drives essential downstream cellular processes such as development, proliferation and differentiation. It gives rise to the versatility and flexibility that allows cells to determine their actions and states in response to internal needs or external stimuli by turning on, or shutting off, select sets of genes. This elaborate control of gene expression is realized by sophisticated transcriptional regulatory networks that include a diverse repertoire of transcription factors. Here, we study the relationship between gene expression and transcription factor binding in diverse yeast physiological processes. Our random forest-based method effectively models gene expression measurements simultaneously, bypassing the necessity of analyzing the multiple samples separately. Using our method, we have identified many high-order interactions between regulatory sequences that give rise to condition-specific gene expression.
PMCID: PMC2691601  PMID: 19543377
14.  Cellular reprogramming by the conjoint action of ERα, FOXA1, and GATA3 to a ligand-inducible growth state 
Estrogen receptor α (ERα), FOXA1, and GATA3 form a functional enhanceosome in MCF-7 breast carcinoma cell that is significantly associated with active transcriptional features such as enhanced p300 co-activator and RNA Pol II recruitment as well as chromatin opening.The enhanceosome exerts significant impact and optimal transcriptional control in the regulation of E2-responsive genes.The presence of FOXA1 and GATA3 is indispensable in restoring the ERα growth-response machinery in the ERα-negative cells and recapitulating the appropriate expression cassette.
Estrogen receptor α (ERα) is a ligand-inducible hormone nuclear receptor that has important physiology and pathology roles in reproduction, cancer, and cardiovascular biology. The regulation of ERα involves its binding to the DNA recognition sequence also known as estrogen-response elements (EREs) and recruits a variety of co-activators, corepressors, and chromatin remodeling enzymes to initiate transcription machinery. In our previous (Lin et al, 2007) and recent (Joseph et al, 2010) studies, we have identified high confidence ERα binding sites in MCF-7 human mammary carcinoma cells. With known motif scanning and de novo motif detection, we identified that FOXA1 and GATA3 motifs were commonly enriched around ERα binding sites. Moreover, numerous microarray studies have documented the co-expression of ERα, FOXA1, and GATA3 in primary breast tumors (Badve et al, 2007; Wilson and Giguere, 2008). This evidence suggests that these three transcription factors (TFs) may cluster on DNA binding sites and contribute to the breast cancer phenotype. However, there is little understanding as to the nature of their coordinated interaction at the genome level or the biological consequences of their detailed interaction.
We mapped the genome-wide binding profiles of ERα, FOXA1, and GATA3 using the massive parallel chromatin immunoprecipitation-sequencing (ChIP-seq) approach. We observed that ERα, FOXA1, and GATA3 colocalized in a coordinated manner where ∼30% of all ERα binding sites were overlapped with FOXA1 and GATA3 bindings upon estrogen (E2) stimulation. Moreover, we found that the ERα+FOXA1+GATA3 conjoint sites were associated with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Such results indicate that these three TFs form a functional enhanceosome and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. And such enhanceosome binding sites appear to regulate the genes driving core ERα function.
To further validate that ERα+FOXA1+GATA3 co-binding represents an optimal configuration for E2-mediated transcriptional activation, we have performed luciferase reporter assays on GREB1 locus that actively engages ERα enhanceosome sites in gene regulation (Figure 5C). The presence of ERα induced the GREB1 luciferase activity to ∼246% (as compared with the control construct). The individual presence of FOXA1 and GATA3 or combination of both only produced subtle changes to the GREB1 luciferase activity. The combination of ERα+FOXA1 and ERα+GATA3 has increased the luciferase activity to ∼330%. Interestingly, the assemblage of ERα+FOXA1+GATA3 provided the optimal ER responsiveness to 370%. This suggests that ERα provides the fundamental gene regulatory module but that FOXA1 and GATA3 incrementally improve ERα-regulated transcriptional induction.
It is known that ERα is a ligand-activated TF that mediates the proliferative effects of E2 in breast cancer cells. Garcia et al (1992) showed inhibited growth in MDA-MB-231 cells with forced expression of ERα upon E2 treatment. The rationale for these different outcomes has remained elusive. We posited that these higher order regulatory mechanisms of ERα function such as the formation and composition of enhanceosomes may explain the establishment of transcriptional regulatory cassettes favoring either growth enhancement or growth repression.
To test this hypothesis, we stably transfected the MDA-MB-231 cells with individual ERα, FOXA1, GATA3, or in combinations (Figure 6A). We observed inhibited growth in cells with enforced expression of ERα or FOXA1. There was unaltered growth in cells with expression of GATA3. Co-expression of ERα+FOXA1 or ERα+GATA3 exhibited inhibition of cell proliferation as compared with control cells. However, the co-expression of ERα together with FOXA1 and GATA3 resulted in marked induction of cell proliferation under E2 stimulation. We have recapitulated this cellular reprogramming in another ERα-negative breast cancer cell line, BT-549 and observed similar E2-responsive growth induction in the ERα+FOXA1+GATA3-expressing BT-549 cells. This suggests that only with the full activation of conjoint binding sites by the three TFs will the proliferative phenotype associated with ligand induced ERα be manifest.
To assess the nature of this transcriptional reprogramming, we asked the question if the reprogrammed MDA-MB-231 cells display any similarity in the expression profile of the ERα-positive breast cancer cell line, MCF-7 (Figure 6C). We combined the E2-regulated genes from these differently transfected MDA-MB-231 cells, and compared their expressions in these MDA-MB-231-transfected cells and MCF-7 cells. Strikingly, we found that the expression profiles of ERα+FOXA1+GATA3-expressing MDA-MB-231 cells display a good correlation (R=0.42) with the E2-induced expression profile of MCF-7. We did not observe such correlation between the expression profiles of MDA-MB-231 transfected with ERα only (R=−0.21). Furthermore, we observed that there is marginal induced expression of luminal marker genes and reduced expression of basal genes in the ERα+FOXA1+GATA3-expressing MDA-MB-231 as compared with the vector control cells. This suggests that the enhanceosome component is competent to partially reprogramme the basal cells to resemble the luminal cells.
Taken together, we have uncovered the genomics impact as well as the functional importance of an enhanceosome comprising ERα, FOXA1, and GATA3 in the estrogen responsiveness of ERα-positive breast cancer cells. This enhanceosome exerts significant combinatorial control of the transcriptional network regulating growth and proliferation of ERα-positive breast cancer cells. Most importantly, we show that the transfection of the enhanceosome component was necessary to reprogramme the ERα-negative cells to restore the estrogen-responsive growth and to transcriptionally induce a basal to luminal transition.
Despite the role of the estrogen receptor α (ERα) pathway as a key growth driver for breast cells, the phenotypic consequence of exogenous introduction of ERα into ERα-negative cells paradoxically has been growth inhibition. We mapped the binding profiles of ERα and its interacting transcription factors (TFs), FOXA1 and GATA3 in MCF-7 breast carcinoma cells, and observed that these three TFs form a functional enhanceosome that regulates the genes driving core ERα function and cooperatively modulate the transcriptional networks previously ascribed to ERα alone. We demonstrate that these enhanceosome occupied sites are associated with optimal enhancer characteristics with highest p300 co-activator recruitment, RNA Pol II occupancy, and chromatin opening. Most importantly, we show that the transfection of all three TFs was necessary to reprogramme the ERα-negative MDA-MB-231 and BT-549 cells to restore the estrogen-responsive growth resembling estrogen-treated ERα-positive MCF-7 cells. Cumulatively, these results suggest that all the enhanceosome components comprising ERα, FOXA1, and GATA3 are necessary for the full repertoire of cancer-associated effects of the ERα.
PMCID: PMC3202798  PMID: 21878914
enhanceosome; estrogen receptor α; FOXA1; GATA3; synthetic phenotypes
15.  A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval 
PLoS Computational Biology  2008;4(2):e1000010.
Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.
Author Summary
Regulation of gene expression plays a central role in the activity of living cells and in their response to internal (e.g., cell division) or external (e.g., stress) stimuli. Key players in determining gene-specific regulation are transcription factors that bind sequence-specific sites on the DNA, modulating the expression of nearby genes. To understand the regulatory program of the cell, we need to identify these transcription factors, when they act, and on which genes. Transcription regulatory maps can be assembled by computational analysis of experimental data, by discovering the DNA recognition sequences (motifs) of transcription factors and their occurrences along the genome. Such an analysis usually results in a large number of overlapping motifs. To reconstruct regulatory maps, it is crucial to combine similar motifs and to relate them to transcription factors. To this end we developed an accurate fully-automated method, termed BLiC, based upon an improved similarity measure for comparing DNA motifs. By applying it to genome-wide data in yeast, we identified the DNA motifs of transcription factors and their putative target genes. Finally, we analyze motifs of transcription factor that alter their target genes under different conditions, and show how cells adjust their regulatory program in response to environmental changes.
PMCID: PMC2265534  PMID: 18463706
16.  The cis-regulatory map of Shewanella genomes 
Nucleic Acids Research  2008;36(16):5376-5390.
While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems.
PMCID: PMC2532739  PMID: 18701645
17.  Coordinated Cell Type–Specific Epigenetic Remodeling in Prefrontal Cortex Begins before Birth and Continues into Early Adulthood 
PLoS Genetics  2013;9(4):e1003433.
Development of prefrontal and other higher-order association cortices is associated with widespread changes in the cortical transcriptome, particularly during the transitions from prenatal to postnatal development, and from early infancy to later stages of childhood and early adulthood. However, the timing and longitudinal trajectories of neuronal gene expression programs during these periods remain unclear in part because of confounding effects of concomitantly occurring shifts in neuron-to-glia ratios. Here, we used cell type–specific chromatin sorting techniques for genome-wide profiling of a histone mark associated with transcriptional regulation—H3 with trimethylated lysine 4 (H3K4me3)—in neuronal chromatin from 31 subjects from the late gestational period to 80 years of age. H3K4me3 landscapes of prefrontal neurons were developmentally regulated at 1,157 loci, including 768 loci that were proximal to transcription start sites. Multiple algorithms consistently revealed that the overwhelming majority and perhaps all of developmentally regulated H3K4me3 peaks were on a unidirectional trajectory defined by either rapid gain or loss of histone methylation during the late prenatal period and the first year after birth, followed by similar changes but with progressively slower kinetics during early and later childhood and only minimal changes later in life. Developmentally downregulated H3K4me3 peaks in prefrontal neurons were enriched for Paired box (Pax) and multiple Signal Transducer and Activator of Transcription (STAT) motifs, which are known to promote glial differentiation. In contrast, H3K4me3 peaks subject to a progressive increase in maturing prefrontal neurons were enriched for activating protein-1 (AP-1) recognition elements that are commonly associated with activity-dependent regulation of neuronal gene expression. We uncovered a developmental program governing the remodeling of neuronal histone methylation landscapes in the prefrontal cortex from the late prenatal period to early adolescence, which is linked to cis-regulatory sequences around transcription start sites.
Author Summary
Prolonged maturation of the human cerebral cortex, which extends into the third decade of life, is critical for proper development of executive functions such as higher-order problem-solving and complex cognition. Little is known about changes of post-mitotic neurons during this prolonged maturation period, including changes in epigenetic regulation, and more broadly, in genome organization and function. Such knowledge is critical for a deeper understanding of human development, cognitive abilities, and psychiatric diseases. Here, we identify 1,157 genomic loci in neuronal cells from the prefrontal cortex that show developmental changes in a chromatin mark, histone H3 trimethylated at lysine 4 (H3K4me3), which has been associated with regulation of gene expression. Interestingly, the overwhelming majority of these developmentally regulated H3K4me3 peaks were defined by rapid gain or loss of histone methylation during the late prenatal period and the first year after birth, followed by slower changes during early and later childhood and minimal changes thereafter. The genomic sequences showing these dynamic changes in H3K4me3 were enriched with distinct transcription factor motifs. Our findings suggest that there is highly regulated, pre-programmed remodeling of neuronal histone methylation landscapes in the human brain that begins before birth and continues into adolescence.
PMCID: PMC3623761  PMID: 23593028
18.  Ab initio identification of putative human transcription factor binding sites by comparative genomics 
BMC Bioinformatics  2005;6:110.
Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome.
By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation.
In this way we find various known motifs and also some new candidate binding sites.
We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.
PMCID: PMC1097714  PMID: 15865625
19.  Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control 
Combining translating ribosome affinity purification with RNA-seq for cell-specific profiling of translating RNAs in developing flowers.Cell type comparisons of cell type-specific hormone responses, promoter motifs, coexpressed cognate binding factor candidates, and splicing isoforms.Widespread post-transcriptional regulation at both the intron splicing and translational stages.A new class of noncoding RNAs associated with polysomes.
What constitutes a differentiated cell type? How much do cell types differ in their transcription of genes? The development and functions of tissues rely on constant interactions among distinct and nonequivalent cell types. Answering these questions will require quantitative information on transcriptomes, proteomes, protein–protein interactions, protein–nucleic acid interactions, and metabolomes at cellular resolution. The systems approaches emerging in biology promise to explain properties of biological systems based on genome-wide measurements of expression, interaction, regulation, and metabolism. To facilitate a systems approach, it is essential first to capture such components in a global manner, ideally at cellular resolution.
Recently, microarray analysis of transcriptomes has been extended to a cellular level of resolution by using laser microdissection or fluorescence-activated sorting (for review, see Nelson et al, 2008). These methods have been limited by stresses associated with cellular separation and isolation procedures, and biases associated with mandatory RNA amplification steps. A newly developed method, translating ribosome affinity purification (TRAP; Zanetti et al, 2005; Heiman et al, 2008; Mustroph et al, 2009), circumvents these problems by epitopetagging a ribosomal protein in specific cellular domains to selectively purify polysomes. We combined TRAP with deep sequencing, which we term TRAP-seq, to provide cell-level spatiotemporal maps for Arabidopsis early floral development at single-base resolution.
Flower development in Arabidopsis has been studied extensively and is one of the best understood aspects of plant development (for review, see Krizek and Fletcher, 2005). Genetic analysis of homeotic mutants established the ABC model, in which three classes of regulatory genes, A, B and C, work in a combinatorial manner to confer organ identities of four whorls (Coen and Meyerowitz, 1991). Each class of regulatory gene is expressed in a specific and evolutionarily conserved domain, and the action of the class A, B and C genes is necessary for specification of organ identity (Figure 1A).
Using TRAP-seq, we purified cell-specific translating mRNA populations, which we and others call the translatome, from the A, B and C domains of early developing flowers, in which floral patterning and the specification of floral organs is established. To achieve temporal specificity, we used a floral induction system to facilitate collection of early stage flowers (Wellmer et al, 2006). The combination of TRAP-seq with domain-specific promoters and this floral induction system enabled fine spatiotemporal isolation of translating mRNA in specific cellular domains, and at specific developmental stages.
Multiple lines of evidence confirmed the specificity of this approach, including detecting the expression in expected domains but not in other domains for well-studied flower marker genes and known physiological functions (Figures 1B–D and 2A–C). Furthermore, we provide numerous examples from flower development in which a spatiotemporal map of rigorously comparable cell-specific translatomes makes possible new views of the properties of cell domains not evident in data obtained from whole organs or tissues, including patterns of transcription and cis-regulation, new physiological differences among cell domains and between flower stages, putative hormone-active centers, and splicing events specific for flower domains (Figure 2A–D). Such findings may provide new targets for reverse genetics studies and may aid in the formulation and validation of interaction and pathway networks.
Beside cellular heterogeneity, the transcriptome is regulated at several steps through the life of mRNA molecules, which are not directly available through traditional transcriptome profiling of total mRNA abundance. By comparing the translatome and transcriptome, we integratively profiled two key posttranscriptional control points, intron splicing and translation state. From our translatome-wide profiling, we (i) confirmed that both posttranscriptional regulation control points were used by a large portion of the transcriptome; (ii) identified a number of cis-acting features within the coding or noncoding sequences that correlate with splicing or translation state; and (iii) revealed correlation between each regulation mechanism and gene function. Our transcriptome-wide surveys have highlighted target genes transcripts of which are probably under extensive posttranscriptional regulation during flower development.
Finally, we reported the finding of a large number of polysome-associated ncRNAs. About one-third of all annotated ncRNA in the Arabidopsis genome were observed co-purified with polysomes. Coding capacity analysis confirmed that most of them are real ncRNA without conserved ORFs. The group of polysome-associated ncRNA reported in this study is a potential new addition to the expanding riboregulator catalog; they could have roles in translational regulation during early flower development.
Determining both the expression levels of mRNA and the regulation of its translation is important in understanding specialized cell functions. In this study, we describe both the expression profiles of cells within spatiotemporal domains of the Arabidopsis thaliana flower and the post-transcriptional regulation of these mRNAs, at nucleotide resolution. We express a tagged ribosomal protein under the promoters of three master regulators of flower development. By precipitating tagged polysomes, we isolated cell type-specific mRNAs that are probably translating, and quantified those mRNAs through deep sequencing. Cell type comparisons identified known cell-specific transcripts and uncovered many new ones, from which we inferred cell type-specific hormone responses, promoter motifs and coexpressed cognate binding factor candidates, and splicing isoforms. By comparing translating mRNAs with steady-state overall transcripts, we found evidence for widespread post-transcriptional regulation at both the intron splicing and translational stages. Sequence analyses identified structural features associated with each step. Finally, we identified a new class of noncoding RNAs associated with polysomes. Findings from our profiling lead to new hypotheses in the understanding of flower development.
PMCID: PMC2990639  PMID: 20924354
Arabidopsis; flower; intron; transcriptome; translation
20.  Dissecting the retinoid-induced differentiation of F9 embryonal stem cells by integrative genomics 
We reveal how the RXRα−RARγ heterodimer upon activation by ATRA sets up a sequence of temporally controlled events that generate different subsets of primary and secondarily induced gene networks.We established RARγ and RXRα chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq) together with the corresponding microarray transcriptomics at five time points during differentiation using pan-RAR and RAR isotype-selective ligands.Gene-regulatory decisions were inferred in silico from the dynamic changes of the transcriptomics patterns that correlated with the expression of RXRα−RARγ and other annotated transcription factors (TFs).Our analysis provides a temporal view of retinoic acid (RA) signalling during F9 cell differentiation, reveals RA receptor (RAR) heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
Nuclear receptors are ligand-inducible transcription factors, which upon induction by their cognate ligand induce complex temporally controlled physiological programs. Retinoic acid (RA) and its receptors are key regulators of multiple physiological processes, including embryogenesis, organogenesis, immune functions, reproduction and organ homeostasis. While insight into (some of) the physiological functions of the various RA receptor (RAR) and retinoid X receptor (RXR) subtypes has been obtained by exploiting mouse genetics (for a review, see Mark et al, 2006) we are far from an understanding of the molecular circuitries and gene networks that are at the basis of these physiological events.
RAs act by interacting with a complex receptor system that comprises heterodimers formed by one of the three RXR (RARα, β and γ) and RAR (RARα, β and γ) isotypes. While insight into the role of heterodimerization on response element preference and contribution of RAR and RXR to transcription activation of model genes has been obtained (for review, see Gronemeyer et al, 2004) very little is known about the role and dynamics of target gene interaction of the various RXR–RAR heterodimers at a global scale in the context of a biological program.
More fundamentally, in order to develop a systems biology of nuclear receptors we need to establish approaches that reveal how the initial event, the information embedded in the chemical structure of a small molecular weight compound, is propagated through binding to cognate receptor(s), recruitment of co-regulatory factors, epigenetic modulators and additional complexes/machineries to establish temporally controlled gene programs. In this respect, a recent study has revealed the impact of epigenetic modulator crosstalk in the setting up of subprograms for oestrogen receptor signalling (Ceschin et al, 2011).
In the present study, we have used mouse F9 EC cells, a homogeneous cell system which is known to differentiate upon RA exposure and require RARγ for this response (Taneja et al, 1996), in order to integrate at a genome-wide scale (i) the dynamics of RXRα and RARγ binding by chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq), (ii) the correlated temporal regulation of gene programs by global transcriptomics analyses, including (iii) the response to isotype-selective RAR ligands (Box 1). Our study revealed an unexpected highly dynamic association of the RXRα–RARγ with target chromatin and an unexpected dynamics of the heterodimer composition itself, which is indicative of partner swapping.
Inspired by early works on the dynamics of Drosophila puffing patterns during ecdysone-induced metamorphosis (Ashburner et al, 1974) our working hypothesis was that diversification of gene programming is achieved by the sequential activation of separable gene cohorts that constitute the various facets of differentiation, such as altered proliferation, cell physiology, signalling and finally terminal apoptogenic differentiation. To identify these temporally activated subroutines within the overall program, we inferred gene-regulatory decisions in silico from dynamically altered global gene expression patterns that occurred due to the action of RXRα−RARγ and other annotated TFs (Ernst et al, 2007). This dynamic regulatory map was used to reconstruct RXRα–RARγ signalling networks by integration of functional co-citation. Altogether we present a genome-wide view of the temporal gene-regulatory events and the corresponding gene programs elicited by the RXRα–RARγ during F9 cell differentiation. Our study deciphers some of the mechanisms by which the chemical information encoded in RA is diversified to regulate different cohorts of genes.
Retinoic acid (RA) triggers physiological processes by activating heterodimeric transcription factors (TFs) comprising retinoic acid receptor (RARα, β, γ) and retinoid X receptor (RXRα, β, γ). How a single signal induces highly complex temporally controlled networks that ultimately orchestrate physiological processes is unclear. Using an RA-inducible differentiation model, we defined the temporal changes in the genome-wide binding patterns of RARγ and RXRα and correlated them with transcription regulation. Unexpectedly, both receptors displayed a highly dynamic binding, with different RXRα heterodimers targeting identical loci. Comparison of RARγ and RXRα co-binding at RA-regulated genes identified putative RXRα–RARγ target genes that were validated with subtype-selective agonists. Gene-regulatory decisions during differentiation were inferred from TF-target gene information and temporal gene expression. This analysis revealed six distinct co-expression paths of which RXRα–RARγ is associated with transcription activation, while Sox2 and Egr1 were predicted to regulate repression. Finally, RXRα–RARγ regulatory networks were reconstructed through integration of functional co-citations. Our analysis provides a dynamic view of RA signalling during cell differentiation, reveals RAR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
This study provides a dynamic view of retinoic acid signalling during cell differentiation, reveals RAR/RXR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
PMCID: PMC3261707  PMID: 21988834
ChIP-seq; retinoic acid-induced differentiation; RXR–RAR heterodimers; temporal control of gene networks; transcriptomics
21.  Robust Target Gene Discovery through Transcriptome Perturbations and Genome-Wide Enhancer Predictions in Drosophila Uncovers a Regulatory Basis for Sensory Specification 
PLoS Biology  2010;8(7):e1000435.
CisTarget X is a novel computational method that accurately predicts Atonal governed regulatory networks in the retina of the fruit fly.
A comprehensive systems-level understanding of developmental programs requires the mapping of the underlying gene regulatory networks. While significant progress has been made in mapping a few such networks, almost all gene regulatory networks underlying cell-fate specification remain unknown and their discovery is significantly hampered by the paucity of generalized, in vivo validated tools of target gene and functional enhancer discovery. We combined genetic transcriptome perturbations and comprehensive computational analyses to identify a large cohort of target genes of the proneural and tumor suppressor factor Atonal, which specifies the switch from undifferentiated pluripotent cells to R8 photoreceptor neurons during larval development. Extensive in vivo validations of the predicted targets for the proneural factor Atonal demonstrate a 50% success rate of bona fide targets. Furthermore we show that these enhancers are functionally conserved by cloning orthologous enhancers from Drosophila ananassae and D. virilis in D. melanogaster. Finally, to investigate cis-regulatory cross-talk between Ato and other retinal differentiation transcription factors (TFs), we performed motif analyses and independent target predictions for Eyeless, Senseless, Suppressor of Hairless, Rough, and Glass. Our analyses show that cisTargetX identifies the correct motif from a set of coexpressed genes and accurately predicts target genes of individual TFs. The validated set of novel Ato targets exhibit functional enrichment of signaling molecules and a subset is predicted to be coregulated by other TFs within the retinal gene regulatory network.
Author Summary
Tens of thousands of regulatory elements determine the spatiotemporal expression pattern of protein-coding genes in the metazoan genome. Each regulatory element, when bound by the appropriate transcription factors, can affect the temporal transcription of a nearby target gene in a particular cell type. Annotating the genome for regulatory elements, as well as determining the input transcription factors for each element, is a key challenge in genome biology. In this study, we introduce a computational method, cisTargetX, that predicts transcription factor binding motifs and their target genes through the integration of gene expression data and comparative genomics. We first validate this method in silico using public gene expression data and, then, apply cisTargetX to the developmental program governing photoreceptor neuron specification in the retina of Drosophila melanogaster. Particularly, we perturbed predicted key transcription factors during the initial steps of neurogenesis; measure gene expression by microarrays; identify motifs and predict target genes; validate the predictions in vivo using transgenic animals; and study several functional and evolutionary aspects of the validated regulatory elements for the proneural factor Atonal. Overall, we show that cisTargetX efficiently predicts genetic regulatory interactions and provides mechanistic insight into gene regulatory networks of postembryonic developmental systems.
PMCID: PMC2910651  PMID: 20668662
22.  BRNI: Modular analysis of transcriptional regulatory programs 
BMC Bioinformatics  2009;10:155.
Transcriptional responses often consist of regulatory modules – sets of genes with a shared expression pattern that are controlled by the same regulatory mechanisms. Previous methods allow dissecting regulatory modules from genomics data, such as expression profiles, protein-DNA binding, and promoter sequences. In cases where physical protein-DNA data are lacking, such methods are essential for the analysis of the underlying regulatory program.
Here, we present a novel approach for the analysis of modular regulatory programs. Our method – Biochemical Regulatory Network Inference (BRNI) – is based on an algorithm that learns from expression data a biochemically-motivated regulatory program. It describes the expression profiles of gene modules consisting of hundreds of genes using a small number of regulators and affinity parameters. We developed an ensemble learning algorithm that ensures the robustness of the learned model. We then use the topology of the learned regulatory program to guide the discovery of a library of cis-regulatory motifs, and determined the motif compositions associated with each module.
We test our method on the cell cycle regulatory program of the fission yeast. We discovered 16 coherent modules, covering diverse processes from cell division to metabolism and associated them with 18 learned regulatory elements, including both known cell-cycle regulatory elements (MCB, Ace2, PCB, ACCCT box) and novel ones, some of which are associated with G2 modules. We integrate the regulatory relations from the expression- and motif-based models into a single network, highlighting specific topologies that result in distinct dynamics of gene expression in the fission yeast cell cycle.
Our approach provides a biologically-driven, principled way for deconstructing a set of genes into meaningful transcriptional modules and identifying their associated cis-regulatory programs. Our analysis sheds light on the architecture and function of the regulatory network controlling the fission yeast cell cycle, and a similar approach can be applied to the regulatory underpinnings of other modular transcriptional responses.
PMCID: PMC2694189  PMID: 19457258
23.  Single-cell and coupled GRN models of cell patterning in the Arabidopsis thaliana root stem cell niche 
BMC Systems Biology  2010;4:134.
Recent experimental work has uncovered some of the genetic components required to maintain the Arabidopsis thaliana root stem cell niche (SCN) and its structure. Two main pathways are involved. One pathway depends on the genes SHORTROOT and SCARECROW and the other depends on the PLETHORA genes, which have been proposed to constitute the auxin readouts. Recent evidence suggests that a regulatory circuit, composed of WOX5 and CLE40, also contributes to the SCN maintenance. Yet, we still do not understand how the niche is dynamically maintained and patterned or if the uncovered molecular components are sufficient to recover the observed gene expression configurations that characterize the cell types within the root SCN. Mathematical and computational tools have proven useful in understanding the dynamics of cell differentiation. Hence, to further explore root SCN patterning, we integrated available experimental data into dynamic Gene Regulatory Network (GRN) models and addressed if these are sufficient to attain observed gene expression configurations in the root SCN in a robust and autonomous manner.
We found that an SCN GRN model based only on experimental data did not reproduce the configurations observed within the root SCN. We developed several alternative GRN models that recover these expected stable gene configurations. Such models incorporate a few additional components and interactions in addition to those that have been uncovered. The recovered configurations are stable to perturbations, and the models are able to recover the observed gene expression profiles of almost all the mutants described so far. However, the robustness of the postulated GRNs is not as high as that of other previously studied networks.
These models are the first published approximations for a dynamic mechanism of the A. thaliana root SCN cellular pattering. Our model is useful to formally show that the data now available are not sufficient to fully reproduce root SCN organization and genetic profiles. We then highlight some experimental holes that remain to be studied and postulate some novel gene interactions. Finally, we suggest the existence of a generic dynamical motif that can be involved in both plant and animal SCN maintenance.
PMCID: PMC2972269  PMID: 20920363
24.  Dynamic modeling of cis-regulatory circuits and gene expression prediction via cross-gene identification 
BMC Bioinformatics  2005;6:258.
Gene expression programs depend on recognition of cis elements in promoter region of target genes by transcription factors (TFs), but how TFs regulate gene expression via recognition of cis elements is still not clear. To study this issue, we define the cis-regulatory circuit of a gene as a system that consists of its cis elements and the interactions among their recognizing TFs and develop a dynamic model to study the functional architecture and dynamics of the circuit. This is in contrast to traditional approaches where a cis-regulatory circuit is constructed by a mutagenesis or motif-deletion scheme. We estimate the regulatory functions of cis-regulatory circuits using microarray data.
A novel cross-gene identification scheme is proposed to infer how multiple TFs coordinate to regulate gene transcription in the yeast cell cycle and to uncover hidden regulatory functions of a cis-regulatory circuit. Some advantages of this approach over most current methods are that it is based on data obtained from intact cis-regulatory circuits and that a dynamic model can quantitatively characterize the regulatory function of each TF and the interactions among the TFs. Our method may also be applicable to other genes if their expression profiles have been examined for a sufficiently long time.
In this study, we have developed a dynamic model to reconstruct cis-regulatory circuits and a cross-gene identification scheme to estimate the regulatory functions of the TFs that control the regulation of the genes under study. We have applied this method to cell cycle genes because the available expression profiles for these genes are long enough. Our method not only can quantify the regulatory strengths and synergy of the TFs but also can predict the expression profile of any gene having a subset of the cis elements studied.
PMCID: PMC1283971  PMID: 16232312
25.  Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural Networks 
PLoS Computational Biology  2006;2(12):e169.
A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN) models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array) with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico “mutation” to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that “network-local discrimination” occurs when regulatory connections (here between MBF and target genes) are explicitly disfavored in one network module (G2), relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of MBF sites in G1 class genes.
A current challenge is to develop computational approaches to infer gene network regulatory relationships by integrating multiple types of large-scale functional genomic data. This paper shows that simple artificial neural networks (ANNs) employed in a new way do this very well. The ANN models are well-suited to capitalize on natural properties of gene networks in ways that many previous methods do not. Resulting gene network connections inferred between transcription factors and RNA output patterns are robust to noise in large-scale input datasets and to differences in RNA clustering class inputs. This was shown by using the yeast cell cycle gene network as a test case. The cycle has multiple classes of oscillatory RNAs, and Hart, Mjolsness, and Wold show that the ANNs identify key connections that associate genes from each cell cycle phase group with known and candidate regulators. Comparative analysis of network connectivity across multiple genomes showed strong conservation of basic factor-to-output relationships, although at the greatest evolutionary distances the specific target genes have mainly changed identity.
PMCID: PMC1761652  PMID: 17194216

