Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation.
Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
Gene expression programs depend on recognition of cis elements in promoter region of target genes by transcription factors (TFs), but how TFs regulate gene expression via recognition of cis elements is still not clear. To study this issue, we define the cis-regulatory circuit of a gene as a system that consists of its cis elements and the interactions among their recognizing TFs and develop a dynamic model to study the functional architecture and dynamics of the circuit. This is in contrast to traditional approaches where a cis-regulatory circuit is constructed by a mutagenesis or motif-deletion scheme. We estimate the regulatory functions of cis-regulatory circuits using microarray data.
A novel cross-gene identification scheme is proposed to infer how multiple TFs coordinate to regulate gene transcription in the yeast cell cycle and to uncover hidden regulatory functions of a cis-regulatory circuit. Some advantages of this approach over most current methods are that it is based on data obtained from intact cis-regulatory circuits and that a dynamic model can quantitatively characterize the regulatory function of each TF and the interactions among the TFs. Our method may also be applicable to other genes if their expression profiles have been examined for a sufficiently long time.
In this study, we have developed a dynamic model to reconstruct cis-regulatory circuits and a cross-gene identification scheme to estimate the regulatory functions of the TFs that control the regulation of the genes under study. We have applied this method to cell cycle genes because the available expression profiles for these genes are long enough. Our method not only can quantify the regulatory strengths and synergy of the TFs but also can predict the expression profile of any gene having a subset of the cis elements studied.
A major goal of system biology is the characterization of transcription factors and microRNAs (miRNAs) and the transcriptional programs they regulate. We present Allegro, a method for de-novo discovery of cis-regulatory transcriptional programs through joint analysis of genome-wide expression data and promoter or 3′ UTR sequences. The algorithm uses a novel log-likelihood-based, non-parametric model to describe the expression pattern shared by a group of co-regulated genes. We show that Allegro is more accurate and sensitive than existing techniques, and can simultaneously analyze multiple expression datasets with more than 100 conditions. We apply Allegro on datasets from several species and report on the transcriptional modules it uncovers. Our analysis reveals a novel motif over-represented in the promoters of genes highly expressed in murine oocytes, and several new motifs related to fly development. Finally, using stem-cell expression profiles, we identify three miRNA families with pivotal roles in human embryogenesis.
Biological systems are integrated networks constantly responding to internal and external stimulators. Understanding the intrinsic response to an imbalanced system provides the opportunity to develop therapeutic approaches to reinstate the natural balanced state. Increasing evidence suggests that members of the nuclear receptor superfamily integrate both inflammatory and metabolic signals to maintain homeostasis in immune cells such as macrophages and lymphocytes. PPAR and LXR are nuclear receptors activated by fatty acid and cholesterol derivatives respectively that control the expression of an array of genes involved in lipid metabolism and inflammation. Recent studies have uncovered distinct mechanisms for transcriptional regulation of metabolic and inflammatory target genes by PPAR and LXR and have expanded the biology of these receptors to include roles in alternative macrophage activation and adaptive immunity.
Investigating the complex systems dynamics of the aging process requires integration of a broad range of cellular processes describing damage and functional decline co-existing with adaptive and protective regulatory mechanisms. We evolve an integrated generic cell network to represent the connectivity of key cellular mechanisms structured into positive and negative feedback loop motifs centrally important for aging. The conceptual network is casted into a fuzzy-logic, hybrid-intelligent framework based on interaction rules assembled from a priori knowledge. Based upon a classical homeostatic representation of cellular energy metabolism, we first demonstrate how positive-feedback loops accelerate damage and decline consistent with a vicious cycle. This model is iteratively extended towards an adaptive response model by incorporating protective negative-feedback loop circuits. Time-lapse simulations of the adaptive response model uncover how transcriptional and translational changes, mediated by stress sensors NF-κB and mTOR, counteract accumulating damage and dysfunction by modulating mitochondrial respiration, metabolic fluxes, biosynthesis, and autophagy, crucial for cellular survival. The model allows consideration of lifespan optimization scenarios with respect to fitness criteria using a sensitivity analysis. Our work establishes a novel extendable and scalable computational approach capable to connect tractable molecular mechanisms with cellular network dynamics underlying the emerging aging phenotype.
The global process of aging disturbs a broad range of cellular mechanisms in a complex fashion and is not well understood. One important goal of computational approaches in aging is to develop integrated models in terms of a unifying aging theory, predicting progression of aging phenotypes grounded on molecular mechanisms. However, current experimental data incoherently reflects many isolated processes from a large diversity of approaches, biological model systems, and species, which makes such integration a challenging task. In an attempt to close this gap, we iteratively develop a fuzzy-logic cell systems model considering the interplay of damage, metabolism, and signaling by positive and negative feedback-loop motifs using relationships drawn from literature data. Because cellular biodynamics may be considered a complex control system, this approach seems particularly suitable. Here, we demonstrate that rule-based fuzzy-logic models provide semi-quantitative predictions that enhance our understanding of complex and interlocked molecular mechanisms and their implications on the aging physiome.
Deciphering the non-coding regulatory genome has proved a formidable challenge. Despite the wealth of available gene expression data, there currently exists no broadly applicable method for characterizing the regulatory elements that shape the rich underlying dynamics. We present a general framework for detecting such regulatory DNA and RNA motifs that relies on directly assessing the mutual information between sequence and gene expression measurements. Our approach makes minimal assumptions about the background sequence model and the mechanisms by which elements affect gene expression. This provides a versatile motif discovery framework, across all data types and genomes, with exceptional sensitivity and near-zero false-positive rates. Applications from yeast to human uncover putative and established transcription-factor binding and miRNA target sites, revealing rich diversity in their spatial configurations, pervasive co-occurrences of DNA and RNA motifs, context-dependent selection for motif avoidance, and the strong impact of post-transcriptional processes on eukaryotic transcriptomes.
cis-regulatory element discovery; transcription factor binding sites; miRNA regulation; computational genomics; transcriptional regulation; post-transcriptional regulation; information-theory; mutual information; motif-discovery
Transcriptional regulation is an important part of regulatory control in eukaryotes. Even if binding motifs for transcription factors are known, the task of finding binding sites by scanning sequences is plagued by false positives. One way to improve the detection of binding sites from motifs is by taking cooperativity of transcription factor binding into account. We propose a non-parametric probabilistic model, similar to a document topic model, for detecting transcriptional programs, groups of cooperative transcription factors and co-regulated genes. The analysis results in transcriptional programs which generalise both transcriptional modules and TF-target gene incidence matrices and provide a higher-level summary of these structures. The method is independent of prior specification of training sets of genes, for example, via gene expression data. The analysis is based on known binding motifs.
We applied our method to putative regulatory regions of 18,445 Mus musculus genes. We discovered just 68 transcriptional programs that effectively summarised the action of 149 transcription factors on these genes. Several of these programs were significantly enriched for known biological processes and signalling pathways. One transcriptional program has a significant overlap with a reference set of cell cycle specific transcription factors.
Our method is able to pick out higher order structure from noisy sequence analyses. The transcriptional programs it identifies potentially represent common mechanisms of regulatory control across the genome. It simultaneously predicts which genes are co-regulated and which sets of transcription factors cooperate to achieve this co-regulation. The programs we discovered enable biologists to choose new genes and transcription factors to study in specific transcriptional regulatory systems.
Current experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks.
We developed and integrated a set of computational methods of differential gene expression analysis, gene clustering, gene network inference, gene function prediction, and DNA motif identification to automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses.
The bioinformatics methods and genome-wide data sources for gene expression, clustering, regulation, and function analysis were integrated seamlessly into one modular protocol to systematically analyze and infer modules and networks from only differential expression genes in soybean cells grown under stress conditions. Our approach appears to effectively reduce the complexity of the problem, and is sufficiently robust and accurate to generate a rather complete and detailed view of putative soybean gene transcription logic potentially underlying the responses to the various environmental challenges. The same automated method can also be applied to reconstruct differentially co-expressed gene modules and their regulatory networks from gene expression data of any other transcriptome.
Gene co-expression module; Gene regulatory network; Transcription factor; Microarray; Soybean
Signal transduction systems coordinate complex cellular information to regulate biological events such as cell proliferation and differentiation. Although the accumulating evidence on widespread association of signaling molecules has revealed essential contribution of phosphorylation-dependent interaction networks to cellular regulation, their dynamic behavior is mostly yet to be analyzed. Recent technological advances regarding mass spectrometry-based quantitative proteomics have enabled us to describe the comprehensive status of phosphorylated molecules in a time-resolved manner. Computational analyses based on the phosphoproteome dynamics accelerate generation of novel methodologies for mathematical analysis of cellular signaling. Phosphoproteomics-based numerical modeling can be used to evaluate regulatory network elements from a statistical point of view. Integration with transcriptome dynamics also uncovers regulatory hubs at the transcriptional level. These omics-based computational methodologies, which have firstly been applied to representative signaling systems such as the epidermal growth factor receptor pathway, have now opened up a gate for systems analysis of signaling networks involved in immune response and cancer.
signal transduction; phosphoproteomics; quantitative proteomics; computational modeling; systems biology
Reliable inference of transcription regulatory networks is still a challenging task in the field of computational biology. Network component analysis (NCA) has become a powerful scheme to uncover the networks behind complex biological processes, especially when gene expression data is integrated with binding motif information. However, the performance of NCA is impaired by the high rate of false connections in binding motif information and the high level of noise in gene expression data. Moreover, in real applications such as cancer research, the performance of NCA in simultaneously analyzing multiple candidate transcription factors (TFs) is further limited by the small sample number of gene expression data. In this paper, we propose a novel scheme, stability-based NCA, to overcome the above-mentioned problems by addressing the inconsistency between gene expression data and motif binding information (i.e., prior network knowledge). This method introduces small perturbations on prior network knowledge and utilizes the variation of estimated TF activities to reflect the stability of TF activities. Such a scheme is less limited by the sample size and especially capable to identify condition-specific TFs and their target genes. Experiment results on both simulation data and real breast cancer data demonstrate the efficiency and robustness of the proposed method.
transcription regulatory network; network component analysis; stability analysis; transcription factor activity; target genes identification
Motivation: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Results: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by ∼50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01.
Availability: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Summary:CompleteMOTIFs (cMOTIFs) is an integrated web tool developed to facilitate systematic discovery of overrepresented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. Comprehensive annotations and Boolean logic operations on multiple peak locations enable users to focus on genomic regions of interest for de novo motif discovery using tools such as MEME, Weeder and ChIPMunk. The pipeline incorporates a scanning tool for known motifs from TRANSFAC and JASPAR databases, and performs an enrichment test using local or precalculated background models that significantly improve the motif scanning result. Furthermore, using the cMOTIFs pipeline, we demonstrated that multiple transcription factors could cooperatively bind to the upstream of important stem cell differentiation regulators.
Supplementary information: Supplementary data are available at Bioinformatics online.
Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome.
By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation.
In this way we find various known motifs and also some new candidate binding sites.
We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.
Recent evidence suggests that dynamic three-dimensional genomic interactions in the nucleus exert critical roles in regulated gene expression. Here, we review a series of recent paradigm-shifting experiments that highlight the existence of specific gene networks within the self-organizing space of the nucleus. These gene networks, evidenced by long-range intra- and inter-chromosomal interactions, can be considered as the cause or consequence of regulatory biological programs. Changes in nuclear architecture are a hallmark of laminopathies and likely potentiate genome rearrangements critical for tumor progression, in addition to potential vital contribution of non-coding RNAs and DNA repeats. It is virtually certain that we will witness an ever-increasing rate of discoveries that uncover new roles of nuclear architecture in transcription, DNA damage/repair, aging and disease.
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Circulating lipid concentrations are important predictors of coronary artery disease. The main pathology of coronary artery disease is atherosclerosis, a cycle of lipid adherence to the walls of arteries and an inflammatory response resulting in more adhesion. To investigate the link between lipids and immune cells in circulation, we have generated both genomic and whole blood gene expression profiles for a population-based collection of individuals from the capital region of Finland. Key mediators of inflammation and allergy were shown to be correlated with lipid levels. Further, the expressions of these genes operated in such a highly coordinated fashion that they appeared to function as part of a single pathway, which itself was both highly correlated with and reactive to lipid levels. Our findings offer insight into how lipids activate circulating immune cells, potentially contributing to the pathogenesis of coronary artery disease.
Correlation of motif occurrences with gene expression intensity is an effective strategy for elucidating transcriptional cis-regulatory logic. Here we demonstrate that this approach can also identify cis-regulatory elements for alternative pre-mRNA splicing. Using data from a human exon microarray, we identified 56 cassette exons that exhibited higher transcript-normalized expression in muscle than in other normal adult tissues. Intron sequences flanking these exons were then analyzed to identify candidate regulatory motifs for muscle-specific alternative splicing. Correlation of motif parameters with gene-normalized exon expression levels was examined using linear regression and linear splines on RNA words and degenerate weight matrices, respectively. Our unbiased analysis uncovered multiple candidate regulatory motifs for muscle-specific splicing, many of which are phylogenetically conserved among vertebrate genomes. The most prominent downstream motifs were binding sites for Fox1- and CELF-related splicing factors, and a branchpoint-like element acuaac; pyrimidine-rich elements resembling PTB-binding sites were most significant in upstream introns. Intriguingly, our systematic study indicates a paucity of novel muscle-specific elements that are dominant in short proximal intronic regions. We propose that Fox and CELF proteins play major roles in enforcing the muscle-specific alternative splicing program, facilitating expression of unique isoforms of cytoskeletal proteins critical to muscle cell function.
Correct interactions between transcription factors (TFs) and their binding sites (TFBSs) are of central importance to gene regulation. Recently developed chromatin-immunoprecipitation DNA chip (ChIP-chip) techniques and the phylogenetic footprinting method provide ways to identify TFBSs with high precision. In this study, we constructed a user-friendly interactive platform for dynamic binding site mapping using ChIP-chip data and phylogenetic footprinting as two filters. MYBS (Mining Yeast Binding Sites) is a comprehensive web server that integrates an array of both experimentally verified and predicted position weight matrixes (PWMs) from eleven databases, including 481 binding motif consensus sequences and 71 PWMs that correspond to 183 TFs. MYBS users can search within this platform for motif occurrences (possible binding sites) in the promoters of genes of interest via simple motif or gene queries in conjunction with the above two filters. In addition, MYBS enables users to visualize in parallel the potential regulators for a given set of genes, a feature useful for finding potential regulatory associations between TFs. MYBS also allows users to identify target gene sets of each TF pair, which could be used as a starting point for further explorations of TF combinatorial regulation. MYBS is available at http://cg1.iis.sinica.edu.tw/~mybs/.
The transcriptional control circuitry in eukaryotic cells is complex and is orchestrated by combinatorially acting transcription factors. Forkhead transcription factors often function in concert with heterotypic transcription factors to specify distinct transcriptional programs. Here, we demonstrate that FOXK2 participates in combinatorial transcriptional control with the AP-1 transcription factor. FOXK2 binding regions are widespread throughout the genome and are often coassociated with AP-1 binding motifs. FOXK2 acts to promote AP-1-dependent gene expression changes in response to activation of the AP-1 pathway. In this context, FOXK2 is required for the efficient recruitment of AP-1 to chromatin. Thus, we have uncovered an important new molecular mechanism that controls AP-1-dependent gene expression.
Macrophages are dynamic cells integrating signals from their microenvironment to develop specific functional responses. Although, microarray-based transcriptional profiling has established transcriptional reprogramming as an important mechanism for signal integration and cell function of macrophages, current knowledge on transcriptional regulation of human macrophages is far from complete. To discover novel marker genes, an area of great need particularly in human macrophage biology but also to generate a much more thorough transcriptome of human M1- and M1-like macrophages, we performed RNA sequencing (RNA-seq) of human macrophages. Using this approach we can now provide a high-resolution transcriptome profile of human macrophages under classical (M1-like) and alternative (M2-like) polarization conditions and demonstrate a dynamic range exceeding observations obtained by previous technologies, resulting in a more comprehensive understanding of the transcriptome of human macrophages. Using this approach, we identify important gene clusters so far not appreciated by standard microarray techniques. In addition, we were able to detect differential promoter usage, alternative transcription start sites, and different coding sequences for 57 gene loci in human macrophages. Moreover, this approach led to the identification of novel M1-associated (CD120b, TLR2, SLAMF7) as well as M2-associated (CD1a, CD1b, CD93, CD226) cell surface markers. Taken together, these data support that high-resolution transcriptome profiling of human macrophages by RNA-seq leads to a better understanding of macrophage function and will form the basis for a better characterization of macrophages in human health and disease.
Reliable identification of cis regulatory elements influencing transcription remains a challenging problem in molecular
bioinformatics. This is especially true for enhancer elements which are often located hundreds of kilobases from the gene promoter.
High resolution DNase hypersensitivity and connectivity profiling by the ENCODE consortium provides evidence of millions of
interacting cis-acting elements in the human genome. This prior knowledge can be incorporated into genome-wide expression
analyses, in the form of gene sets sharing regulatory sequence motifs in known DNase hypersensitivity peak regions. High
proportions of enrichment among the most extreme differentially transcribed genes from controlled biological experiments may
suggest novel hypotheses about signalling pathways. The utility of this approach is demonstrated with the reanalysis of a
microarray-derived gene expression data set through the Gene Set Enrichment Analysis pipeline, uncovering new putative distal
cis elements in the context of innate immunity. The DNase Hypersensitivity Connectivity informed Motif Enrichment in Gene
Expression (DHC-MEGE) method described here has the advantage of identifying distal elements such as enhancers, which are
often overlooked with standard promoter motif analysis.
The DHC-MEGE shell script can be obtained from Sourceforge https://sourceforge.net/projects/dhcmege/ and the
generated GMT file is attached as supplementary data.
DNAse hypersensitivity; motif enrichment; gene expression; enhancer; gene set enrichment analysis
Recent experimental work has uncovered some of the genetic components required to maintain the Arabidopsis thaliana root stem cell niche (SCN) and its structure. Two main pathways are involved. One pathway depends on the genes SHORTROOT and SCARECROW and the other depends on the PLETHORA genes, which have been proposed to constitute the auxin readouts. Recent evidence suggests that a regulatory circuit, composed of WOX5 and CLE40, also contributes to the SCN maintenance. Yet, we still do not understand how the niche is dynamically maintained and patterned or if the uncovered molecular components are sufficient to recover the observed gene expression configurations that characterize the cell types within the root SCN. Mathematical and computational tools have proven useful in understanding the dynamics of cell differentiation. Hence, to further explore root SCN patterning, we integrated available experimental data into dynamic Gene Regulatory Network (GRN) models and addressed if these are sufficient to attain observed gene expression configurations in the root SCN in a robust and autonomous manner.
We found that an SCN GRN model based only on experimental data did not reproduce the configurations observed within the root SCN. We developed several alternative GRN models that recover these expected stable gene configurations. Such models incorporate a few additional components and interactions in addition to those that have been uncovered. The recovered configurations are stable to perturbations, and the models are able to recover the observed gene expression profiles of almost all the mutants described so far. However, the robustness of the postulated GRNs is not as high as that of other previously studied networks.
These models are the first published approximations for a dynamic mechanism of the A. thaliana root SCN cellular pattering. Our model is useful to formally show that the data now available are not sufficient to fully reproduce root SCN organization and genetic profiles. We then highlight some experimental holes that remain to be studied and postulate some novel gene interactions. Finally, we suggest the existence of a generic dynamical motif that can be involved in both plant and animal SCN maintenance.
Transcription factors can either activate or repress target genes by binding onto short nucleotide sequence motifs in the promoter regions of these genes. Here, we present POBO, a promoter bootstrapping program, for gene expression data. POBO can be used to detect, compare and verify predetermined transcription factor binding site motifs in the promoters of one or two clusters of co-regulated genes. The program calculates the frequencies of the motif in the input promoter sets. A bootstrap analysis detects significantly over- or underrepresented motifs. The output of the program presents bootstrapped results in picture and text formats. The program was tested with published data from transgenic WRKY70 microarray experiments. Intriguingly, motifs recognized by the WRKY transcription factors of plant defense pathways are similarly enriched in both up- and downregulated clusters. POBO analysis suggests slightly modified hypothetical motifs that discriminate between up- and downregulated clusters. In conclusion, POBO allows easy, fast and accurate verification of putative regulatory motifs. The statistical tests implemented in POBO can be useful in eliminating false positives from the results of pattern discovery programs and increasing the reliability of true positives. POBO is freely available from http://ekhidna.biocenter.helsinki.fi:9801/pobo.
To date, only a limited number of transcriptional regulatory interactions have been uncovered. In a pilot study integrating sequence data with microarray data, a position weight matrix (PWM) performed poorly in inferring transcriptional interactions (TIs), which represent physical interactions between transcription factors (TF) and upstream sequences of target genes. Inferring a TI means that the promoter sequence of a target is inferred to match the consensus sequence motifs of a potential TF, and their interaction type such as AT or RT is also predicted. Thus, a robust PWM (rPWM) was developed to search for consensus sequence motifs. In addition to rPWM, one feature extracted from ChIP-chip data was incorporated to identify potential TIs under specific conditions. An interaction type classifier was assembled to predict activation/repression of potential TIs using microarray data. This approach, combining an adaptive (learning) fuzzy inference system and an interaction type classifier to predict transcriptional regulatory networks, was named AdaFuzzy.
AdaFuzzy was applied to predict TIs using real genomics data from Saccharomyces cerevisiae. Following one of the latest advances in predicting TIs, constrained probabilistic sparse matrix factorization (cPSMF), and using 19 transcription factors (TFs), we compared AdaFuzzy to four well-known approaches using over-representation analysis and gene set enrichment analysis. AdaFuzzy outperformed these four algorithms. Furthermore, AdaFuzzy was shown to perform comparably to 'ChIP-experimental method' in inferring TIs identified by two sets of large scale ChIP-chip data, respectively. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The results coincided with known promoter architectures in yeast and provided insights into transcriptional regulatory mechanisms.
AdaFuzzy successfully integrates multiple types of data (sequence, ChIP, and microarray) to predict transcriptional regulatory networks. The validated success in the prediction results implies that AdaFuzzy can be applied to uncover TIs in yeast.
Transcription factor proteins control the temporal and spatial expression of genes by binding specific regulatory elements, or motifs, in DNA. Mapping a transcription factor to its motif is an important step towards defining the structure of transcriptional regulatory networks and understanding their dynamics. The information to map a transcription factor to its DNA binding specificity is in principle contained in the protein sequence. Nevertheless, methods that map directly from protein sequence to target DNA sequence have been lacking, and generation of regulatory maps has required experimental data. Here we describe a purely computational method for predicting transcription factor binding. The method calculates the free energy of binding between a transcription factor and possible target DNA sequences using thermodynamic integration. Approximations of additivity (each DNA basepair contributes independently to the binding energy) and linear response (the DNA-protein and DNA-solvent couplings are linear in an effective reaction coordinate representing the basepair character at a specific position) make the computations feasible and can be verified by more detailed simulations. Results obtained for MAT-α2, a yeast homeodomain transcription factor, are in good agreement with known results. This method promises to provide a general, computationally feasible route from a genome sequence to a gene regulatory network.
computational biology; bioinformatics; gene regulation; transcription factor; homeodomain
When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database.
We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets.
In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli.
Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.