During the development of the central nervous system (CNS), combinations of transcription factors and signalling molecules orchestrate patterning, specification and differentiation of neural cell types. In vertebrates, three types of melanin-containing pigment cells, exert a variety of functional roles including visual perception. Here we analysed the mechanisms underlying pigment cell specification within the CNS of a simple chordate, the ascidian Ciona intestinalis. Ciona tadpole larvae exhibit a basic chordate body plan characterized by a small number of neural cells. We employed lineage-specific transcription profiling to characterize the expression of genes downstream of fibroblast growth factor signalling, which govern pigment cell formation. We demonstrate that FGF signalling sequentially imposes a pigment cell identity at the expense of anterior neural fates. We identify FGF-dependent and pigment cell-specific factors, including the small GTPase, Rab32/38 and demonstrated its requirement for the pigmentation of larval sensory organs.
The fibroblast growth factor (FGF) signalling pathway specifies the fate of pigmented cells in the ascidian Ciona intestinalis. Here, the authors obtain lineage-specific transcription profiles of pigment precursor cells and identify FGF downstream genes involved in central nervous system patterning, and the specification and differentiation of pigmented cells.
Cilia are microtubule-based organelles protruding from almost all mammalian cells which, when dysfunctional, result in genetic disorders called “ciliopathies”. High-throughput studies have revealed that cilia are composed of thousands of proteins. However, despite many efforts, much remains to be determined regarding the biological functions of this increasingly important complex organelle.
We have derived an online tool, from a systematic network-based approach to dissect the cilia/centrosome complex interactome (CCCI). The tool integrates all current available data into a model which provides an “interaction” perspective on ciliary function. We generated a network of interactions between human proteins organized into functionally relevant “communities”, which can be defined as groups of genes that are both highly inter-connected and strongly co-expressed. We then combined sequence and co-expression data in order to identify the transcription factors responsible for regulating genes within their respective communities. Our analyses have discovered communities significantly specialized for delegating specific biological functions such as mRNA processing, protein translation, folding and degradation processes that had never been associated with ciliary proteins until now.
CCCI will allow us to clarify the roles of previously unknown ciliary functions, elucidate the molecular mechanisms underlying ciliary-associated phenotypes, and apply our knowledge of the functional roles of relatively uncharacterized molecular entities to disease phenotypes and new clinical applications.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-658) contains supplementary material, which is available to authorized users.
Cilia; Ciliopathies; Centrosome; Interactome
We describe an innovative experimental and computational approach to control the expression of a protein in a population of yeast cells. We designed a simple control algorithm to automatically regulate the administration of inducer molecules to the cells by comparing the actual protein expression level in the cell population with the desired expression level. We then built an automated platform based on a microfluidic device, a time-lapse microscopy apparatus, and a set of motorized syringes, all controlled by a computer. We tested the platform to force yeast cells to express a desired fixed, or time-varying, amount of a reporter protein over thousands of minutes. The computer automatically switched the type of sugar administered to the cells, its concentration and its duration, according to the control algorithm. Our approach can be used to control expression of any protein, fused to a fluorescent reporter, provided that an external molecule known to (indirectly) affect its promoter activity is available.
A crucial feature of biological systems is their ability to maintain homeostasis in spite of ever-changing conditions. In engineering, this ability can be embedded in devices ranging from the thermostat to the autopilot of a modern plane using control systems which operate via a negative feedback mechanism: the quantity to be controlled is measured then subtracted from the desired reference value, and the resulting error is used to compute the control action to be implemented on the physical system (e.g. switching on or off the heating, changing the position of the rudder). Here, we developed and applied a method to regulate the expression level of a protein, in a growing population of cells over several generations, in a completely automatic fashion. We designed and implemented an integrated platform comprising a microfluidic device, a time-lapse microscopy apparatus, and a set of motorized syringes, all controlled by a computer. We tested the platform to force yeast cells to express a desired time-varying amount of a gene in yeast. Our method can be applied to control a protein of interest in vivo allowing to probe the function of biological systems in unprecedented ways.
Mendelian disorders are mostly caused by single mutations in the DNA sequence of a gene, leading to a phenotype with pathologic consequences. Whole Exome Sequencing of patients can be a cost-effective alternative to standard genetic screenings to find causative mutations of genetic diseases, especially when the number of cases is limited. Analyzing exome sequencing data requires specific expertise, high computational resources and a reference variant database to identify pathogenic variants.
We developed a database of variations collected from patients with Mendelian disorders, which is automatically populated thanks to an associated exome-sequencing pipeline. The pipeline is able to automatically identify, annotate and store insertions, deletions and mutations in the database. The resource is freely available online http://exome.tigem.it. The exome sequencing pipeline automates the analysis workflow (quality control and read trimming, mapping on reference genome, post-alignment processing, variation calling and annotation) using state-of-the-art software tools. The exome-sequencing pipeline has been designed to run on a computing cluster in order to analyse several samples simultaneously. The detected variants are annotated by the pipeline not only with the standard variant annotations (e.g. allele frequency in the general population, the predicted effect on gene product activity, etc.) but, more importantly, with allele frequencies across samples progressively collected in the database itself, stratified by Mendelian disorder.
We aim at providing a resource for the genetic disease community to automatically analyse whole exome-sequencing samples with a standard and uniform analysis pipeline, thus collecting variant allele frequencies by disorder. This resource may become a valuable tool to help dissecting the genotype underlying the disease phenotype through an improved selection of putative patient-specific causative or phenotype-associated variations.
Inborn errors of metabolism (IEM) are genetic diseases caused by mutations in enzymes or transporters affecting specific metabolic reactions that cause a block in the physiological metabolic fluxes. Therapeutic treatment can be achieved either by decreasing the metabolic flux upstream of the block or by increasing the flux downstream of the block. The identification of upstream and downstream fluxes however is not trivial, since metabolic reactions are intertwined in a complex network. To overcome this problem, we propose an innovative computational workflow to model the alteration of metabolism caused by IEM and predict the metabolites and reactions that are affected by the mutation. Our workflow exploits a recent genome-scale metabolic network model of hepatocyte metabolism to identify metabolites accumulating in hepatocytes due to single gene mutations in IEM via an innovative “differential flux analysis.” We simulated 38 IEMs in the liver, and in about half of the cases, our workflow correctly identified the metabolites known to accumulate in the blood and urine of IEM patients.
differential flux analysis; flux balance analysis; hepatocyte metabolism; inborn errors of metabolism; mathematical modeling
The lysosomal-autophagic pathway is activated by starvation and plays an important role in both cellular clearance and lipid catabolism. However, the transcriptional regulation of this pathway in response to metabolic cues is currently uncharacterized. Here we show that the transcription factor EB (TFEB), a master regulator of lysosomal biogenesis and autophagy, is induced by starvation through an autoregulatory feedback loop and exerts a global transcriptional control on lipid catabolism via PGC1α and PPARα. Thus, during starvation a transcriptional mechanism links the autophagic pathway to cellular energy metabolism. The conservation of this mechanism in Caenorhabditis elegans suggests a fundamental role for TFEB in the evolution of the adaptive response to food deprivation. Viral delivery of TFEB to the liver prevented weight gain and metabolic syndrome in both diet-induced and genetic mouse models of obesity, suggesting a novel therapeutic strategy for disorders of lipid metabolism.
miRNAs are small non-coding RNAs able to modulate target-gene expression. It has been postulated that miRNAs confer robustness to biological processes, but a clear experimental evidence is still missing. Using a synthetic biology approach, we demonstrate that microRNAs provide phenotypic robustness to transcriptional regulatory networks by buffering fluctuations in protein levels. Here we construct a network motif in mammalian cells exhibiting a “toggle - switch” phenotype in which two alternative protein expression levels define its ON and OFF states. The motif consists of an inducible transcription factor that self-regulates its own transcription and that of a miRNA against the transcription factor itself. We confirm, using mathematical modeling and experimental approaches, that the microRNA confers robustness to the toggle-switch by enabling the cell to maintain and transmit its state. When absent, a dramatic increase in protein noise level occurs, causing the cell to randomly switch between the two states.
Motivation: Identification of differential expressed genes has led to countless new discoveries. However, differentially expressed genes are only a proxy for finding dysregulated pathways. The problem is to identify how the network of regulatory and physical interactions rewires in different conditions or in disease.
Results: We developed a procedure named DINA (DIfferential Network Analysis), which is able to identify set of genes, whose co-regulation is condition-specific, starting from a collection of condition-specific gene expression profiles. DINA is also able to predict which transcription factors (TFs) may be responsible for the pathway condition-specific co-regulation. We derived 30 tissue-specific gene networks in human and identified several metabolic pathways as the most differentially regulated across the tissues. We correctly identified TFs such as Nuclear Receptors as their main regulators and demonstrated that a gene with unknown function (YEATS2) acts as a negative regulator of hepatocyte metabolism. Finally, we showed that DINA can be used to make hypotheses on dysregulated pathways during disease progression. By analyzing gene expression profiles across primary and transformed hepatocytes, DINA identified hepatocarcinoma-specific metabolic and transcriptional pathway dysregulation.
Availability: We implemented an on-line web-tool http://dina.tigem.it enabling the user to apply DINA to identify tissue-specific pathways or gene signatures.
Supplementary data are available at Bioinformatics online.
Stem-cell functions require activation of stem-cell-intrinsic transcriptional programs and extracellular interaction with a niche microenvironment. How the transcriptional machinery controls residency of stem cells in the niche is unknown. Here we show that Id proteins coordinate stem-cell activities with anchorage of neural stem cells (NSCs) to the niche. Conditional inactivation of three Id genes in NSCs triggered detachment of embryonic and postnatal NSCs from the ventricular and vascular niche, respectively. The interrogation of the gene modules directly targeted by Id deletion in NSCs revealed that Id proteins repress bHLH-mediated activation of Rap1GAP, thus serving to maintain the GTPase activity of RAP1, a key mediator of cell adhesion. Preventing the elevation of the Rap1GAP level countered the consequences of Id loss on NSC–niche interaction and stem-cell identity. Thus, by preserving anchorage of NSCs to the extracellular environment, Id activity synchronizes NSC functions to residency in the specialized niche.
The connection between chromatin nuclear organization and gene activity is vividly illustrated by the observation that transcriptional coregulation of certain genes appears to be directly influenced by their spatial proximity. This fact poses the more general question of whether it is at all feasible that the numerous genes that are coregulated on a given chromosome, especially those at large genomic distances, might become proximate inside the nucleus. This problem is studied here using steered molecular dynamics simulations in order to enforce the colocalization of thousands of knowledge-based gene sequences on a model for the gene-rich human chromosome 19. Remarkably, it is found that most () gene pairs can be brought simultaneously into contact. This is made possible by the low degree of intra-chromosome entanglement and the large number of cliques in the gene coregulatory network. A clique is a set of genes coregulated all together as a group. The constrained conformations for the model chromosome 19 are further shown to be organized in spatial macrodomains that are similar to those inferred from recent HiC measurements. The findings indicate that gene coregulation and colocalization are largely compatible and that this relationship can be exploited to draft the overall spatial organization of the chromosome in vivo. The more general validity and implications of these findings could be investigated by applying to other eukaryotic chromosomes the general and transferable computational strategy introduced here.
Recent high-throughput experiments have shown that chromosome regions (loci) which accommodate specific sets of coregulated genes can be in close spatial proximity despite their possibly large sequence separation. The findings pose the question of whether gene coregulation and gene colocalization are related in general. Here, we tackle this problem using a knowledge-based coarse-grained model of human chromosome 19. Specifically, we carry out steered molecular dynamics simulations to promote the colocalization of hundreds of gene pairs that are known to be significantly coregulated. We show that most () of such pairs can be simultaneously colocalized. This result is, in turn, shown to depend on at least two distinctive chromosomal features: the remarkably low degree of intra-chain entanglement found in chromosomes inside the nucleus and the large number of cliques present in the gene coregulatory network. The results are therefore largely consistent with the coregulation-colocalization hypothesis. Furthermore, the model chromosome conformations obtained by applying the coregulation constraints are found to display spatial macrodomains that have significant similarities with those inferred from HiC measurements of human chromosome 19. This finding suggests that suitable extensions of the present approach might be used to propose viable ensembles of eukaryotic chromosome conformations in vivo.
Gene expression profiles can be used to infer previously unknown transcriptional regulatory interaction among thousands of genes, via systems biology ‘reverse engineering’ approaches. We ‘reverse engineered’ an embryonic stem (ES)-specific transcriptional network from 171 gene expression profiles, measured in ES cells, to identify master regulators of gene expression (‘hubs’). We discovered that E130012A19Rik (E13), highly expressed in mouse ES cells as compared with differentiated cells, was a central ‘hub’ of the network. We demonstrated that E13 is a protein-coding gene implicated in regulating the commitment towards the different neuronal subtypes and glia cells. The overexpression and knock-down of E13 in ES cell lines, undergoing differentiation into neurons and glia cells, caused a strong up-regulation of the glutamatergic neurons marker Vglut2 and a strong down-regulation of the GABAergic neurons marker GAD65 and of the radial glia marker Blbp. We confirmed E13 expression in the cerebral cortex of adult mice and during development. By immuno-based affinity purification, we characterized protein partners of E13, involved in the Polycomb complex. Our results suggest a role of E13 in regulating the division between glutamatergic projection neurons and GABAergic interneurons and glia cells possibly by epigenetic-mediated transcriptional regulation.
We collected a massive and heterogeneous dataset of 20 255 gene expression profiles (GEPs) from a variety of human samples and experimental conditions, as well as 8895 GEPs from mouse samples. We developed a mutual information (MI) reverse-engineering approach to quantify the extent to which the mRNA levels of two genes are related to each other across the dataset. The resulting networks consist of 4 817 629 connections among 20 255 transcripts in human and 14 461 095 connections among 45 101 transcripts in mouse, with a inter-species conservation of 12%. The inferred connections were compared against known interactions to assess their biological significance. We experimentally validated a subset of not previously described protein–protein interactions. We discovered co-expressed modules within the networks, consisting of genes strongly connected to each other, which carry out specific biological functions, and tend to be in physical proximity at the chromatin level in the nucleus. We show that the network can be used to predict the biological function and subcellular localization of a protein, and to elucidate the function of a disease gene. We experimentally verified that granulin precursor (GRN) gene, whose mutations cause frontotemporal lobar degeneration, is involved in lysosome function. We have developed an online tool to explore the human and mouse gene networks.
Understanding the relationship between topology and dynamics of transcriptional regulatory networks in mammalian cells is essential to elucidate the biology of complex regulatory and signaling pathways. Here, we characterised, via a synthetic biology approach, a transcriptional positive feedback loop (PFL) by generating a clonal population of mammalian cells (CHO) carrying a stable integration of the construct. The PFL network consists of the Tetracycline-controlled transactivator (tTA), whose expression is regulated by a tTA responsive promoter (CMV-TET), thus giving rise to a positive feedback. The same CMV-TET promoter drives also the expression of a destabilised yellow fluorescent protein (d2EYFP), thus the dynamic behaviour can be followed by time-lapse microscopy. The PFL network was compared to an engineered version of the network lacking the positive feedback loop (NOPFL), by expressing the tTA mRNA from a constitutive promoter. Doxycycline was used to repress tTA activation (switch off), and the resulting changes in fluorescence intensity for both the PFL and NOPFL networks were followed for up to 43 h. We observed a striking difference in the dynamics of the PFL and NOPFL networks. Using non-linear dynamical models, able to recapitulate experimental observations, we demonstrated a link between network topology and network dynamics. Namely, transcriptional positive autoregulation can significantly slow down the “switch off” times, as comparared to the nonautoregulatated system. Doxycycline concentration can modulate the response times of the PFL, whereas the NOPFL always switches off with the same dynamics. Moreover, the PFL can exhibit bistability for a range of Doxycycline concentrations. Since the PFL motif is often found in naturally occurring transcriptional and signaling pathways, we believe our work can be instrumental to characterise their behaviour.
Synthetic Biology aims at designing and building new biological functions in living organisms. At the same time, Synthetic Biology approaches can be used to uncover the design principles of natural biological systems through the rational construction of simplified regulatory networks. Mathematical models of the networks are then derived from physical considerations and can be used to explain the observed dynamical behaviours. We have characterised a regulatory motif often found in transcriptional and signalling pathways. We constructed a positive feedback loop motif in mammalian cells, consisting of a protein controlling its own expression. We have shown that this motif exhibits a dynamic behaviour which is very different from that obtained when the autoregulation is removed. This difference is intrinsic to the specific wiring diagram chosen by the cell to control its behaviour (feedback versus non-feedback configurations), and can be instrumental in understanding the complex network of regulation occurring in a cell.
RNA interference (RNAi) is a regulatory cellular process that controls post-transcriptional gene silencing. During RNAi double-stranded RNA (dsRNA) induces sequence-specific degradation of homologous mRNA via the generation of smaller dsRNA oligomers of length between 21-23nt (siRNAs). siRNAs are then loaded onto the RNA-Induced Silencing multiprotein Complex (RISC), which uses the siRNA antisense strand to specifically recognize mRNA species which exhibit a complementary sequence. Once the siRNA loaded-RISC binds the target mRNA, the mRNA is cleaved and degraded, and the siRNA loaded-RISC can degrade additional mRNA molecules. Despite the widespread use of siRNAs for gene silencing, and the importance of dosage for its efficiency and to avoid off target effects, none of the numerous mathematical models proposed in literature was validated to quantitatively capture the effects of RNAi on the target mRNA degradation for different concentrations of siRNAs. Here, we address this pressing open problem performing in vitro experiments of RNAi in mammalian cells and testing and comparing different mathematical models fitting experimental data to in-silico generated data. We performed in vitro experiments in human and hamster cell lines constitutively expressing respectively EGFP protein or tTA protein, measuring both mRNA levels, by quantitative Real-Time PCR, and protein levels, by FACS analysis, for a large range of concentrations of siRNA oligomers.
We tested and validated four different mathematical models of RNA interference by quantitatively fitting models' parameters to best capture the in vitro experimental data. We show that a simple Hill kinetic model is the most efficient way to model RNA interference. Our experimental and modeling findings clearly show that the RNAi-mediated degradation of mRNA is subject to saturation effects.
Our model has a simple mathematical form, amenable to analytical investigations and a small set of parameters with an intuitive physical meaning, that makes it a unique and reliable mathematical tool. The findings here presented will be a useful instrument for better understanding RNAi biology and as modelling tool in Systems and Synthetic Biology.
Dysferlin (DYSF) is a type II transmembrane protein implicated in surface membrane repair of muscle. Mutations in dysferlin lead to Limb Girdle Muscular Dystrophy 2B (LGMD2B), Miyoshi Myopathy (MM), and Distal Myopathy with Anterior Tibialis onset (DMAT). The DYSF protein complex is not well understood, and only a few protein-binding partners have been identified thus far. To increase the set of interacting protein partners for DYSF we recovered a list of predicted interacting protein through a systems biology approach. The predictions are part of a “reverse-engineered” genome-wide human gene regulatory network obtained from experimental data by computational analysis. The reverse-engineering algorithm behind the analysis relates genes to each other based on changes in their expression patterns. DYSF and AHNAK were used to query the system and extract lists of potential interacting proteins. Among the 32 predictions the two genes share, we validated the physical interaction between DYSF protein with moesin (MSN) and polymerase I and transcript release factor (PTRF) in mouse heart lysate, thus identifying two novel Dysferlin-interacting proteins. Our strategy could be useful to clarify Dysferlin function in intracellular vesicles and its implication in muscle membrane resealing.
Caveolae; Genetic Diseases; Microarray; Muscular Dystrophy; Protein-Protein Interactions
Dosage imbalance is responsible for several genetic diseases, among which Down syndrome is caused by the trisomy of human chromosome 21.
To elucidate the extent to which the dosage imbalance of specific human chromosome 21 genes perturb distinct molecular pathways, we developed the first mouse embryonic stem (ES) cell bank of human chromosome 21 genes. The human chromosome 21-mouse ES cell bank includes, in triplicate clones, 32 human chromosome 21 genes, which can be overexpressed in an inducible manner. Each clone was transcriptionally profiled in inducing versus non-inducing conditions. Analysis of the transcriptional response yielded results that were consistent with the perturbed gene's known function. Comparison between mouse ES cells containing the whole human chromosome 21 (trisomic mouse ES cells) and mouse ES cells overexpressing single human chromosome 21 genes allowed us to evaluate the contribution of single genes to the trisomic mouse ES cell transcriptome. In addition, for the clones overexpressing the Runx1 gene, we compared the transcriptome changes with the corresponding protein changes by mass spectroscopy analysis.
We determined that only a subset of genes produces a strong transcriptional response when overexpressed in mouse ES cells and that this effect can be predicted taking into account the basal gene expression level and the protein secondary structure. We showed that the human chromosome 21-mouse ES cell bank is an important resource, which may be instrumental towards a better understanding of Down syndrome and other human aneuploidy disorders.
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes - as is the case in biological networks - due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Systems and Synthetic Biology use computational models of biological pathways in order to study in silico the behaviour of biological pathways. Mathematical models allow to verify biological hypotheses and to predict new possible dynamical behaviours. Here we use the tools of non-linear analysis to understand how to change the dynamics of the genes composing a novel synthetic network recently constructed in the yeast Saccharomyces cerevisiae for In-vivo Reverse-engineering and Modelling Assessment (IRMA). Guided by previous theoretical results that make the dynamics of a biological network depend on its topological properties, through the use of simulation and continuation techniques, we found that the network can be easily turned into a robust and tunable synthetic oscillator or a bistable switch. Our results provide guidelines to properly re-engineering in vivo the network in order to tune its dynamics.
MicroRNAs are small highly conserved non-coding RNAs which play an important role in regulating gene expression by binding the 3'UTR of target mRNAs. The majority of microRNAs are localized within other transcriptional units (host genes) and are co-expressed with them, which strongly suggests that microRNAs and corresponding host genes use the same promoter and other expression control elements. The remaining fraction of microRNAs is intergenic and is endowed with an independent regulatory region. A number of databases have already been developed to collect information about microRNAs but none of them allow an easy exploration of microRNA genomic organization across evolution.
CoGemiR is a publicly available microRNA-centered database whose aim is to offer an overview of the genomic organization of microRNAs and of its extent of conservation during evolution in different metazoan species. The database collects information on genomic location, conservation and expression data of both known and newly predicted microRNAs and displays the data by privileging a comparative point of view. The database also includes a microRNA prediction pipeline to annotate microRNAs in recently sequenced genomes. This information is easily accessible via web through a user-friendly query page. The CoGemiR database is available at
The knowledge of the genomic organization of microRNAs can provide useful information to understand their biology. In order to have a comparative genomics overview of microRNAs genomic organization, we developed CoGemiR. To achieve this goal, we both collected and integrated data from pre-existing databases and generated new ones, such as the identification in several species of a number of previously unannotated microRNAs. For a more effective use of this data, we developed a user-friendly web interface that simply shows how a microRNA genomic context is related in different species.
Correction to: Molecular Systems Biology 3:78. doi:10.1038/msb4100120; Published online 13 February 2007
Inferring, or ‘reverse-engineering', gene networks can be defined as the process of identifying gene interactions from experimental data through computational analysis. Gene expression data from microarrays are typically used for this purpose. Here we compared different reverse-engineering algorithms for which ready-to-use software was available and that had been tested on experimental data sets. We show that reverse-engineering algorithms are indeed able to correctly infer regulatory interactions among genes, at least when one performs perturbation experiments complying with the algorithm requirements. These algorithms are superior to classic clustering algorithms for the purpose of finding regulatory interactions among genes, and, although further improvements are needed, have reached a discreet performance for being practically useful.
gene network; reverse-engineering; gene expression; transcriptional regulation; gene regulation
Control of gene expression is essential to the establishment and maintenance of all cell types, and its dysregulation is involved in pathogenesis of several diseases. Accurate computational predictions of transcription factor regulation may thus help in understanding complex diseases, including mental disorders in which dysregulation of neural gene expression is thought to play a key role. However, biological mechanisms underlying the regulation of gene expression are not completely understood, and predictions via bioinformatics tools are typically poorly specific.
We developed a bioinformatics workflow for the prediction of transcription factor binding sites from several independent datasets. We show the advantages of integrating information based on evolutionary conservation and gene expression, when tackling the problem of binding site prediction. Consistent results were obtained on a large simulated dataset consisting of 13050 in silico promoter sequences, on a set of 161 human gene promoters for which binding sites are known, and on a smaller set of promoters of Myc target genes.
Our computational framework for binding site prediction can integrate multiple sources of data, and its performance was tested on different datasets. Our results show that integrating information from multiple data sources, such as genomic sequence of genes' promoters, conservation over multiple species, and gene expression data, indeed improves the accuracy of computational predictions.