Inborn errors of metabolism (IEM) are genetic diseases caused by mutations in enzymes or transporters affecting specific metabolic reactions that cause a block in the physiological metabolic fluxes. Therapeutic treatment can be achieved either by decreasing the metabolic flux upstream of the block or by increasing the flux downstream of the block. The identification of upstream and downstream fluxes however is not trivial, since metabolic reactions are intertwined in a complex network. To overcome this problem, we propose an innovative computational workflow to model the alteration of metabolism caused by IEM and predict the metabolites and reactions that are affected by the mutation. Our workflow exploits a recent genome-scale metabolic network model of hepatocyte metabolism to identify metabolites accumulating in hepatocytes due to single gene mutations in IEM via an innovative “differential flux analysis.” We simulated 38 IEMs in the liver, and in about half of the cases, our workflow correctly identified the metabolites known to accumulate in the blood and urine of IEM patients.
differential flux analysis; flux balance analysis; hepatocyte metabolism; inborn errors of metabolism; mathematical modeling
The lysosomal-autophagic pathway is activated by starvation and plays an important role in both cellular clearance and lipid catabolism. However, the transcriptional regulation of this pathway in response to metabolic cues is currently uncharacterized. Here we show that the transcription factor EB (TFEB), a master regulator of lysosomal biogenesis and autophagy, is induced by starvation through an autoregulatory feedback loop and exerts a global transcriptional control on lipid catabolism via PGC1α and PPARα. Thus, during starvation a transcriptional mechanism links the autophagic pathway to cellular energy metabolism. The conservation of this mechanism in Caenorhabditis elegans suggests a fundamental role for TFEB in the evolution of the adaptive response to food deprivation. Viral delivery of TFEB to the liver prevented weight gain and metabolic syndrome in both diet-induced and genetic mouse models of obesity, suggesting a novel therapeutic strategy for disorders of lipid metabolism.
miRNAs are small non-coding RNAs able to modulate target-gene expression. It has been postulated that miRNAs confer robustness to biological processes, but a clear experimental evidence is still missing. Using a synthetic biology approach, we demonstrate that microRNAs provide phenotypic robustness to transcriptional regulatory networks by buffering fluctuations in protein levels. Here we construct a network motif in mammalian cells exhibiting a “toggle - switch” phenotype in which two alternative protein expression levels define its ON and OFF states. The motif consists of an inducible transcription factor that self-regulates its own transcription and that of a miRNA against the transcription factor itself. We confirm, using mathematical modeling and experimental approaches, that the microRNA confers robustness to the toggle-switch by enabling the cell to maintain and transmit its state. When absent, a dramatic increase in protein noise level occurs, causing the cell to randomly switch between the two states.
Motivation: Identification of differential expressed genes has led to countless new discoveries. However, differentially expressed genes are only a proxy for finding dysregulated pathways. The problem is to identify how the network of regulatory and physical interactions rewires in different conditions or in disease.
Results: We developed a procedure named DINA (DIfferential Network Analysis), which is able to identify set of genes, whose co-regulation is condition-specific, starting from a collection of condition-specific gene expression profiles. DINA is also able to predict which transcription factors (TFs) may be responsible for the pathway condition-specific co-regulation. We derived 30 tissue-specific gene networks in human and identified several metabolic pathways as the most differentially regulated across the tissues. We correctly identified TFs such as Nuclear Receptors as their main regulators and demonstrated that a gene with unknown function (YEATS2) acts as a negative regulator of hepatocyte metabolism. Finally, we showed that DINA can be used to make hypotheses on dysregulated pathways during disease progression. By analyzing gene expression profiles across primary and transformed hepatocytes, DINA identified hepatocarcinoma-specific metabolic and transcriptional pathway dysregulation.
Availability: We implemented an on-line web-tool http://dina.tigem.it enabling the user to apply DINA to identify tissue-specific pathways or gene signatures.
Supplementary data are available at Bioinformatics online.
Stem-cell functions require activation of stem-cell-intrinsic transcriptional programs and extracellular interaction with a niche microenvironment. How the transcriptional machinery controls residency of stem cells in the niche is unknown. Here we show that Id proteins coordinate stem-cell activities with anchorage of neural stem cells (NSCs) to the niche. Conditional inactivation of three Id genes in NSCs triggered detachment of embryonic and postnatal NSCs from the ventricular and vascular niche, respectively. The interrogation of the gene modules directly targeted by Id deletion in NSCs revealed that Id proteins repress bHLH-mediated activation of Rap1GAP, thus serving to maintain the GTPase activity of RAP1, a key mediator of cell adhesion. Preventing the elevation of the Rap1GAP level countered the consequences of Id loss on NSC–niche interaction and stem-cell identity. Thus, by preserving anchorage of NSCs to the extracellular environment, Id activity synchronizes NSC functions to residency in the specialized niche.
The connection between chromatin nuclear organization and gene activity is vividly illustrated by the observation that transcriptional coregulation of certain genes appears to be directly influenced by their spatial proximity. This fact poses the more general question of whether it is at all feasible that the numerous genes that are coregulated on a given chromosome, especially those at large genomic distances, might become proximate inside the nucleus. This problem is studied here using steered molecular dynamics simulations in order to enforce the colocalization of thousands of knowledge-based gene sequences on a model for the gene-rich human chromosome 19. Remarkably, it is found that most () gene pairs can be brought simultaneously into contact. This is made possible by the low degree of intra-chromosome entanglement and the large number of cliques in the gene coregulatory network. A clique is a set of genes coregulated all together as a group. The constrained conformations for the model chromosome 19 are further shown to be organized in spatial macrodomains that are similar to those inferred from recent HiC measurements. The findings indicate that gene coregulation and colocalization are largely compatible and that this relationship can be exploited to draft the overall spatial organization of the chromosome in vivo. The more general validity and implications of these findings could be investigated by applying to other eukaryotic chromosomes the general and transferable computational strategy introduced here.
Recent high-throughput experiments have shown that chromosome regions (loci) which accommodate specific sets of coregulated genes can be in close spatial proximity despite their possibly large sequence separation. The findings pose the question of whether gene coregulation and gene colocalization are related in general. Here, we tackle this problem using a knowledge-based coarse-grained model of human chromosome 19. Specifically, we carry out steered molecular dynamics simulations to promote the colocalization of hundreds of gene pairs that are known to be significantly coregulated. We show that most () of such pairs can be simultaneously colocalized. This result is, in turn, shown to depend on at least two distinctive chromosomal features: the remarkably low degree of intra-chain entanglement found in chromosomes inside the nucleus and the large number of cliques present in the gene coregulatory network. The results are therefore largely consistent with the coregulation-colocalization hypothesis. Furthermore, the model chromosome conformations obtained by applying the coregulation constraints are found to display spatial macrodomains that have significant similarities with those inferred from HiC measurements of human chromosome 19. This finding suggests that suitable extensions of the present approach might be used to propose viable ensembles of eukaryotic chromosome conformations in vivo.
Gene expression profiles can be used to infer previously unknown transcriptional regulatory interaction among thousands of genes, via systems biology ‘reverse engineering’ approaches. We ‘reverse engineered’ an embryonic stem (ES)-specific transcriptional network from 171 gene expression profiles, measured in ES cells, to identify master regulators of gene expression (‘hubs’). We discovered that E130012A19Rik (E13), highly expressed in mouse ES cells as compared with differentiated cells, was a central ‘hub’ of the network. We demonstrated that E13 is a protein-coding gene implicated in regulating the commitment towards the different neuronal subtypes and glia cells. The overexpression and knock-down of E13 in ES cell lines, undergoing differentiation into neurons and glia cells, caused a strong up-regulation of the glutamatergic neurons marker Vglut2 and a strong down-regulation of the GABAergic neurons marker GAD65 and of the radial glia marker Blbp. We confirmed E13 expression in the cerebral cortex of adult mice and during development. By immuno-based affinity purification, we characterized protein partners of E13, involved in the Polycomb complex. Our results suggest a role of E13 in regulating the division between glutamatergic projection neurons and GABAergic interneurons and glia cells possibly by epigenetic-mediated transcriptional regulation.
We collected a massive and heterogeneous dataset of 20 255 gene expression profiles (GEPs) from a variety of human samples and experimental conditions, as well as 8895 GEPs from mouse samples. We developed a mutual information (MI) reverse-engineering approach to quantify the extent to which the mRNA levels of two genes are related to each other across the dataset. The resulting networks consist of 4 817 629 connections among 20 255 transcripts in human and 14 461 095 connections among 45 101 transcripts in mouse, with a inter-species conservation of 12%. The inferred connections were compared against known interactions to assess their biological significance. We experimentally validated a subset of not previously described protein–protein interactions. We discovered co-expressed modules within the networks, consisting of genes strongly connected to each other, which carry out specific biological functions, and tend to be in physical proximity at the chromatin level in the nucleus. We show that the network can be used to predict the biological function and subcellular localization of a protein, and to elucidate the function of a disease gene. We experimentally verified that granulin precursor (GRN) gene, whose mutations cause frontotemporal lobar degeneration, is involved in lysosome function. We have developed an online tool to explore the human and mouse gene networks.
Understanding the relationship between topology and dynamics of transcriptional regulatory networks in mammalian cells is essential to elucidate the biology of complex regulatory and signaling pathways. Here, we characterised, via a synthetic biology approach, a transcriptional positive feedback loop (PFL) by generating a clonal population of mammalian cells (CHO) carrying a stable integration of the construct. The PFL network consists of the Tetracycline-controlled transactivator (tTA), whose expression is regulated by a tTA responsive promoter (CMV-TET), thus giving rise to a positive feedback. The same CMV-TET promoter drives also the expression of a destabilised yellow fluorescent protein (d2EYFP), thus the dynamic behaviour can be followed by time-lapse microscopy. The PFL network was compared to an engineered version of the network lacking the positive feedback loop (NOPFL), by expressing the tTA mRNA from a constitutive promoter. Doxycycline was used to repress tTA activation (switch off), and the resulting changes in fluorescence intensity for both the PFL and NOPFL networks were followed for up to 43 h. We observed a striking difference in the dynamics of the PFL and NOPFL networks. Using non-linear dynamical models, able to recapitulate experimental observations, we demonstrated a link between network topology and network dynamics. Namely, transcriptional positive autoregulation can significantly slow down the “switch off” times, as comparared to the nonautoregulatated system. Doxycycline concentration can modulate the response times of the PFL, whereas the NOPFL always switches off with the same dynamics. Moreover, the PFL can exhibit bistability for a range of Doxycycline concentrations. Since the PFL motif is often found in naturally occurring transcriptional and signaling pathways, we believe our work can be instrumental to characterise their behaviour.
Synthetic Biology aims at designing and building new biological functions in living organisms. At the same time, Synthetic Biology approaches can be used to uncover the design principles of natural biological systems through the rational construction of simplified regulatory networks. Mathematical models of the networks are then derived from physical considerations and can be used to explain the observed dynamical behaviours. We have characterised a regulatory motif often found in transcriptional and signalling pathways. We constructed a positive feedback loop motif in mammalian cells, consisting of a protein controlling its own expression. We have shown that this motif exhibits a dynamic behaviour which is very different from that obtained when the autoregulation is removed. This difference is intrinsic to the specific wiring diagram chosen by the cell to control its behaviour (feedback versus non-feedback configurations), and can be instrumental in understanding the complex network of regulation occurring in a cell.
Connectivity mapping is a recently developed technique for discovering the underlying connections between different biological states based on gene-expression similarities. The sscMap method has been shown to provide enhanced sensitivity in mapping meaningful connections leading to testable biological hypotheses and in identifying drug candidates with particular pharmacological and/or toxicological properties. Challenges remain, however, as to how to prioritise the large number of discovered connections in an unbiased manner such that the success rate of any following-up investigation can be maximised. We introduce a new concept, gene-signature perturbation, which aims to test whether an identified connection is stable enough against systematic minor changes (perturbation) to the gene-signature. We applied the perturbation method to three independent datasets obtained from the GEO database: acute myeloid leukemia (AML), cervical cancer, and breast cancer treated with letrozole. We demonstrate that the perturbation approach helps to identify meaningful biological connections which suggest the most relevant candidate drugs. In the case of AML, we found that the prevalent compounds were retinoic acids and PPAR activators. For cervical cancer, our results suggested that potential drugs are likely to involve the EGFR pathway; and with the breast cancer dataset, we identified candidates that are involved in prostaglandin inhibition. Thus the gene-signature perturbation approach added real values to the whole connectivity mapping process, allowing for increased specificity in the identification of possible therapeutic candidates.
RNA interference (RNAi) is a regulatory cellular process that controls post-transcriptional gene silencing. During RNAi double-stranded RNA (dsRNA) induces sequence-specific degradation of homologous mRNA via the generation of smaller dsRNA oligomers of length between 21-23nt (siRNAs). siRNAs are then loaded onto the RNA-Induced Silencing multiprotein Complex (RISC), which uses the siRNA antisense strand to specifically recognize mRNA species which exhibit a complementary sequence. Once the siRNA loaded-RISC binds the target mRNA, the mRNA is cleaved and degraded, and the siRNA loaded-RISC can degrade additional mRNA molecules. Despite the widespread use of siRNAs for gene silencing, and the importance of dosage for its efficiency and to avoid off target effects, none of the numerous mathematical models proposed in literature was validated to quantitatively capture the effects of RNAi on the target mRNA degradation for different concentrations of siRNAs. Here, we address this pressing open problem performing in vitro experiments of RNAi in mammalian cells and testing and comparing different mathematical models fitting experimental data to in-silico generated data. We performed in vitro experiments in human and hamster cell lines constitutively expressing respectively EGFP protein or tTA protein, measuring both mRNA levels, by quantitative Real-Time PCR, and protein levels, by FACS analysis, for a large range of concentrations of siRNA oligomers.
We tested and validated four different mathematical models of RNA interference by quantitatively fitting models' parameters to best capture the in vitro experimental data. We show that a simple Hill kinetic model is the most efficient way to model RNA interference. Our experimental and modeling findings clearly show that the RNAi-mediated degradation of mRNA is subject to saturation effects.
Our model has a simple mathematical form, amenable to analytical investigations and a small set of parameters with an intuitive physical meaning, that makes it a unique and reliable mathematical tool. The findings here presented will be a useful instrument for better understanding RNAi biology and as modelling tool in Systems and Synthetic Biology.
Dysferlin (DYSF) is a type II transmembrane protein implicated in surface membrane repair of muscle. Mutations in dysferlin lead to Limb Girdle Muscular Dystrophy 2B (LGMD2B), Miyoshi Myopathy (MM), and Distal Myopathy with Anterior Tibialis onset (DMAT). The DYSF protein complex is not well understood, and only a few protein-binding partners have been identified thus far. To increase the set of interacting protein partners for DYSF we recovered a list of predicted interacting protein through a systems biology approach. The predictions are part of a “reverse-engineered” genome-wide human gene regulatory network obtained from experimental data by computational analysis. The reverse-engineering algorithm behind the analysis relates genes to each other based on changes in their expression patterns. DYSF and AHNAK were used to query the system and extract lists of potential interacting proteins. Among the 32 predictions the two genes share, we validated the physical interaction between DYSF protein with moesin (MSN) and polymerase I and transcript release factor (PTRF) in mouse heart lysate, thus identifying two novel Dysferlin-interacting proteins. Our strategy could be useful to clarify Dysferlin function in intracellular vesicles and its implication in muscle membrane resealing.
Caveolae; Genetic Diseases; Microarray; Muscular Dystrophy; Protein-Protein Interactions
Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain) and linear state-space (time domain) can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.
Although maps of intracellular interactions are increasingly well characterized, little is known about large-scale maps of host-pathogen protein interactions. The investigation of host-pathogen interactions can reveal features of pathogenesis and provide a foundation for the development of drugs and disease prevention strategies. A compilation of experimentally verified interactions between HIV-1 and human proteins and a set of HIV-dependency factors (HDF) allowed insights into the topology and intricate interplay between viral and host proteins on a large scale. We found that targeted and HDF proteins appear predominantly in rich-clubs, groups of human proteins that are strongly intertwined among each other. These assemblies of proteins may serve as an infection gateway, allowing the virus to take control of the human host by reaching protein pathways and diversified cellular functions in a pronounced and focused way. Particular transcription factors and protein kinases facilitate indirect interactions between HDFs and viral proteins. Discerning the entanglement of directly targeted and indirectly interacting proteins may uncover molecular and functional sites that can provide novel perspectives on the progression of HIV infection and highlight new avenues to fight this virus.
The yeast pheromone response pathway is a canonical three-step mitogen activated protein kinase (MAPK) cascade which requires a scaffold protein for proper signal transduction. Recent experimental studies into the role the scaffold plays in modulating the character of the transduced signal, show that the presence of the scaffold increases the biphasic nature of the signal response. This runs contrary to prior theoretical investigations into how scaffolds function. We describe a mathematical model of the yeast MAPK cascade specifically designed to capture the experimental conditions and results of these empirical studies. We demonstrate how the system can exhibit either graded or ultrasensitive (biphasic) response dynamics based on the binding kinetics of enzymes to the scaffold. At the basis of our theory is an analytical result that weak interactions make the response biphasic while tight interactions lead to a graded response. We then show via an analysis of the kinetic binding rate constants how the results of experimental manipulations, modeled as changes to certain of these binding constants, lead to predictions of pathway output consistent with experimental observations. We demonstrate how the results of these experimental manipulations are consistent within the framework of our theoretical treatment of this scaffold-dependent MAPK cascades, and how future efforts in this style of systems biology can be used to interpret the results of other signal transduction observations.
Decoding transcriptional programs governing transcriptomic diversity across human multiple tissues is a major challenge in bioinformatics. To address this problem, a number of computational methods have focused on cis-regulatory codes driving overexpression or underexpression in a single tissue as compared to others. On the other hand, we recently proposed a different approach to mine cis-regulatory codes: starting from gene sets sharing common cis-regulatory motifs, the method screens for expression modules based on expression coherence. However, both approaches seem to be insufficient to capture transcriptional programs that control gene expression in a subset of all samples. Especially, this limitation would be serious when analyzing multiple tissue data. To overcome this limitation, we developed a new module discovery method termed BEEM (Biclusering-based Extraction of Expression Modules) in order to discover expression modules that are functional in a subset of tissues. We showed that, when applied to expression profiles of human multiple tissues, BEEM finds expression modules missed by two existing approaches that are based on the coherent expression and the single tissue-specific differential expression. From the BEEM results, we obtained new insights into transcriptional programs controlling transcriptomic diversity across various types of tissues. This study introduces BEEM as a powerful tool for decoding regulatory programs from a compendium of gene expression profiles.
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes - as is the case in biological networks - due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Robustness is an essential feature of biological systems, and any mathematical model that describes such a system should reflect this feature. Especially, persistence of oscillatory behavior is an important issue. A benchmark model for this phenomenon is the Laub-Loomis model, a nonlinear model for cAMP oscillations in Dictyostelium discoideum. This model captures the most important features of biomolecular networks oscillating at constant frequencies. Nevertheless, the robustness of its oscillatory behavior is not yet fully understood. Given a system that exhibits oscillating behavior for some set of parameters, the central question of robustness is how far the parameters may be changed, such that the qualitative behavior does not change. The determination of such a “robustness region” in parameter space is an intricate task. If the number of parameters is high, it may be also time consuming. In the literature, several methods are proposed that partially tackle this problem. For example, some methods only detect particular bifurcations, or only find a relatively small box-shaped estimate for an irregularly shaped robustness region. Here, we present an approach that is much more general, and is especially designed to be efficient for systems with a large number of parameters. As an illustration, we apply the method first to a well understood low-dimensional system, the Rosenzweig-MacArthur model. This is a predator-prey model featuring satiation of the predator. It has only two parameters and its bifurcation diagram is available in the literature. We find a good agreement with the existing knowledge about this model. When we apply the new method to the high dimensional Laub-Loomis model, we obtain a much larger robustness region than reported earlier in the literature. This clearly demonstrates the power of our method. From the results, we conclude that the biological system underlying is much more robust than was realized until now.
A significant proportion of myocardial infarction (MI) patients undergo complex, coordinated perturbations at the molecular level that may eventually drive the occurrence of ventricular dysfunction and heart failure. Despite advances in the elucidation of key processes implicated in this condition, traditional methods relying on gene expression data and the identification of individual biomarkers in isolation pose major limitations not only for improving prediction power, but also for model interpretability. Mechanisms underlying clinical responses after MI remain elusive and there is no biomarker with the capacity to accurately predict ventricular dysfunction after MI. This calls for the exploration of system-level modeling of ventricular dysfunction in post-MI patients. Within this discovery framework key perturbations and predictive patterns are characterized by the integrated biological activity levels observed in pathways, rather than in individual genes.
Here we report an integrative approach to identifying pathways related with ventricular dysfunction post MI with potential prognostic and therapeutic value. We found that a diversity of pathway-level perturbations can be profiled in samples of patients with ventricular dysfunction post MI, most of which represent major reductions of gene expression. Highly perturbed pathways included those implicated in antigen-dependent B-cell activation and the synthesis of leucine. By analyzing patient-specific samples encoded with information derived from highly-perturbed pathways, it is possible to visualize differential prognostic patterns and to perform computational classification of patients with areas under the receiver operating characteristic curve above 0.75. We also demonstrate how the integration of the outcomes generated by different pathway-based analysis models may improve ventricular dysfunction prediction performance.
This research offers an alternative, comprehensive view of key relationships and perturbations that may trigger the emergence or prevention of ventricular dysfunction post-MI.
Optimal selection of multiple regulatory genes, known as targets, for deletion to enhance or suppress the activities of downstream genes or metabolites is an important problem in genetic engineering. Such problems become more feasible to address in silico due to the availability of more realistic dynamical system models of gene regulatory and metabolic networks. The goal of the computational problem is to search for a subset of genes to knock out so that the activity of a downstream gene or a metabolite is optimized.
Based on discrete dynamical system modeling of gene regulatory networks, an integer programming problem is formulated for the optimal in silico target gene deletion problem. In the first result, the integer programming problem is proved to be NP-hard and equivalent to a nonlinear programming problem. In the second result, a heuristic algorithm, called GKONP, is designed to approximate the optimal solution, involving an approach to prune insignificant terms in the objective function, and the parallel differential evolution algorithm. In the third result, the effectiveness of the GKONP algorithm is demonstrated by applying it to a discrete dynamical system model of the yeast pheromone pathways. The empirical accuracy and time efficiency are assessed in comparison to an optimal, but exhaustive search strategy.
Although the in silico target gene deletion problem has enormous potential applications in genetic engineering, one must overcome the computational challenge due to its NP-hardness. The presented solution, which has been demonstrated to approximate the optimal solution in a practical amount of time, is among the few that address the computational challenge. In the experiment on the yeast pheromone pathways, the identified best subset of genes for deletion showed advantage over genes that were selected empirically. Once validated in vivo, the optimal target genes are expected to achieve higher genetic engineering effectiveness than a trial-and-error procedure.
Systems and Synthetic Biology use computational models of biological pathways in order to study in silico the behaviour of biological pathways. Mathematical models allow to verify biological hypotheses and to predict new possible dynamical behaviours. Here we use the tools of non-linear analysis to understand how to change the dynamics of the genes composing a novel synthetic network recently constructed in the yeast Saccharomyces cerevisiae for In-vivo Reverse-engineering and Modelling Assessment (IRMA). Guided by previous theoretical results that make the dynamics of a biological network depend on its topological properties, through the use of simulation and continuation techniques, we found that the network can be easily turned into a robust and tunable synthetic oscillator or a bistable switch. Our results provide guidelines to properly re-engineering in vivo the network in order to tune its dynamics.
The advent of various high-throughput experimental techniques for measuring molecular interactions has enabled the systematic study of biological interactions on a global scale. Since biological processes are carried out by elaborate collaborations of numerous molecules that give rise to a complex network of molecular interactions, comparative analysis of these biological networks can bring important insights into the functional organization and regulatory mechanisms of biological systems.
In this paper, we present an effective framework for identifying common interaction patterns in the biological networks of different organisms based on hidden Markov models (HMMs). Given two or more networks, our method efficiently finds the top matching paths in the respective networks, where the matching paths may contain a flexible number of consecutive insertions and deletions.
Based on several protein-protein interaction (PPI) networks obtained from the Database of Interacting Proteins (DIP) and other public databases, we demonstrate that our method is able to detect biologically significant pathways that are conserved across different organisms. Our algorithm has a polynomial complexity that grows linearly with the size of the aligned paths. This enables the search for very long paths with more than 10 nodes within a few minutes on a desktop computer. The software program that implements this algorithm is available upon request from the authors.
MicroRNAs play a critical role in many essential cellular functions in the mammalian species. However, limited information is available regarding the regulation of miRNAs gene transcription. Microarray profiling and real-time PCR analysis revealed a marked down-regulation of miR-206 in nuclear receptor SHP−/− mice. To understand the regulatory function of SHP with regard to miR-206 gene expression, we determined the putative transcriptional initiation site of miR-206 and also its full length primary transcript using a database mining approach and RACE. We identified the transcription factor AP1 binding sites on the miR-206 promoter and further showed that AP1 (c-Jun and c-Fos) induced miR-206 promoter transactivity and expression which was repressed by YY1. ChIP analysis confirmed the physical association of AP1 (c-Jun) and YY1 with the endogenous miR-206 promoter. In addition, we also identified nuclear receptor ERRγ (NR3B3) binding site on the YY1 promoter and showed that YY1 promoter was transactivated by ERRγ, which was inhibited by SHP (NROB2). ChIP analysis confirmed the ERRγ binding to the YY1 promoter. Forced expression of SHP and AP1 induced miR-206 expression while overexpression of ERRγ and YY1 reduced its expression. The effects of AP1, ERRγ, and YY1 on miR-206 expression were reversed by siRNA knockdown of each gene, respectively. Thus, we propose a novel cascade “dual inhibitory” mechanism governing miR-206 gene transcription by SHP: SHP inhibition of ERRγ led to decreased YY1 expression and the de-repression of YY1 on AP1 activity, ultimately leading to the activation of miR-206. This is the first report to elucidate a cascade regulatory mechanism governing miRNAs gene transcription.
MicroRNAs are small highly conserved non-coding RNAs which play an important role in regulating gene expression by binding the 3'UTR of target mRNAs. The majority of microRNAs are localized within other transcriptional units (host genes) and are co-expressed with them, which strongly suggests that microRNAs and corresponding host genes use the same promoter and other expression control elements. The remaining fraction of microRNAs is intergenic and is endowed with an independent regulatory region. A number of databases have already been developed to collect information about microRNAs but none of them allow an easy exploration of microRNA genomic organization across evolution.
CoGemiR is a publicly available microRNA-centered database whose aim is to offer an overview of the genomic organization of microRNAs and of its extent of conservation during evolution in different metazoan species. The database collects information on genomic location, conservation and expression data of both known and newly predicted microRNAs and displays the data by privileging a comparative point of view. The database also includes a microRNA prediction pipeline to annotate microRNAs in recently sequenced genomes. This information is easily accessible via web through a user-friendly query page. The CoGemiR database is available at
The knowledge of the genomic organization of microRNAs can provide useful information to understand their biology. In order to have a comparative genomics overview of microRNAs genomic organization, we developed CoGemiR. To achieve this goal, we both collected and integrated data from pre-existing databases and generated new ones, such as the identification in several species of a number of previously unannotated microRNAs. For a more effective use of this data, we developed a user-friendly web interface that simply shows how a microRNA genomic context is related in different species.
Correction to: Molecular Systems Biology 3:78. doi:10.1038/msb4100120; Published online 13 February 2007