Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many eQTL studies typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis-effect on expression cannot be accounted for by common cis-variants, a finding which exposes the contribution of low frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene and identify several replicating trans-variants which act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.
The identification of virulence genes in plant pathogenic fungi is important for understanding the infection process, host range and for developing control strategies. The analysis of already verified virulence genes in phytopathogenic fungi in the context of integrated functional networks can give clues about the underlying mechanisms and pathways directly or indirectly linked to fungal pathogenicity and can suggest new candidates for further experimental investigation, using a ‘guilt by association’ approach. Here we study 133 genes in the globally important Ascomycete fungus Fusarium graminearum that have been experimentally tested for their involvement in virulence. An integrated network that combines information from gene co-expression, predicted protein-protein interactions and sequence similarity was employed and, using 100 genes known to be required for virulence, we found a total of 215 new proteins potentially associated with virulence of which 29 are annotated as hypothetical proteins. The majority of these potential virulence genes are located in chromosomal regions known to have a low recombination frequency. We have also explored the taxonomic diversity of these candidates and found 25 sequences, which are likely to be fungal specific. We discuss the biological relevance of a few of the potentially novel virulence associated genes in detail. The analysis of already verified virulence genes in phytopathogenic fungi in the context of integrated functional networks can give clues about the underlying mechanisms and pathways directly or indirectly linked to fungal pathogenicity and can suggest new candidates for further experimental investigation, using a ‘guilt by association’ approach.
Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes.
We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed.
Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.
Disease classification; Molecular grouping; Psoriasis; Decision tree prediction model
Hepatocellular carcinoma (HCC) is a leading cause of global cancer mortality. However, little is known about the precise molecular mechanisms involved in tumor formation and pathogenesis. The primary goal of this study was to elucidate genome-wide molecular networks involved in development of HCC with multiple etiologies by exploring high quality microarray data. We undertook a comparative network analysis across 264 human microarray profiles monitoring transcript changes in healthy liver, liver cirrhosis, and HCC with viral and alcoholic etiologies. Gene co-expression profiling was used to derive a consensus gene relevance network of HCC progression that consisted of 798 genes and 2,012 links. The HCC interactome was further confirmed to be phenotype-specific and non-random. Additionally, we confirmed that co-expressed genes are more likely to share biological function, but not sub-cellular localization. Analysis of individual HCC genes revealed that they are topologically central in a human protein-protein interaction network. We used quantitative RT-PCR in a cohort of normal liver tissue (n = 8), hepatitis C virus (HCV)-induced chronic liver disease (n = 9), and HCC (n = 7) to validate co-expressions of several well-connected genes, namely ASPM, CDKN3, NEK2, RACGAP1, and TOP2A. We show that HCC is a heterogeneous disorder, underpinned by complex cross talk between immune response, cell cycle, and mRNA translation pathways. Our work provides a systems-wide resource for deeper understanding of molecular mechanisms in HCC progression and may be used further to define novel targets for efficient treatment or diagnosis of this disease.
Cellular constituents such as proteins, DNA, and RNA form a complex web of interactions that regulate biochemical homeostasis and determine the dynamic cellular response to external stimuli. It follows that detailed understanding of these patterns is critical for the assessment of fundamental processes in cell biology and pathology. Representation and analysis of cellular constituents through network principles is a promising and popular analytical avenue towards a deeper understanding of molecular mechanisms in a system-wide context.
We present Functional Genomics Assistant (FUGA) - an extensible and portable MATLAB toolbox for the inference of biological relationships, graph topology analysis, random network simulation, network clustering, and functional enrichment statistics. In contrast to conventional differential expression analysis of individual genes, FUGA offers a framework for the study of system-wide properties of biological networks and highlights putative molecular targets using concepts of systems biology.
FUGA offers a simple and customizable framework for network analysis in a variety of systems biology applications. It is freely available for individual or academic use at http://code.google.com/p/fuga.
A transgenic mouse model for conditional induction of long-term hibernation via myocardium-specific expression of a VEGF-sequestering soluble receptor allowed the dissection of the hibernation process into an initiation and a maintenance phase. The hypoxic initiation phase was characterized by peak levels of K(ATP) channel and glucose transporter 1 (GLUT1) expression. Glibenclamide, an inhibitor of K(ATP) channels, blocked GLUT1 induction. In the maintenance phase, tissue hypoxia and GLUT1 expression were reduced. Thus, we employed a combined “-omics” approach to resolve this cardioprotective adaptation process. Unguided bioinformatics analysis on the transcriptomic, proteomic and metabolomic datasets confirmed that anaerobic glycolysis was affected and that the observed enzymatic changes in cardiac metabolism were directly linked to hypoxia-inducible factor (HIF)-1 activation. Although metabolite concentrations were kept relatively constant, the combination of the proteomic and transcriptomic dataset improved the statistical confidence of the pathway analysis by 2 orders of magnitude. Importantly, proteomics revealed a reduced phosphorylation state of myosin light chain 2 and cardiac troponin I within the contractile apparatus of hibernating hearts in the absence of changes in protein abundance. Our study demonstrates how combining different “-omics” datasets aids in the identification of key biological pathways: chronic hypoxia resulted in a pronounced adaptive response at the transcript and the protein level to keep metabolite levels steady. This preservation of metabolic homeostasis is likely to contribute to the long-term survival of the hibernating myocardium.
► The hibernation process was dissected into an initiation and a maintenance phase. ► Glibenclamide, an inhibitor of K(ATP) channels, blocked GLUT1 induction. ► The maintenance phase was characterized by attenuated tissue hypoxia. ► Phosphorylation of myosin light chain 2 and cardiac troponin I was reduced. ► Combining of proteomics and transcriptomics improved the bioinformatic pathway analysis.
DIGE, difference in-gel electrophoresis; 2-DE, two-dimensional gel electrophoresis; 1H-NMR, proton nuclear magnetic resonance spectroscopy; LC-MS/MS, liquid chromatography tandem mass spectrometry; Hibernation; Hypoxia; Metabolomics; Myocardium; Proteomics
Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation.
We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes.
Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.
Inflammation is characterized by altered cytokine levels produced by cell populations in a highly interdependent manner. To elucidate the mechanism of an inflammatory reaction, we have developed a mathematical model for immune cell interactions via the specific, dose-dependent cytokine production rates of cell populations. The model describes the criteria required for normal and pathological immune system responses and suggests that alterations in the cytokine production rates can lead to various stable levels which manifest themselves in different disease phenotypes. The model predicts that pairs of interacting immune cell populations can maintain homeostatic and elevated extracellular cytokine concentration levels, enabling them to operate as an immune system switch. The concept described here is developed in the context of psoriasis, an immune-mediated disease, but it can also offer mechanistic insights into other inflammatory pathologies as it explains how interactions between immune cell populations can lead to disease phenotypes.
A functional immune system requires complex interactions among diverse cell types, mediated by a variety of cytokines. These interactions include phenomena such as positive and negative feedback loops that can be experimentally characterized by dose-dependent cytokine production measurements. However, any experimental approach is not only limited with regard to the number of cell-cell interactions that can be studied at a given time, but also does not have the capacity to assess or predict the overall immune response which is the result of complex interdependent immune cell interactions. Therefore, experimental data need to be viewed from a theoretical perspective allowing the quantitative modeling of immune cell interactions. Here, we propose a strategy for a quantitative description of multiple interactions between immune cell populations based on their cytokine production profiles. The model predicts that the modified feedback loop interactions can result in the appearance of alternative steady-states causing the switch-like immune system effect that is experimentally observed in pathologic phenotypes. Overall, the quantitative description of immune cell interactions via cytokine signaling reported here offers new insights into understanding and predicting normal and pathological immune system responses.
The detection of modules or community structure is widely used to reveal the underlying properties of complex networks in biology, as well as physical and social sciences. Since the adoption of modularity as a measure of network topological properties, several methodologies for the discovery of community structure based on modularity maximisation have been developed. However, satisfactory partitions of large graphs with modest computational resources are particularly challenging due to the NP-hard nature of the related optimisation problem. Furthermore, it has been suggested that optimising the modularity metric can reach a resolution limit whereby the algorithm fails to detect smaller communities than a specific size in large networks.
We present a novel solution approach to identify community structure in large complex networks and address resolution limitations in module detection. The proposed algorithm employs modularity to express network community structure and it is based on mixed integer optimisation models. The solution procedure is extended through an iterative procedure to diminish effects that tend to agglomerate smaller modules (resolution limitations).
A comprehensive comparative analysis of methodologies for module detection based on modularity maximisation shows that our approach outperforms previously reported methods. Furthermore, in contrast to previous reports, we propose a strategy to handle resolution limitations in modularity maximisation. Overall, we illustrate ways to improve existing methodologies for community structure identification so as to increase its efficiency and applicability.
Genome-wide expression patterns in physiological cardiac hypertrophy. Co-expression patterns in physiological cardiac hypertrophy
In this study, the first large-scale analysis of publicly available genome-wide expression data of several in vivo murine models of physiological LVH was carried out using network analysis. On evaluating 3 million gene co-expression patterns across 141 relevant microarray experiments, it was found that physiological adaptation is an evolutionarily conserved processes involving preservation of the function of cytochrome c oxidase, induction of autophagy compatible with cell survival, and coordinated regulation of angiogenesis.
This analysis not only identifies known biological pathways involved in physiological LVH, but also offers novel insights into the molecular basis of this phenotype by identifying key networks of co-expressed genes, as well as their topological and functional properties, using relevant high-quality microarray experiments and network inference.
The mechanisms of stress tolerance in sessile animals, such as molluscs, can offer fundamental insights into the adaptation of organisms for a wide range of environmental challenges. One of the best studied processes at the molecular level relevant to stress tolerance is the heat shock response in the genus Mytilus. We focus on the upstream region of Mytilus galloprovincialis Hsp90 genes and their structural and functional associations, using comparative genomics and network inference. Sequence comparison of this region provides novel evidence that the transcription of Hsp90 is regulated via a dense region of transcription factor binding sites, also containing a region with similarity to the Gamera family of LINE-like repetitive sequences and a genus-specific element of unknown function. Furthermore, we infer a set of gene networks from tissue-specific expression data, and specifically extract an Hsp class-associated network, with 174 genes and 2,226 associations, exhibiting a complex pattern of expression across multiple tissue types. Our results (i) suggest that the heat shock response in the genus Mytilus is regulated by an unexpectedly complex upstream region, and (ii) provide new directions for the use of the heat shock process as a biosensor system for environmental monitoring.
Adaptation of sessile animals, such as molluscs, to stress is achieved by a number of molecular mechanisms, few of which are clearly understood. Insights from this research can provide clues about stress tolerance both for sessile and mobile organisms. The Mediterranean mussel, of the genus Mytilus, is a model organism for the study of stress at the molecular level, with sufficient gene structure and function data available. We have thus investigated a key stress response gene, Hsp90, and in particular its upstream region, using a combination of sequence and expression analysis approaches. We demonstrate that this region, responsible for the regulation of heat shock-associated gene expression, exhibits an unparalleled structural and functional complexity compared to other model organisms, as well as subtle gene expression patterns across multiple tissues. These results form the basis upon which the heat shock response can be used as a molecular biosensor for environmental monitoring in the future.
Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database.
The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples.
Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.
We present the computational prediction and synthesis of the metabolic
pathways in Methanococcus jannaschii from its genomic
sequence using the PathoLogic software. Metabolic reconstruction is
based on a reference knowledge base of metabolic pathways and is
performed with minimal manual intervention. We predict the existence
of 609 metabolic reactions that are assembled in 113 metabolic
pathways and an additional 17 super-pathways consisting of one or more
component pathways. These assignments represent significantly improved
enzyme and pathway predictions compared with previous metabolic
reconstructions, and some key metabolic reactions, previously missing,
have been identified. Our results, in the form of enzymatic
assignments and metabolic pathway predictions, form a database (MJCyc)
that is accessible over the World Wide Web for further dissemination
among members of the scientific community.
metabolic databases; Methanocaldococcus; pathway synthesis
By the end of 2002, we witnessed the landmark submission of the 100th complete genome sequence in the databases. An overview of these genomes reveals certain interesting trends and provides valuable insights into possible future developments.
To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function.