Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases.
We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes.
Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.
Protein-protein interaction (PPI) networks enable us to better understand the functional organization of the proteome. We can learn a lot about a particular protein by querying its neighborhood in a PPI network to find proteins with similar function. A spectral approach that considers random walks between nodes of interest is particularly useful in evaluating closeness in PPI networks. Spectral measures of closeness are more robust to noise in the data and are more precise than simpler methods based on edge density and shortest path length.
We develop a novel affinity measure for pairs of proteins in PPI networks, which uses personalized PageRank, a random walk based method used in context-sensitive search on the Web. Our measure of closeness, which we call PageRank Affinity, is proportional to the number of times the smaller-degree protein is visited in a random walk that restarts at the larger-degree protein. PageRank considers paths of all lengths in a network, therefore PageRank Affinity is a precise measure that is robust to noise in the data. PageRank Affinity is also provably related to cluster co-membership, making it a meaningful measure. In our experiments on protein networks we find that our measure is better at predicting co-complex membership and finding functionally related proteins than other commonly used measures of closeness. Moreover, our experiments indicate that PageRank Affinity is very resilient to noise in the network. In addition, based on our method we build a tool that quickly finds nodes closest to a queried protein in any protein network, and easily scales to much larger biological networks.
We define a meaningful way to assess the closeness of two proteins in a PPI network, and show that our closeness measure is more biologically significant than other commonly used methods. We also develop a tool, accessible at http://xialab.bu.edu/resources/pnns, that allows the user to quickly find nodes closest to a queried vertex in any protein network available from BioGRID or specified by the user.
Most protein PageRank studies do not use signal flow direction information in protein interactions because this information was not readily available in large protein databases until recently. Therefore, four questions have yet to be answered: A) What is the general difference between signal emitting and receiving in a protein interactome? B) Which proteins are among the top ranked in directional ranking? C) Are high ranked proteins more evolutionarily conserved than low ranked ones? D) Do proteins with similar ranking tend to have similar subcellular locations? In this study, we address these questions using the forward, reverse, and non-directional PageRank approaches to rank an information-directional network of human proteins and study their evolutionary conservation. The forward ranking gives credit to information receivers, reverse ranking to information emitters, and non-directional ranking mainly to the number of interactions. The protein lists generated by the forward and non-directional rankings are highly correlated, but those by the reverse and non-directional rankings are not. The results suggest that the signal emitting/receiving system is characterized by key-emittings and relatively even receivings in the human protein interactome. Signaling pathway proteins are frequent in top ranked ones. Eight proteins are both informational top emitters and top receivers. Top ranked proteins, except a few species-related novel-function ones, are evolutionarily well conserved. Protein-subunit ranking position reflects subunit function. These results demonstrate the usefulness of different PageRank approaches in characterizing protein networks and provide insights to protein interaction in the cell.
Genome-wide association studies (GWAS) are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI), a network-based method that combines GWAS data with human protein-protein interaction data (PPI). NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call ‘trait prioritized sub-networks.’ As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn’s disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn’s disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses.
Co-expression based Cancer Modules (CMs) are sets of genes that act in concert to carry out specific functions in different cancer types, and are constructed by exploiting gene expression profiles related to specific clinical conditions or expression signatures associated to specific processes altered in cancer. Unfortunately, genes involved in cancer are not always detectable using only expression signatures or co-expressed sets of genes, and in principle other types of functional interactions should be exploited to obtain a comprehensive picture of the molecular mechanisms underlying the onset and progression of cancer.
We propose a novel semi-supervised method to rank genes with respect to CMs using networks constructed from different sources of functional information, not limited to gene expression data. It exploits on the one hand local learning strategies through score functions that extend the guilt-by-association approach, and on the other hand global learning strategies through graph kernels embedded in the score functions, able to take into account the overall topology of the network. The proposed kernelized score functions compare favorably with other state-of-the-art semi-supervised machine learning methods for gene ranking in biological networks and scales well with the number of genes, thus allowing fast processing of very large gene networks.
The modular nature of kernelized score functions provides an algorithmic scheme from which different gene ranking algorithms can be derived, and the results show that using integrated functional networks we can successfully predict CMs defined mainly through expression signatures obtained from gene expression data profiling. A preliminary analysis of top ranked "false positive" genes shows that our approach could be in perspective applied to discover novel genes involved in the onset and progression of tumors related to specific CMs.
Systematic RNA interference perturbations within ovarian cancer cells reveal a hierarchically organized transcription factor network downstream of the oncogenic RAS pathway. Modules within the network are shown to control distinct aspects of cell growth and migration.
Cellular transformation by KRAS oncogenes results in the upregulation of a multitude of transcription factors and a general deregulation of the transcriptomeTo exploit the network organization of selected transcriptional regulators responding to chronic RAS pathway activation, we used an integrated strategy combining experimental perturbation of transcription factor and signalling kinase expression with a reverse-engineering approach based on modular response analysis (MRA).The network shows strong modularity, high connectivity and hierarchical organization.The network hierarchy is reflected in distinct phenotypic consequences of perturbation within modules that separately control cellular proliferation and anchorage independence.
RAS mutations are highly relevant for progression and therapy response of human tumours, but the genetic network that ultimately executes the oncogenic effects is poorly understood. Here, we used a reverse-engineering approach in an ovarian cancer model to reconstruct KRAS oncogene-dependent cytoplasmic and transcriptional networks from perturbation experiments based on gene silencing and pathway inhibitor treatments. We measured mRNA and protein levels in manipulated cells by microarray, RT–PCR and western blot analysis, respectively. The reconstructed model revealed complex interactions among the transcriptional and cytoplasmic components, some of which were confirmed by double pertubation experiments. Interestingly, the transcription factors decomposed into two hierarchically arranged groups. To validate the model predictions, we analysed growth parameters and transcriptional deregulation in the KRAS-transformed epithelial cells. As predicted by the model, we found two functional groups among the selected transcription factors. The experiments thus confirmed the predicted hierarchical transcription factor regulation and showed that the hierarchy manifests itself in downstream gene expression patterns and phenotype.
cancer systems biology; modular response analysis; oncogenes; ovarian carcinoma model; signal transduction
We investigate the behaviour of the recently proposed Quantum PageRank algorithm, in large complex networks. We find that the algorithm is able to univocally reveal the underlying topology of the network and to identify and order the most relevant nodes. Furthermore, it is capable to clearly highlight the structure of secondary hubs and to resolve the degeneracy in importance of the low lying part of the list of rankings. The quantum algorithm displays an increased stability with respect to a variation of the damping parameter, present in the Google algorithm, and a more clearly pronounced power-law behaviour in the distribution of importance, as compared to the classical algorithm. We test the performance and confirm the listed features by applying it to real world examples from the WWW. Finally, we raise and partially address whether the increased sensitivity of the quantum algorithm persists under coordinated attacks in scale-free and random networks.
Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information.
GeneRank is an intuitive modification of PageRank that maintains many of its mathematical properties. It combines gene expression information with a network structure derived from gene annotations (gene ontologies) or expression profile correlations. Using both simulated and real data we find that the algorithm offers an improved ranking of genes compared to pure expression change rankings.
Our modification of the PageRank algorithm provides an alternative method of evaluating microarray experimental results which combines prior knowledge about the underlying network. GeneRank offers an improvement compared to assessing the importance of a gene based on its experimentally observed fold-change alone and may be used as a basis for further analytical developments.
Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the directed edge structure of the graph.
The fields of environment and health are both interdisciplinary and trans-disciplinary, and until recently had little engagement in social networking designed to cross disciplinary boundaries. The EU FP6 project HENVINET aimed to establish integrated social network and networking facilities for multiple stakeholders in environment and health. The underlying assumption is that increased social networking across disciplines and sectors will enhance the quality of both problem knowledge and problem solving, by facilitating interactions. Inter- and trans-disciplinary networks are considered useful for this purpose. This does not mean that such networks are easily organized, as openness to such cooperation and exchange is often difficult to ascertain.
Different methods may enhance network building. Using a mixed method approach, a diversity of actions were used in order to investigate the main research question: which kind of social networking activities and structures can best support the objective of enhanced inter- and trans-disciplinary cooperation and exchange in the fields of environment and health. HENVINET applied interviews, a role playing session, a personal response system, a stakeholder workshop and a social networking portal as part of the process of building an interdisciplinary and trans-disciplinary network.
The interviews provided support for the specification of requirements for an interdisciplinary and trans-disciplinary network. The role playing session, the personal response system and the stakeholder workshop were assessed as useful tools in forming such network, by increasing the awareness by different disciplines of other’s positions. The social networking portal was particularly useful in delivering knowledge, but the role of the scientist in social networking is not yet clear.
The main challenge in the field of environment and health is not so much a lack of scientific problem knowledge, but rather the ability to effectively communicate, share and use available knowledge for policy making. Structured social network facilities can be useful by policy makers to engage with the research community. It is beneficial for scientists to be able to integrate the perspective of policy makers in the research agenda, and to assist in co-production of policy-relevant information. A diversity of methods need to be applied for network building: according to the fit-for-purpose-principle. It is useful to know which combination of methods and in which time frame produces the best results.
Networking projects such as HENVINET are created not only for the benefit of the network itself, but also because the applying of the different methods is a learning tool for future network building. Finally, it is clear that the importance of specialized professionals in enabling effective communication between different groups should not be underestimated.
Due to the growing amount of biological knowledge that is incorporated into metabolic network models, their analysis has become more and more challenging. Here, we examine the capabilities of the recently introduced chemical organization theory (OT) to ease this task. Considering only network stoichiometry, the theory allows the prediction of all potentially persistent species sets and therewith rigorously relates the structure of a network to its potential dynamics. By this, the phenotypes implied by a metabolic network can be predicted without the need for explicit knowledge of the detailed reaction kinetics.
We propose an approach to deal with regulation – and especially inhibitory interactions – in chemical organization theory. One advantage of this approach is that the metabolic network and its regulation are represented in an integrated way as one reaction network. To demonstrate the feasibility of this approach we examine a model by Covert and Palsson (J Biol Chem, 277(31), 2002) of the central metabolism of E. coli that incorporates the regulation of all involved genes. Our method correctly predicts the known growth phenotypes on 16 different substrates. Without specific assumptions, organization theory correctly predicts the lethality of knockout experiments in 101 out of 116 cases. Taking into account the same model specific assumptions as in the regulatory flux balance analysis (rFBA) by Covert and Palsson, the same performance is achieved (106 correctly predicted cases). Two model specific assumptions had to be considered: first, we have to assume that secreted molecules do not influence the regulatory system, and second, that metabolites with increasing concentrations indicate a lethal state.
The introduced approach to model a metabolic network and its regulation in an integrated way as one reaction network makes organization analysis a universal technique to study the potential behavior of biological network models. Applying multiple methods like OT and rFBA is shown to be valuable to uncover critical assumptions and helps to improve model coherence.
A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort. Furthermore, as entire genomes are being sequenced and an increasing amount of genetic and expression data are being generated, a challenge remains: linking phenotypic variation to the underlying genomic variation. To identify candidate genes and understand the molecular basis underlying the phenotypic variation of traits, bioinformatic approaches are needed to exploit information such as genetic map, expression and whole genome sequence data of organisms in biological databases.
The Sol Genomics Network (SGN, http://solgenomics.net) is a primary repository for phenotypic, genetic, genomic, expression and metabolic data for the Solanaceae family and other related Asterids species and houses a variety of bioinformatics tools. SGN has implemented a new approach to QTL data organization, storage, analysis, and cross-links with other relevant data in internal and external databases. The new QTL module, solQTL, http://solgenomics.net/qtl/, employs a user-friendly web interface for uploading raw phenotype and genotype data to the database, R/QTL mapping software for on-the-fly QTL analysis and algorithms for online visualization and cross-referencing of QTLs to relevant datasets and tools such as the SGN Comparative Map Viewer and Genome Browser. Here, we describe the development of the solQTL module and demonstrate its application.
solQTL allows Solanaceae researchers to upload raw genotype and phenotype data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations. Exploration and synthesis of the relevant data is expected to help facilitate identification of candidate genes underlying phenotypic variation and markers more closely linked to QTLs. solQTL is freely available on SGN and can be used in private or public mode.
Motivation: An important question that has emerged from the recent success of genome-wide association studies (GWAS) is how to detect genetic signals beyond single markers/genes in order to explore their combined effects on mediating complex diseases and traits. Integrative testing of GWAS association data with that from prior-knowledge databases and proteome studies has recently gained attention. These methodologies may hold promise for comprehensively examining the interactions between genes underlying the pathogenesis of complex diseases.
Methods: Here, we present a dense module searching (DMS) method to identify candidate subnetworks or genes for complex diseases by integrating the association signal from GWAS datasets into the human protein–protein interaction (PPI) network. The DMS method extensively searches for subnetworks enriched with low P-value genes in GWAS datasets. Compared with pathway-based approaches, this method introduces flexibility in defining a gene set and can effectively utilize local PPI information.
Results: We implemented the DMS method in an R package, which can also evaluate and graphically represent the results. We demonstrated DMS in two GWAS datasets for complex diseases, i.e. breast cancer and pancreatic cancer. For each disease, the DMS method successfully identified a set of significant modules and candidate genes, including some well-studied genes not detected in the single-marker analysis of GWA studies. Functional enrichment analysis and comparison with previously published methods showed that the genes we identified by DMS have higher association signal.
Availability: dmGWAS package and documents are available at http://bioinfo.mc.vanderbilt.edu/dmGWAS.html.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein–protein interactions, disease phenotype similarities, and known gene–phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype–genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.
disease gene; human disease; modularity; network; prioritization
The variation in antibody response to vaccination likely involves small contributions of numerous genetic variants, such as single-nucleotide polymorphisms (SNPs), which interact in gene networks and pathways. To accumulate the bits of genetic information relevant to the phenotype that are distributed throughout the interaction network, we develop a network eigenvector centrality algorithm (SNPrank) that is sensitive to the weak main effects, gene–gene interactions and small higher-order interactions through hub effects. Analogous to Google PageRank, we interpret the algorithm as the simulation of a random SNP surfer (RSS) that accumulates bits of information in the network through a dynamic probabilistic Markov chain. The transition matrix for the RSS is based on a data-driven genetic association interaction network (GAIN), the nodes of which are SNPs weighted by the main-effect strength and edges weighted by the gene–gene interaction strength. We apply SNPrank to a GAIN analysis of a candidate-gene association study on human immune response to smallpox vaccine. SNPrank implicates a SNP in the retinoid X receptor α (RXRA) gene through a network interaction effect on antibody response. This vitamin A- and D-signaling mediator has been previously implicated in human immune responses, although it would be neglected in a standard analysis because its significance is unremarkable outside the context of its network centrality. This work suggests SNPrank to be a powerful method for identifying network effects in genetic association data and reveals a potential vitamin regulation network association with antibody response.
genetic association study; gene–gene interaction; single-nucleotide polymorphism; information theory; eigenvector centrality; Markov chain
Motivation: Networks and pathways are important in describing the collective biological function of molecular players such as genes or proteins. In many areas of biology, for example in cancer studies, available data may harbour undiscovered subtypes which differ in terms of network phenotype. That is, samples may be heterogeneous with respect to underlying molecular networks. This motivates a need for unsupervised methods capable of discovering such subtypes and elucidating the corresponding network structures.
Results: We exploit recent results in sparse graphical model learning to put forward a ‘network clustering’ approach in which data are partitioned into subsets that show evidence of underlying, subset-level network structure. This allows us to simultaneously learn subset-specific networks and corresponding subset membership under challenging small-sample conditions. We illustrate this approach on synthetic and proteomic data.
Supplementary information: Supplementary data are available at Bioinformatics online.
Deciphering the biological networks underlying complex phenotypic traits, e.g., human disease is undoubtedly crucial to understand the underlying molecular mechanisms and to develop effective therapeutics. Due to the network complexity and the relatively small number of available experiments, data-driven modeling is a great challenge for deducing the functions of genes/ proteins in the network and in phenotype formation. We propose a novel knowledge-driven systems biology method that utilizes qualitative knowledge to construct a Dynamic Bayesian network (DBN) to represent the biological network underlying a specific phenotype. Edges in this network depict physical interactions between genes and/or proteins. A qualitative knowledge model first translates typical molecular interactions into constraints when resolving the DBN structure and parameters. Therefore, the uncertainty of the network is restricted to a subset of models which are consistent with the qualitative knowledge. All models satisfying the constraints are considered as candidates for the underlying network. These consistent models are used to perform quantitative inference. By in silico inference, we can predict phenotypic traits upon genetic interventions and perturbing in the network. We applied our method to analyze the puzzling mechanism of breast cancer cell proliferation network and we accurately predicted cancer cell growth rate upon manipulating (anti)cancerous marker genes/proteins.
Dynamic Bayesian network; genetic network; phenotype prediction; genetic intervention; systems biology; breast cancer; cell proliferation
Motivation: The availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein–protein interaction (PPI) network using graph theoretic analysis. Despite the recent progress, systems level analysis of high-throughput PPIs remains a daunting task because of the amount of data they present. In this article, we propose a novel PPI network decomposition algorithm called FACETS in order to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. FACETS finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity.
Results: We tested our algorithm on the global networks from IntAct, and compared it with gold standard datasets from MIPS and KEGG. We demonstrated the performance of FACETS. We also performed a case study that illustrates the utility of our approach.
firstname.lastname@example.org or email@example.com
Supplementary data are available at the Bioinformatics online.
Availability: Our software is available freely for non-commercial purposes from: http://www.cais.ntu.edu.sg/∼assourav/Facets/
Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.
We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.
The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.
Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network analysis for uncovering functional organizations and pathways. However, they have been challenged because ofcomplex connectivity. It has been investigated that protein interaction networks are typically characterized by intrinsic topological features: high modularity and hub-oriented structure. Elucidating the structural roles of modules and hubs is a critical step in complex interactome network analysis.
We propose a novel approach to convert the complex structure of an interactome network into hierarchical ordering of proteins. This algorithm measures functional similarity between proteins based on the path strength model, and reveals a hub-oriented tree structure hidden in the complex network. We score hub confidence and identify functional modules in the tree structure of proteins, retrieved by our algorithm. Our experimental results in the yeast protein interactome network demonstrate that the selected hubs are essential proteins for performing functions. In network topology, they have a role in bridging different functional modules. Furthermore, our approach has high accuracy in identifying functional modules hierarchically distributed.
Decomposing, converting, and synthesizing complex interaction networks are fundamental tasks for modeling their structural behaviors. In this study, we systematically analyzed complex interactome network structures for retrievingfunctional information. Unlike previous hierarchical clustering methods, this approach dynamically explores the hierarchical structure of proteins in a global view. It is well-applicable to the interactome networks in high-level organisms because of its efficiency and scalability.
Within organisms, groups of traits with different functions are frequently modular, such that variation among modules is independent and variation within modules is tightly integrated, or correlated. Here, we investigated patterns of trait integration and modularity in Brassica rapa in response to three simulated seasonal temperature/photoperiod conditions. The goals of this research were to use trait correlations to understand patterns of trait integration and modularity within and among floral, vegetative and phenological traits of B. rapa in each of three treatments, to examine the QTL architecture underlying patterns of trait integration and modularity, and to quantify how variation in temperature and photoperiod affects the correlation structure and QTL architecture of traits. All floral organs of B. rapa were strongly correlated, and contrary to expectations, floral and vegetative traits were also correlated. Extensive QTL co-localization suggests that covariation of these traits is likely due to pleiotropy, although physically linked loci that independently affect individual traits cannot be ruled out. Across treatments, the structure of genotypic and QTL correlations was generally conserved. Any observed variation in genetic architecture arose from genotype × environment interactions (GEIs) and attendant QTL × E in response to temperature but not photoperiod.
Brassica rapa; flowers; modularity; photoperiod; temperature; trait integration
Molecular predictor is a new tool for disease diagnosis, which uses gene expression to classify diagnostic category of a patient. The statistical challenge for constructing such a predictor is that there are thousands of genes to predict for the disease categories, but only a small number of samples are available.
We proposed a gene network modular-based linear discriminant analysis approach by integrating 'essential' correlation structure among genes into the predictor in order that the modules or cluster structures of genes, which are related to the diagnostic classes we look for, can have potential biological interpretation. We evaluated performance of the new method with other established classification methods using three real data sets.
Our results show that the new approach has the advantage of computational simplicity and efficiency with relatively lower classification error rates than the compared methods in many cases. The modular-based linear discriminant analysis approach induced in the study has the potential to increase the power of discriminant analysis for which sample sizes are small and there are large number of genes in the microarray studies.
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.
Why do some people with the same type of cancer die early and some live long? Apart from influences from the environment and personal lifestyle, we believe that differences in the individual tumor genome account for different survival times. Recently, powerful methods have become available to systematically read genomic information of patient samples. The major remaining challenge is how to spot, among the thousands of changes, those few that are relevant for tumor aggressiveness and thereby affecting patient survival. Here, we make use of the fact that genes and proteins in a cell never act alone, but form a network of interactions. Finding the relevant information in big networks of web documents and hyperlinks has been mastered by Google with their PageRank algorithm. Similar to PageRank, we have developed an algorithm that can identify genes that are better indicators for survival than genes found by traditional algorithms. Our method can aid the clinician in deciding if a patient should receive chemotherapy or not. Reliable prediction of survival and response to therapy based on molecular markers bears a great potential to improve and personalize patient therapies in the future.
A key goal of biomedical research is to elucidate the complex network of gene interactions underlying complex traits such as common human diseases. Here we detail a multistep procedure for identifying potential key drivers of complex traits that integrates DNA-variation and gene-expression data with other complex trait data in segregating mouse populations. Ordering gene expression traits relative to one another and relative to other complex traits is achieved by systematically testing whether variations in DNA that lead to variations in relative transcript abundances statistically support an independent, causative or reactive function relative to the complex traits under consideration. We show that this approach can predict transcriptional responses to single gene–perturbation experiments using gene-expression data in the context of a segregating mouse population. We also demonstrate the utility of this approach by identifying and experimentally validating the involvement of three new genes in susceptibility to obesity.
Identifying associated phenotypes of proteins is a challenge of the modern genetics since the multifactorial trait often results from contributions of many proteins. Besides the high-through phenotype assays, the computational methods are alternative ways to identify the phenotypes of proteins.
Here, we proposed a new method for predicting protein phenotypes in yeast based on protein-protein interaction network. Instead of only the most likely phenotype, a series of possible phenotypes for the query protein were generated and ranked acording to the tethering potential score. As a result, the first order prediction accuracy of our method achieved 65.4% evaluated by Jackknife test of 1,267 proteins in budding yeast, much higher than the success rate (15.4%) of a random guess. And the likelihood of the first 3 predicted phenotypes including all the real phenotypes of the proteins was 70.6%.
The candidate phenotypes predicted by our method provided useful clues for the further validation. In addition, the method can be easily applied to the prediction of protein associated phenotypes in other organisms.