Temporal lobe epilepsy (TLE) and idiopathic generalized epilepsy (IGE) patients have each been associated with extensive brain atrophy findings, yet to date there are no reports of head to head comparison of both patient groups. Our aim was to assess and compare between tissue-specific and structural brain atrophy findings in TLE to IGE patients and to healthy controls (HC).
TLE patients were classified in TLE lesional (L-TLE) or non-lesional (NL-TLE) based on presence or absence of MRI temporal structural abnormalities. High resolution 3 T MRI with automated segmentation by SIENAX and FIRST tools were performed in a group of patients with temporal lobe epilepsy (11 L-TLE and 15 NL-TLE) and in15 IGE as well as in 26 HC. Normal brain volume (NBV), normal grey matter volume (NGMV), normal white matter volume (NWMV), and volumes of subcortical deep grey matter structures were quantified. Using regression analyses, differences between the groups in both volume and left/right asymmetry were evaluated. Additionally, laterality of results was also evaluated to separately quantify ipsilateral and contralateral effects in the TLE group.
All epilepsy groups had significantly lower NBV and NWMV compared to HC (p < 0.001). L-TLE had lower hippocampal volume than HC and IGE (p = 0.001), and all epilepsy groups had significantly lower amygdala volume than HC (p < = 0.004). In L-TLE, there was evidence of atrophy in both ipsilateral and contralateral structures.
Our study revealed that TLE and IGE patients demonstrated similar overall tissue-specific brain atrophy, although specific structures differences were appreciated. L-TLE also appeared to behave differently than NL-TLE, with atrophy not limited to the ipsilateral side.
Temporal lobe epilepsy; Idiopathic generalized epilepsy; MRI segmentation; Brain atrophy
Multiple sclerosis (MS) is associated with ectopic lymphoid follicle formation. Podoplanin+ (lymphatic marker) T helper17 (Th17) cells and B cell aggregates have been implicated in the formation of tertiary lymphoid organs (TLOs) in MS and experimental autoimmune encephalitis (EAE). Since podoplanin expressed by Th17 cells in MS brains is also expressed by lymphatic endothelium, we investigated whether the pathophysiology of MS involves inductions of lymphatic proteins in the inflamed neurovasculature.
We assessed the protein levels of lymphatic vessel endothelial hyaluronan receptor and podoplanin, which are specific to the lymphatic system and prospero-homeobox protein-1, angiopoietin-2, vascular endothelial growth factor-D, vascular endothelial growth factor receptor-3, which are expressed by both lymphatic endothelium and neurons. Levels of these proteins were measured in postmortem brains and sera from MS patients, in the myelin proteolipid protein (PLP)-induced EAE and Theiler’s murine encephalomyelitis virus (TMEV) induced demyelinating disease (TMEV-IDD) mouse models and in cell culture models of inflamed neurovasculature.
Results and conclusions
Intense staining for LYVE-1 was found in neurons of a subset of MS patients using immunohistochemical approaches. The lymphatic protein, podoplanin, was highly expressed in perivascular inflammatory lesions indicating signaling cross-talks between inflamed brain vasculature and lymphatic proteins in MS. The profiles of these proteins in MS patient sera discriminated between relapsing remitting MS from secondary progressive MS and normal patients. The in vivo findings were confirmed in the in vitro cell culture models of neuroinflammation.
Prox-1; Angiopoietin-2; VEGFR-3; VEGF-D; LYVE-1; Podoplanin/D2-40
To investigate the associations of environmental MS risk factors with clinical and MRI measures of progression in high-risk clinically isolated syndromes (CIS) after the first demyelinating event.
We analyzed 211 CIS patients (age: 28.9±7.8 years) enrolled in the SET study, a multi-center study of high-risk CIS patients. Pre-treatment samples were analyzed for IgG antibodies against cytomegalovirus (anti-CMV), Epstein Barr virus (EBV) early nuclear antigen-1 (EBNA-1), viral capsid antigen (VCA), early antigen-diffuse (EA-D), 25 hydroxy-vitamin D3 and cotinine levels and HLA DRB1*1501 status. The inclusion criteria required evaluation within 4 months of the initial demyelinating event, 2 or more brain MRI lesions and the presence of two or more oligoclonal bands in cerebrospinal fluid. All patients were treated with interferon-beta. Clinical and MRI assessments were obtained at baseline, 6, 12, and 24 months.
The time to first relapse decreased and the number of relapses increased with anti-CMV IgG positivity. Smoking was associated with increased number and volume of contrast-enhancing lesions (CEL) during the 2-year period. The cumulative number of CEL and T2 lesions during the 2-year period was greater for individuals in the highest quartile of anti-EBV VCA IgG antibodies. The percent loss of brain volume was increased for those in the highest quartile of with anti-EBV VCA IgG antibodies.
Relapses in CIS patients were associated with CMV positivity whereas anti-EBV VCA positivity was associated with progression on MRI measures, including accumulation of CEL and T2 lesions and development of brain atrophy.
Chronic cerebrospinal venous insufficiency (CCSVI) is a vascular condition characterized by anomalies of the primary veins outside the skull that has been reported to be associated with MS. In the blinded Combined Transcranial (TCD) and Extracranial Venous Doppler Evaluation (CTEVD) study, we found that prevalence of CCSVI was significantly higher in multiple sclerosis (MS) vs. healthy controls (HC) (56.1% vs. 22.7%, p < 0.001).
The objective was to evaluate the clinical correlates of venous anomalies indicative of CCSVI in patients with MS.
The original study enrolled 499 subjects; 163 HC, 289 MS, 21 CIS and 26 subjects with other neurological disorders who underwent a clinical examination and a combined Doppler and TCD scan of the head and neck. This analysis was restricted to adult subjects with MS (RR-MS: n = 181, SP-MS: n = 80 and PP-MS: n = 12). Disability status was evaluated by using the Kurtzke Expanded Disability Status Scale (EDSS) and MS severity scale (MSSS).
Disability was not associated with the presence (≥2 venous hemodynamic criteria) or the severity of CCSVI, as measured with venous hemodynamic insufficiency severity score (VHISS). However, the severity of CCSVI was associated with the increased brainstem functional EDSS sub-score (p = 0.002). In logistic regression analysis, progressive MS (SP-MS or PP-MS) vs. non-progressive status (including RR-MS) was associated with CCSVI diagnosis (p = 0.004, OR = 2.34, CI = 1.3–4.2).
The presence and severity of CCVSI in multiple sclerosis correlate with disease status but has no or very limited association with clinical disability.
Multiple sclerosis; Disease progression; Disability; Echo-color Doppler; Venous anomalies; CCSVI
Information-theoretic metrics have been proposed for studying gene-gene and gene-environment interactions in genetic epidemiology. Although these metrics have proven very promising, they are typically interpreted in the context of communications and information transmission, diminishing their tangibility for epidemiologists and statisticians.
In this paper, we clarify the interpretation of information-theoretic metrics. In particular, we develop the methods so that their relation to the global properties of probability models is made clear and contrast them with log-linear models for multinomial data. Hopefully, a better understanding of their properties and probabilistic implications will promote their acceptance and correct usage in genetic epidemiology. Our novel development also suggests new approaches to model search and computation.
gene-environment interaction; gene-gene interactions; K-way Interaction Index; information theory
The role of intra- and extra-cranial venous system impairment in the pathogenesis of various vascular, inflammatory and neurodegenerative neurological disorders, as well as in aging, has not been studied in detail. Nor have risk factors been determined for increased susceptibility of venous pathology in the intra-cranial and extra-cranial veins. The aim of this study was to investigate the association between presence of a newly proposed vascular condition called chronic cerebrospinal venous insufficiency (CCSVI) and environmental factors in a large volunteer control group without known central nervous system pathology.
Methods and Findings
The data were collected in a prospective study from 252 subjects who were screened for medical history as part of the entry criteria and participated in the case-control study of CCSVI prevalence in multiple sclerosis (MS) patients, and then were analyzed post-hoc. All participants underwent physical and Doppler sonography examinations, and were assessed with a structured environmental questionnaire. Fullfilment of ≥2 positive venous hemodynamic (VH) criteria on Doppler sonography was considered indicative of CCSVI diagnosis. Risk and protective factors associated with CCSVI were analyzed using logistic regression analysis. Seventy (27.8%) subjects presented with CCSVI diagnosis and 153 (60.7%) presented with one or more VH criteria. The presence of heart disease (p = .001), especially heart murmurs (p = .007), a history of infectious mononucleosis (p = .002), and irritable bowel syndrome (p = .005) were associated with more frequent CCSVI diagnosis. Current or previous smoking (p = .029) showed a trend for association with more frequent CCSVI diagnosis, while use of dietary supplements (p = .018) showed a trend for association with less frequent CCSVI diagnosis.
Risk factors for CCSVI differ from established risk factors for peripheral venous diseases. Vascular, infectious and inflammatory factors were associated with higher CCSVI frequency.
The breakdown of the blood-brain-barrier vascular endothelium is critical for entry of immune cells into the MS brain. Vascular co-morbidities are associated with increased risk of progression. Dyslipidemia, elevated LDL and reduced HDL may increase progression by activating inflammatory processes at the vascular endothelium.
To assess the associations of serum lipid profile variables (triglycerides, high and low density lipoproteins (HDL, LDL) and total cholesterol) with disability and MRI measures in multiple sclerosis (MS).
This study included 492 MS patients (age: 47.1 ± 10.8 years; disease duration: 12.8 ± 10.1 years) with baseline and follow-up Expanded Disability Status Score (EDSS) assessments after a mean period of 2.2 ± 1.0 years. The associations of baseline lipid profile variables with disability changes were assessed. Quantitative MRI findings at baseline were available for 210 patients.
EDSS worsening was associated with higher baseline LDL (p = 0.006) and total cholesterol (p = 0.001, 0.008) levels, with trends for higher triglyceride (p = 0.025); HDL was not associated. A similar pattern was found for MSSS worsening. Higher HDL levels (p < 0.001) were associated with lower contrast-enhancing lesion volume. Higher total cholesterol was associated with a trend for lower brain parenchymal fraction (p = 0.033).
Serum lipid profile has modest effects on disease progression in MS. Worsening disability is associated with higher levels of LDL, total cholesterol and triglycerides. Higher HDL is associated with lower levels of acute inflammatory activity.
Multiple sclerosis; diet; lipid profile; MRI; environmental factors; gene-environment interactions; lesion volume; brain atrophy
Chronic cerebrospinal venous insufficiency (CCSVI) was described as a vascular condition characterized by anomalies of veins outside the skull was reported to be associated with multiple sclerosis (MS). The objective was to assess the associations between HLA DRB1*1501 status and the occurrence of CCSVI in MS patients.
This study included 423 of 499 subjects enrolled in the Combined Transcranial and Extracranial Venous Doppler Evaluation (CTEVD) study. The HLA DRB1*1501 status was obtained in 268 MS patients and 155 controls by genotyping rs3135005, a SNP associated with DRB1*1501 status. All subjects underwent a clinical examination and Doppler scan of the head and neck. The frequency of CCSVI was higher (OR = 4.52, p<0.001) in the MS group 56.0% vs. 21.9% in the controls group and also higher in the progressive MS group 69.8% vs. 49.5% in the non-progressive MS group. The 51.9% frequency of HLA DRB1*1501 positivity (HLA+) in MS was higher compared (OR = 2.33, p<0.001) to 31.6% to controls. The HLA+ frequency in the non-progressive (51.6%) and progressive MS groups (52.3%) was similar. The frequency of HLA+ CCSVI+ was 40.7% in progressive MS, 27.5% in non-progressive MS and 8.4% in controls. The presence of CCSVI was independent of HLA DRB1*1501 status in MS patients.
The lack of strong associations of CCSVI with HLA DRB1*1501 suggests that the role of the underlying associations of CCSVI in MS should be interpreted with caution. Further longitudinal studies should determine whether interactions between these factors can contribute to disease progression in MS.
We developed an information-theoretic metric called the Interaction Index for prioritizing genetic variations and environmental variables for follow-up in detailed sequencing studies. The Interaction Index was found to be effective for prioritizing the genetic and environmental variables involved in GEI for a diverse range of simulated data sets. The metric was also evaluated for a 103-SNP Crohn’s disease dataset and a simulated data set containing 9187 SNPs and multiple covariates that was modeled on a rheumatoid arthritis data set. Our results demonstrate that the Interaction Index algorithm is effective and efficient for prioritizing interacting variables for a diverse range of epidemiologic data sets containing complex combinations of direct effects, multiple GGI and GEI.
gene-environment interactions; gene-gene interactions; K-way interaction information
Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem. However, the power of information theoretic methods for interaction analysis has not been systematically evaluated. In this work, we compare power and Type I error of an information-theoretic approach to existing interaction analysis methods.
The k-way interaction information (KWII) metric for identifying variable combinations involved in gene-gene interactions (GGI) was assessed using several simulated data sets under models of genetic heterogeneity driven by susceptibility increasing loci with varying allele frequency, penetrance values and heritability. The power and proportion of false positives of the KWII was compared to multifactor dimensionality reduction (MDR), restricted partitioning method (RPM) and logistic regression.
The power of the KWII was considerably greater than MDR on all six simulation models examined. For a given disease prevalence at high values of heritability, the power of both RPM and KWII was greater than 95%. For models with low heritability and/or genetic heterogeneity, the power of the KWII was consistently greater than RPM; the improvements in power for the KWII over RPM ranged from 4.7% to 14.2% at for α = 0.001 in the three models at the lowest heritability values examined. KWII performed similar to logistic regression.
Information theoretic models are flexible and have excellent power to detect GGI under a variety of conditions that characterize complex diseases.
Several highly pathogenic avian influenza (AI) outbreaks have been reported over the past decade. South Korea recently faced AI outbreaks whose economic impact was estimated to be 6.3 billion dollars, equivalent to nearly 50% of the profit generated by the poultry-related industries in 2008. In addition, AI is threatening to cause a human pandemic of potentially devastating proportions. Several studies show that a stochastic simulation model can be used to plan an efficient containment strategy on an emerging influenza. Efficient control of AI outbreaks based on such simulation studies could be an important strategy in minimizing its adverse economic and public health impacts.
We constructed a spatio-temporal multi-agent model of chickens and ducks in poultry farms in South Korea. The spatial domain, comprised of 76 (37.5 km × 37.5 km) unit squares, approximated the size and scale of South Korea. In this spatial domain, we introduced 3,039 poultry flocks (corresponding to 2,231 flocks of chickens and 808 flocks of ducks) whose spatial distribution was proportional to the number of birds in each province. The model parameterizes the properties and dynamic behaviors of birds in poultry farms and quarantine plans and included infection probability, incubation period, interactions among birds, and quarantine region.
We conducted sensitivity analysis for the different parameters in the model. Our study shows that the quarantine plan with well-chosen values of parameters is critical for minimize loss of poultry flocks in an AI outbreak. Specifically, the aggressive culling plan of infected poultry farms over 18.75 km radius range is unlikely to be effective, resulting in higher fractions of unnecessarily culled poultry flocks and the weak culling plan is also unlikely to be effective, resulting in higher fractions of infected poultry flocks.
Our results show that a prepared response with targeted quarantine protocols would have a high probability of containing the disease. The containment plan with an aggressive culling plan is not necessarily efficient, causing a higher fraction of unnecessarily culled poultry farms. Instead, it is necessary to balance culling with other important factors involved in AI spreading. Better estimations for the containment of AI spreading with this model offer the potential to reduce the loss of poultry and minimize economic impact on the poultry industry.
Gene × gene interactions play important roles in the etiology of complex multi-factorial diseases like rheumatoid arthritis (RA). In this paper, we describe our use of a two-stage search strategy consisting of information theoretic methods and logistic regression to detect gene × gene interactions associated with RA using the data in Problem 1 of Genetic Analysis Workshop 16. Our method detected interactions of several SNPs (single-SNP and SNP × SNP) that are located on chromosomal regions linked to RA and related diseases in previous studies.
The purpose of this research was to develop a novel information theoretic method and an efficient algorithm for analyzing the gene-gene (GGI) and gene-environmental interactions (GEI) associated with quantitative traits (QT). The method is built on two information-theoretic metrics, the k-way interaction information (KWII) and phenotype-associated information (PAI). The PAI is a novel information theoretic metric that is obtained from the total information correlation (TCI) information theoretic metric by removing the contributions for inter-variable dependencies (resulting from factors such as linkage disequilibrium and common sources of environmental pollutants).
The KWII and the PAI were critically evaluated and incorporated within an algorithm called CHORUS for analyzing QT. The combinations with the highest values of KWII and PAI identified each known GEI associated with the QT in the simulated data sets. The CHORUS algorithm was tested using the simulated GAW15 data set and two real GGI data sets from QTL mapping studies of high-density lipoprotein levels/atherosclerotic lesion size and ultra-violet light-induced immunosuppression. The KWII and PAI were found to have excellent sensitivity for identifying the key GEI simulated to affect the two quantitative trait variables in the GAW15 data set. In addition, both metrics showed strong concordance with the results of the two different QTL mapping data sets.
The KWII and PAI are promising metrics for analyzing the GEI of QT.
We developed an information-theoretic metric called the Interaction Index for prioritizing genetic variations and environmental variables for follow-up in detailed sequencing studies. The Interaction Index was found to be effective for prioritizing the genetic and environmental variables involved in GEI for a diverse range of simulated data sets. The metric was also evaluated for a 103-SNP Crohn's disease dataset and a simulated data set containing 9187 SNPs and multiple covariates that was modeled on a rheumatoid arthritis data set. Our results demonstrate that the Interaction Index algorithm is effective and efficient for prioritizing interacting variables for a diverse range of epidemiologic data sets containing complex combinations of direct effects, multiple GGI and GEI.
gene–environment interactions; gene–gene interactions; K-way interaction information
Data visualization techniques for the pharmaceutical sciences have not been extensively investigated. The purpose of this study was to evaluate the usefulness of VizStruct, a multidimensional visualization tool, for applications in pharmacokinetics, pharmacodynamics, and pharmacogenomics.
The VizStruct tool uses the first harmonic of the discrete Fourier transform to map multidimensional data to two dimensions for visualization. The mapping was used to visualize several published pharmacokinetic, pharmacodynamic, and pharmacogenomic data sets. The VizStruct approach was evaluated using simulated population pharmacokinetics data sets, the data from Dalen and colleagues (Clin. Pharmacol. Ther. 63:444−452, 1998) on the kinetics of nortriptyline and its 10-hydroxy-nortriptyline metabolite in subjects with differing number of copies of the CYP2D6, and the gene expression profiling data of Bohen and colleagues (Proc. Natl. Acad. Sci. USA 100:1926−1930, 2003) on follicular lymphoma patients responsive and nonresponsive to rituximab.
The VizStruct mapping preserves the key characteristics of multidimensional data in two dimensions in a manner that facilitates visualization. The mapping is computationally efficient and can be used for cluster detection and class prediction in pharmaceutical data sets. The VizStruct visualization succinctly summarized the salient similarities and differences in the nortriptyline and 10-hydroxynortriptyline pharmacokinetic profiles in subjects with increasing number of CYP2D6 gene copies. In the simulated population pharmacokinetic data sets, it was capable of discriminating the subtle differences between pharmacokinetic profiles derived from 1- and 2-compartment models with the same area under the curve. The two-dimensional VizStruct mapping computed from a subset of 102 informative genes from the Bohen and colleagues data set effectively separated the rituximab responder, rituximab nonresponder, and control subject groups.
The VizStruct approach is a computationally efficient and effective approach for visualizing complex, multidimensional data sets. It could have many useful applications in the pharmaceutical sciences.
microarray; pharmacodynamics; pharmacogenomic modeling; pharmacokinetics; visualization algorithms
DNA arrays provide a broad snapshot of the state of the cell by measuring the expression levels of thousands of genes simultaneously. Visualization techniques can enable the exploration and detection of patterns and relationships in a complex data set by presenting the data in a graphical format in which the key characteristics become more apparent. The dimensionality and size of array data sets however present significant challenges to visualization. The purpose of this study is to present an interactive approach for visualizing variations in gene expression profiles and to assess its usefulness for classifying samples.
The first Fourier harmonic projection was used to map multi-dimensional gene expression data to two dimensions in an implementation called VizStruct. The visualization method was tested using the differentially expressed genes identified in eight separate gene expression data sets. The samples were classified using the oblique decision tree (OC1) algorithm to provide a procedure for visualization-driven classification. The classifiers were evaluated by the holdout and the cross-validation techniques. The proposed method was found to achieve high accuracy.
Detailed mathematical derivation of all mapping properties as well as figures in color can be found as supplementary on the web page http://www.cse.buffalo.edu/DBGROUP/bioinformatics/supplementary/vizstruct. All programs were written in Java and Matlab and software code is available by request from the first author.
To evaluate a semi-parametric, model-based approach for obtaining transcription rates from mRNA and protein expression.
The transcription profile input was modeled using an exponential function of a cubic spline and the dynamics of translation; mRNA and protein degradation were modeled using the Hargrove–Schmidt model. The transcription rate profile and the translation, and mRNA and protein degradation rate constants were estimated by the maximum likelihood method.
Simulated datasets generated from the stochastic, transit compartment and dispersion signaling models were used to test the approach. The approach satisfactorily fit the mRNA and protein data, and accurately recapitulated the parameter and the normalized transcription rate profile values. The approach was successfully used to model published data on tyrosine aminotransferase pharmacodynamics.
The semi-parametric approach is effective and could be useful for delineating the genomic effects of drugs.
Code suitable for use with the ADAPT software program is available from the corresponding author.
This study was conducted to evaluate the applicability of SPLINDID, a semiparametric, model-based approach for obtaining transcription rates from the pharmacodynamics of mRNA expression.
A nonparametric exponential cubic spline function was used to obtain the transcription rate profile and the dynamics of mRNA expression was fitted using compartmental approaches. The transcription rate profile and mRNA degradation parameter was estimated using maximum likelihood method of ADAPT II software.
Data sets containing noise for mRNA levels were simulated for four diverse pharmaceutically relevant conditions: receptor nonlinearity, a model in which the variant mRNAs differing in mRNA degradation constants were transcribed and for a minimal model of the cell cycle. SPLINDID was able to fit the data sets and accurately recapitulate the transcription rate profiles normalized to the mRNA degradation rate constants. The model was also challenged using experimental data containing time profiles of cell-cycle-regulated genes.
The SPLINDID approach is flexible in capturing complicated/complex mRNA profiles that are encountered in many experimental data sets.
exponential splines; microarray; pharmacodynamics; pharmacokinetics; pharmacogenomic modeling
Biomedical research is now generating large amounts of data, ranging from clinical test results to microarray gene expression profiles. The scale and complexity of these datasets give rise to substantial challenges in data management and analysis. It is highly desirable that data warehousing and online analytical processing technologies can be applied to biomedical data integration and mining. The major difficulty probably lies in the task of capturing and modelling diverse biological objects and their complex relationships. This paper describes multidimensional data modelling for biomedical data warehouse design. Since the conventional models such as star schema appear to be insufficient for modelling clinical and genomic data, we develop a new model called BioStar schema. The new model can capture the rich semantics of biomedical data and provide greater extensibility for the fast evolution of biological research methodologies.
clinical and genomic data integration; multidimensional modeling; data warehouse design
DNA arrays permit rapid, large-scale screening for patterns of gene expression and simultaneously yield the expression levels of thousands of genes for samples. The number of samples is usually limited, and such datasets are very sparse in high-dimensional gene space. Furthermore, most of the genes collected may not necessarily be of interest and uncertainty about which genes are relevant makes it difficult to construct an informative gene space. Unsupervised empirical sample pattern discovery and informative genes identification of such sparse high-dimensional datasets present interesting but challenging problems.
A new model called empirical sample pattern detection (ESPD) is proposed to delineate pattern quality with informative genes. By integrating statistical metrics, data mining and machine learning techniques, this model dynamically measures and manipulates the relationship between samples and genes while conducting an iterative detection of informative space and the empirical pattern. The performance of the proposed method with various array datasets is illustrated.
The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the functional analysis because the function of unknown proteins can be postulated on the basis of their interaction evidence with known proteins. The protein-protein interaction data sets have been enriched by high-throughput experimental methods. However, the functional analysis using the interaction data has a limitation in accuracy because of the presence of the false positive data experimentally generated and the interactions that are a lack of functional linkage.
Protein-protein interaction data can be integrated with the functional knowledge existing in the Gene Ontology (GO) database. We apply similarity measures to assess the functional similarity between interacting proteins. We present a probabilistic framework for predicting functions of unknown proteins based on the functional similarity. We use the leave-one-out cross validation to compare the performance. The experimental results demonstrate that our algorithm performs better than other competing methods in terms of prediction accuracy. In particular, it handles the high false positive rates of current interaction data well.
The experimentally determined protein-protein interactions are erroneous to uncover the functional associations among proteins. The performance of function prediction for uncharacterized proteins can be enhanced by the integration of multiple data sources available.
Quantitative characterization of the topological characteristics of protein-protein interaction (PPI) networks can enable the elucidation of biological functional modules. Here, we present a novel clustering methodology for PPI networks wherein the biological and topological influence of each protein on other proteins is modeled using the probability distribution that the series of interactions necessary to link a pair of distant proteins in the network occur within a time constant (the occurrence probability).
CASCADE selects representative nodes for each cluster and iteratively refines clusters based on a combination of the occurrence probability and graph topology between every protein pair. The CASCADE approach is compared to nine competing approaches. The clusters obtained by each technique are compared for enrichment of biological function. CASCADE generates larger clusters and the clusters identified have p-values for biological function that are approximately 1000-fold better than the other methods on the yeast PPI network dataset. An important strength of CASCADE is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches which have an average discard rate of 45% on the yeast protein-protein interaction network.
CASCADE is effective at detecting biologically relevant clusters of interactions.
The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms.
We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches.
The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.
The sparse connectivity of protein-protein interaction data sets makes identification of functional modules challenging. The purpose of this study is to critically evaluate a novel clustering technique for clustering and detecting functional modules in protein-protein interaction networks, termed STM.
STM selects representative proteins for each cluster and iteratively refines clusters based on a combination of the signal transduced and graph topology. STM is found to be effective at detecting clusters with a diverse range of interaction structures that are significant on measures of biological relevance. The STM approach is compared to six competing approaches including the maximum clique, quasi-clique, minimum cut, betweeness cut and Markov Clustering (MCL) algorithms. The clusters obtained by each technique are compared for enrichment of biological function. STM generates larger clusters and the clusters identified have p-values that are approximately 125-fold better than the other methods on biological function. An important strength of STM is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches.
STM outperforms competing approaches and is capable of effectively detecting both densely and sparsely connected, biologically relevant functional modules with fewer discards.