Adipose tissue inflammation increases with obesity, but adipocyte vs. immune cell contributions are unclear. In the present study, transcriptome analyses were performed on highly-purified subcutaneous adipocytes from lean and obese women, and differentially expressed genes/pathways were determined in both adipocyte and stromal vascular fraction (SVF) samples. Adipocyte but not SVF expression of NOD-like receptor pathway genes, including NLRP3 and PYCARD, which regulate caspase-1-mediated IL-1β secretion, correlated with adiposity phenotypes and adipocyte class II major histocompatibility complex (MHCII) gene expression, but only MHCII remained after adjusting for age and body mass index. IFNγ stimulated adipocyte MHCII, NLRP3 and caspase-1 expression, while adipocyte MHCII-mediated CD4+ T cell activation, an important factor in adipose inflammation, induced IFNγ-dependent adipocyte IL-1β secretion. These results uncover a dialogue regulated by interactions among T cell IFNγ and adipocyte MHCII and NLRP3 inflammasome activity that appears to initiate and escalate adipose tissue inflammation during obesity.
Obesity; adipocyte; transcriptome analysis; network modeling; NOD-like receptor pathway; inflammasome
An imaged-based profiling and analysis system was developed to predict clinically
effective synergistic drug combinations that could accelerate the identification of
effective multi-drug therapies for the treatment of triple-negative breast cancer (TNBC)
and other challenging malignancies. The identification of effective drug combinations for
the treatment of triple-negative breast cancer was achieved by integrating high-content
screening, computational analysis, and experimental biology. The approach was based on
altered cellular phenotypes induced by 55 FDA-approved drugs and biologically active
compounds, acquired using fluorescence microscopy and retained in multivariate compound
profiles. Dissimilarities between compound profiles guided the identification of 5
combinations, which were assessed for qualitative interaction on TNBC cell growth. The
combination of the microtubule-targeting drug vinblastine with KSP/Eg5 motor protein
inhibitors monastrol or ispinesib showed potent synergism in 3 independent TNBC cell
lines, which was not substantiated in normal fibroblasts. The synergistic interaction was
mediated by an increase in mitotic arrest with cells demonstrating typical
ispinesib-induced monopolar mitotic spindles, which translated into enhanced apoptosis
induction. The antitumor activity of the combination vinblastine / ispinesib was confirmed
in an orthotopic mouse model of TNBC. Compared to single drug treatment, combination
treatment significantly reduced tumor growth without causing increased toxicity.
Image-based profiling and analysis led to the rapid discovery of a drug combination
effective against TNBC in vitro and in vivo, and has the
potential to lead to the development of new therapeutic options in other hard-to-treat
High-content screening; compound profiling; synergy; triple-negative breast cancer; KSP/Eg5 inhibitors; microtubule-targeting agents
A new type of signaling network element, called cancer signaling bridges (CSB), has been shown to have the potential for systematic and fast-tracked drug repositioning. On the basis of CSBs, we developed a computational model to derive specific downstream signaling pathways that reveal previously unknown target–disease connections and new mechanisms for specific cancer subtypes. The model enables us to reposition drugs based on available patient gene expression data. We applied this model to repurpose known or shelved drugs for brain, lung, and bone metastases of breast cancer with the hypothesis that cancer subtypes have their own specific signaling mechanisms. To test the hypothesis, we addressed specific CSBs for each metastasis that satisfy (i) CSB proteins are activated by the maximal number of enriched signaling pathways specific to a given metastasis, and (ii) CSB proteins are involved in the most differential expressed coding genes specific to each breast cancer metastasis. The identified signaling networks for the three types of breast cancer metastases contain 31, 15, and 18 proteins and are used to reposition 15, 9, and 2 drug candidates for the brain, lung, and bone metastases. We conducted both in vitro and in vivo preclinical experiments as well as analysis on patient tumor specimens to evaluate the targets and repositioned drugs. Of special note, we found that the Food and Drug Administration-approved drugs, sunitinib and dasatinib, prohibit brain metastases derived from breast cancer, addressing one particularly challenging aspect of this disease.
Triple negative breast cancer (TNBC) is known to contain a high percentage of CD44+/CD24−/low cancer stem cells (CSC), corresponding with a poor prognosis despite systemic chemotherapy. Chloroquine (CQ), an anti-malarial drug, is a lysotropic reagent which inhibits autophagy. CQ was identified as a potential CSC inhibitor through in silico gene expression signature analysis of the CD44+/CD24−/low CSC population. Autophagy plays a critical role in adaptation to stress conditions in cancer cells, and is related with drug resistance and CSC maintenance. Thus the objectives of this study were to examine the potential enhanced efficacy arising from addition of chloroquine (CQ) to standard chemotherapy (paclitaxel) in TNBC and to identify the mechanism by which CQ eliminates CSCs in TNBCs. Herein, we report that CQ sensitizes TNBC cells to paclitaxel through inhibition of autophagy and reduces the CD44+/CD24−/low CSC population in both preclinical and clinical settings. Also, we are the first to report a mechanism by which CQ regulates the CSCs in TNBC through inhibition of the Janus-activated kinase 2 (Jak2) - Signal transducer and activator of transcription 3 (STAT3) signaling pathway by reducing the expression of Jak2 and DNA methyltransferase 1 (DNMT1).
Breast Cancer Stem Cells; Autophagy; Chloroquine; DNMT1; Jak2
Irinotecan (CPT-11) induced diarrhea occurs frequently in cancer patients and limits its usage. Bacteria β-glucuronidase (GUS) enzymes in intestines convert the non-toxic metabolite of CPT-11, SN-38G, to toxic SN-38, and finally lead to damage of intestinal epithelial cells and diarrhea. We previously reported amoxapine as potent GUS inhibitor in vitro. To further understand the molecular mechanism of amoxapine and its potential for treatment of CPT-11 induced diarrhea, we studied the binding modes of amoxapine and its metabolites by docking and molecular dynamics simulation, and tested the in vivo efficacy on mice in combination with CPT-11.
The binding of amoxapine, its metabolites, 7-hydroxyamoxapine and 8-hydroxyamoxapine, and a control drug loxapine with GUS was explored by computational protocols. The in vitro potencies of metabolites were measured by E. Coli GUS enzyme and cell-based assay. Low dosage daily oral administration was designed to use along with CPT-11 to treat tumor-bearing mice.
Computational modeling results indicated that amoxapine and its metabolites bound in the active site of GUS and satisfied critical pharmacophore features: aromatic features near bacterial loop residue F365’ and hydrogen bond toward E413. Amoxapine and its metabolites were demonstrated as potent in vitro. Administration of low dosages of amoxapine with CPT-11 in mice achieved significant suppression of diarrhea and reduced tumor growth.
Amoxapine has great clinical potential to be rapidly translated to human subjects for irinotecan induced diarrhea.
amoxapine; β-glucuronidase; irinotecan (CPT-11); molecular docking and dynamics simulation; drug reposition
Personalized genomics instability, e.g., somatic mutations, is believed to contribute to the heterogeneous drug responses in patient cohorts. However, it is difficult to discover personalized driver mutations that are predictive of drug sensitivity owing to diverse and complex mutations of individual patients. To circumvent this problem, a novel computational method is presented to discover potential drug sensitivity relevant cancer subtypes and identify driver mutation modules of individual subtypes by coupling differentially expressed genes (DEGs) based subtyping analysis with the driver mutation network analysis.
The proposed method was applied to breast cancer and lung cancer samples available from The Cancer Genome Atlas (TCGA). Cancer subtypes were uncovered with significantly different survival rates, and more interestingly, distinct driver mutation modules were also discovered among different subtypes, indicating the potential mechanism of heterogeneous drug sensitivity.
The research findings can be used to help guide the repurposing of known drugs and their combinations in order to target these dysfunctional modules and their downstream signaling effectively for achieving personalized or precision medicine treatment.
Dendritic spines, the bulbous protrusions that form the postsynaptic half of excitatory synapses, are one of the most prominent features of neurons and have been imaged and studied for over a century. In that time, changes in the number and morphology of dendritic spines have been correlated to the developmental process as well as the pathophysiology of a number of neurodegenerative diseases. Due to the sheer scale of synaptic connectivity in the brain, work to date has merely scratched the surface in the study of normal spine function and pathology. This review will highlight traditional approaches to the imaging of dendritic spines and newer approaches made possible by advances in microscopy, protein engineering, and image analysis. The review will also describe recent work that is leading researchers toward the possibility of a systematic and comprehensive study of spine anatomy throughout the brain.
Microscopy; Image Analysis; Dendritic spines; Two-photon; Informatics; Transgenic mice
Adipose-resident T-cells (ARTs) regulate metabolic and inflammatory responses in obesity, but ART activation signals are poorly understood. Here, we describe class II major histocompatibility complex (MHCII) as an important component of high-fat diet (HFD)-induced obesity. Microarray analysis of primary adipocytes revealed that multiple genes involved in MHCII antigen processing and presentation increased in obese women. In mice, adipocyte MHCII increased within two weeks HFD, paralleling increases in pro-inflammatory and decreases in anti-inflammatory ART markers, and preceding adipose tissue macrophage (ATM) accumulation and pro-inflammatory M1 polarization. Mouse 3T3-L1 and primary adipocytes activated T-cells in an antigen-specific, contact-dependent manner, indicating adipocyte MHCII is functional. HFD-fed MHCII−/− mice developed less adipose inflammation and insulin resistance than wild-type mice, despite developing similar adiposity. These investigations uncover a mechanism whereby a HFD-induced adipocyte/ART dialogue involving MHCII instigates adipose inflammation and, together with ATM MHCII, escalates its progression.
Obesity; adipose tissue; MHCII; T-cell activation
Recent reports indicate that a subgroup of tumor cells named cancer stem cells (CSCs) or tumor initiating cells (TICs) are responsible for tumor initiation, growth and drug resistance. This subgroup of tumor cells has self-renewal capacity and could differentiate into heterogeneous tumor cell populations through asymmetric proliferation. The idea of CSC provides informative insights into tumor initiation, metastasis and treatment. However, the underlying mechanisms of CSCs regulating tumor behaviors are unclear due to the complex cancer system. To study the functions of CSCs in the complex tumor system, a few mathematical modeling studies have been proposed. Whereas, the effect of microenvironment (mE) factors, the behaviors of CSCs, progenitor tumor cells (PCs) and differentiated tumor cells (TCs), and the impact of CSC fraction and signaling heterogeneity, are not adequately explored yet.
In this study, a novel 3D multi-scale mathematical modeling is proposed to investigate the behaviors of CSCsin tumor progressions. The model integrates CSCs, PCs, and TCs together with a few essential mE factors. With this model, we simulated and investigated the tumor development and drug response under different CSC content and heterogeneity.
The simulation results shown that the fraction of CSCs plays a critical role in driving the tumor progression and drug resistance. It is also showed that the pure chemo-drug treatment was not a successful treatment, as it resulted in a significant increase of the CSC fraction. It further shown that the self-renew heterogeneity of the initial CSC population is a cause of the heterogeneity of the derived tumors in terms of the CSC fraction and response to drug treatments.
The proposed 3D multi-scale model provides a new tool for investigating the behaviors of CSC in CSC-initiated tumors, which enables scientists to investigate and generate testable hypotheses about CSCs in tumor development and drug response under different microenvironments and drug perturbations.
Subtypes are widely found in cancer. They are characterized with different behaviors in clinical and molecular profiles, such as survival rates, gene signature and copy number aberrations (CNAs). While cancer is generally believed to have been caused by genetic aberrations, the number of such events is tremendous in the cancer tissue and only a small subset of them may be tumorigenic. On the other hand, gene expression signature of a subtype represents residuals of the subtype-specific cancer mechanisms. Using high-throughput data to link these factors to define subtype boundaries and identify subtype-specific drivers, is a promising yet largely unexplored topic.
We report a systematic method to automate the identification of cancer subtypes and candidate drivers. Specifically, we propose an iterative algorithm that alternates between gene expression clustering and gene signature selection. We applied the method to datasets of the pediatric cerebellar tumor medulloblastoma (MB). The subtyping algorithm consistently converges on multiple datasets of medulloblastoma, and the converged signatures and copy number landscapes are also found to be highly reproducible across the datasets. Based on the identified subtypes, we developed a PCA-based approach for subtype-specific identification of cancer drivers. The top-ranked driver candidates are found to be enriched with known pathways in certain subtypes of MB. This might reveal new understandings for these subtypes.
This article is an extended abstract of our ICCABS '12 paper (Chen et al. 2012), with revised methods in iterative subtyping, the use of canonical correlation analysis for driver-identification, and an extra dataset (Northcott90 dataset) for cross-validations. Discussions of the algorithm performance and of the slightly different gene lists identified are also added.
Our study indicates that subtype-signature defines the subtype boundaries, characterizes the subtype-specific processes and can be used to prioritize signature-related drivers.
subtypes of cancer; medulloblastoma; gene signature; copy number aberrations; microarrays; driver genes
Automatic or semi-automatic segmentation and tracking of artery trees from computed tomography angiography (CTA) is an important step to improve the diagnosis and treatment of artery diseases, but it still remains a significant challenging problem. In this paper, we present an artery extraction method to address the challenge. The proposed method consists of two steps: (1) a geometric moments based tracking to secure a rough centerline, and (2) a fully automatic generalized cylinder structure-based snake method to refine the centerlines and estimate the radii of the arteries. In this method, a new line direction based on first and second order geometric moments is adopted while both gradient and intensity information are used in the snake model to improve the accuracy. The approach has been evaluated on synthetic images as well as 8 clinical coronary CTA images with 32 coronary arteries. Our method achieves 94.7% overlap tracking ability within an average distance inside the vessel of 0.36mm.
Predicting drug-protein interactions from heterogeneous biological data sources is a key step for in silico drug discovery. The difficulty of this prediction task lies in the rarity of known drug-protein interactions and myriad unknown interactions to be predicted. To meet this challenge, a manifold regularization semi-supervised learning method is presented to tackle this issue by using labeled and unlabeled information which often generates better results than using the labeled data alone. Furthermore, our semi-supervised learning method integrates known drug-protein interaction network information as well as chemical structure and genomic sequence data.
Using the proposed method, we predicted certain drug-protein interactions on the enzyme, ion channel, GPCRs, and nuclear receptor data sets. Some of them are confirmed by the latest publicly available drug targets databases such as KEGG.
We report encouraging results of using our method for drug-protein interaction network reconstruction which may shed light on the molecular interaction inference and new uses of marketed drugs.
A variety of algorithms have been proposed for brain tumor segmentation from multi-channel sequences, however, most of them require isotropic or pseudo-isotropic resolution of the MR images. Although co-registration and interpolation of low-resolution sequences, such as T2-weighted images, onto the space of the high-resolution image, such as T1-weighted image, can be performed prior to the segmentation, the results are usually limited by partial volume effects due to interpolation of low resolution images. To improve the quality of tumor segmentation in clinical applications where low-resolution sequences are commonly used together with high-resolution images, we propose the algorithm based on Spatial accuracy-weighted Hidden Markov random field and Expectation maximization (SHE) approach for both automated tumor and enhanced-tumor segmentation. SHE incorporates the spatial interpolation accuracy of low-resolution images into the optimization procedure of the Hidden Markov Random Field (HMRF) to segment tumor using multi-channel MR images with different resolutions, e.g., high-resolution T1-weighted and low-resolution T2-weighted images. In experiments, we evaluated this algorithm using a set of simulated multi-channel brain MR images with known ground-truth tissue segmentation and also applied it to a dataset of MR images obtained during clinical trials of brain tumor chemotherapy. The results show that more accurate tumor segmentation results can be obtained by comparing with conventional multi-channel segmentation algorithms.
DNA copy number aberration (CNA) is very important in the pathogenesis of tumors and other diseases. For example, CNAs may result in suppression of anti-oncogenes and activation of oncogenes, which would cause certain types of cancers. High density single nucleotide polymorphism (SNP) array data is widely used for the CNA detection. However, it is nontrivial to detect the CNA automatically because the signals obtained from high density SNP arrays often have low signal-to-noise ratio (SNR), which might be caused by whole genome amplification, mixtures of normal and tumor cells, experimental noise or other technical limitations. With the reduction in SNR, many false CNA regions are often detected and the true CNA regions are missed. Thus, more sophisticated statistical models are needed to make the CNAs detection, using the low SNR signals, more robust and reliable.
This paper presents a conditional random pattern (CRP) model for CNA detection where much contextual cues are explored to suppress the noise and improve CNA detection accuracy. Both simulated and the real data are used to evaluate the proposed model, and the validation results show that the CRP model is more robust and reliable in the presence of noise for CNA detection using high density SNP array data, compared to a number of widely used software packages.
The proposed conditional random pattern (CRP) model could effectively detect the CNA regions in the presence of noise.
The human brain cortex is a highly convoluted sheet. Mapping of the cortical surface into a canonical coordinate space is an important tool for the study of the structure and function of the brain. Here, we present a technique based on least-square conformal mapping with spring energy for the mapping of the cortical surface. This method aims to reduce the metric and area distortion while maintaining the conformal map and computation efficiency. We demonstrate through numerical results that this method effectively controls metric and area distortion, and is computational efficient. This technique is particularly useful for fast visualization of the brain cortex.
conformal mapping; brain cortex; spring energy; brain mapping; angle distortion; area distortion
Cancer and Alzheimer's disease (AD) are two seemingly distinct diseases and rarely occur simultaneously in patients. To explore molecular determinants differentiating pathogenic routes towards AD or cancer, we investigate the role of amyloid β protein (Aβ) on multiple tumor cell lines that are stably expressing luciferase (human glioblastoma U87; human breast adenocarcinoma MDA-MB231; and mouse melanoma B16F).
Quantification of the photons emitted from the MDA-MB231 or B16F cells revealed a significant inhibition of cell proliferation by the conditioning media (CM) derived from amyloid precursor protein (APP) over-expressing cells. The inhibition of U87 cells was observed only after the media was conditioned for longer than 2 days with APP over-expressing cells.
Our results suggest that Aβ plays an inhibitory role in tumor cell proliferation; this effect could depend on the type of tumor cells and amount of Aβ.
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.
High content screening (HCS)-based image analysis is becoming an important and widely used research tool. Capitalizing this technology, ample cellular information can be extracted from the high content cellular images. In this study, an automated, reliable and quantitative cellular image analysis system developed in house has been employed to quantify the toxic responses of human H4 neuroglioma cells exposed to metal oxide nanoparticles. This system has been proved to be an essential tool in our study.
The cellular images of H4 neuroglioma cells exposed to different concentrations of CuO nanoparticles were sampled using IN Cell Analyzer 1000. A fully automated cellular image analysis system has been developed to perform the image analysis for cell viability. A multiple adaptive thresholding method was used to classify the pixels of the nuclei image into three classes: bright nuclei, dark nuclei, and background. During the development of our image analysis methodology, we have achieved the followings: (1) The Gaussian filtering with proper scale has been applied to the cellular images for generation of a local intensity maximum inside each nucleus; (2) a novel local intensity maxima detection method based on the gradient vector field has been established; and (3) a statistical model based splitting method was proposed to overcome the under segmentation problem. Computational results indicate that 95.9% nuclei can be detected and segmented correctly by the proposed image analysis system.
The proposed automated image analysis system can effectively segment the images of human H4 neuroglioma cells exposed to CuO nanoparticles. The computational results confirmed our biological finding that human H4 neuroglioma cells had a dose-dependent toxic response to the insult of CuO nanoparticles.
Reliable segmentation of cell nuclei from three dimensional (3D) microscopic images is an important task in many biological studies. We present a novel, fully automated method for the segmentation of cell nuclei from 3D microscopic images. It was designed specifically to segment nuclei in images where the nuclei are closely juxtaposed or touching each other. The segmentation approach has three stages: 1) a gradient diffusion procedure, 2) gradient flow tracking and grouping, and 3) local adaptive thresholding.
Both qualitative and quantitative results on synthesized and original 3D images are provided to demonstrate the performance and generality of the proposed method. Both the over-segmentation and under-segmentation percentages of the proposed method are around 5%. The volume overlap, compared to expert manual segmentation, is consistently over 90%.
The proposed algorithm is able to segment closely juxtaposed or touching cell nuclei obtained from 3D microscopy imaging with reasonable accuracy.
Automated identification of cell cycle phases of individual live cells in a large population captured via automated fluorescence microscopy technique is important for cancer drug discovery and cell cycle studies. Time-lapse fluorescence microscopy images provide an important method to study the cell cycle process under different conditions of perturbation. Existing methods are limited in dealing with such time-lapse data sets while manual analysis is not feasible. This paper presents statistical data analysis and statistical pattern recognition to perform this task.
The data is generated from Hela H2B GFP cells imaged during a 2-day period with images acquired 15 minutes apart using an automated time-lapse fluorescence microscopy. The patterns are described with four kinds of features, including twelve general features, Haralick texture features, Zernike moment features, and wavelet features. To generate a new set of features with more discriminate power, the commonly used feature reduction techniques are used, which include Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Maximum Margin Criterion (MMC), Stepwise Discriminate Analysis based Feature Selection (SDAFS), and Genetic Algorithm based Feature Selection (GAFS). Then, we propose a Context Based Mixture Model (CBMM) for dealing with the time-series cell sequence information and compare it to other traditional classifiers: Support Vector Machine (SVM), Neural Network (NN), and K-Nearest Neighbor (KNN). Being a standard practice in machine learning, we systematically compare the performance of a number of common feature reduction techniques and classifiers to select an optimal combination of a feature reduction technique and a classifier. A cellular database containing 100 manually labelled subsequence is built for evaluating the performance of the classifiers. The generalization error is estimated using the cross validation technique. The experimental results show that CBMM outperforms all other classifies in identifying prophase and has the best overall performance.
The application of feature reduction techniques can improve the prediction accuracy significantly. CBMM can effectively utilize the contextual information and has the best overall performance when combined with any of the previously mentioned feature reduction techniques.
Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities.
We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins.
The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.
We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence.