Enter Your Search:
Results 1-4 (4)
Go to page number:
Select a Filter Below
PLoS ONE (2)
BMC Bioinformatics (1)
Goliaei, Bahram (4)
Hosseinzadeh, Faezeh (2)
Ahrabian, Hayedeh (1)
Ashktorab, Hassan (1)
Dokholyan, Nikolay V. (1)
Ebrahimi, Mansour (1)
Ebrahimi, Mansuor (1)
Hadi-Alijanvand, Hamid (1)
KayvanJoo, Amir Hossein (1)
Moosavi-Movahedi, Ali A. (1)
Nowzari-Dalini, Abbas (1)
Proctor, Elizabeth A. (1)
Sadeghi, Mehdei (1)
Shamabadi, Narges (1)
Uversky, Vladimir N. (1)
Zare-Mirakabad, Fatemeh (1)
Year of Publication
Did you mean:
Prediction of lung tumor types based on protein attributes by machine learning algorithms
KayvanJoo, Amir Hossein
Early diagnosis of lung cancers and distinction between the tumor types (Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) are very important to increase the survival rate of patients. Herein, we propose a diagnostic system based on sequence-derived structural and physicochemical attributes of proteins that involved in both types of tumors via feature extraction, feature selection and prediction models. 1497 proteins attributes computed and important features selected by 12 attribute weighting models and finally machine learning models consist of seven SVM models, three ANN models and two NB models applied on original database and newly created ones from attribute weighting models; models accuracies calculated through 10-fold cross and wrapper validation (just for SVM algorithms). In line with our previous findings, dipeptide composition, autocorrelation and distribution descriptor were the most important protein features selected by bioinformatics tools. The algorithms performances in lung cancer tumor type prediction increased when they applied on datasets created by attribute weighting models rather than original dataset. Wrapper-Validation performed better than X-Validation; the best cancer type prediction resulted from SVM and SVM Linear models (82%). The best accuracy of ANN gained when Neural Net model applied on SVM dataset (88%). This is the first report suggesting that the combination of protein features and attribute weighting models with machine learning algorithms can be effectively used to predict the type of lung cancer tumors (SCLC and NSCLC).
Electronic supplementary material
The online version of this article (doi:10.1186/2193-1801-2-238) contains supplementary material, which is available to authorized users.
Lung cancer; Prediction; Structural and physicochemical features; Attributes weighting; Support vector machine; Artificial neural network; Naïve bayes
Thermal Unfolding Pathway of PHD2 Catalytic Domain in Three Different PHD2 Species: Computational Approaches
Proctor, Elizabeth A.
Dokholyan, Nikolay V.
Moosavi-Movahedi, Ali A.
Uversky, Vladimir N.
Prolyl hydroxylase domain 2 containing protein (PHD2) is a key protein in regulation of angiogenesis and metastasis. In normoxic condition, PHD2 triggers the degradation of hypoxia-inducible factor 1 (HIF-1α) that induces the expression of hypoxia response genes. Therefore the correct function of PHD2 would inhibit angiogenesis and consequent metastasis of tumor cells in normoxic condition. PHD2 mutations were reported in some common cancers. However, high levels of HIF-1α protein were observed even in normoxic metastatic tumors with normal expression of wild type PHD2. PHD2 malfunctions due to protein misfolding may be the underlying reason of metastasis and invasion in such cases. In this study, we scrutinize the unfolding pathways of the PHD2 catalytic domain’s possible species and demonstrate the properties of their unfolding states by computational approaches. Our study introduces the possibility of aggregation disaster for the prominent species of PHD2 during its partial unfolding. This may justify PHD2 inability to regulate HIF-1α level in some normoxic tumor types.
Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
New scoring schema for finding motifs in DNA Sequences
Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions.
We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions.
The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.
Results 1-4 (4)
Go to page number:
Remove citation from clipboard
Add citation to clipboard
This will clear all selections from your clipboard. Do you wish proceed?
Clipboard is full! Please remove an item and try again.
PubMed Central Canada is a service of the
Canadian Institutes of Health Research
(CIHR) working in partnership with the National Research Council's
Canada Institute for Scientific and Technical Information
in cooperation with the
National Center for Biotechnology Information
U.S. National Library of Medicine
(NCBI/NLM). It includes content provided to the
PubMed Central International archive
by participating publishers.