PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-8 (8)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Protein Complex Discovery by Interaction Filtering from Protein Interaction Networks Using Mutual Rank Coexpression and Sequence Similarity 
BioMed Research International  2015;2015:165186.
The evaluation of the biological networks is considered the essential key to understanding the complex biological systems. Meanwhile, the graph clustering algorithms are mostly used in the protein-protein interaction (PPI) network analysis. The complexes introduced by the clustering algorithms include noise proteins. The error rate of the noise proteins in the PPI network researches is about 40–90%. However, only 30–40% of the existing interactions in the PPI databases depend on the specific biological function. It is essential to eliminate the noise proteins and the interactions from the complexes created via clustering methods. We have introduced new methods of weighting interactions in protein clusters and the splicing of noise interactions and proteins-based interactions on their weights. The coexpression and the sequence similarity of each pair of proteins are considered the edge weight of the proteins in the network. The results showed that the edge filtering based on the amount of coexpression acts similar to the node filtering via graph-based characteristics. Regarding the removal of the noise edges, the edge filtering has a significant advantage over the graph-based method. The edge filtering based on the amount of sequence similarity has the ability to remove the noise proteins and the noise interactions.
doi:10.1155/2015/165186
PMCID: PMC4322317
2.  Protein-protein interaction networks (PPI) and complex diseases 
The physical interaction of proteins which lead to compiling them into large densely connected networks is a noticeable subject to investigation. Protein interaction networks are useful because of making basic scientific abstraction and improving biological and biomedical applications. Based on principle roles of proteins in biological function, their interactions determine molecular and cellular mechanisms, which control healthy and diseased states in organisms. Therefore, such networks facilitate the understanding of pathogenic (and physiologic) mechanisms that trigger the onset and progression of diseases. Consequently, this knowledge can be translated into effective diagnostic and therapeutic strategies. Furthermore, the results of several studies have proved that the structure and dynamics of protein networks are disturbed in complex diseases such as cancer and autoimmune disorders. Based on such relationship, a novel paradigm is suggested in order to confirm that the protein interaction networks can be the target of therapy for treatment of complex multi-genic diseases rather than individual molecules with disrespect the network.
PMCID: PMC4017556  PMID: 25436094
PPI; Complex diseases; Networks
3.  Analysis of Candidate Genes Has Proposed the Role of Y Chromosome in Human Prostate Cancer 
Background
Prostate cancer, a serious genetic disease, has known as the first widespread cancer in men, but the molecular changes required for the cancer progression has not fully understood. Availability of high-throughput gene expression data has led to the development of various computational methods, for identification of the critical genes, have involved in the cancer.
Methods
In this paper, we have shown the construction of co-expression networks, which have been using Y-chromosome genes, provided an alternative strategy for detecting of new candidate, might involve in prostate cancer. In our approach, we have constructed independent co-expression networks from normal and cancerous stages have been using a reverse engineering approach. Then we have highlighted crucial Y chromosome genes involved in the prostate cancer, by analyzing networks, based on party and date hubs.
Results
Our results have led to the detection of 19 critical genes, related to prostate cancer, which 12 of them have previously shown to be involved in this cancer. Also, essential Y chromosome genes have searched based on reconstruction of sub-networks which have led to the identification of 4 experimentally established as well as 4 new Y chromosome genes might be linked putatively to prostate cancer.
Conclusion
Correct inference of master genes, which mediate molecular, has changed during cancer progression would be one of the major challenges in cancer genomics. In this paper, we have shown the role of Y chromosome genes in finding of the prostate cancer susceptibility genes. Application of our approach to the prostate cancer has led to the establishment of the previous knowledge about this cancer as well as prediction of other new genes.
PMCID: PMC4307103  PMID: 25628841
Co-expression networks; expression data; prostate cancer; reverse engineering approach
4.  Dependency of codon usage on protein sequence patterns: a statistical study 
Background
Codon degeneracy and codon usage by organisms is an interesting and challenging problem. Researchers demonstrated the relation between codon usage and various functions or properties of genes and proteins, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Researchers usually represent segments of proteins responsible for specific functions or structures in a family of proteins as sequence patterns or motifs. We asked the question if organisms use the same codons in pattern segments as compared to the rest of the sequence.
Methods
We used the likelihood ratio test, Pearson’s chi-squared test, and mutual information to compare these two codon usages.
Results
We showed that codon usage, in segments of genes that code for a given pattern or motif in a group of proteins, varied from the rest of the gene. The codon usage in these segments was not random. Amino acids with larger number of codons used more specific codon ratios in these segments. We studied the number of amino acids in the pattern (pattern length). As patterns got longer, there was a slight decrease in the fraction of patterns with significant different codon usage in the pattern region as compared to codon usage in the gene region. We defined a measure of specificity of protein patterns, and studied its relation to the codon usage. The difference in the codon usage between pattern region and gene region, was less for the patterns with higher specificity.
Conclusions
We provided a hypothesis that there are segments on genes that affect the codon usage and thus influence protein translation speed, and these regions are the regions that code protein pattern regions.
doi:10.1186/1742-4682-11-2
PMCID: PMC3896713  PMID: 24410898
Codon usage; Sequence analysis; Protein pattern; Pearson’s chi-squared test; Likelihood ratio test
5.  Prediction of lung tumor types based on protein attributes by machine learning algorithms 
SpringerPlus  2013;2:238.
Early diagnosis of lung cancers and distinction between the tumor types (Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) are very important to increase the survival rate of patients. Herein, we propose a diagnostic system based on sequence-derived structural and physicochemical attributes of proteins that involved in both types of tumors via feature extraction, feature selection and prediction models. 1497 proteins attributes computed and important features selected by 12 attribute weighting models and finally machine learning models consist of seven SVM models, three ANN models and two NB models applied on original database and newly created ones from attribute weighting models; models accuracies calculated through 10-fold cross and wrapper validation (just for SVM algorithms). In line with our previous findings, dipeptide composition, autocorrelation and distribution descriptor were the most important protein features selected by bioinformatics tools. The algorithms performances in lung cancer tumor type prediction increased when they applied on datasets created by attribute weighting models rather than original dataset. Wrapper-Validation performed better than X-Validation; the best cancer type prediction resulted from SVM and SVM Linear models (82%). The best accuracy of ANN gained when Neural Net model applied on SVM dataset (88%). This is the first report suggesting that the combination of protein features and attribute weighting models with machine learning algorithms can be effectively used to predict the type of lung cancer tumors (SCLC and NSCLC).
Electronic supplementary material
The online version of this article (doi:10.1186/2193-1801-2-238) contains supplementary material, which is available to authorized users.
doi:10.1186/2193-1801-2-238
PMCID: PMC3710575  PMID: 23888262
Lung cancer; Prediction; Structural and physicochemical features; Attributes weighting; Support vector machine; Artificial neural network; Naïve bayes
6.  Thermal Unfolding Pathway of PHD2 Catalytic Domain in Three Different PHD2 Species: Computational Approaches 
PLoS ONE  2012;7(10):e47061.
Prolyl hydroxylase domain 2 containing protein (PHD2) is a key protein in regulation of angiogenesis and metastasis. In normoxic condition, PHD2 triggers the degradation of hypoxia-inducible factor 1 (HIF-1α) that induces the expression of hypoxia response genes. Therefore the correct function of PHD2 would inhibit angiogenesis and consequent metastasis of tumor cells in normoxic condition. PHD2 mutations were reported in some common cancers. However, high levels of HIF-1α protein were observed even in normoxic metastatic tumors with normal expression of wild type PHD2. PHD2 malfunctions due to protein misfolding may be the underlying reason of metastasis and invasion in such cases. In this study, we scrutinize the unfolding pathways of the PHD2 catalytic domain’s possible species and demonstrate the properties of their unfolding states by computational approaches. Our study introduces the possibility of aggregation disaster for the prominent species of PHD2 during its partial unfolding. This may justify PHD2 inability to regulate HIF-1α level in some normoxic tumor types.
doi:10.1371/journal.pone.0047061
PMCID: PMC3471951  PMID: 23077544
7.  Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models 
PLoS ONE  2012;7(7):e40017.
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
doi:10.1371/journal.pone.0040017
PMCID: PMC3400626  PMID: 22829872
8.  New scoring schema for finding motifs in DNA Sequences 
BMC Bioinformatics  2009;10:93.
Background
Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions.
Results
We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions.
Conclusion
The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.
doi:10.1186/1471-2105-10-93
PMCID: PMC2679735  PMID: 19302709

Results 1-8 (8)