Retinoid x receptor α (RXRα) is abundantly expressed in the liver and is essential for the function of other nuclear receptors. Using chromatin immunoprecipitation sequencing and mRNA profiling data generated from wild type and RXRα-null mouse livers, the current study identifies the bona-fide hepatic RXRα targets and biological pathways. In addition, based on binding and motif analysis, the molecular mechanism by which RXRα regulates hepatic genes is elucidated in a high-throughput manner.
Close to 80% of hepatic expressed genes were bound by RXRα, while 16% were expressed in an RXRα-dependent manner. Motif analysis predicted direct repeat with a spacer of one nucleotide as the most prevalent RXRα binding site. Many of the 500 strongest binding motifs overlapped with the binding motif of specific protein 1. Biological functional analysis of RXRα-dependent genes revealed that hepatic RXRα deficiency mainly resulted in up-regulation of steroid and cholesterol biosynthesis-related genes and down-regulation of translation- as well as anti-apoptosis-related genes. Furthermore, RXRα bound to many genes that encode nuclear receptors and their cofactors suggesting the central role of RXRα in regulating nuclear receptor-mediated pathways.
This study establishes the relationship between RXRα DNA binding and hepatic gene expression. RXRα binds extensively to the mouse genome. However, DNA binding does not necessarily affect the basal mRNA level. In addition to metabolism, RXRα dictates the expression of genes that regulate RNA processing, translation, and protein folding illustrating the novel roles of hepatic RXRα in post-transcriptional regulation.
The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799,0.782, 0.787, and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, respectively. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on physical principles can be highly useful for testing the robustness of predictive models.
Protein acidostability is a common problem in biopharmaceutical and other industries. However, it remains a great challenge to engineer proteins for enhanced acidostability because our knowledge of protein acidostabilization is still very limited. In this paper, we present a comparative study of proteins from bacteria with acidic (AP) and neutral cytoplasms (NP) using an integrated statistical and machine learning approach. We construct a set of 393 non-redundant AP-NP ortholog pairs and calculate a total of 889 sequence based features for these proteins. The pairwise alignments of these ortholog pairs are used to build a residue substitution propensity matrix between APs and NPs. We use Gini importance provided by the Random Forest algorithm to rank the relative importance of these features. A scoring function using the 10 most significant features is developed and optimized using a hill climbing algorithm. The accuracy of the score function is 86.01% in predicting AP-NP ortholog pairs and is 76.65% in predicting non-ortholog AP-NP pairs, suggesting that there are significant differences between APs and NPs which can be used to predict relative acidostability of proteins. The overall trends uncovered in the study can be used as general guidelines for designing acidostable proteins. To best of our knowledge, this work represents the first systematic comparative study of the acidostable proteins and their non-acidostable orthologs.
Designing proteins with enhanced thermo-stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo-stable proteins are in critical demand. Here we report PROTS, a sequential and structural four-residue fragment based protein thermo-stability potential. PROTS is derived from a non-redundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo-stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermostability changes as well as classify thermophilic and mesophilic proteins. In addition, this white-box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level.
protein stability; thermophilic; prediction; datamining; thermostability potential
Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development.
This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes.
It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.
Gene expression profiles; Gene selection; Tumor classification; Heuristic breadth-first search; Power-law distribution
Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.
A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).
Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.
The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
Based on the available X-ray structures of S-adenosylhomocysteine hydrolases (SAHHs), free energy simulations employing the MM-GBSA approach were applied to predict residues important to the differential cofactor binding properties of human and trypanosomal SAHHs (Hs-SAHH and Tc-SAHH), within 5 Å of the cofactor NAD+/NADH binding site. Among the 38 residues in this region, only four are different between the two enzymes. Surprisingly, the four non-identical residues make no major contribution to differential cofactor binding between Hs-SAHH and Tc-SAHH. On the other hand, four pairs of identical residues are shown by free energy simulations to differentiate cofactor binding between Hs-SAHH and Tc-SAHH. Experimental mutagenesis was performed to test these predictions for a lysine residue and a tyrosine residue of the C-terminal extension that penetrates a partner subunit to form part of the cofactor binding site. The K431A mutant of Tc-SAHH (TcK431A) loses its cofactor binding affinity but retains the wild type’s tetrameric structure, while the corresponding mutant of Hs-SAHH (HsK426A) loses both cofactor affinity and tetrameric structure (Ault-Riche et al., 1994 J Biol Chem, 269, 31472–8). The tyrosine mutants HsY430A and TcY435A alter the NAD+ association and dissociation kinetics, with HsY430A increasing the cofactor equilibrium dissociation constant from approximately 10 nM (Hs-SAHH) to about 800 nM while TcY435A increases the cofactor equilibrium dissociation constant from approximately 100 nM (Tc-SAHH) to about 1 mM. Both changes result from larger increases in off-rate combined with smaller decreases in on-rate. These investigations demonstrate that computational free energy decomposition may be used to guide experimental studies by suggesting sensitive sites for mutagenesis. Our finding that identical residues in two orthologous proteins may give significantly different binding free energy contributions strongly suggests that comparative studies of homologous proteins should investigate not only different residues, but also identical residues in these proteins.
S-adenosyl-L-homocysteine hydrolase; Homo sapiens; Trypanosoma cruzi; cofactor binding kinetics; free-energy simulations; computational alanine scan; C-terminal extensions
Identification of the characteristic structural patterns responsible for protein thermostability is theoretically important and practically useful but largely remains an open problem. These patterns may be revealed through comparative study on thermophilic and mesophilic proteins that have distinct thermostability. In this study we constructed several distance-dependant potentials from thermophilic and mesophilic proteins. These potentials were then used to evaluate the structural difference between thermophilic and mesophilic proteins. We found that using the subtraction or division of the potentials derived from thermophilic and mesophilic proteins can dramatically increase the discriminatory ability. This approach revealed that the ability to distinct the subtle structural features responsible for protein thermostability may be effectively enhanced through rationally designed comparative study.
Thermostability; mesophilic proteins; thermophilic proteins; statistical potential
Brain aging is associated with a progressive decline in cognitive function though the molecular mechanisms remain unknown. Functional changes in brain neurons could be due to age-related alterations in levels of specific proteins critical for information processing. Specialized membrane microdomains known as ‘lipid rafts’ contain protein complexes involved in many signal transduction processes. This study was undertaken to determine if two-dimensional fluorescence difference gel electrophoresis (2D DIGE) analysis of proteins in synaptic membrane lipid rafts revealed age-dependent alterations in levels of raft proteins. Five pairs of young and aged rat synaptic membrane rafts were subjected to DIGE separation, followed by image analysis and identification of significantly altered proteins. Of 1046 matched spots on DIGE gels, 94 showed statistically significant differences in levels between old and young rafts, and 87 of these were decreased in aged rafts. The 41 most significantly altered (p < 0.03) proteins included several synaptic proteins involved in energy metabolism, redox homeostasis, and cytoskeletal structure. This may indicate a disruption in bioenergetic balance and redox homeostasis in synaptic rafts with brain aging. Differential levels of representative identified proteins were confirmed by immunoblot analysis. Our findings provide novel pathways in investigations of mechanisms that may contribute to altered neuronal function in aging brain.
2D DIGE; Brain aging; Energy metabolism; Lipid rafts; Synaptic dysfunction
The post-translational modification of proteins is a well-known endogenous mechanism for regulating protein function and activity. Cellular proteins are also susceptible to post-translational modification by xenobiotic agents that possess, or whose metabolites possess, significant electrophilic character. Such non-physiological modifications to endogenous proteins are sometimes benign, but in other cases they are strongly associated with, and are presumed to cause, lethal cytotoxic consequences via necrosis and/or apoptosis. The Reactive Metabolite Target Protein Database (TPDB) is a searchable, freely web-accessible (http://tpdb.medchem.ku.edu:8080/protein_database/) resource that attempts to provide a comprehensive, up-to-date listing of known reactive metabolite target proteins. In this report we characterize the TPDB by reviewing briefly how the information it contains came to be known. We also compare its information to that provided by other types of “-omics” studies relevant to toxicology, and we illustrate how bioinformatic analysis of target proteins may help to elucidate mechanisms of cytotoxic responses to reactive metabolites.
The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.
We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.
We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.
Despite great advances in the efficiency of analytical and synthetic chemistry, time and available starting material still limit the number of unique compounds that can be practically synthesized and evaluated as prospective therapeutics. Chemical diversity analysis (the capacity to identify finite diverse subsets that reliably represent greater manifolds of drug-like chemicals) thus remains an important resource in drug discovery. Despite an unproven track record, chemical diversity has also been used to posit, from preliminary screen hits, new compounds with similar or better activity. Identifying diversity metrics that demonstrably encode bioactivity trends is thus of substantial potential value for intelligent assembly of targeted screens. This paper reports novel algorithms designed to simultaneously reflect chemical similarity or diversity trends and apparent bioactivity in compound collections. An extensive set of descriptors are evaluated within large NCI screening data sets according to bioactivity differentiation capacities, quantified as the ability to co-localize known active species into bioactive-rich K-means clusters. One method tested for descriptor selection orders features according to relative variance across a set of training compounds, and samples increasingly finer subset meshes for descriptors whose exclusion from the model induces drastic drops in relative bioactive colocalization. This yields metrics with reasonable bioactive enrichment (greater than 50% of all bioactive compounds collected into clusters or cells with significantly enriched active/inactive rates) for each of the four data sets examined herein. A second method replaces variance by an active/inactive divergence score, achieving comparable enrichment via a much more efficient search process. Combinations of the above metrics are tested in 2D rectilinear diversity models, achieving similarly successful colocalization statistics, with metrics derived from the active/inactive divergence score typically outperforming those selected from the variance criterion and computed from the DiverseSolutions software.
Protein covalent binding by reactive metabolites of drugs, chemicals and natural products can lead to acute cytotoxicity. Recent rapid progress in reactive metabolite target protein identification has shown that adduction is surprisingly selective and inspired the hope that analysis of target proteins might reveal protein factors that differentiate target- vs. non-target proteins and illuminate mechanisms connecting covalent binding to cytotoxicity.
Sorting 171 known reactive metabolite target proteins revealed a number of GO categories and KEGG pathways to be significantly enriched in targets, but in most cases the classes were too large, and the "percent coverage" too small, to allow meaningful conclusions about mechanisms of toxicity. However, a similar analysis of the directlyinteracting partners of 28 common targets of multiple reactive metabolites revealed highly significant enrichments in terms likely to be highly relevant to cytotoxicity (e.g., MAP kinase pathways, apoptosis, response to unfolded protein). Machine learning was used to rank the contribution of 211 computed protein features to determining protein susceptibility to adduction. Protein lysine (but not cysteine) content and protein instability index (i.e., rate of turnover in vivo) were among the features most important to determining susceptibility.
As yet there is no good explanation for why some low-abundance proteins become heavily adducted while some abundant proteins become only lightly adducted in vivo. Analyzing the directly interacting partners of target proteins appears to yield greater insight into mechanisms of toxicity than analyzing target proteins per se. The insights provided can readily be formulated as hypotheses to test in future experimental studies.
The Escherichia coli l-rhamnose-responsive transcription activators RhaS and RhaR both consist of two domains, a C-terminal DNA-binding domain and an N-terminal dimerization domain. Both function as dimers and only activate transcription in the presence of l-rhamnose. Here, we examined the ability of the DNA-binding domains of RhaS (RhaS-CTD) and RhaR (RhaR-CTD) to bind to DNA and activate transcription. RhaS-CTD and RhaR-CTD were both shown by DNase I footprinting to be capable of binding specifically to the appropriate DNA sites. In vivo as well as in vitro transcription assays showed that RhaS-CTD could activate transcription to high levels, whereas RhaR-CTD was capable of only very low levels of transcription activation. As expected, RhaS-CTD did not require the presence of l-rhamnose to activate transcription. The upstream half-site at rhaBAD and the downstream half-site at rhaT were found to be the strongest of the known RhaS half-sites, and a new putative RhaS half-site with comparable strength to known sites was identified. Given that cyclic AMP receptor protein (CRP), the second activator required for full rhaBAD expression, cannot activate rhaBAD expression in a ΔrhaS strain, it was of interest to test whether CRP could activate transcription in combination with RhaS-CTD. We found that RhaS-CTD allowed significant activation by CRP, both in vivo and in vitro, although full-length RhaS allowed somewhat greater CRP activation. We conclude that RhaS-CTD contains all of the determinants necessary for transcription activation by RhaS.
The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references.
The toxic effects of many simple organic compounds stem from their biotransformation to chemically reactive metabolites which bind covalently to cellular proteins. To understand the mechanisms of cytotoxic responses it may be important to know which proteins become adducted and whether some may be common targets of multiple toxins. The literature of this field is widely scattered but expanding rapidly, suggesting the need for a comprehensive, searchable database of reactive metabolite target proteins.
The Reactive Metabolite Target Protein Database (TPDB) is a comprehensive, curated, searchable, documented compilation of publicly available information on the protein targets of reactive metabolites of 18 well-studied chemicals and drugs of known toxicity. TPDB software enables i) string searches for author names and proteins names/synonyms, ii) more complex searches by selecting chemical compound, animal species, target tissue and protein names/synonyms from pull-down menus, and iii) commonality searches over multiple chemicals. Tabulated search results provide information, references and links to other databases.
The TPDB is a unique on-line compilation of information on the covalent modification of cellular proteins by reactive metabolites of chemicals and drugs. Its comprehensiveness and searchability should facilitate the elucidation of mechanisms of reactive metabolite toxicity. The database is freely available at
The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge.
A total of 3108 sequence signatures were found, each of which was shared by a set of guest proteins interacting with one of 944 host proteins in Saccharomyces cerevisiae genome. Approximately 94% of these sequence signatures matched entries in InterPro member databases. We identified 84 distinct sequence signatures from the remaining 172 unknown signatures. The signature sharing information was then applied in predicting sub-cellular localization of yeast proteins and the novel signatures were used in identifying possible interacting sites.
We reported a method of PPI data mining that facilitated the discovery of novel sequence signatures using a large PPI dataset from S. cerevisiae genome as input. The fact that 94% of discovered signatures were known validated the ability of the approach to identify large numbers of signatures from PPI data. The significance of these discovered signatures was demonstrated by their application in predicting sub-cellular localizations and identifying potential interaction binding sites of yeast proteins.