Voltage-sensitive potassium ion channels are essential for life, but the molecular basis of their ion conduction is not well understood. In particular, the impact of ion concentration on ion conduction has not been fully studied. We performed several micro-second molecular dynamics simulations of the pore domain of the Kv1.2 potassium channel in KCl solution at four different ion concentrations, and scrutinized each of the conduction events, based on graphical representations of the simulation trajectories. As a result, we observed that the conduction mechanism switched with different ion concentrations: at high ion concentrations, potassium conduction occurred by Hodgkin and Keynes' knock-on mechanism, where the association of an incoming ion with the channel is tightly coupled with the dissociation of an outgoing ion, in a one-step manner. On the other hand, at low ion concentrations, ions mainly permeated by a two-step association/dissociation mechanism, in which the association and dissociation of ions were not coupled, and occurred in two distinct steps. We also found that this switch was triggered by the facilitated association of an ion from the intracellular side within the channel pore and by the delayed dissociation of the outermost ion, as the ion concentration increased.
Coexpressed gene databases are valuable resources for identifying new gene functions or
functional modules in metabolic pathways and signaling pathways. Although coexpressed gene
databases are a fundamental platform in the field of plant biology, their use in animal
studies is relatively limited. The COXPRESdb (http://coxpresdb.jp) provides coexpression
relationships for multiple animal species, as comparisons of coexpressed gene lists can
enhance the reliability of gene coexpression determinations. Here, we report the updates
of the database, mainly focusing on the following two points. First, we updated our
coexpression data by including recent microarray data for the previous seven species
(human, mouse, rat, chicken, fly, zebrafish and nematode) and adding four new species
(monkey, dog, budding yeast and fission yeast), along with a new human microarray
platform. A reliability scoring function was also implemented, based on coexpression
conservation to filter out coexpression with low reliability. Second, the network drawing
function was updated, to implement automatic cluster analyses with enrichment analyses in
Gene Ontology and in cis elements, along with interactive network analyses with Cytoscape
Web. With these updates, COXPRESdb will become a more powerful tool for analyses of
functional and regulatory networks of genes in a variety of animal species.
With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunctional genes and searching for co-expressed genes under a few specific conditions; that is, a subgroup of all conditions. Biclustering based on a genetic algorithm (GA) has shown better performance than greedy algorithms, but the overlap state for biclusters must be treated more systematically.
We developed a new biclustering algorithm (binary-iterative genetic algorithm [BIGA]), based on an iterative GA, by introducing a novel, ternary-digit chromosome encoding function. BIGA searches for a set of biclusters by iterative binary divisions that allow the overlap state to be explicitly considered. In addition, the average of the Pearson’s correlation coefficient was employed to measure the relationship of genes within a bicluster, instead of the mean square residual, the popular classical index. As compared to the six existing algorithms, BIGA found highly correlated biclusters, with large gene coverage and reasonable gene overlap. The gene ontology (GO) enrichment showed that most of the biclusters are significant, with at least one GO term over represented.
BIGA is a powerful tool to analyze large amounts of gene expression data, and will facilitate the elucidation of the underlying functional mechanisms in living organisms.
biclustering; microarray data; genetic algorithm; Pearson’s correlation coefficient
Pancreatic cancer is among the most lethal malignancies worldwide. This study aimed to identify a novel prognostic biomarker, facilitating treatment selection, using mass spectrometry (MS)-based proteomic analysis with formalin-fixed paraffin-embedded (FFPE) tissue.
The two groups with poor prognosis (n = 4) and with better prognosis (n = 4) had been carefully chosen among 96 resected cases of pancreatic cancer during 1998 to 2007 in Tohoku University Hospital. Although those 2 groups had adjusted background (UICC-Stage IIB, Grade2, R0, gemcitabine adjuvant), there was a significant difference in postoperative mean survival time (poor 21.0 months, better 58.1 months, P = 0.0067). Cancerous epithelial cells collected from FFPE tissue sections by laser micro-dissection (LMD) were processed for liquid chromatography-tandem mass spectrometry (LC-MS/MS). In total, 1099 unique proteins were identified and 6 proteins showed different expressions in the 2 groups by semi-quantitative comparison. Among these 6 proteins, we focused on Nm23/Nucleoside Diphosphate Kinase A (NDPK-A) and immunohistochemically confirmed its expression in the cohort of 96 cases. Kaplan-Meier analysis showed high Nm23/NDPK-A expression to correlate with significantly worse overall survival (P = 0.0103). Moreover, in the multivariate Cox regression model, Nm23/NDPK-A over-expression remained an independent predictor of poor survival with a hazard ratio of 1.97 (95% CI 1.16-3.56, P = 0.0110).
We identified 6 candidate prognostic markers for postoperative pancreatic cancer using FFPE tissues and immunohistochemically demonstrated high Nm23/NDPK-A expression to be a useful prognostic marker for pancreatic cancer.
Proteomics; Prognostic biomarker; Formalin-fixed paraffin-embedded (FFPE); Laser micro-dissection (LMD); Liquid chromatography-tandem mass spectrometry (LC-MS/MS); Nm23/Nucleoside Diphosphate Kinase A (NDPK-A)
Gene co-expression, in the form of a correlation coefficient, has been valuable in the analysis, classification and prediction of protein-protein interactions. However, it is susceptible to bias from a few samples having a large effect on the correlation coefficient. Gene co-expression stability is a means of quantifying this bias, with high stability indicating robust, unbiased co-expression correlation coefficients. We assess the utility of gene co-expression stability as an additional measure to support the co-expression correlation in the analysis of protein-protein interaction networks.
We studied the patterns of co-expression correlation and stability in interacting proteins with respect to their interaction promiscuity, levels of intrinsic disorder, and essentiality or disease-relatedness. Co-expression stability, along with co-expression correlation, acts as a better classifier of hub proteins in interaction networks, than co-expression correlation alone, enabling the identification of a class of hubs that are functionally distinct from the widely accepted transient (date) and obligate (party) hubs. Proteins with high levels of intrinsic disorder have low co-expression correlation and high stability with their interaction partners suggesting their involvement in transient interactions, except for a small group that have high co-expression correlation and are typically subunits of stable complexes. Similar behavior was seen for disease-related and essential genes. Interacting proteins that are both disordered have higher co-expression stability than ordered protein pairs. Using co-expression correlation and stability, we found that transient interactions are more likely to occur between an ordered and a disordered protein while obligate interactions primarily occur between proteins that are either both ordered, or disordered.
We observe that co-expression stability shows distinct patterns in structurally and functionally different groups of proteins and interactions. We conclude that it is a useful and important measure to be used in concert with gene co-expression correlation for further insights into the characteristics of proteins in the context of their interaction network.
ATTED-II (http://atted.jp) is a gene coexpression database for a wide variety of experimental designs, such as prioritizations of genes for functional identification and analyses of the regulatory relationships among genes. Here, we report updates of ATTED-II focusing on two new features: condition-specific coexpression and homologous coexpression with rice. To analyze a broad range of biological phenomena, it is important to collect data under many diverse experimental conditions, but the meaning of coexpression can become ambiguous under these conditions. One approach to overcome this difficulty is to calculate the coexpression for each set of conditions with a clear biological meaning. With this viewpoint, we prepared five sets of experimental conditions (tissue, abiotic stress, biotic stress, hormones and light conditions), and users can evaluate the coexpression by employing comparative gene lists and switchable gene networks. We also developed an interactive visualization system, using the Cytoscape web system, to improve the network representation. As the second update, rice coexpression is now available. The previous version of ATTED-II was specifically developed for Arabidopsis, and thus coexpression analyses for other useful plants have been difficult. To solve this problem, we extended ATTED-II by including comparison tables between Arabidopsis and rice. This representation will make it possible to analyze the conservation of coexpression among flowering plants. With the ability to investigate condition-specific coexpression and species conservation, ATTED-II can help researchers to clarify the functional and regulatory networks of genes in a broad array of plant species.
Arabidopsis; Comparative transcriptomics; Database; Gene coexpression; Gene network; Rice
Publicly available databases of coexpressed gene sets are a valuable resource for a wide variety of experimental studies, including gene targeting for functional identification, and for investigations of regulatory mechanisms or protein–protein interaction networks. Although coexpressed gene databases are becoming more and more popular in the field of plant biology, those with animal data are rather limited, possibly due to the lower reliability of the coexpression data. The original COXPRESdb (coexpressed gene database) (http://coxpresdb.jp) represented the coexpression relationship for human and mouse. Here, we report updates of this database that especially focus on the enhancement of the reliability of gene coexpression data in animals. For this purpose, we implemented a new comparable coexpression measure, Mutual Rank, included five other animal species, rat, chicken, zebrafish, fly and nematoda, to assess the conservation of coexpression, and added different layers of omics data into the integrated network of genes. Comparison of coexpression is a key concept to enhance the reliability of gene coexpression, and the integration of different information can reduce the noise inherent in the information. With the functions for gene network representation, COXPRESdb can help researchers to clarify the functional and regulatory networks of genes in a broad array of animal species.
Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
Motivation: The identification of putative ligand-binding sites on proteins is important for the prediction of protein function. Knowledge-based approaches using structure databases have become interesting, because of the recent increase in structural information. Approaches using binding motif information are particularly effective. However, they can only be applied to well-known ligands that frequently appear in the structure databases.
Results: We have developed a new method for predicting the binding sites of chemically diverse ligands, by using information about the interactions between fragments. The selection of the fragment size is important. If the fragments are too small, then the patterns derived from the binding motifs cannot be used, since they are many-body interactions, while using larger fragments limits the application to well-known ligands. In our method, we used the main and side chains for proteins, and three successive atoms for ligands, as fragments. After superposition of the fragments, our method builds the conformations of ligands and predicts the binding sites. As a result, our method could accurately predict the binding sites of chemically diverse ligands, even though the Protein Data Bank currently contains a large number of nucleotides. Moreover, a further evaluation for the unbound forms of proteins revealed that our building up procedure was robust to conformational changes induced by ligand binding.
Availability: Our method, named ‘BUMBLE’, is available at http://bumble.hgc.jp/
Supplementary information: Supplementary Material is available at Bioinformatics online.
Hubs are proteins with a large number of interactions in a protein-protein interaction network. They are the principal agents in the interaction network and affect its function and stability. Their specific recognition of many different protein partners is of great interest from the structural viewpoint. Over the last few years, the structural properties of hubs have been extensively studied. We review the currently known features that are particular to hubs, possibly affecting their binding ability. Specifically, we look at the levels of intrinsic disorder, surface charge and domain distribution in hubs, as compared to non-hubs, along with differences in their functional domains.
protein-protein interactions; interaction networks; hubs; promiscuous binding
Protein–protein docking simulations can provide the predicted complex structural models. In a docking simulation, several putative structural models are selected by scoring functions from an ensemble of many complex models. Scoring functions based on statistical analyses of heterodimers are usually designed to select the complex model with the most abundant interaction mode found among the known complexes, as the correct model. However, because the formation schemes of heterodimers are extremely diverse, a single scoring function does not seem to be sufficient to describe the fitness of the predicted models other than the most abundant interaction mode. Thus, it is necessary to classify the heterodimers in terms of their individual interaction modes, and then to construct multiple scoring functions for each heterodimer type. In this study, we constructed the classification method of heterodimers based on the discriminative characters between near-native and decoy models, which were found in the comparison of the interfaces in terms of the complementarities for the hydrophobicity, the electrostatic potential and the shape. Consequently, we found four heterodimer clusters, and then constructed the multiple scoring functions, each of which was optimized for each cluster. Our multiple scoring functions were applied to the predictions in the unbound docking.
classification of heterodimers; prediction of complex structures; scoring functions; protein-protein docking; CAPRI
Information regarding gene coexpression is useful to predict gene function. Several databases have been constructed for gene coexpression in model organisms based on a large amount of publicly available gene expression data measured by GeneChip platforms. In these databases, Pearson's correlation coefficients (PCCs) of gene expression patterns are widely used as a measure of gene coexpression. Although the coexpression measure or GeneChip summarization method affects the performance of the gene coexpression database, previous studies for these calculation procedures were tested with only a small number of samples and a particular species. To evaluate the effectiveness of coexpression measures, assessments with large-scale microarray data are required. We first examined characteristics of PCC and found that the optimal PCC threshold to retrieve functionally related genes was affected by the method of gene expression database construction and the target gene function. In addition, we found that this problem could be overcome when we used correlation ranks instead of correlation values. This observation was evaluated by large-scale gene expression data for four species: Arabidopsis, human, mouse and rat.
gene coexpression; Pearson's correlation coefficient; GeneChip summarization; Arabidopsis
Background: Recent improvements in DNA microarray techniques have made a large variety of gene expression data available in public databases. This data can be used to evaluate the strength of gene coexpression by calculating the correlation of expression patterns among different genes between many experiments. However, gene expression levels differ significantly across various tissues in higher organisms, as well as in different cellular location in eukaryotes in different cell state. Thus the usual correlation measure can only evaluate the difference of tissues or cellular localizations, and cannot adequately elucidate the functional relationship from the coexpression of genes.
Method: We propose a new measure of coexpression by expanding the generally used correlation into a multidimensional one. We used principal component analyses to identify the major factors of gene expression correlation, and then re-calculate the correlation by subtracting the major components in order to remove biases cased by a few experiments. The repeated subtractions of the major components yielded a set of correlation values for each pair of genes. We observed the correlation changes when the first ten principal components were subtracted step-by-step in large-scale Arabidopsis expression data.
Results: We found two extreme patterns of correlation changes, corresponding to stable and fragile coexpression. Our new indexes provided a good means to determine the functional relationships of the genes, by examining a few examples, and higher performance of Gene Ontology term prediction by using the support vector machine and the multidimensional correlation.
Availability: The results are available from the expression detail pages in ATTED-II (http://atted.jp).
Supplementary information: Supplementary data are available at Bioinformatics online.
A discrimination method between biologically relevant interfaces and artificial crystal-packing contacts in crystal structures was constructed. The method evaluates protein-protein interfaces in terms of complementarities for hydrophobicity, electrostatic potential and shape on the protein surfaces, and chooses the most probable biological interfaces among all possible contacts in the crystal. The method uses a discriminator named as “COMP”, which is a linear combination of the complementarities for the above three surface features and does not correlate with the contact area. The discrimination of homo-dimer interfaces from symmetry-related crystal-packing contacts based on the COMP value achieved the modest success rate. Subsequent detailed review of the discrimination results raised the success rate to about 88.8%. In addition, our discrimination method yielded some clues for understanding the interaction patterns in several examples in the PDB. Thus, the COMP discriminator can also be used as an indicator of the “biological-ness” of protein-protein interfaces.
protein-protein interaction; complementarity analysis; homo-dimer interface; crystal-packing contact; biological interfaces
ATTED-II (http://atted.jp) is a database of gene coexpression in Arabidopsis that can be used to design a wide variety of experiments, including the prioritization of genes for functional identification or for studies of regulatory relationships. Here, we report updates of ATTED-II that focus especially on functionalities for constructing gene networks with regard to the following points: (i) introducing a new measure of gene coexpression to retrieve functionally related genes more accurately, (ii) implementing clickable maps for all gene networks for step-by-step navigation, (iii) applying Google Maps API to create a single map for a large network, (iv) including information about protein–protein interactions, (v) identifying conserved patterns of coexpression and (vi) showing and connecting KEGG pathway information to identify functional modules. With these enhanced functions for gene network representation, ATTED-II can help researchers to clarify the functional and regulatory networks of genes in Arabidopsis.
The vast accumulation of protein structural data has now facilitated the observation of many different complexes in the PDB for the same protein. Therefore, a single protein complex is not sufficient to identify their interaction sites, especially for proteins with multiple binding states or different partners, such as hub proteins. PiSite is a database that provides protein–protein interaction sites at the residue level with consideration of multiple complexes at the same time, by mapping the binding sites of all complexes containing the same protein in the PDB. PiSite provides easy web interfaces with an interactive viewer working with typical web browsers, and the different binding modes can be checked visually. All of the information can also be downloaded for further analyses. In addition, PiSite provides a list of proteins with multiple binding partners and multiple binding states, as well as up-to-date statistics of protein–protein interfaces. PiSite is available at http://pisite.hgc.jp
Interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms. A number of studies have compared protein sequences or promoter sequences between mammals, which provided many insights into genomics. However, the correlation between protein conservation and promoter conservation remains controversial.
We examined promoter conservation as well as protein conservation for 6,901 human and mouse orthologous genes, and observed a very weak correlation between them. We further investigated their relationship by decomposing it based on functional categories, and identified categories with significant tendencies. Remarkably, the 'ribosome' category showed significantly low promoter conservation, despite its high protein conservation, and the 'extracellular matrix' category showed significantly high promoter conservation, in spite of its low protein conservation.
Our results show the relation of gene function to protein conservation and promoter conservation, and revealed that there seem to be nonparallel components between protein and promoter sequence evolution.
A database of coexpressed gene sets can provide valuable information for a wide variety of experimental designs, such as targeting of genes for functional identification, gene regulation and/or protein–protein interactions. Coexpressed gene databases derived from publicly available GeneChip data are widely used in Arabidopsis research, but platforms that examine coexpression for higher mammals are rather limited. Therefore, we have constructed a new database, COXPRESdb (coexpressed gene database) (http://coxpresdb.hgc.jp), for coexpressed gene lists and networks in human and mouse. Coexpression data could be calculated for 19 777 and 21 036 genes in human and mouse, respectively, by using the GeneChip data in NCBI GEO. COXPRESdb enables analysis of the four types of coexpression networks: (i) highly coexpressed genes for every gene, (ii) genes with the same GO annotation, (iii) genes expressed in the same tissue and (iv) user-defined gene sets. When the networks became too big for the static picture on the web in GO networks or in tissue networks, we used Google Maps API to visualize them interactively. COXPRESdb also provides a view to compare the human and mouse coexpression patterns to estimate the conservation between the two species.
PrDOS is a server that predicts the disordered regions of a protein from its amino acid sequence (http://prdos.hgc.jp). The server accepts a single protein amino acid sequence, in either plain text or FASTA format. The prediction system is composed of two predictors: a predictor based on local amino acid sequence information and one based on template proteins. The server combines the results of the two predictors and returns a two-state prediction (order/disorder) and a disorder probability for each residue. The prediction results are sent by e-mail, and the server also provides a web-interface to check the results.
We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).
Publicly available database of co-expressed gene sets would be a valuable tool for a wide variety of experimental designs, including targeting of genes for functional identification or for regulatory investigation. Here, we report the construction of an Arabidopsis thaliana trans-factor and cis-element prediction database (ATTED-II) that provides co-regulated gene relationships based on co-expressed genes deduced from microarray data and the predicted cis elements. ATTED-II () includes the following features: (i) lists and networks of co-expressed genes calculated from 58 publicly available experimental series, which are composed of 1388 GeneChip data in A.thaliana; (ii) prediction of cis-regulatory elements in the 200 bp region upstream of the transcription start site to predict co-regulated genes amongst the co-expressed genes; and (iii) visual representation of expression patterns for individual genes. ATTED-II can thus help researchers to clarify the function and regulation of particular genes and gene networks.
PreBI is a server that predicts biological interfaces in protein crystal structures, according to the complementarity and the area of the interface. The server accepts a coordinate file in the PDB format, and all of the possible interfaces are generated automatically, according to the symmetry operations given in the coordinate file. For all of the interfaces generated, the complementarities of the electrostatic potential, hydrophobicity and shape of the interfaces are analyzed, and the most probable biological interface is identified according to the combination of the degree of complementarity derived from the database analyses and the area of the interface. The results can be checked through an interactive viewer, and the most probable complex can be downloaded as atomic coordinates in the PDB format. PreBI is available at .
The high similarity of tunicates and vertebrates during their development coupled with the transparency of tunicate larvae, their well-studied cell lineages and the availability of simple and efficient transgenesis methods makes of this subphylum an ideal system for the investigation of vertebrate physiological and developmental processes. Recently, the sequencing of two different Ciona genomes has lead to the identification of numerous genes. In order to better understand the regulation of these genes, a database was created containing information on regulation of tunicate genes collected from literature. It includes for instance information regarding the minimal promoter length, the transcription factors involved and their binding sites, as well as the localization of the gene expression. Additionally, binding sites for characterized transcription factors were predicted based on published in vitro recognition sites. Comparison of the promoters of homologous genes in different species is also provided to allow identification of conserved cis elements. At the time of writing, information about 184 promoters, containing 73 identified binding sites and >2000 newly predicted binding sites is available. This database is accessible at .