Many segmentation techniques have been published, and some of them have been widely used in different application problems. Most of these segmentation techniques have been motivated by specific application purposes. Unsupervised methods, which do not assume any prior scene knowledge can be learned to help the segmentation process, and are obviously more challenging than the supervised ones. In this paper, we present an unsupervised strategy for biomedical image segmentation using an algorithm based on recursively applying mean shift filtering, where entropy is used as a stopping criterion. This strategy is proven with many real images, and a comparison is carried out with manual segmentation. With the proposed strategy, errors less than 20% for false positives and 0% for false negatives are obtained.
segmentation; mean shift; unsupervised segmentation; entropy
In recent years, due to vital need for novel fungicidal agents, investigation on natural antifungal resources has been increased. The special features exhibited by neural network classifiers make them suitable for handling complex problems like analyzing different properties of candidate compounds in computer-aided drug design. In this study, by using a Levenberg–Marquardt (LM) neural network (the fastest of the training algorithms), the relation between some important thermodynamic and physico-chemical properties of coumarin compounds and their biological activities (tested against Candida albicans) has been evaluated. A set of already reported antifungal bioactive coumarin and some well-known physical descriptors have been selected and using LM training algorithm the best architecture of neural model has been designed for forecasting the new bioactive compounds.
Levenberg/Marquardt algorithm; coumarin; neural network
This paper discusses cyberinformation studies of the amino acid composition of insulin, in particular the identification of scientific terminology that could describe this phenomenon, ie, the study of genetic information, as well as the relationship between the genetic language of proteins and theoretical aspects of this system and cybernetics. The results of this research show that there is a matrix code for insulin. It also shows that the coding system within the amino acid language gives detailed information, not only on the amino acid “record”, but also on its structure, configuration, and various shapes. The issue of the existence of an insulin code and coding of the individual structural elements of this protein are discussed. Answers to the following questions are sought. Does the matrix mechanism for biosynthesis of this protein function within the law of the general theory of information systems, and what is the significance of this for understanding the genetic language of insulin? What is the essence of existence and functioning of this language? Is the genetic information characterized only by biochemical principles or it is also characterized by cyberinformation principles? The potential effects of physical and chemical, as well as cybernetic and information principles, on the biochemical basis of insulin are also investigated. This paper discusses new methods for developing genetic technologies, in particular more advanced digital technology based on programming, cybernetics, and informational laws and systems, and how this new technology could be useful in medicine, bioinformatics, genetics, biochemistry, and other natural sciences.
human insulin; insulin model; biocode; genetic code; amino acids
Chronic hepatitis C (CHC) patients often stop pursuing interferon-alfa and ribavirin (IFN-alfa/RBV) treatment because of the high cost and associated adverse effects. It is highly desirable, both clinically and economically, to establish tools to distinguish responders from nonresponders and to predict possible outcomes of the IFN-alfa/RBV treatments. Single nucleotide polymorphisms (SNPs) can be used to understand the relationship between genetic inheritance and IFN-alfa/RBV therapeutic response. The aim in this study was to establish a predictive model based on a pharmacogenomic approach. Our study population comprised Taiwanese patients with CHC who were recruited from multiple sites in Taiwan. The genotyping data was generated in the high-throughput genomics lab of Vita Genomics, Inc. With the wrapper-based feature selection approach, we employed multilayer feedforward neural network (MFNN) and logistic regression as a basis for comparisons. Our data revealed that the MFNN models were superior to the logistic regression model. The MFNN approach provides an efficient way to develop a tool for distinguishing responders from nonresponders prior to treatments. Our preliminary results demonstrated that the MFNN algorithm is effective for deriving models for pharmacogenomics studies and for providing the link from clinical factors such as SNPs to the responsiveness of IFN-alfa/RBV in clinical association studies in pharmacogenomics.
chronic hepatitis C; artificial neural networks; interferon; pharmacogenomics; ribavirin; single nucleotide polymorphisms
We explore, using the Crh protein dimer as a model, how information from solution NMR, solid-state NMR and X-ray crystallography can be combined using structural bioinformatics methods, in order to get insights into the transition from solution to crystal. Using solid-state NMR chemical shifts, we filtered intra-monomer NMR distance restraints in order to keep only the restraints valid in the solid state. These filtered restraints were added to solid-state NMR restraints recorded on the dimer state to sample the conformational landscape explored during the oligomerization process. The use of non-crystallographic symmetries then permitted the extraction of converged conformers subsets. Ensembles of NMR and crystallographic conformers calculated independently display similar variability in monomer orientation, which supports a funnel shape for the conformational space explored during the solution-crystal transition. Insights into alternative conformations possibly sampled during oligomerization were obtained by analyzing the relative orientation of the two monomers, according to the restraint precision. Molecular dynamics simulations of Crh confirmed the tendencies observed in NMR conformers, as a paradoxical increase of the distance between the two β1a strands, when the structure gets closer to the crystallographic structure, and the role of water bridges in this context.
structural bioinformatics; NMR structure calculation; ARIA; non-crystallographic symmetry; crystallographic ensemble refinement; molecular dynamics simulation
In this study we used a Random Forest-based approach for an assignment of small guanosine triphosphate proteins (GTPases) to specific subgroups. Small GTPases represent an important functional group of proteins that serve as molecular switches in a wide range of fundamental cellular processes, including intracellular transport, movement and signaling events. These proteins have further gained a special emphasis in cancer research, because within the last decades a huge variety of small GTPases from different subgroups could be related to the development of all types of tumors. Using a random forest approach, we were able to identify the most important amino acid positions for the classification process within the small GTPases superfamily and its subgroups. These positions are in line with the results of earlier studies and have been shown to be the essential elements for the different functionalities of the GTPase families. Furthermore, we provide an accurate and reliable software tool (GTPasePred) to identify potential novel GTPases and demonstrate its application to genome sequences.
cancer; machine learning; classification; Random Forests; proteins
To construct and optimize a neural network that is capable of predicting the occurrence of recurrent aphthous ulceration (RAU) based on a set of appropriate input data.
Participants and methods
Artificial neural networks (ANN) software employing genetic algorithms to optimize the architecture neural networks was used. Input and output data of 86 participants (predisposing factors and status of the participants with regards to recurrent aphthous ulceration) were used to construct and train the neural networks. The optimized neural networks were then tested using untrained data of a further 10 participants.
The optimized neural network, which produced the most accurate predictions for the presence or absence of recurrent aphthous ulceration was found to employ: gender, hematological (with or without ferritin) and mycological data of the participants, frequency of tooth brushing, and consumption of vegetables and fruits.
Factors appearing to be related to recurrent aphthous ulceration and appropriate for use as input data to construct ANNs that predict recurrent aphthous ulceration were found to include the following: gender, hemoglobin, serum vitamin B12, serum ferritin, red cell folate, salivary candidal colony count, frequency of tooth brushing, and the number of fruits or vegetables consumed daily.
artifical neural networks; recurrent; aphthous ulceration; ulcer
Ca2+-ions have a range of affinities to different proteins, depending on the various functions of these proteins. This makes the determination of Ca2+-protein affinities an interesting subject for functional studies. We have investigated the performance of two methods – Fold-X and AutoDock vina – in the prediction of Ca2+-protein affinities. Both methods, although based on different energy functions, showed virtually the same correlation with experimental affinities. Guided by insight from experiment, we further derived a simple linear model based on the solvent accessible surface of Ca2+ that had practically the same performance in terms of absolute errors as the more complex docking methods.
metal ions; binding; free energy; crystal structure; solvent accessible surface
Genotoxic stress is induced by a broad range of DNA-damaging agents and could lead to a variety of human diseases including cancer. DNA damage is also therapeutically induced for cancer treatment with the aim to eliminate tumor cells. However, the effectiveness of radio- and chemotherapy is strongly hampered by tumor cell resistance. A major reason for radio- and chemotherapeutic resistances is the simultaneous activation of cell survival pathways resulting in the activation of the transcription factor nuclear factor-kappa B (NF-κB). Here, we present a Boolean network model of the NF-κB signal transduction induced by genotoxic stress in epithelial cells. For the representation and analysis of the model, we used the formalism of logical interaction hypergraphs. Model reconstruction was based on a careful meta-analysis of published data. By calculating minimal intervention sets, we identified p53-induced protein with a death domain (PIDD), receptor-interacting protein 1 (RIP1), and protein inhibitor of activated STAT y (PIASy) as putative therapeutic targets to abrogate NF-κB activation resulting in apoptosis. Targeting these structures therapeutically may potentiate the effectiveness of radio-and chemotherapy. Thus, the presented model allows a better understanding of the signal transduction in tumor cells and provides candidates as new therapeutic target structures.
apoptosis; Boolean network; cancer therapy; DNA-damage response; NF-κB
In recent years, protein–protein interactions are becoming the object of increasing attention in many different fields, such as structural biology, molecular biology, systems biology, and drug discovery. From a structural biology perspective, it would be desirable to integrate current efforts into the structural proteomics programs. Given that experimental determination of many protein–protein complex structures is highly challenging, and in the context of current high-performance computational capabilities, different computer tools are being developed to help in this task. Among them, computational docking aims to predict the structure of a protein–protein complex starting from the atomic coordinates of its individual components, and in recent years, a growing number of docking approaches are being reported with increased predictive capabilities. The improvement of speed and accuracy of these docking methods, together with the modeling of the interaction networks that regulate the most critical processes in a living organism, will be essential for computational proteomics. The ultimate goal is the rational design of drugs capable of specifically inhibiting or modifying protein–protein interactions of therapeutic significance. While rational design of protein–protein interaction inhibitors is at its very early stage, the first results are promising.
protein-protein interactions; drug design; protein docking; structural prediction; virtual ligand screening; hot-spots
Protein–protein docking simulations can provide the predicted complex structural models. In a docking simulation, several putative structural models are selected by scoring functions from an ensemble of many complex models. Scoring functions based on statistical analyses of heterodimers are usually designed to select the complex model with the most abundant interaction mode found among the known complexes, as the correct model. However, because the formation schemes of heterodimers are extremely diverse, a single scoring function does not seem to be sufficient to describe the fitness of the predicted models other than the most abundant interaction mode. Thus, it is necessary to classify the heterodimers in terms of their individual interaction modes, and then to construct multiple scoring functions for each heterodimer type. In this study, we constructed the classification method of heterodimers based on the discriminative characters between near-native and decoy models, which were found in the comparison of the interfaces in terms of the complementarities for the hydrophobicity, the electrostatic potential and the shape. Consequently, we found four heterodimer clusters, and then constructed the multiple scoring functions, each of which was optimized for each cluster. Our multiple scoring functions were applied to the predictions in the unbound docking.
classification of heterodimers; prediction of complex structures; scoring functions; protein-protein docking; CAPRI
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
computational biology; protein homology; amino acid substitution matrix; protein structure
Multivariate partial least square (PLS) regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics). In presence of multiple responses, it is of particular interest how to appropriately “dissect” the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection). In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC) analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coefficients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.
partial least square regression; regression coefficients; variable selection; biomarker discovery; omics-data
Identification of genes involved in the aging process is critical for understanding the mechanisms of age-dependent diseases such as cancer and diabetes. Measuring the mutant gene lifespan, each missing one gene, is traditionally employed to identify longevity genes. While such screening is impractical for the whole genome due to the time-consuming nature of lifespan assays, it can be achieved by in silico genetic manipulations with systems biology approaches. In this review, we will introduce pilot explorations applying two approaches of systems biology in aging studies. One approach is to predict the role of a specific gene in the aging process by comparing its expression profile and protein–protein interaction pattern with those of known longevity genes (top-down systems biology). The other approach is to construct mathematical models from previous kinetics data and predict how a specific protein contributes to aging and antiaging processes (bottom-up systems biology). These approaches allow researchers to simulate the effect of each gene’s product in aging by in silico genetic manipulations such as deletion or over-expression. Since simulation-based approaches are not as widely used as the other approaches, we will focus our review on this effort in more detail. A combination of hypothesis from data-mining, in silico experimentation from simulations, and wet laboratory validation will make the systematic identification of all longevity genes possible.
systems biology; yeast; aging; in silico; genetic manipulation; modeling
Probabilistic DNA sequence models have been intensively applied to genome research. Within the evolutionary biology framework, this article investigates the feasibility for rigorously estimating the probability of a set of orthologous DNA sequences which evolve from a common progenitor. We propose Monte Carlo integration algorithms to sample the unknown ancestral and/or root sequences a posteriori conditional on a reference sequence and apply pairwise Needleman–Wunsch alignment between the sampled and nonreference species sequences to estimate the probability. We test our algorithms on both simulated and real sequences and compare calculated probabilities from Monte Carlo integration to those induced by single multiple alignment.
evolution; Jukes-Cantor model; Monte Carlo integration; Needleman-Wunsch alignment; orthologous
Simple sequence repeats (SSRs) play important roles in gene regulation and genome evolution. Although there exist several online resources for SSR mining, most of them only extract general SSR patterns without providing functional information. Here, an online search tool, CG-SSR (Comparative Genomics SSR discovery), has been developed for discovering potential functional SSRs from vertebrate genomes through cross-species comparison. In addition to revealing SSR candidates in conserved regions among various species, it also combines accurate coordinate and functional genomics information. CG-SSR is the first comprehensive and efficient online tool for conserved SSR discovery.
microsatellites; genome; comparative genomics; functional SSR; gene ontology; conserved region
Bladder cancer is relatively common but early detection techniques such as cystoscopy and cytology are somewhat limited. We developed a broadly applicable, platform-independent and clinically relevant method based on simple ratios of gene expression to diagnose human cancers. In this study, we sought to determine whether this technique could be applied to the diagnosis of bladder cancer.
We developed a model for the diagnosis of bladder cancer using expression profiling data from 80 normal and tumor bladder tissues to identify statistically significant discriminating genes with reciprocal average expression levels in each tissue type. The expression levels of select genes were used to calculate individual gene pair expression ratios in order to assign diagnosis. The optimal model was examined in two additional published microarray data sets and using quantitative RT-PCR in a cohort of 13 frozen benign bladder urothelium samples and 13 bladder cancer samples from our institution.
A five-ratio test utilizing six genes proved to be 100% accurate (26 of 26 samples) for distinguishing benign from malignant bladder tissue samples (P < 10−6).
: We have provided a proof of principle study for the use of gene expression ratios in the diagnosis of bladder cancer. This technique may ultimately prove to be a useful adjunct to cytopathology in screening urine specimens for bladder cancer.
bladder cancer; gene expression profiling; and diagnosis
A system was developed to evaluate and predict the interaction between protein pairs by using the widely used shape complementarity search method as the algorithm for docking simulations between the proteins. We used this system, which we call the affinity evaluation and prediction (AEP) system, to evaluate the interaction between 20 protein pairs. The system first executes a “round robin” shape complementarity search of the target protein group, and evaluates the interaction between the complex structures obtained by the search. These complex structures are selected by using a statistical procedure that we developed called ‘grouping’. At a prevalence of 5.0%, our AEP system predicted protein–protein interactions with a 50.0% recall, 55.6% precision, 95.5% accuracy, and an F-measure of 0.526. By optimizing the grouping process, our AEP system successfully predicted 10 protein pairs (among 20 pairs) that were biologically relevant combinations. Our ultimate goal is to construct an affinity database that will provide cell biologists and drug designers with crucial information obtained using our AEP system.
protein-protein interaction; affinity analysis; protein-protein docking; FFT; massive parallel computing
It is expected that different markers may show different patterns of association with different pathogenic variants within a given gene. It would be helpful to combine the evidence implicating association at the level of the whole gene rather than just for individual markers or haplotypes. Doing this is complicated by the fact that different markers do not represent independent sources of information.
We propose combining the p values from all single locus and/or multilocus analyses of different markers according to the formula of Fisher, X = ∑(−2ln(pi)), and then assessing the empirical significance of this statistic using permutation testing. We present an example application to 19 markers around the HTRA2 gene in a case-control study of Parkinson’s disease.
Applying our approach shows that, although some individual tests produce low p values, overall association at the level of the gene is not supported.
Approaches such as this should be more widely used in assimilating the overall evidence supporting involvement of a gene in a particular disease. Information can be combined from biallelic and multiallelic markers and from single markers along with multimarker analyses. Single genes can be tested or results from groups of genes involved in the same pathway could be combined in order to test biologically relevant hypotheses. The approach has been implemented in a computer program called COMBASSOC which is made available for downloading.
Fisher; significance; genetic marker
A discrimination method between biologically relevant interfaces and artificial crystal-packing contacts in crystal structures was constructed. The method evaluates protein-protein interfaces in terms of complementarities for hydrophobicity, electrostatic potential and shape on the protein surfaces, and chooses the most probable biological interfaces among all possible contacts in the crystal. The method uses a discriminator named as “COMP”, which is a linear combination of the complementarities for the above three surface features and does not correlate with the contact area. The discrimination of homo-dimer interfaces from symmetry-related crystal-packing contacts based on the COMP value achieved the modest success rate. Subsequent detailed review of the discrimination results raised the success rate to about 88.8%. In addition, our discrimination method yielded some clues for understanding the interaction patterns in several examples in the PDB. Thus, the COMP discriminator can also be used as an indicator of the “biological-ness” of protein-protein interfaces.
protein-protein interaction; complementarity analysis; homo-dimer interface; crystal-packing contact; biological interfaces
There is a need to identify the regulatory gene interaction of anticancer drugs on target cancer cells. Whole genome expression profiling offers promise in this regard, but can be complicated by the challenge of identifying the genes affected by hundreds to thousands of genes that induce changes in expression. A proteasome inhibitor, bortezomib, could be a potential therapeutic agent in treating adult T-cell leukemia (ATL) patients, however, the underlying mechanism by which bortezomib induces cell death in ATL cells via gene regulatory network has not been fully elucidated. Here we show that a Bayesian statistical framework by VoyaGene® identified a secreted protein acidic and rich in cysteine (SPARC) gene, a tumor-invasiveness related gene, as a possible modulator of bortezomib-induced cell death in ATL cells. Functional analysis using RNAi experiments revealed that inhibition of the expression SPARC by siRNA enhanced the apoptotic effect of bortezomib on ATL cells in accordance with an increase of cleaved caspase 3. Targeting SPARC may help to treat ATL patients in combination with bortezomib. This work shows that a network biology approach can be used advantageously to identify the genetic interaction related to anticancer effects.
network biology; adult T cell leukemia; bortezomib; SPARC
Mobile phone technology makes use of radio frequency (RF) electromagnetic fields transmitted through a dense network of base stations in Europe. Possible harmful effects of RF fields on humans and animals are discussed, but their effect on plants has received little attention. In search for physiological processes of plant cells sensitive to RF fields, cell suspension cultures of Arabidopsis thaliana were exposed for 24 h to a RF field protocol representing typical microwave exposition in an urban environment. mRNA of exposed cultures and controls was used to hybridize Affymetrix-ATH1 whole genome microarrays. Differential expression analysis revealed significant changes in transcription of 10 genes, but they did not exceed a fold change of 2.5. Besides that 3 of them are dark-inducible, their functions do not point to any known responses of plants to environmental stimuli. The changes in transcription of these genes were compared with published microarray datasets and revealed a weak similarity of the microwave to light treatment experiments. Considering the large changes described in published experiments, it is questionable if the small alterations caused by a 24 h continuous microwave exposure would have any impact on the growth and reproduction of whole plants.
suspension cultured plant cells; radio frequency electromagnetic fields; microarrays; Arabidopsis thaliana
The microtubule network, the major organelle of the eukaryotic cytoskeleton, is involved in cell division and differentiation but also with many other cellular functions. In plants, microtubules seem to be involved in the ordered deposition of cellulose microfibrils by a so far unknown mechanism. Microtubule-associated proteins (MAP) typically contain various domains targeting or binding proteins with different functions to microtubules. Here we have investigated a proposed microtubule-targeting domain, TPX2, first identified in the Kinesin-like protein 2 in Xenopus. A TPX2 containing microtubule binding protein, PttMAP20, has been recently identified in poplar tissues undergoing xylogenesis. Furthermore, the herbicide 2,6-dichlorobenzonitrile (DCB), which is a known inhibitor of cellulose synthesis, was shown to bind specifically to PttMAP20. It is thus possible that PttMAP20 may have a role in coupling cellulose biosynthesis and the microtubular networks in poplar secondary cell walls. In order to get more insight into the occurrence, evolution and potential functions of TPX2-containing proteins we have carried out bioinformatic analysis for all genes so far found to encode TPX2 domains with special reference to poplar PttMAP20 and its putative orthologs in other plants.
TPX2 domain; MAP20; evolution; microtubule; cellulose; bioinformatics
Prion diseases are fatal neurodegenerative disorders that affect animals and humans. There is a need to gain understanding of prion disease pathogenesis and to develop diagnostic assays to detect prion diseases prior to the onset of clinical symptoms. The goal of this study was to identify genes that show altered expression early in the disease process in the spleen and brain of prion disease-infected mice. Using Affymetrix microarrays, we identified 67 genes that showed increased expression in the brains of prion disease-infected mice prior to the onset of clinical symptoms. These genes function in many cellular processes including immunity, the endosome/lysosome system, hormone activity, and the cytoskeleton. We confirmed a subset of these gene expression alterations using other methods and determined the time course in which these changes occur. We also identified 14 genes showing altered expression prior to the onset of clinical symptoms in spleens of prion disease infected mice. Interestingly, four genes, Atp1b1, Gh, Anp32a, and Grn, were altered at the very early time of 46 days post-infection. These gene expression alterations provide insights into the molecular mechanisms underlying prion disease pathogenesis and may serve as surrogate markers for the early detection and diagnosis of prion disease.
prion disease; microarrays; gene expression
We examined the procedures to combine two different in silico drug-screening results to achieve a high hit ratio. When the 3D structure of the target protein and some active compounds are known, both structure-based and ligand-based in silico screening methods can be applied. In the present study, the machine-learning score modification multiple target screening (MSM-MTS) method was adopted as a structure-based screening method, and the machine-learning docking score index (ML-DSI) method was adopted as a ligand-based screening method. To combine the predicted compound’s sets by these two screening methods, we examined the product of the sets (consensus set) and the sum of the sets. As a result, the consensus set achieved a higher hit ratio than the sum of the sets and than either individual predicted set. In addition, the current combination was shown to be robust enough for the structural diversities both in different crystal structure and in snapshot structures during molecular dynamics simulations.
in silico; screening; consensus score; protein-based screening; protein-ligand docking; conformation of active site