Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)
Year of Publication
1.  Polynomial Supertree Methods Revisited 
Advances in Bioinformatics  2011;2011:524182.
Supertree methods allow to reconstruct large phylogenetic trees by combining smaller trees with overlapping leaf sets into one, more comprehensive supertree. The most commonly used supertree method, matrix representation with parsimony (MRP), produces accurate supertrees but is rather slow due to the underlying hard optimization problem. In this paper, we present an extensive simulation study comparing the performance of MRP and the polynomial supertree methods MinCut Supertree, Modified MinCut Supertree, Build-with-distances, PhySIC, PhySIC_IST, and super distance matrix. We consider both quality and resolution of the reconstructed supertrees. Our findings illustrate the tradeoff between accuracy and running time in supertree construction, as well as the pros and cons of voting- and veto-based supertree approaches. Based on our results, we make some general suggestions for supertree methods yet to come.
PMCID: PMC3249592  PMID: 22229028
2.  GenSensor Suite: A Web-Based Tool for the Analysis of Gene and Protein Interactions, Pathways, and Regulation 
Advances in Bioinformatics  2011;2011:271563.
The GenSensor Suite consists of four web tools for elucidating relationships among genes and proteins. GenPath results show which biochemical, regulatory, or other gene set categories are over- or under-represented in an input list compared to a background list. All common gene sets are available for searching in GenPath, plus some specialized sets. Users can add custom background lists. GenInteract builds an interaction gene list from a single gene input and then analyzes this in GenPath. GenPubMed uses a PubMed query to identify a list of PubMed IDs, from which a gene list is extracted and queried in GenPath. GenViewer allows the user to query one gene set against another in GenPath. GenPath results are presented with relevant P- and q-values in an uncluttered, fully linked, and integrated table. Users can easily copy this table and paste it directly into a spreadsheet or document.
PMCID: PMC3238354  PMID: 22194743
3.  An Integrated Framework to Model Cellular Phenotype as a Component of Biochemical Networks 
Advances in Bioinformatics  2011;2011:608295.
Identification of regulatory molecules in signaling pathways is critical for understanding cellular behavior. Given the complexity of the transcriptional gene network, the relationship between molecular expression and phenotype is difficult to determine using reductionist experimental methods. Computational models provide the means to characterize regulatory mechanisms and predict phenotype in the context of gene networks. Integrating gene expression data with phenotypic data in transcriptional network models enables systematic identification of critical molecules in a biological network. We developed an approach based on fuzzy logic to model cell budding in Saccharomyces cerevisiae using time series expression microarray data of the cell cycle. Cell budding is a phenotype of viable cells undergoing division. Predicted interactions between gene expression and phenotype reflected known biological relationships. Dynamic simulation analysis reproduced the behavior of the yeast cell cycle and accurately identified genes and interactions which are essential for cell viability.
PMCID: PMC3235418  PMID: 22190923
4.  Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification 
Advances in Bioinformatics  2011;2011:801478.
Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast amounts of data generated by these technologies provides a strategy for identifying potential drug targets within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets, functional interaction networks between proteins are used to identify proteins essential to the survival, growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies.
PMCID: PMC3235424  PMID: 22190924
5.  NovelSNPer: A Fast Tool for the Identification and Characterization of Novel SNPs and InDels 
Advances in Bioinformatics  2011;2011:657341.
Typically, next-generation resequencing projects produce large lists of variants. NovelSNPer is a software tool that permits fast and efficient processing of such output lists. In a first step, NovelSNPer determines if a variant represents a known variant or a previously unknown variant. In a second step, each variant is classified into one of 15 SNP classes or 19 InDel classes. Beside the classes used by Ensembl, we introduce POTENTIAL_START_GAINED and START_LOST as new functional classes and present a classification scheme for InDels. NovelSNPer is based upon the gene structure information stored in Ensembl. It processes two million SNPs in six hours. The tool can be used online or downloaded.
PMCID: PMC3206323  PMID: 22110502
6.  Prediction of Enzyme Mutant Activity Using Computational Mutagenesis and Incremental Transduction 
Advances in Bioinformatics  2011;2011:958129.
Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets.
PMCID: PMC3189455  PMID: 22007208
7.  A Growth Curve Model with Fractional Polynomials for Analysing Incomplete Time-Course Data in Microarray Gene Expression Studies 
Advances in Bioinformatics  2011;2011:261514.
Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations.
PMCID: PMC3182337  PMID: 21966290
8.  Computational Design of a DNA- and Fc-Binding Fusion Protein 
Advances in Bioinformatics  2011;2011:457578.
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein.
PMCID: PMC3173724  PMID: 21941539
9.  Data-Driven Compensation for Flow Cytometry of Solid Tissues 
Advances in Bioinformatics  2011;2011:184731.
Propidium Iodide is a fluorochrome that is used to measure the DNA content of individual cells, taken from solid tissues, with a flow cytometer. Compensation for spectral cross-over of this fluorochrome still leads to compensation results that are depending on operator experience. We present a data-driven compensation (DDC) algorithm that is designed to automatically compensate combined DNA phenotype flow cytometry acquisitions. The generated compensation values of the DDC algorithm are validated by comparison with manually determined compensation values. The results show that (1) compensation of two-color flow cytometry leads to comparable results using either manual compensation or the DDC method; (2) DDC can calculate sample-specific compensation trace lines; (3) the effects of two different approaches to calculate compensation values can be visualized within one sample. We conclude that the DDC algorithm contributes to the standardization of compensation for spectral cross-over in flow cytometry of solid tissues.
PMCID: PMC3170795  PMID: 21912544
10.  Genome Evolution 
Advances in Bioinformatics  2011;2010:643701.
PMCID: PMC3166573  PMID: 21904546
11.  Predicting Flavonoid UGT Regioselectivity 
Advances in Bioinformatics  2011;2011:506583.
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities.
PMCID: PMC3130495  PMID: 21747849
12.  Cotranslational Protein Folding and Terminus Hydrophobicity 
Advances in Bioinformatics  2011;2011:176813.
Peptides fold on a time scale that is much smaller than the time required for synthesis, whence all proteins potentially fold cotranslationally to some degree (followed by additional folding events after release from the ribosome). In this paper, in three different ways, we find that cotranslational folding success is associated with higher hydrophobicity at the N-terminus than at the C-terminus. First, we fold simple HP models on a square lattice and observe that HP sequences that fold better cotranslationally than from a fully extended state exhibit a positive difference (N−C) in terminus hydrophobicity. Second, we examine real proteins using a previously established measure of potential cotranslationality known as ALR (Average Logarithmic Ratio of the extent of previous contacts) and again find a correlation with the difference in terminus hydrophobicity. Finally, we use the cotranslational protein structure prediction program SAINT and again find that such an approach to folding is more successful for proteins with higher N-terminus than C-terminus hydrophobicity. All results indicate that cotranslational folding is promoted in part by a hydrophobic start and a less hydrophobic finish to the sequence.
PMCID: PMC3112501  PMID: 21687643
13.  ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities 
Advances in Bioinformatics  2011;2011:743782.
Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.
PMCID: PMC3085309  PMID: 21541071
14.  Systems Biology: The Next Frontier for Bioinformatics 
Advances in Bioinformatics  2011;2010:268925.
Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from ‘omics platform technologies, in particular “downstream” technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful were strategies involving (a) quantitative measurements of cellular components at the mRNA, protein and metabolite levels, as well as in vivo metabolic reaction rates, (b) development of mathematical models that integrate biochemical knowledge with the information generated by high-throughput experiments, and (c) applications to microbial organisms. The inevitable role bioinformatics plays in modern systems biology puts mathematical and computational sciences as an equal partner to analytical and experimental biology. Furthermore, mathematical and computational models are expected to become increasingly prevalent representations of our knowledge about specific biochemical systems.
PMCID: PMC3038413  PMID: 21331364

Results 1-14 (14)