PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1584)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit 
Bioinformatics  2013;29(7):845-854.
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources.
Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
Availability: GROMACS is an open source and free software available from http://www.gromacs.org.
Contact: erik.lindahl@scilifelab.se
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt055
PMCID: PMC3605599  PMID: 23407358
2.  A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing 
Bioinformatics  2013;29(7):878-885.
Motivation: Next-generation sequencing (NGS) technologies have enabled whole-genome discovery and analysis of genetic variants in many species of interest. Individuals are often sequenced at low coverage for detecting novel variants, phasing haplotypes and inferring population structures. Although several tools have been developed for SNP and genotype calling in NGS data, haplotype phasing is often done separately on the called genotypes.
Results: We propose a dynamic Bayesian Markov model (DBM) for simultaneous genotype calling and haplotype phasing in low-coverage NGS data of unrelated individuals. Our method is fully probabilistic that produces consistent inference of genotypes, haplotypes and recombination probabilities. Using data from the 1000 Genomes Project, we demonstrate that DBM not only yields more accurate results than some popular methods, but also provides novel characterization of haplotype structures at the individual level for visualization, interpretation and comparison in downstream analysis. DBM is a powerful and flexible tool that can be applied to many sequencing studies. Its statistical framework can also be extended to accommodate broader scopes of data.
Availability and implementation: http://stat.psu.edu/∼yuzhang/software/dbm.tar
Contact: yuzhang@stat.psu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt065
PMCID: PMC3656686  PMID: 23407359
4.  BION web server: predicting non-specifically bound surface ions 
Bioinformatics  2013;29(6):805-806.
Motivation: Ions are essential component of the cell and frequently are found bound to various macromolecules, in particular to proteins. A binding of an ion to a protein greatly affects protein’s biophysical characteristics and needs to be taken into account in any modeling approach. However, ion’s bounded positions cannot be easily revealed experimentally, especially if they are loosely bound to macromolecular surface.
Results: Here, we report a web server, the BION web server, which addresses the demand for tools of predicting surface bound ions, for which specific interactions are not crucial; thus, they are difficult to predict. The BION is easy to use web server that requires only coordinate file to be inputted, and the user is provided with various, but easy to navigate, options. The coordinate file with predicted bound ions is displayed on the output and is available for download.
Availability: http://compbio.clemson.edu/bion_server/
Supplementary information: Supplementary data are available at Bioinformatics online.
Contact: ealexov@clemson.edu
doi:10.1093/bioinformatics/btt032
PMCID: PMC3597141  PMID: 23380591
5.  AbCD: arbitrary coverage design for sequencing-based genetic studies 
Bioinformatics  2013;29(6):799-801.
Summary: Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5–30×) and sample size (20–10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing Genomewide Association Study (GWAS)].
Availability and implementation: AbCD, including its downloadable terminal interface and web-based interface, and the associated tools ShotGun and DesignPlanner, including documentation, examples and executables, are available at http://www.unc.edu/∼yunmli/AbCD.html.
Contact: yunli@med.unc.edu
doi:10.1093/bioinformatics/btt041
PMCID: PMC3597143  PMID: 23357921
6.  Bayesian analysis of gene essentiality based on sequencing of transposon insertion libraries 
Bioinformatics  2013;29(6):695-703.
Motivation: Next-generation sequencing affords an efficient analysis of transposon insertion libraries, which can be used to identify essential genes in bacteria. To analyse this high-resolution data, we present a formal Bayesian framework for estimating the posterior probability of essentiality for each gene, using the extreme-value distribution to characterize the statistical significance of the longest region lacking insertions within a gene. We describe a sampling procedure based on the Metropolis–Hastings algorithm to calculate posterior probabilities of essentiality while simultaneously integrating over unknown internal parameters.
Results: Using a sequence dataset from a transposon library for Mycobacterium tuberculosis, we show that this Bayesian approach predicts essential genes that correspond well with genes shown to be essential in previous studies. Furthermore, we show that by using the extreme-value distribution to characterize genomic regions lacking transposon insertions, this method is capable of identifying essential domains within genes. This approach can be used for analysing transposon libraries in other organisms and augmenting essentiality predictions with statistical confidence scores.
Availability: A python script implementing the method described is available for download from http://saclab.tamu.edu/essentiality/.
Contact: michael.dejesus@tamu.edu or ioerger@cs.tamu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt043
PMCID: PMC3597147  PMID: 23361328
8.  RamiGO: an R/Bioconductor package providing an AmiGO Visualize interface 
Bioinformatics  2013;29(5):666-668.
Summary: The R/Bioconductor package RamiGO is an R interface to AmiGO that enables visualization of Gene Ontology (GO) trees. Given a list of GO terms, RamiGO uses the AmiGO visualize API to import Graphviz-DOT format files into R, and export these either as images (SVG, PNG) or into Cytoscape for extended network analyses. RamiGO provides easy customization of annotation, highlighting of specific GO terms, colouring of terms by P-value or export of a simplified summary GO tree. We illustrate RamiGO functionalities in a genome-wide gene set analysis of prognostic genes in breast cancer.
Availability and implementation: RamiGO is provided in R/Bioconductor, is open source under the Artistic-2.0 License and is available with a user manual containing installation, operating instructions and tutorials. It requires R version 2.15.0 or higher. URL: http://bioconductor.org/packages/release/bioc/html/RamiGO.html
Contact: markus.schroeder@ucdconnect.ie
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts708
PMCID: PMC3582261  PMID: 23297033
9.  SIBER: systematic identification of bimodally expressed genes using RNAseq data 
Bioinformatics  2013;29(5):605-613.
Motivation: Identification of bimodally expressed genes is an important task, as genes with bimodal expression play important roles in cell differentiation, signalling and disease progression. Several useful algorithms have been developed to identify bimodal genes from microarray data. Currently, no method can deal with data from next-generation sequencing, which is emerging as a replacement technology for microarrays.
Results: We present SIBER (systematic identification of bimodally expressed genes using RNAseq data) for effectively identifying bimodally expressed genes from next-generation RNAseq data. We evaluate several candidate methods for modelling RNAseq count data and compare their performance in identifying bimodal genes through both simulation and real data analysis. We show that the lognormal mixture model performs best in terms of power and robustness under various scenarios. We also compare our method with alternative approaches, including profile analysis using clustering and kurtosis (PACK) and cancer outlier profile analysis (COPA). Our method is robust, powerful, invariant to shifting and scaling, has no blind spots and has a sample-size-free interpretation.
Availability: The R package SIBER is available at the website http://bioinformatics.mdanderson.org/main/OOMPA:Overview.
Contact: kcoombes@mdanderson.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts713
PMCID: PMC3582265  PMID: 23303507
10.  Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression 
Bioinformatics  2013;29(5):656-657.
Motivation: A common question arises at the beginning of every experiment where RNA-Seq is used to detect differential gene expression between two conditions: How many reads should we sequence?
Results: Scotty is an interactive web-based application that assists biologists to design an experiment with an appropriate sample size and read depth to satisfy the user-defined experimental objectives. This design can be based on data available from either pilot samples or publicly available datasets.
Availability: Scotty can be freely accessed on the web at http://euler.bc.edu/marthlab/scotty/scotty.php
Contact: gabor.marth@bc.edu
Supplementary information: Supplementary data are is available at Bioinformatics online.
doi:10.1093/bioinformatics/btt015
PMCID: PMC3582267  PMID: 23314327
11.  Sparsely correlated hidden Markov models with application to genome-wide location studies 
Bioinformatics  2013;29(5):533-541.
Motivation: Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable.
Results: We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward–backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis.
Availability: The scHMM package can be freely downloaded from http://sourceforge.net/p/schmm/ and is recommended for use in a linux environment.
Contact: ghoshd@psu.edu or zhaohui.qin@emory.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt012
PMCID: PMC3582268  PMID: 23325620
12.  APoc: large-scale identification of similar protein pockets 
Bioinformatics  2013;29(5):597-604.
Motivation: Most proteins interact with small-molecule ligands such as metabolites or drug compounds. Over the past several decades, many of these interactions have been captured in high-resolution atomic structures. From a geometric point of view, most interaction sites for grasping these small-molecule ligands, as revealed in these structures, form concave shapes, or ‘pockets’, on the protein’s surface. An efficient method for comparing these pockets could greatly assist the classification of ligand-binding sites, prediction of protein molecular function and design of novel drug compounds.
Results: We introduce a computational method, APoc (Alignment of Pockets), for the large-scale, sequence order-independent, structural comparison of protein pockets. A scoring function, the Pocket Similarity Score (PS-score), is derived to measure the level of similarity between pockets. Statistical models are used to estimate the significance of the PS-score based on millions of comparisons of randomly related pockets. APoc is a general robust method that may be applied to pockets identified by various approaches, such as ligand-binding sites as observed in experimental complex structures, or predicted pockets identified by a pocket-detection method. Finally, we curate large benchmark datasets to evaluate the performance of APoc and present interesting examples to demonstrate the usefulness of the method. We also demonstrate that APoc has better performance than the geometric hashing-based method SiteEngine.
Availability and implementation: The APoc software package including the source code is freely available at http://cssb.biology.gatech.edu/APoc.
Contact: skolnick@gatech.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt024
PMCID: PMC3582269  PMID: 23335017
13.  Reconciling differential gene expression data with molecular interaction networks 
Bioinformatics  2013;29(5):622-629.
Motivation: Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease.
Results: To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression P-values such that two genes connected by an interaction show similar changes in their gene expression values. We provide a novel evaluation of four methods from this class of algorithms. We enumerate three desirable properties that this class of algorithms should address. These properties seek to maintain that the returned gene rankings are specific to the condition being studied. Moreover, to ease interpretation, highly ranked genes should participate in coherent network structures and should be functionally enriched with relevant biological pathways. We comprehensively evaluate the extent to which each algorithm addresses these properties on a compendium of gene expression data for 54 diverse human diseases. We show that the reconciled gene rankings can identify novel disease-related functions that are missed by analyzing expression data alone.
Availability: C++ software implementing our algorithms is available in the NetworkReconciliation package as part of the Biorithm software suite under the GNU General Public License: http://bioinformatics.cs.vt.edu/∼murali/software/biorithm-docs.
Contact: murali@cs.vt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt007
PMCID: PMC3582271  PMID: 23314326
14.  Detecting and understanding genetic and structural features in HIV-1 B subtype V3 underlying HIV-1 co-receptor usage 
Bioinformatics  2013;29(4):451-460.
Motivation: To define V3 genetic elements and structural features underlying different HIV-1 co-receptor usage in vivo.
Results: By probabilistically modeling mutations in the viruses isolated from HIV-1 B subtype patients, we present a unique statistical procedure that would first identify V3 determinants associated with the usage of different co-receptors cooperatively or independently, and then delineate the complicated interactions among mutations functioning cooperatively. We built a model based on dual usage of CXCR4 and CCR5 co-receptors. The molecular basis of our statistical predictions is further confirmed by phenotypic and molecular modeling analyses. Our results provide new insights on molecular basis of different HIV-1 co-receptor usage. This is critical to optimize the use of genotypic tropism testing in clinical practice and to obtain molecular-implication for design of vaccine and new entry-inhibitors.
Contact: jing.zhang.jz349@yale.edu or cf.perno@uniroma2.it
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt002
PMCID: PMC3570207  PMID: 23297034
15.  Assessing identity, redundancy and confounds in Gene Ontology annotations over time 
Bioinformatics  2013;29(4):476-482.
Motivation: The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored.
Results: We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their ‘functional identity’ over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks.
Availability: Data available at http://chibi.ubc.ca/assessGO.
Contact: paul@chibi.ubc.ca
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts727
PMCID: PMC3570208  PMID: 23297035
16.  NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets 
Bioinformatics  2013;29(4):494-496.
Summary: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis.
Availability and implementation: NGSUtils is available under a BSD license and works on Mac OS X and Linux systems. Python 2.6+ and virtualenv are required. More information and source code may be obtained from the website: http://ngsutils.org.
Contact: yunliu@iupui.edu
Supplemental information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts731
PMCID: PMC3570212  PMID: 23314324
17.  RS-WebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules 
Bioinformatics  2012;29(4):497-498.
Summary: Regioselectivity-WebPredictor (RS-WebPredictor) is a server that predicts isozyme-specific cytochrome P450 (CYP)-mediated sites of metabolism (SOMs) on drug-like molecules. Predictions may be made for the promiscuous 2C9, 2D6 and 3A4 CYP isozymes, as well as CYPs 1A2, 2A6, 2B6, 2C8, 2C19 and 2E1. RS-WebPredictor is the first freely accessible server that predicts the regioselectivity of the last six isozymes. Server execution time is fast, taking on average 2s to encode a submitted molecule and 1s to apply a given model, allowing for high-throughput use in lead optimization projects.
Availability: RS-WebPredictor is accessible for free use at http://reccr.chem.rpi.edu/Software/RS-WebPredictor/
Contact: brenec@rpi.edu
doi:10.1093/bioinformatics/bts705
PMCID: PMC3570214  PMID: 23242264
18.  A comprehensive SNP and indel imputability database 
Bioinformatics  2013;29(4):528-531.
Motivation: Genotype imputation has become an indispensible step in genome-wide association studies (GWAS). Imputation accuracy, directly influencing downstream analysis, has shown to be improved using re-sequencing-based reference panels; however, this comes at the cost of high computational burden due to the huge number of potentially imputable markers (tens of millions) discovered through sequencing a large number of individuals. Therefore, there is an increasing need for access to imputation quality information without actually conducting imputation. To facilitate this process, we have established a publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms.
Results: SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s). This is the first database providing variant imputability information specific to each continental group and to each genotyping platform. In Filipino individuals from the Cebu Longitudinal Health and Nutrition Survey, our database can achieve an area under the receiver-operating characteristic curve of 0.97, 0.91, 0.88 and 0.79 for markers with minor allele frequency >5%, 3–5%, 1–3% and 0.5–1%, respectively. Specifically, by filtering out 48.6% of markers (corresponding to a reduction of up to 48.6% in computational costs for actual imputation) based on the imputability information in our database, we can remove 77%, 58%, 51% and 42% of the poorly imputed markers at the cost of only 0.3%, 0.8%, 1.5% and 4.6% of the well-imputed markers with minor allele frequency >5%, 3–5%, 1–3% and 0.5–1%, respectively.
Availability: http://www.unc.edu/∼yunmli/imputability.html
Supplementary information: Supplementary data are available at Bioinformatics online.
Contact: yunli@med.unc.edu
doi:10.1093/bioinformatics/bts724
PMCID: PMC3570215  PMID: 23292738
19.  GeStoDifferent: a Cytoscape plugin for the generation and the identification of gene regulatory networks describing a stochastic cell differentiation process 
Bioinformatics  2013;29(4):513-514.
Summary: The characterization of the complex phenomenon of cell differentiation is a key goal of both systems and computational biology. GeStoDifferent is a Cytoscape plugin aimed at the generation and the identification of gene regulatory networks (GRNs) describing an arbitrary stochastic cell differentiation process. The (dynamical) model adopted to describe general GRNs is that of noisy random Boolean networks (NRBNs), with a specific focus on their emergent dynamical behavior. GeStoDifferent explores the space of GRNs by filtering the NRBN instances inconsistent with a stochastic lineage differentiation tree representing the cell lineages that can be obtained by following the fate of a stem cell descendant. Matched networks can then be analyzed by Cytoscape network analysis algorithms or, for instance, used to define (multiscale) models of cellular dynamics.
Availability: Freely available at http://bimib.disco.unimib.it/index.php/Retronet#GESTODifferent or at the Cytoscape App Store http://apps.cytoscape.org/.
Contact: marco.antoniotti@unimib.it
doi:10.1093/bioinformatics/bts726
PMCID: PMC3888149  PMID: 23292740
20.  A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity 
Bioinformatics  2012;29(3):355-364.
Motivation: Recent advances in technology have dramatically increased the availability of protein–protein interaction (PPI) data and stimulated the development of many methods for improving the systems level understanding the cell. However, those efforts have been significantly hindered by the high level of noise, sparseness and highly skewed degree distribution of PPI networks. Here, we present a novel algorithm to reduce the noise present in PPI networks. The key idea of our algorithm is that two proteins sharing some higher-order topological similarities, measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex.
Results: Applying our algorithm to a yeast PPI network, we found that the edges in the reconstructed network have higher biological relevance than in the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species and known protein complexes. Comparison with existing methods shows that the network reconstructed by our method has the highest quality. Using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes. Furthermore, our method is applicable to PPI networks obtained with different experimental systems, such as affinity purification, yeast two-hybrid (Y2H) and protein-fragment complementation assay (PCA), and evidence shows that the predicted edges are likely bona fide physical interactions. Finally, an application to a human PPI network increased the coverage of the network by at least 100%.
Availability: www.cs.utsa.edu/∼jruan/RWS/.
Contact: Jianhua.Ruan@utsa.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts688
PMCID: PMC3562060  PMID: 23235927
21.  Concerning the accuracy of Fido and parameter choice 
Bioinformatics  2012;29(3):412.
Contact: Oliver.Serang@Childrens.Harvard.edu
doi:10.1093/bioinformatics/bts687
PMCID: PMC3562061  PMID: 23193221
22.  Glycosylation Network Analysis Toolbox: a MATLAB-based environment for systems glycobiology 
Bioinformatics  2012;29(3):404-406.
Summary: Systems glycobiology studies the interaction of various pathways that regulate glycan biosynthesis and function. Software tools for the construction and analysis of such pathways are not yet available. We present GNAT, a platform-independent, user-extensible MATLAB-based toolbox that provides an integrated computational environment to construct, manipulate and simulate glycans and their networks. It enables integration of XML-based glycan structure data into SBML (Systems Biology Markup Language) files that describe glycosylation reaction networks. Curation and manipulation of networks is facilitated using class definitions and glycomics database query tools. High quality visualization of networks and their steady-state and dynamic simulation are also supported.
Availability: The software package including source code, help documentation and demonstrations are available at http://sourceforge.net/projects/gnatmatlab/files/.
Contact: neel@buffalo.edu or gangliu@buffalo.edu
doi:10.1093/bioinformatics/bts703
PMCID: PMC3562062  PMID: 23230149
23.  PAIR: paired allelic log-intensity-ratio-based normalization method for SNP-CGH arrays 
Bioinformatics  2012;29(3):299-307.
Motivation: Normalization is critical in DNA copy number analysis. We propose a new method to correctly identify two-copy probes from the genome to obtain representative references for normalization in single nucleotide polymorphism arrays. The method is based on a two-state Hidden Markov Model. Unlike most currently available methods in the literature, the proposed method does not need to assume that the percentage of two-copy state probes is dominant in the genome, as long as there do exist two-copy probes.
Results: The real data analysis and simulation study show that the proposed algorithm is successful in that (i) it performs as well as the current methods (e.g. CGHnormaliter and popLowess) for samples with dominant two-copy states and outperforms these methods for samples with less dominant two-copy states; (ii) it can identify the copy-neutral loss of heterozygosity; and (iii) it is efficient in terms of the computational time used.
Availability: R scripts are available at http://publichealth.lsuhsc.edu/PAIR.html.
Contact: zfang@lsuhsc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts683
PMCID: PMC3562063  PMID: 23196989
24.  nestly—a framework for running software with nested parameter choices and aggregating results 
Bioinformatics  2012;29(3):387-388.
Summary: The execution of a software application or pipeline using various combinations of parameters and inputs is a common task in bioinformatics. In the absence of a specialized tool to organize, streamline and formalize this process, scientists must write frequently complex scripts to perform these tasks. We present nestly, a Python package to facilitate running tools with nested combinations of parameters and inputs. nestly provides three components. First, a module to build nested directory structures corresponding to choices of parameters. Second, the nestrun script to run a given command using each set of parameter choices. Third, the nestagg script to aggregate results of the individual runs into a CSV file, as well as support for more complex aggregation. We also include a module for easily specifying nested dependencies for the SCons build tool, enabling incremental builds.
Availability: Source, documentation and tutorial examples are available at http://github.com/fhcrc/nestly. nestly can be installed from the Python Package Index via pip; it is open source (MIT license).
Contact: cmccoy@fhcrc.org or matsen@fhcrc.org
doi:10.1093/bioinformatics/bts696
PMCID: PMC3562064  PMID: 23220574
25.  Multistructural hot spot characterization with FTProd 
Bioinformatics  2012;29(3):393-394.
Summary: Computational solvent fragment mapping is typically performed on a single structure of a protein to identify and characterize binding sites. However, the simultaneous analysis of several mutant structures or frames of a molecular dynamics simulation may provide more realistic detail about the behavior of the sites. Here we present a plug-in for Visual Molecular Dynamics that streamlines the comparison of the binding configurations of several FTMAP-generated structures.
Availability: FTProd is a freely available and open-source plug-in that can be downloaded at http://amarolab.ucsd.edu/ftprod
Contact: ramaro@ucsd.edu
Supplementary Information: Supplementary data are available at Bioinformatics online
doi:10.1093/bioinformatics/bts689
PMCID: PMC3562065  PMID: 23202744

Results 1-25 (1584)