PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1960)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
2.  GPU linear and non-linear Poisson–Boltzmann solver module for DelPhi 
Bioinformatics  2013;30(4):569-570.
Summary: In this work, we present a CUDA-based GPU implementation of a Poisson–Boltzmann equation solver, in both the linear and non-linear versions, using double precision. A finite difference scheme is adopted and made suitable for the GPU architecture. The resulting code was interfaced with the electrostatics software for biomolecules DelPhi, which is widely used in the computational biology community. The algorithm has been implemented using CUDA and tested over a few representative cases of biological interest. Details of the implementation and performance test results are illustrated. A speedup of ∼10 times was achieved both in the linear and non-linear cases.
Availability and implementation: The module is open-source and available at http://www.electrostaticszone.eu/index.php/downloads.
Contact: walter.rocchia@iit.it
Supplementary information: Supplementary data are available at Bioinformatics online
doi:10.1093/bioinformatics/btt699
PMCID: PMC3928518  PMID: 24292939
3.  ChromoHub V2: cancer genomics 
Bioinformatics  2013;30(4):590-592.
Summary: Cancer genomics data produced by next-generation sequencing support the notion that epigenetic mechanisms play a central role in cancer. We have previously developed Chromohub, an open access online interface where users can map chemical, structural and biological data from public repositories on phylogenetic trees of protein families involved in chromatin mediated-signaling. Here, we describe a cancer genomics interface that was recently added to Chromohub; the frequency of mutation, amplification and change in expression of chromatin factors across large cohorts of cancer patients is regularly extracted from The Cancer Genome Atlas and the International Cancer Genome Consortium and can now be mapped on phylogenetic trees of epigenetic protein families. Explorators of chromatin signaling can now easily navigate the cancer genomics landscape of writers, readers and erasers of histone marks, chromatin remodeling complexes, histones and their chaperones.
Availability and implementation: http://www.thesgc.org/chromohub/.
Contact: matthieu.schapira@utoronto.ca
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt710
PMCID: PMC3928521  PMID: 24319001
4.  HTS navigator: freely accessible cheminformatics software for analyzing high-throughput screening data 
Bioinformatics  2013;30(4):588-589.
Summary: We report on the development of the high-throughput screening (HTS) Navigator software to analyze and visualize the results of HTS of chemical libraries. The HTS Navigator processes output files from different plate readers' formats, computes the overall HTS matrix, automatically detects hits and has different types of baseline navigation and correction features. The software incorporates advanced cheminformatics capabilities such as chemical structure storage and visualization, fast similarity search and chemical neighborhood analysis for retrieved hits. The software is freely available for academic laboratories.
Availability and implementation: http://fourches.web.unc.edu/
Contact: fourches@email.unc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt718
PMCID: PMC3928525  PMID: 24376084
5.  BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing 
Bioinformatics  2013;30(4):480-487.
Summary: Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error.
Availability and implementation: BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/∼yunmli/betaseq
Contact: songyan@unc.edu or yunli@med.unc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt719
PMCID: PMC3928526  PMID: 24336643
6.  DIVE: a data intensive visualization engine 
Bioinformatics  2013;30(4):593-595.
Summary: Modern scientific investigation is generating increasingly larger datasets, yet analyzing these data with current tools is challenging. DIVE is a software framework intended to facilitate big data analysis and reduce the time to scientific insight. Here, we present features of the framework and demonstrate DIVE’s application to the Dynameomics project, looking specifically at two proteins.
Availability and implementation: Binaries and documentation are available at http://www.dynameomics.org/DIVE/DIVESetup.exe.
Contact: daggett@uw.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt721
PMCID: PMC3928528  PMID: 24336804
7.  Fast pairwise IBD association testing in genome-wide association studies 
Bioinformatics  2013;30(2):206-213.
Motivation: Recently, investigators have proposed state-of-the-art Identity-by-descent (IBD) mapping methods to detect IBD segments between purportedly unrelated individuals. The IBD information can then be used for association testing in genetic association studies. One approach for this IBD association testing strategy is to test for excessive IBD between pairs of cases (‘pairwise method’). However, this approach is inefficient because it requires a large number of permutations. Moreover, a limited number of permutations define a lower bound for P-values, which makes fine-mapping of associated regions difficult because, in practice, a much larger genomic region is implicated than the region that is actually associated.
Results: In this article, we introduce a new pairwise method ‘Fast-Pairwise’. Fast-Pairwise uses importance sampling to improve efficiency and enable approximation of extremely small P-values. Fast-Pairwise method takes only days to complete a genome-wide scan. In the application to the WTCCC type 1 diabetes data, Fast-Pairwise successfully fine-maps a known human leukocyte antigen gene that is known to cause the disease.
Availability: Fast-Pairwise is publicly available at: http://genetics.cs.ucla.edu/graphibd.
Contact: eeskin@cs.ucla.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt609
PMCID: PMC3892684  PMID: 24158599
8.  Testing multiple biological mediators simultaneously 
Bioinformatics  2013;30(2):214-220.
Motivation: Modern biomedical and epidemiological studies often measure hundreds or thousands of biomarkers, such as gene expression or metabolite levels. Although there is an extensive statistical literature on adjusting for ‘multiple comparisons’ when testing whether these biomarkers are directly associated with a disease, testing whether they are biological mediators between a known risk factor and a disease requires a more complex null hypothesis, thus offering additional methodological challenges.
Results: We propose a permutation approach that tests multiple putative mediators and controls the family wise error rate. We demonstrate that, unlike when testing direct associations, replacing the Bonferroni correction with a permutation approach that focuses on the maximum of the test statistics can significantly improve the power to detect mediators even when all biomarkers are independent. Through simulations, we show the power of our method is 2–5× larger than the power achieved by Bonferroni correction. Finally, we apply our permutation test to a case-control study of dietary risk factors and colorectal adenoma to show that, of 149 test metabolites, docosahexaenoate is a possible mediator between fish consumption and decreased colorectal adenoma risk.
Availability and implementation: R-package included in online Supplementary Material.
Contact: joshua.sampson@nih.gov
Supplementary information: Supplementary materials are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt633
PMCID: PMC3892685  PMID: 24202540
9.  On the simultaneous association analysis of large genomic regions: a massive multi-locus association test 
Bioinformatics  2013;30(2):157-164.
Motivation: For samples of unrelated individuals, we propose a general analysis framework in which hundred thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multilocus analysis, which has focused on the dimension reduction of the data, our multilocus association-clustering test profits from the availability of large numbers of genetic loci by detecting clusters of loci that are associated with the phenotype.
Results: The approach is computationally fast and powerful, enabling the simultaneous association testing of large genomic regions. Even the entire genome or certain chromosomes can be tested simultaneously. Using simulation studies, the properties of the approach are evaluated. In an application to a genome-wide association study for chronic obstructive pulmonary disease, we illustrate the practical relevance of the proposed method by simultaneously testing all genotyped loci of the genome-wide association study and by testing each chromosome individually. Our findings suggest that statistical methodology that incorporates spatial-clustering information will be especially useful in whole-genome sequencing studies in which millions or billions of base pairs are recorded and grouped by genomic regions or genes, and are tested jointly for association.
Availability and implementation: Implementation of the approach is available upon request.
Contact: daq412@mail.harvard.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt654
PMCID: PMC3892690  PMID: 24262215
10.  SeqDepot: streamlined database of biological sequences and precomputed features 
Bioinformatics  2013;30(2):295-297.
Summary: Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot—a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines.
Availability: Freely available on the web at http://seqdepot.net/. REST access via http://seqdepot.net/api/v1. Database files and scripts may be downloaded from http://seqdepot.net/download.
Contact: ulrich.luke+sci@gmail.com
doi:10.1093/bioinformatics/btt658
PMCID: PMC3892692  PMID: 24234005
11.  Pathway Commons at Virtual Cell: use of pathway data for mathematical modeling 
Bioinformatics  2013;30(2):292-294.
Summary: Pathway Commons is a resource permitting simultaneous queries of multiple pathway databases. However, there is no standard mechanism for using these data (stored in BioPAX format) to annotate and build quantitative mathematical models. Therefore, we developed a new module within the virtual cell modeling and simulation software. It provides pathway data retrieval and visualization and enables automatic creation of executable network models directly from qualitative connections between pathway nodes.
Availability and implementation: Available at Virtual Cell (http://vcell.org/). Application runs on all major platforms and does not require registration for use on the user’s computer. Tutorials and video are available at user guide page.
Contact: vcell_support@uchc.edu
doi:10.1093/bioinformatics/btt660
PMCID: PMC3892693  PMID: 24273241
12.  Using Genome Query Language to uncover genetic variation 
Bioinformatics  2013;30(1):1-8.
Motivation: With high-throughput DNA sequencing costs dropping <$1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. To address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants.
Results: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5–10 lines of high-level code and search large datasets (100 GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple datasets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection and focuses on statistical inference.
Availability: GQL can be downloaded from http://cseweb.ucsd.edu/~ckozanit/gql.
Contact: ckozanit@ucsd.edu or vbafna@cs.ucsd.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt250
PMCID: PMC3866549  PMID: 23751181
13.  A C library for retrieving specific reactions from the BioModels database 
Bioinformatics  2013;30(1):129-130.
Summary: We describe libSBMLReactionFinder, a C library for retrieving specific biochemical reactions from the curated systems biology markup language models contained in the BioModels database. The library leverages semantic annotations in the database to associate reactions with human-readable descriptions, making the reactions retrievable through simple string searches. Our goal is to provide a useful tool for quantitative modelers who seek to accelerate modeling efforts through the reuse of previously published representations of specific chemical reactions.
Availability and implementation: The library is open-source and dual licensed under the Mozilla Public License Version 2.0 and GNU General Public License Version 2.0. Project source code, downloads and documentation are available at http://code.google.com/p/lib-sbml-reaction-finder.
Contact: mneal@uw.edu
doi:10.1093/bioinformatics/btt567
PMCID: PMC3866552  PMID: 24078714
14.  A user-oriented web crawler for selectively acquiring online content in e-health research 
Bioinformatics  2013;30(1):104-114.
Motivation: Life stories of diseased and healthy individuals are abundantly available on the Internet. Collecting and mining such online content can offer many valuable insights into patients’ physical and emotional states throughout the pre-diagnosis, diagnosis, treatment and post-treatment stages of the disease compared with those of healthy subjects. However, such content is widely dispersed across the web. Using traditional query-based search engines to manually collect relevant materials is rather labor intensive and often incomplete due to resource constraints in terms of human query composition and result parsing efforts. The alternative option, blindly crawling the whole web, has proven inefficient and unaffordable for e-health researchers.
Results: We propose a user-oriented web crawler that adaptively acquires user-desired content on the Internet to meet the specific online data source acquisition needs of e-health researchers. Experimental results on two cancer-related case studies show that the new crawler can substantially accelerate the acquisition of highly relevant online content compared with the existing state-of-the-art adaptive web crawling technology. For the breast cancer case study using the full training set, the new method achieves a cumulative precision between 74.7 and 79.4% after 5 h of execution till the end of the 20-h long crawling session as compared with the cumulative precision between 32.8 and 37.0% using the peer method for the same time period. For the lung cancer case study using the full training set, the new method achieves a cumulative precision between 56.7 and 61.2% after 5 h of execution till the end of the 20-h long crawling session as compared with the cumulative precision between 29.3 and 32.4% using the peer method. Using the reduced training set in the breast cancer case study, the cumulative precision of our method is between 44.6 and 54.9%, whereas the cumulative precision of the peer method is between 24.3 and 26.3%; for the lung cancer case study using the reduced training set, the cumulative precisions of our method and the peer method are, respectively, between 35.7 and 46.7% versus between 24.1 and 29.6%. These numbers clearly show a consistently superior accuracy of our method in discovering and acquiring user-desired online content for e-health research.
Availability and implementation: The implementation of our user-oriented web crawler is freely available to non-commercial users via the following Web site: http://bsec.ornl.gov/AdaptiveCrawler.shtml. The Web site provides a step-by-step guide on how to execute the web crawler implementation. In addition, the Web site provides the two study datasets including manually labeled ground truth, initial seeds and the crawling results reported in this article.
Contact: xus1@ornl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt571
PMCID: PMC3866553  PMID: 24078710
15.  MSPrep—Summarization, normalization and diagnostics for processing of mass spectrometry–based metabolomic data 
Bioinformatics  2013;30(1):133-134.
Motivation: Although R packages exist for the pre-processing of metabolomic data, they currently do not incorporate additional analysis steps of summarization, filtering and normalization of aligned data. We developed the MSPrep R package to complement other packages by providing these additional steps, implementing a selection of popular normalization algorithms and generating diagnostics to help guide investigators in their analyses.
Availability: http://www.sourceforge.net/projects/msprep
Contact: grant.hughes@ucdenver.edu
Supplementary Information: Supplementary materials are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt589
PMCID: PMC3866554  PMID: 24174567
16.  PhosphoNetworks: a database for human phosphorylation networks 
Bioinformatics  2013;30(1):141-142.
Summary: Phosphorylation plays an important role in cellular signal transduction. Current phosphorylation-related databases often focus on the phosphorylation sites, which are mainly determined by mass spectrometry. Here, we present PhosphoNetworks, a phosphorylation database built on a high-resolution map of phosphorylation networks. This high-resolution map of phosphorylation networks provides not only the kinase–substrate relationships (KSRs), but also the specific phosphorylation sites on which the kinases act on the substrates. The database contains the most comprehensive dataset for KSRs, including the relationships from a recent high-throughput project for identification of KSRs using protein microarrays, as well as known KSRs curated from the literature. In addition, the database also includes several analytical tools for dissecting phosphorylation networks. PhosphoNetworks is expected to play a prominent role in proteomics and phosphorylation-related disease research.
Availability and implementation: http://www.phosphonetworks.org
Contact: jiang.qian@jhmi.edu
doi:10.1093/bioinformatics/btt627
PMCID: PMC3866559  PMID: 24227675
17.  Functional module identification in protein interaction networks by interaction patterns 
Bioinformatics  2013;30(1):81-93.
Motivation: Identifying functional modules in protein–protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of ‘higher than expected connectivity’, those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks.
Results: In this article, we propose a novel optimization formulation LCP2 (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP2 by random walk. A spectral approximate algorithm SLCP2 is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP2 to a new algorithm (greedy algorithm for LCP2) GLCP2 to identify overlapping functional modules. We compare SLCP2 and GLCP2 with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP2 outperform all other compared algorithms.
Availability and implementation: All data and code are available at http://www.cse.usf.edu/∼xqian/fmi/slcp2hop/.
Contact: yijie@mail.usf.edu or xqian@ece.tamu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt569
PMCID: PMC3924044  PMID: 24085567
18.  BSeQC: quality control of bisulfite sequencing experiments 
Bioinformatics  2013;29(24):3227-3229.
Motivation: Bisulfite sequencing (BS-seq) has emerged as the gold standard to study genome-wide DNA methylation at single-nucleotide resolution. Quality control (QC) is a critical step in the analysis pipeline to ensure that BS-seq data are of high quality and suitable for subsequent analysis. Although several QC tools are available for next-generation sequencing data, most of them were not designed to handle QC issues specific to BS-seq protocols. Therefore, there is a strong need for a dedicated QC tool to evaluate and remove potential technical biases in BS-seq experiments.
Results: We developed a package named BSeQC to comprehensively evaluate the quality of BS-seq experiments and automatically trim nucleotides with potential technical biases that may result in inaccurate methylation estimation. BSeQC takes standard SAM/BAM files as input and generates bias-free SAM/BAM files for downstream analysis. Evaluation based on real BS-seq data indicates that the use of the bias-free SAM/BAM file substantially improves the quantification of methylation level.
Availability and implementation: BSeQC is freely available at: http://code.google.com/p/bseqc/.
Contact: wl1@bcm.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt548
PMCID: PMC3842756  PMID: 24064417
19.  MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels 
Bioinformatics  2013;29(24):3143-3150.
Motivation: Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly relevant owing to the increasing attention in globally concerted projects.
Results: We present MATE-CLEVER (Mendelian-inheritance-AtTEntive CLique-Enumerating Variant findER) as an approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. For enhanced quality of indel calls in family trios or quartets, MATE-CLEVER integrates statistics that reflect the laws of Mendelian inheritance. MATE-CLEVER’s performance rates for indels longer than 30 bp are on a par with those of the GATK for indels shorter than 30 bp, achieving up to 90% precision overall, with >80% of calls correctly typed. In predicting de novo indels longer than 30 bp in family contexts, MATE-CLEVER even raises the standards of the GATK. MATE-CLEVER achieves precision and recall of ∼63% on indels of 30 bp and longer versus 55% in both categories for the GATK on indels of 10–29 bp. A special version of MATE-CLEVER has contributed to indel discovery, in particular for indels of 30–100 bp, the ‘NGS twilight zone of indels’, in the Genome of the Netherlands Project.
Availability and implementation: http://clever-sv.googlecode.com/
Contact: tm@cwi.nl or as@cwi.nl
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt556
PMCID: PMC3842759  PMID: 24072733
20.  STAR: an integrated solution to management and visualization of sequencing data 
Bioinformatics  2013;29(24):3204-3210.
Motivation: Easily visualization of complex data features is a necessary step to conduct studies on next-generation sequencing (NGS) data. We developed STAR, an integrated web application that enables online management, visualization and track-based analysis of NGS data.
Results: STAR is a multilayer web service system. On the client side, STAR leverages JavaScript, HTML5 Canvas and asynchronous communications to deliver a smoothly scrolling desktop-like graphical user interface with a suite of in-browser analysis tools that range from providing simple track configuration controls to sophisticated feature detection within datasets. On the server side, STAR supports private session state retention via an account management system and provides data management modules that enable collection, visualization and analysis of third-party sequencing data from the public domain with over thousands of tracks hosted to date. Overall, STAR represents a next-generation data exploration solution to match the requirements of NGS data, enabling both intuitive visualization and dynamic analysis of data.
Availability and implementation: STAR browser system is freely available on the web at http://wanglab.ucsd.edu/star/browser and https://github.com/angell1117/STAR-genome-browser.
Contact: wei-wang@ucsd.edu
doi:10.1093/bioinformatics/btt558
PMCID: PMC3842760  PMID: 24078702
21.  WebGLORE: a Web service for Grid LOgistic REgression 
Bioinformatics  2013;29(24):3238-3240.
WebGLORE is a free web service that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use web service for biomedical researchers to break down policy barriers during information exchange.
Availability and implementation: http://dbmi-engine.ucsd.edu/webglore3/. WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation.
Contact: x1jiang@ucsd.edu
doi:10.1093/bioinformatics/btt559
PMCID: PMC3842761  PMID: 24072732
22.  Optimized atomic statistical potentials: assessment of protein interfaces and loops 
Bioinformatics  2013;29(24):3158-3166.
Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.
Results: We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein–protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures.
Availability and implementation: SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Contact: sali@salilab.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt560
PMCID: PMC3842762  PMID: 24078704
23.  GPCR ontology: development and application of a G protein-coupled receptor pharmacology knowledge framework 
Bioinformatics  2013;29(24):3211-3219.
Motivation: Novel tools need to be developed to help scientists analyze large amounts of available screening data with the goal to identify entry points for the development of novel chemical probes and drugs. As the largest class of drug targets, G protein-coupled receptors (GPCRs) remain of particular interest and are pursued by numerous academic and industrial research projects.
Results: We report the first GPCR ontology to facilitate integration and aggregation of GPCR-targeting drugs and demonstrate its application to classify and analyze a large subset of the PubChem database. The GPCR ontology, based on previously reported BioAssay Ontology, depicts available pharmacological, biochemical and physiological profiles of GPCRs and their ligands. The novelty of the GPCR ontology lies in the use of diverse experimental datasets linked by a model to formally define these concepts. Using a reasoning system, GPCR ontology offers potential for knowledge-based classification of individuals (such as small molecules) as a function of the data.
Availability: The GPCR ontology is available at http://www.bioassayontology.org/bao_gpcr and the National Center for Biomedical Ontologies Web site.
Contact: sschurer@med.miami.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt565
PMCID: PMC3842764  PMID: 24078711
24.  Achievements and challenges in structural bioinformatics and computational biophysics 
Bioinformatics  2014;31(1):146-150.
Motivation: The field of structural bioinformatics and computational biophysics has undergone a revolution in the last 10 years. Developments that are captured annually through the 3DSIG meeting, upon which this article reflects.
Results: An increase in the accessible data, computational resources and methodology has resulted in an increase in the size and resolution of studied systems and the complexity of the questions amenable to research. Concomitantly, the parameterization and efficiency of the methods have markedly improved along with their cross-validation with other computational and experimental results.
Conclusion: The field exhibits an ever-increasing integration with biochemistry, biophysics and other disciplines. In this article, we discuss recent achievements along with current challenges within the field.
Contact: Rafael.Najmanovich@USherbrooke.ca
doi:10.1093/bioinformatics/btu769
PMCID: PMC4271151  PMID: 25488929
25.  PAVIS: a tool for Peak Annotation and Visualization 
Bioinformatics  2013;29(23):3097-3099.
Summary: We introduce a web-based tool, Peak Annotation and Visualization (PAVIS), for annotating and visualizing ChIP-seq peak data. PAVIS is designed with non-bioinformaticians in mind and presents a straightforward user interface to facilitate biological interpretation of ChIP-seq peak or other genomic enrichment data. PAVIS, through association with annotation, provides relevant genomic context for each peak, such as peak location relative to genomic features including transcription start site, intron, exon or 5′/3′-untranslated region. PAVIS reports the relative enrichment P-values of peaks in these functionally distinct categories, and provides a summary plot of the relative proportion of peaks in each category. PAVIS, unlike many other resources, provides a peak-oriented annotation and visualization system, allowing dynamic visualization of tens to hundreds of loci from one or more ChIP-seq experiments, simultaneously. PAVIS enables rapid, and easy examination and cross-comparison of the genomic context and potential functions of the underlying genomic elements, thus supporting downstream hypothesis generation.
Availability and Implementation: PAVIS is publicly accessed at http://manticore.niehs.nih.gov/pavis.
Contact: li3@niehs.nih.gov
Supplementary information: Supplementary data are available at Bioinformatics online
doi:10.1093/bioinformatics/btt520
PMCID: PMC3834791  PMID: 24008416

Results 1-25 (1960)