Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads 
PLoS ONE  2014;9(5):e97277.
Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads.
PMCID: PMC4026317  PMID: 24842718
2.  A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units 
BMC Bioinformatics  2014;15(Suppl 1):S10.
Single Nucleotide Polymorphism (SNP) genotyping analysis is very susceptible to SNPs chromosomal position errors. As it is known, SNPs mapping data are provided along the SNP arrays without any necessary information to assess in advance their accuracy. Moreover, these mapping data are related to a given build of a genome and need to be updated when a new build is available. As a consequence, researchers often plan to remap SNPs with the aim to obtain more up-to-date SNPs chromosomal positions. In this work, we present G-SNPM a GPU (Graphics Processing Unit) based tool to map SNPs on a genome.
G-SNPM is a tool that maps a short sequence representative of a SNP against a reference DNA sequence in order to find the physical position of the SNP in that sequence. In G-SNPM each SNP is mapped on its related chromosome by means of an automatic three-stage pipeline. In the first stage, G-SNPM uses the GPU-based short-read mapping tool SOAP3-dp to parallel align on a reference chromosome its related sequences representative of a SNP. In the second stage G-SNPM uses another short-read mapping tool to remap the sequences unaligned or ambiguously aligned by SOAP3-dp (in this stage SHRiMP2 is used, which exploits specialized vector computing hardware to speed-up the dynamic programming algorithm of Smith-Waterman). In the last stage, G-SNPM analyzes the alignments obtained by SOAP3-dp and SHRiMP2 to identify the absolute position of each SNP.
Results and conclusions
To assess G-SNPM, we used it to remap the SNPs of some commercial chips. Experimental results shown that G-SNPM has been able to remap without ambiguity almost all SNPs. Based on modern GPUs, G-SNPM provides fast mappings without worsening the accuracy of the results. G-SNPM can be used to deal with specialized Genome Wide Association Studies (GWAS), as well as in annotation tasks that require to update the SNP mapping probes.
PMCID: PMC4015528  PMID: 24564714
3.  Literature Retrieval and Mining in Bioinformatics: State of the Art and Challenges 
Advances in Bioinformatics  2012;2012:573846.
The world has widely changed in terms of communicating, acquiring, and storing information. Hundreds of millions of people are involved in information retrieval tasks on a daily basis, in particular while using a Web search engine or searching their e-mail, making such field the dominant form of information access, overtaking traditional database-style searching. How to handle this huge amount of information has now become a challenging issue. In this paper, after recalling the main topics concerning information retrieval, we present a survey on the main works on literature retrieval and mining in bioinformatics. While claiming that information retrieval approaches are useful in bioinformatics tasks, we discuss some challenges aimed at showing the effectiveness of these approaches applied therein.
PMCID: PMC3388278  PMID: 22778730
4.  Computational Design of a DNA- and Fc-Binding Fusion Protein 
Advances in Bioinformatics  2011;2011:457578.
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein.
PMCID: PMC3173724  PMID: 21941539
5.  ProDaMa: an open source Python library to generate protein structure datasets 
BMC Research Notes  2009;2:202.
The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements.
To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data.
ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL .
PMCID: PMC2761932  PMID: 19799773
6.  A Hybrid Genetic-Neural System for Predicting Protein Secondary Structure 
BMC Bioinformatics  2005;6(Suppl 4):S3.
Due to the strict relation between protein function and structure, the prediction of protein 3D-structure has become one of the most important tasks in bioinformatics and proteomics. In fact, notwithstanding the increase of experimental data on protein structures available in public databases, the gap between known sequences and known tertiary structures is constantly increasing. The need for automatic methods has brought the development of several prediction and modelling tools, but a general methodology able to solve the problem has not yet been devised, and most methodologies concentrate on the simplified task of predicting secondary structure.
In this paper we concentrate on the problem of predicting secondary structures by adopting a technology based on multiple experts. The system performs an overall processing based on two main steps: first, a "sequence-to-structure" prediction is enforced by resorting to a population of hybrid (genetic-neural) experts, and then a "structure-to-structure" prediction is performed by resorting to an artificial neural network. Experiments, performed on sequences taken from well-known protein databases, allowed to reach an accuracy of about 76%, which is comparable to those obtained by state-of-the-art predictors.
The adoption of a hybrid technique, which encompasses genetic and neural technologies, has demonstrated to be a promising approach in the task of protein secondary structure prediction.
PMCID: PMC1866382  PMID: 16351752

Results 1-6 (6)