PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-6 (6)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
1.  BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms 
Advances in Bioinformatics  2012;2012:893048.
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
doi:10.1155/2012/893048
PMCID: PMC3366197  PMID: 22675348
2.  TFM-Explorer: mining cis-regulatory regions in genomes 
Nucleic Acids Research  2010;38(Web Server issue):W286-W292.
DNA-binding transcription factors (TFs) play a central role in transcription regulation, and computational approaches that help in elucidating complex mechanisms governing this basic biological process are of great use. In this perspective, we present the TFM-Explorer web server that is a toolbox to identify putative TF binding sites within a set of upstream regulatory sequences of genes sharing some regulatory mechanisms. TFM-Explorer finds local regions showing overrepresentation of binding sites. Accepted organisms are human, mouse, rat, chicken and drosophila. The server employs a number of features to help users to analyze their data: visualization of selected binding sites on genomic sequences, and selection of cis-regulatory modules. TFM-Explorer is available at http://bioinfo.lifl.fr/TFM.
doi:10.1093/nar/gkq473
PMCID: PMC2896114  PMID: 20522509
3.  MAGNOLIA: multiple alignment of protein–coding and structural RNA sequences 
Nucleic Acids Research  2008;36(Web Server issue):W14-W18.
MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designed for sequences that are intended to be either protein-coding or structural RNAs. It extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern between sequences before aligning them. The alignment step then incorporates this information to achieve higher accuracy. The website is available at http://bioinfo.lifl.fr/magnolia.
doi:10.1093/nar/gkn321
PMCID: PMC2447753  PMID: 18515348
4.  Efficient and accurate P-value computation for Position Weight Matrices 
Background
Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time.
Results
The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available.
Conclusion
We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.
doi:10.1186/1748-7188-2-15
PMCID: PMC2238751  PMID: 18072973
5.  Predicting transcription factor binding sites using local over-representation and comparative genomics 
BMC Bioinformatics  2006;7:396.
Background
Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms.
Results
We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets.
Conclusion
TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at .
doi:10.1186/1471-2105-7-396
PMCID: PMC1570149  PMID: 16945132
6.  CARNAC: folding families of related RNAs 
Nucleic Acids Research  2004;32(Web Server issue):W142-W145.
We present a tool for the prediction of conserved secondary structure elements of a family of homologous non-coding RNAs. Our method does not require any prior multiple sequence alignment. Thus, it successfully applies to datasets with low primary structure similarity. The functionality is demonstrated using three example datasets: sequences of RNase P RNAs, ciliate telomerases and enterovirus messenger RNAs. CARNAC has a web server that can be accessed at the URL http://bioinfo.lifl.fr/carnac.
doi:10.1093/nar/gkh415
PMCID: PMC441553  PMID: 15215367

Results 1-6 (6)