Search tips
Search criteria

Results 1-15 (15)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Web-Beagle: a web server for the alignment of RNA secondary structures 
Nucleic Acids Research  2015;43(Web Server issue):W493-W497.
Web-Beagle ( is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3′ UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods.
PMCID: PMC4489221  PMID: 25977293
2.  Exploiting holistic approaches to model specificity in protein phosphorylation 
Frontiers in Genetics  2014;5:315.
Phosphate plays a chemically unique role in shaping cellular signaling of all current living systems, especially eukaryotes. Protein phosphorylation has been studied at several levels, from the near-site context, both in sequence and structure, to the crowded cellular environment, and ultimately to the systems-level perspective. Despite the tremendous advances in mass spectrometry and efforts dedicated to the development of ad hoc highly sophisticated methods, phosphorylation site inference and associated kinase identification are still unresolved problems in kinome biology. The sequence and structure of the substrate near-site context are not sufficient alone to model the in vivo phosphorylation rules, and they should be integrated with orthogonal information in all possible applications. Here we provide an overview of the different contexts that contribute to protein phosphorylation, discussing their potential impact in phosphorylation site annotation and in predicting kinase-substrate specificity.
PMCID: PMC4179730  PMID: 25324856
kinase-substrate specificity; phosphorylation context; phosphorylation prediction; cellular signaling; kinase-peptide specificity; substrate recruitment; signaling networks
3.  Computational methods for analysis and inference of kinase/inhibitor relationships 
Frontiers in Genetics  2014;5:196.
The central role of kinases in virtually all signal transduction networks is the driving motivation for the development of compounds modulating their activity. ATP-mimetic inhibitors are essential tools for elucidating signaling pathways and are emerging as promising therapeutic agents. However, off-target ligand binding and complex and sometimes unexpected kinase/inhibitor relationships can occur for seemingly unrelated kinases, stressing that computational approaches are needed for learning the interaction determinants and for the inference of the effect of small compounds on a given kinase. Recently published high-throughput profiling studies assessed the effects of thousands of small compound inhibitors, covering a substantial portion of the kinome. This wealth of data paved the road for computational resources and methods that can offer a major contribution in understanding the reasons of the inhibition, helping in the rational design of more specific molecules, in the in silico prediction of inhibition for those neglected kinases for which no systematic analysis has been carried yet, in the selection of novel inhibitors with desired selectivity, and offering novel avenues of personalized therapies.
PMCID: PMC4075008  PMID: 25071826
kinase inhibitors; kinase activity modulation; kinase/inhibitor inference; drug design and development; chemogenomics
4.  A Proteome-wide Domain-centric Perspective on Protein Phosphorylation *  
Phosphorylation is a widespread post-translational modification that modulates the function of a large number of proteins. Here we show that a significant proportion of all the domains in the human proteome is significantly enriched or depleted in phosphorylation events. A substantial improvement in phosphosites prediction is achieved by leveraging this observation, which has not been tapped by existing methods. Phosphorylation sites are often not shared between multiple occurrences of the same domain in the proteome, even when the phosphoacceptor residue is conserved. This is partly because of different functional constraints acting on the same domain in different protein contexts. Moreover, by augmenting domain alignments with structural information, we were able to provide direct evidence that phosphosites in protein-protein interfaces need not be positionally conserved, likely because they can modulate interactions simply by sitting in the same general surface area.
PMCID: PMC4159644  PMID: 24830415
5.  A novel approach to represent and compare RNA secondary structures 
Nucleic Acids Research  2014;42(10):6146-6157.
Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny.
PMCID: PMC4041456  PMID: 24753415
6.  Role of CTCF Protein in Regulating FMR1 Locus Transcription 
PLoS Genetics  2013;9(7):e1003601.
Fragile X syndrome (FXS), the leading cause of inherited intellectual disability, is caused by epigenetic silencing of the FMR1 gene, through expansion and methylation of a CGG triplet repeat (methylated full mutation). An antisense transcript (FMR1-AS1), starting from both promoter and intron 2 of the FMR1 gene, was demonstrated in transcriptionally active alleles, but not in silent FXS alleles. Moreover, a DNA methylation boundary, which is lost in FXS, was recently identified upstream of the FMR1 gene. Several nuclear proteins bind to this region, like the insulator protein CTCF. Here we demonstrate for the first time that rare unmethylated full mutation (UFM) alleles present the same boundary described in wild type (WT) alleles and that CTCF binds to this region, as well as to the FMR1 gene promoter, exon 1 and intron 2 binding sites. Contrariwise, DNA methylation prevents CTCF binding to FXS alleles. Drug-induced CpGs demethylation does not restore this binding. CTCF knock-down experiments clearly established that CTCF does not act as insulator at the active FMR1 locus, despite the presence of a CGG expansion. CTCF depletion induces heterochromatinic histone configuration of the FMR1 locus and results in reduction of FMR1 transcription, which however is not accompanied by spreading of DNA methylation towards the FMR1 promoter. CTCF depletion is also associated with FMR1-AS1 mRNA reduction. Antisense RNA, like sense transcript, is upregulated in UFM and absent in FXS cells and its splicing is correlated to that of the FMR1-mRNA. We conclude that CTCF has a complex role in regulating FMR1 expression, probably through the organization of chromatin loops between sense/antisense transcriptional regulatory regions, as suggested by bioinformatics analysis.
Author Summary
Fragile X syndrome is the most common cause of inherited intellectual disability, accounting for about 1∶3000 males and 1∶4000 females. It is caused by a dynamic mutation of FMR1, a gene mapping on the X chromosome and containing a CGG repeat in its promoter region. Expansion of this unstable sequence beyond 200 repeats (full mutation) is followed by DNA methylation and histone changes, leading to the transcriptional inactivation of FMR1 and to the lack of the FMRP protein. Recently, an antisense transcript (FMR1-AS1) spanning the CGG repeats and a region of transition of DNA methylation (boundary) located upstream of the CGG repeats have been identified in transcriptional active FMR1 alleles. Several nuclear proteins bound to the methylation boundary have been described, such as the zinc-finger protein CTCF, the first known insulator in mammals. This protein is an important transcriptional regulator of genes harboring trinucleotide repeats and it is mostly active in chromatin organization. For the first time, we have investigated the role of CTCF protein in the transcriptional regulation of the FMR1 gene. Our results define a complex role for CTCF acting through chromatin organization of the FMR1 locus.
PMCID: PMC3715420  PMID: 23874213
7.  Alternative splicing tends to avoid partial removals of protein-protein interaction sites 
BMC Genomics  2013;14:379.
Anecdotal evidence of the involvement of alternative splicing (AS) in the regulation of protein-protein interactions has been reported by several studies. AS events have been shown to significantly occur in regions where a protein interaction domain or a short linear motif is present. Several AS variants show partial or complete loss of interface residues, suggesting that AS can play a major role in the interaction regulation by selectively targeting the protein binding sites. In the present study we performed a statistical analysis of the alternative splicing of a non-redundant dataset of human protein-protein interfaces known at molecular level to determine the importance of this way of modulation of protein-protein interactions through AS.
Using a Cochran-Mantel-Haenszel chi-square test we demonstrated that the alternative splicing-mediated partial removal of both heterodimeric and homodimeric binding sites occurs at lower frequencies than expected, and this holds true even if we consider only those isoforms whose sequence is less different from that of the canonical protein and which therefore allow to selectively regulate functional regions of the protein. On the other hand, large removals of the binding site are not significantly prevented, possibly because they are associated to drastic structural changes of the protein. The observed protection of the binding sites from AS is not preferentially directed towards putative hot spot interface residues, and is widespread to all protein functional classes.
Our findings indicate that protein-protein binding sites are generally protected from alternative splicing-mediated partial removals. However, some cases in which the binding site is selectively removed exist, and here we discuss one of them.
PMCID: PMC3700808  PMID: 23758645
Alternative splicing; Protein-protein interaction; Hot spots; Protein three-dimensional structure; Disordered regions
8.  webPDBinder: a server for the identification of ligand binding sites on protein structures 
Nucleic Acids Research  2013;41(Web Server issue):W308-W313.
The webPDBinder ( is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.
PMCID: PMC3692056  PMID: 23737450
9.  Identification of Nucleotide-Binding Sites in Protein Structures: A Novel Approach Based on Nucleotide Modularity 
PLoS ONE  2012;7(11):e50240.
Nucleotides are involved in several cellular processes, ranging from the transmission of genetic information, to energy transfer and storage. Both sequence and structure based methods have been developed to predict the location of nucleotide-binding sites in proteins. Here we propose a novel methodology that leverages the observation that nucleotide-binding sites have a modular structure. Nucleotides are composed of identifiable fragments, i.e. the phosphate, the nucleobase and the carbohydrate moieties. These fragments are bound by specific structural motifs that recur in proteins of different fold. Moreover these motifs behave as modules and are found in different combinations across fold space. Our method predicts binding sites for each nucleotide fragment by comparing a query protein with a database of templates extracted from proteins of known structure. Whenever a similarity is found the fragment bound by the template is transferred on the query protein, thus identifying a putative binding site. Predictions falling inside the surface of the protein are discarded, and the remaining ones are scored using clustering and conservation. The method is able to rank as first a correct prediction in the 48%, 48% and 68% of the analyzed proteins for the nucleobase, carbohydrate and phosphate respectively, while considering the first five predictions the performances change to 71%, 65% and 86% respectively. Furthermore we attempted to reconstruct the full structure of the binding site, starting from the predicted positions of the fragments. We calculated that in the 59% of the analyzed proteins the method ranks as first a reconstructed binding site or a part of it. Finally we tested the reliability of our method in a real world case in which it has to predict nucleotide-binding sites in unbound proteins. We analyzed proteins whose structure has been solved with and without the nucleotide and observed only little variations in the method performance.
PMCID: PMC3507729  PMID: 23209685
10.  Specific tagging of the egress-related osmiophilic bodies in the gametocytes of Plasmodium falciparum 
Malaria Journal  2012;11:88.
Gametocytes, the blood stages responsible for Plasmodium falciparum transmission, contain electron dense organelles, traditionally named osmiophilic bodies, that are believed to be involved in gamete egress from the host cell. In order to provide novel tools in the cellular and molecular studies of osmiophilic body biology, a P. falciparum transgenic line in which these organelles are specifically marked by a reporter protein was produced and characterized.
A P. falciparum transgenic line expressing an 80-residue N-terminal fragment of the osmiophilic body protein Pfg377 fused to the reporter protein DsRed, under the control of pfg377 upstream and downstream regulatory regions, was produced.
The transgenic fusion protein is expressed at the appropriate time and stage of sexual differentiation and is trafficked to osmiophilic bodies as the endogenous Pfg377 protein. These results indicate that a relatively small N-terminal portion of Pfg377 is sufficient to target the DsRed reporter to the gametocyte osmiophilic bodies.
This is the first identification of a P. falciparum aminoacid sequence able to mediate trafficking to such organelles. To fluorescently tag such poorly characterized organelles opens novel avenues in cellular and imaging studies on their biogenesis and on their role in gamete egress.
PMCID: PMC3342164  PMID: 22452991
Malaria; Plasmodium falciparum; pfg377; Female gametocyte; Osmiophilic body; Subcellular localization; Gamete egress; Trafficking
11.  Coding potential of the products of alternative splicing in human 
Genome Biology  2011;12(1):R9.
Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein.
In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products.
The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity.
PMCID: PMC3091307  PMID: 21251333
12.  Functional annotation by identification of local surface similarities: a novel tool for structural genomics 
BMC Bioinformatics  2005;6:194.
Protein function is often dependent on subsets of solvent-exposed residues that may exist in a similar three-dimensional configuration in non homologous proteins thus having different order and/or spacing in the sequence. Hence, functional annotation by means of sequence or fold similarity is not adequate for such cases.
We describe a method for the function-related annotation of protein structures by means of the detection of local structural similarity with a library of annotated functional sites. An automatic procedure was used to annotate the function of local surface regions. Next, we employed a sequence-independent algorithm to compare exhaustively these functional patches with a larger collection of protein surface cavities. After tuning and validating the algorithm on a dataset of well annotated structures, we applied it to a list of protein structures that are classified as being of unknown function in the Protein Data Bank. By this strategy, we were able to provide functional clues to proteins that do not show any significant sequence or global structural similarity with proteins in the current databases.
This method is able to spot structural similarities associated to function-related similarities, independently on sequence or fold resemblance, therefore is a valuable tool for the functional analysis of uncharacterized proteins. Results are available at
PMCID: PMC1190158  PMID: 16076399
13.  SURFACE: a database of protein surface regions for functional annotation 
Nucleic Acids Research  2004;32(Database issue):D240-D244.
The SURFACE (SUrface Residues and Functions Annotated, Compared and Evaluated, URL database is a repository of annotated and compared protein surface regions. SURFACE contains the results of a large-scale protein annotation and local structural comparison project. A non-redundant set of protein chains is used to build a database of protein surface patches, defined as putative surface functional sites. Each patch is annotated with sequence and structure-derived information about function or interaction abilities. A new procedure for structure comparison is used to perform an all-versus-all patches comparison. Selection of the results obtained with stringent parameters offers a similarity score that can be used to associate different patches and allows reliable annotation by similarity. Annotation exerted through the comparison of regions of protein surface allows the highlighting of similarities that cannot be recognized by other methods of sequence or structure comparison. A graphic representation of the surface patches, functional annotations and the structural superpositions is available through the web interface.
PMCID: PMC308788  PMID: 14681403
14.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins 
Nucleic Acids Research  2003;31(13):3625-3630.
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more.
PMCID: PMC168952  PMID: 12824381
15.  Development of Computational Tools for the Inference of Protein Interaction Specificity Rules and Functional Annotation Using Structural Information 
Relatively few protein structures are known, compared to the enormous amount of sequence data produced in the sequencing of different genomes, and relatively few protein complexes are deposited in the PDB with respect to the great amount of interaction data coming from high-throughput experiments (two-hybrid or affinity purification of protein complexes and mass spectrometry). Nevertheless, we can rely on computational techniques for the extraction of high-quality and information-rich data from the known structures and for their spreading in the protein sequence space. We describe here the ongoing research projects in our group: we analyse the protein complexes stored in the PDB and, for each complex involving one domain belonging to a family of interaction domains for which some interaction data are available, we can calculate its probability of interaction with any protein sequence. We analyse the structures of proteins encoding a function specified in a PROSITE pattern, which exhibits relatively low selectivity and specificity, and build extended patterns. To this aim, we consider residues that are well-conserved in the structure, even if their conservation cannot easily be recognized in the sequence alignment of the proteins holding the function. We also analyse protein surface regions and, through the annotation of the solvent-exposed residues, we annotate protein surface patches via a structural comparison performed with stringent parameters and independently of the residue order in the sequence. Local surface comparison may also help in identifying new sequence patterns, which could not be highlighted with other sequence-based methods.
PMCID: PMC2447366  PMID: 18629081

Results 1-15 (15)