1.  Alphaherpesvirinae and Gammaherpesvirinae glycoprotein L and CMV UL130 originate from chemokines 
Virology Journal  2013;10:1.
Herpesviridae is a large family of DNA viruses divided into three subfamilies: Alpha-, Beta- and Gammaherpesvirinae. The process of herpesvirus transmission is mediated by a range of proteins, one of which is glycoprotein L (gL). Based on our analysis of the solved structures of HSV2 and EBV gH/gL complexes, we propose that Alphaherpesvirinae and Gammaherpesvirinae glycoprotein L and Betaherpesvirinae UL130 originate from chemokines. Herpes simplex virus type 2 gL and human cytomegalovirus homolog (UL130) adopt a novel C chemokine-like fold, while Epstein-Barr virus gL mimics a CC chemokine structure. Hence, it is possible that gL interface with specific chemokine receptors during the transmission of Herpesviridae. We conclude that the further understanding of the function of viral chemokine-like proteins in Herpesviridae infection may lead to development of novel prophylactic and therapeutic treatment.
PMCID: PMC3598415  PMID: 23279912
Herpesviridae; Glycoprotein L; GL; UL130; Chemokines; HCMV; HSV2; EBV
2.  Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily 
Nucleic Acids Research  2012;40(15):7016-7045.
Proteins belonging to PD-(D/E)XK phosphodiesterases constitute a functionally diverse superfamily with representatives involved in replication, restriction, DNA repair and tRNA–intron splicing. Their malfunction in humans triggers severe diseases, such as Fanconi anemia and Xeroderma pigmentosum. To date there have been several attempts to identify and classify new PD-(D/E)KK phosphodiesterases using remote homology detection methods. Such efforts are complicated, because the superfamily exhibits extreme sequence and structural divergence. Using advanced homology detection methods supported with superfamily-wide domain architecture and horizontal gene transfer analyses, we provide a comprehensive reclassification of proteins containing a PD-(D/E)XK domain. The PD-(D/E)XK phosphodiesterases span over 21 900 proteins, which can be classified into 121 groups of various families. Eleven of them, including DUF4420, DUF3883, DUF4263, COG5482, COG1395, Tsp45I, HaeII, Eco47II, ScaI, HpaII and Replic_Relax, are newly assigned to the PD-(D/E)XK superfamily. Some groups of PD-(D/E)XK proteins are present in all domains of life, whereas others occur within small numbers of organisms. We observed multiple horizontal gene transfers even between human pathogenic bacteria or from Prokaryota to Eukaryota. Uncommon domain arrangements greatly elaborate the PD-(D/E)XK world. These include domain architectures suggesting regulatory roles in Eukaryotes, like stress sensing and cell-cycle regulation. Our results may inspire further experimental studies aimed at identification of exact biological functions, specific substrates and molecular mechanisms of reactions performed by these highly diverse proteins.
PMCID: PMC3424549  PMID: 22638584
3.  Mapping the Substrate Binding Site of Phenylacetone Monooxygenase from Thermobifida fusca by Mutational Analysis▿† 
Applied and Environmental Microbiology  2011;77(16):5730-5738.
Baeyer-Villiger monooxygenases catalyze oxidations that are of interest for biocatalytic applications. Among these enzymes, phenylacetone monooxygenase (PAMO) from Thermobifida fusca is the only protein showing remarkable stability. While related enzymes often present a broad substrate scope, PAMO accepts only a limited number of substrates. Due to the absence of a substrate in the elucidated crystal structure of PAMO, the substrate binding site of this protein has not yet been defined. In this study, a structural model of cyclopentanone monooxygenase, which acts on a broad range of compounds, has been prepared and compared with the structure of PAMO. This revealed 15 amino acid positions in the active site of PAMO that may account for its relatively narrow substrate specificity. We designed and analyzed 30 single and multiple mutants in order to verify the role of these positions. Extensive substrate screening revealed several mutants that displayed increased activity and altered regio- or enantioselectivity in Baeyer-Villiger reactions and sulfoxidations. Further substrate profiling resulted in the identification of mutants with improved catalytic properties toward synthetically attractive compounds. Moreover, the thermostability of the mutants was not compromised in comparison to that of the wild-type enzyme. Our data demonstrate that the positions identified within the active site of PAMO, namely, V54, I67, Q152, and A435, contribute to the substrate specificity of this enzyme. These findings will aid in more dedicated and effective redesign of PAMO and related monooxygenases toward an expanded substrate scope.
PMCID: PMC3165276  PMID: 21724896
4.  Comprehensive Structural and Substrate Specificity Classification of the Saccharomyces cerevisiae Methyltransferome 
PLoS ONE  2011;6(8):e23168.
Methylation is one of the most common chemical modifications of biologically active molecules and it occurs in all life forms. Its functional role is very diverse and involves many essential cellular processes, such as signal transduction, transcriptional control, biosynthesis, and metabolism. Here, we provide further insight into the enzymatic methylation in S. cerevisiae by conducting a comprehensive structural and functional survey of all the methyltransferases encoded in its genome. Using distant homology detection and fold recognition, we found that the S. cerevisiae methyltransferome comprises 86 MTases (53 well-known and 33 putative with unknown substrate specificity). Structural classification of their catalytic domains shows that these enzymes may adopt nine different folds, the most common being the Rossmann-like. We also analyzed the domain architecture of these proteins and identified several new domain contexts. Interestingly, we found that the majority of MTase genes are periodically expressed during yeast metabolic cycle. This finding, together with calculated isoelectric point, fold assignment and cellular localization, was used to develop a novel approach for predicting substrate specificity. Using this approach, we predicted the general substrates for 24 of 33 putative MTases and confirmed these predictions experimentally in both cases tested. Finally, we show that, in S. cerevisiae, methylation is carried out by 34 RNA MTases, 32 protein MTases, eight small molecule MTases, three lipid MTases, and nine MTases with still unknown substrate specificity.
PMCID: PMC3153492  PMID: 21858014
5.  Distant homologs of anti-apoptotic factor HAX1 encode parvalbumin-like calcium binding proteins 
BMC Research Notes  2010;3:197.
Apoptosis is a highly ordered and orchestrated multiphase process controlled by the numerous cellular and extra-cellular signals, which executes the programmed cell death via release of cytochrome c alterations in calcium signaling, caspase-dependent limited proteolysis and DNA fragmentation. Besides the general modifiers of apoptosis, several tissue-specific regulators of this process were identified including HAX1 (HS-1 associated protein X-1) - an anti-apoptotic factor active in myeloid cells. Although HAX1 was the subject of various experimental studies, the mechanisms of its action and a functional link connected with the regulation of apoptosis still remains highly speculative.
Here we provide the data which suggests that HAX1 may act as a regulator or as a sensor of calcium. On the basis of iterative similarity searches, we identified a set of distant homologs of HAX1 in insects. The applied fold recognition protocol gives us strong evidence that the distant insects' homologs of HAX1 are novel parvalbumin-like calcium binding proteins. Although the whole three EF-hands fold is not preserved in vertebrate our analysis suggests that there is an existence of a potential single EF-hand calcium binding site in HAX1. The molecular mechanism of its action remains to be identified, but the risen hypothesis easily translates into previously reported lines of various data on the HAX1 biology as well as, provides us a direct link to the regulation of apoptosis. Moreover, we also report that other family of myeloid specific apoptosis regulators - myeloid leukemia factors (MLF1, MLF2) share the homologous C-terminal domain and taxonomic distribution with HAX1.
Performed structural and active sites analyses gave new insights into mechanisms of HAX1 and MLF families in apoptosis process and suggested possible role of HAX1 in calcium-binding, still the analyses require further experimental verification.
PMCID: PMC2914655  PMID: 20633251
6.  Species Used for Drug Testing Reveal Different Inhibition Susceptibility for 17beta-Hydroxysteroid Dehydrogenase Type 1 
PLoS ONE  2010;5(6):e10969.
Steroid-related cancers can be treated by inhibitors of steroid metabolism. In searching for new inhibitors of human 17beta-hydroxysteroid dehydrogenase type 1 (17β-HSD 1) for the treatment of breast cancer or endometriosis, novel substances based on 15-substituted estrone were validated. We checked the specificity for different 17β-HSD types and species. Compounds were tested for specificity in vitro not only towards recombinant human 17β-HSD types 1, 2, 4, 5 and 7 but also against 17β-HSD 1 of several other species including marmoset, pig, mouse, and rat. The latter are used in the processes of pharmacophore screening. We present the quantification of inhibitor preferences between human and animal models. Profound differences in the susceptibility to inhibition of steroid conversion among all 17β-HSDs analyzed were observed. Especially, the rodent 17β-HSDs 1 were significantly less sensitive to inhibition compared to the human ortholog, while the most similar inhibition pattern to the human 17β-HSD 1 was obtained with the marmoset enzyme. Molecular docking experiments predicted estrone as the most potent inhibitor. The best performing compound in enzymatic assays was also highly ranked by docking scoring for the human enzyme. However, species-specific prediction of inhibitor performance by molecular docking was not possible. We show that experiments with good candidate compounds would out-select them in the rodent model during preclinical optimization steps. Potentially active human-relevant drugs, therefore, would no longer be further developed. Activity and efficacy screens in heterologous species systems must be evaluated with caution.
PMCID: PMC2882332  PMID: 20544026
7.  ELM: the status of the 2010 eukaryotic linear motif resource 
Nucleic Acids Research  2009;38(Database issue):D167-D180.
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
PMCID: PMC2808914  PMID: 19920119
8.  Comprehensive classification of nucleotidyltransferase fold proteins: identification of novel families and their representatives in human 
Nucleic Acids Research  2009;37(22):7701-7714.
This article presents a comprehensive review of large and highly diverse superfamily of nucleotidyltransferase fold proteins by providing a global picture about their evolutionary history, sequence-structure diversity and fulfilled functional roles. Using top-of-the-line homology detection method combined with transitive searches and fold recognition, we revised the realm of these superfamily in numerous databases of catalogued protein families and structures, and identified 10 new families of nucleotidyltransferase fold. These families include hundreds of previously uncharacterized and various poorly annotated proteins such as Fukutin/LICD, NFAT, FAM46, Mab-21 and NRAP. Some of these proteins seem to play novel important roles, not observed before for this superfamily, such as regulation of gene expression or choline incorporation into cell membrane. Importantly, within newly detected families we identified 25 novel superfamily members in human genome. Among these newly assigned members are proteins known to be involved in congenital muscular dystrophy, neurological diseases and retinal pigmentosa what sheds some new light on the molecular background of these genetic disorders. Twelve of new human nucleotidyltransferase fold proteins belong to Mab-21 family known to be involved in organogenesis and development. The determination of specific biological functions of these newly detected proteins remains a challenging task.
PMCID: PMC2794190  PMID: 19833706
9.  Molecular determinants archetypical to the phylum Nematoda 
BMC Genomics  2009;10:114.
Nematoda diverged from other animals between 600–1,200 million years ago and has become one of the most diverse animal phyla on earth. Most nematodes are free-living animals, but many are parasites of plants and animals including humans, posing major ecological and economical challenges around the world.
We investigated phylum-specific molecular characteristics in Nematoda by exploring over 214,000 polypeptides from 32 nematode species including 27 parasites. Over 50,000 nematode protein families were identified based on primary sequence, including ~10% with members from at least three different species. Nearly 1,600 of the multi-species families did not share homology to Pfam domains, including a total of 758 restricted to Nematoda. Majority of the 462 families that were conserved among both free-living and parasitic species contained members from multiple nematode clades, yet ~90% of the 296 parasite-specific families originated only from a single clade. Features of these protein families were revealed through extrapolation of essential functions from observed RNAi phenotypes in C. elegans, bioinformatics-based functional annotations, identification of distant homology based on protein folds, and prediction of expression at accessible nematode surfaces. In addition, we identified a group of nematode-restricted sequence features in energy-generating electron transfer complexes as potential targets for new chemicals with minimal or no toxicity to the host.
This study identified and characterized the molecular determinants that help in defining the phylum Nematoda, and therefore improved our understanding of nematode protein evolution and provided novel insights for the development of next generation parasite control strategies.
PMCID: PMC2666764  PMID: 19296854
10.  3D-Fun: predicting enzyme function from structure 
Nucleic Acids Research  2008;36(Web Server issue):W303-W307.
The ‘omics’ revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at or at
PMCID: PMC2447717  PMID: 18515349
11.  Evaluation of 3D-Jury on CASP7 models 
BMC Bioinformatics  2007;8:304.
3D-Jury, the structure prediction consensus method publicly available in the Meta Server , was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers.
The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models.
The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature available in the Meta Server.
PMCID: PMC2040163  PMID: 17711571
12.  Realm of PD-(D/E)XK nuclease superfamily revisited: detection of novel families with modified transitive meta profile searches 
PD-(D/E)XK nucleases constitute a large and highly diverse superfamily of enzymes that display little sequence similarity despite retaining a common core fold and a few critical active site residues. This makes identification of new PD-(D/E)XK nuclease families a challenging task as they usually escape detection with standard sequence-based methods. We developed a modified transitive meta profile search approach and to consider the structural diversity of PD-(D/E)XK nuclease fold more thoroughly we analyzed also lower than threshold Meta-BASIC hits to select potentially correct predictions placed among unreliable or incorrect ones.
Application of a modified transitive Meta-BASIC searches on updated PFAM families and PDB structures resulted in detection of five new PD-(D/E)XK nuclease families encompassing hundreds of so far uncharacterized and poorly annotated proteins. These include four families catalogued in PFAM database as domains of unknown function (DUF506, DUF524, DUF1626 and DUF1703) and YhgA-like family of putative transposases. Three of these families represent extremely distant homologs (DUF506, DUF524, and YhgA-like), while two are newly defined in updated database (DUF1626 and DUF1703). In addition, we also confidently identified an extended AAA-ATPase domain in the N-terminal region of DUF1703 family proteins.
Obtained results suggest that detailed analysis of below threshold Meta-BASIC hits may push limits further for distant homology detection in the 'midnight zone' of homology. All identified families conserve the core evolutionary fold, secondary structure and hydrophobic patterns common to existing PD-(D/E)XK nucleases and maintain critical active site motifs that contribute to nucleic acid cleavage. Further experimental investigations should address the predicted activity and clarify potential substrates providing further insight into detailed biological role of these newly detected nucleases.
PMCID: PMC1913061  PMID: 17584917
13.  Human Herpesvirus 1 UL24 Gene Encodes a Potential PD-(D/E)XK Endonuclease 
Journal of Virology  2006;80(5):2575-2577.
Using Meta-BASIC, a highly sensitive method for detection of distant similarity between proteins, we have identified another potential PD-(D/E)XK endonuclease in human herpesvirus 1 (HHV-1) encoded by the UL24 gene. The universal presence of UL24 in completed herpesviral genomes of three major subfamilies, Alphaherpesvirinae, Betaherpesvirinae, and Gammaherpesvirinae, suggests a fundamental role for this predicted PD-(D/E)XK endonuclease activity in the viral life cycle.
PMCID: PMC1395385  PMID: 16474163
14.  PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics 
BMC Bioinformatics  2006;7:53.
The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity.
Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes.
We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file.
PMCID: PMC1409798  PMID: 16460560
15.  FFAS03: a server for profile–profile sequence alignments 
Nucleic Acids Research  2005;33(Web Server issue):W284-W288.
The FFAS03 server provides a web interface to the third generation of the profile–profile alignment and fold-recognition algorithm of fold and function assignment system (FFAS) [L. Rychlewski, L. Jaroszewski, W. Li and A. Godzik (2000), Protein Sci., 9, 232–241]. Profile–profile algorithms use information present in sequences of homologous proteins to amplify the patterns defining the family. As a result, they enable detection of remote homologies beyond the reach of other methods. FFAS, initially developed in 2000, is consistently one of the best ranked fold prediction methods in the CAFASP and LiveBench competitions. It is also used by several fold-recognition consensus methods and meta-servers. The FFAS03 server accepts a user supplied protein sequence and automatically generates a profile, which is then compared with several sets of sequence profiles of proteins from PDB, COG, PFAM and SCOP. The profile databases used by the server are automatically updated with the latest structural and sequence information. The server provides access to the alignment analysis, multiple alignment, and comparative modeling tools. Access to the server is open for both academic and commercial researchers. The FFAS03 server is available at .
PMCID: PMC1160179  PMID: 15980471
16.  Identification of novel restriction endonuclease-like fold families among hypothetical proteins 
Nucleic Acids Research  2005;33(11):3598-3605.
Restriction endonucleases and other nucleic acid cleaving enzymes form a large and extremely diverse superfamily that display little sequence similarity despite retaining a common core fold responsible for cleavage. The lack of significant sequence similarity between protein families makes homology inference a challenging task and hinders new family identification with traditional sequence-based approaches. Using the consensus fold recognition method Meta-BASIC that combines sequence profiles with predicted protein secondary structure, we identify nine new restriction endonuclease-like fold families among previously uncharacterized proteins and predict these proteins to cleave nucleic acid substrates. Application of transitive searches combined with gene neighborhood analysis allow us to confidently link these unknown families to a number of known restriction endonuclease-like structures and thus assign folds to the uncharacterized proteins. Finally, our method identifies a novel restriction endonuclease-like domain in the C-terminus of RecC that is not detected with structure-based searches of the existing PDB database.
PMCID: PMC1157100  PMID: 15972856
17.  Practical lessons from protein structure prediction 
Nucleic Acids Research  2005;33(6):1874-1891.
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.
PMCID: PMC1074308  PMID: 15805122
18.  Integrated web service for improving alignment quality based on segments comparison 
BMC Bioinformatics  2004;5:98.
Defining blocks forming the global protein structure on the basis of local structural regularity is a very fruitful idea, extensively used in description, and prediction of structure from only sequence information. Over many years the secondary structure elements were used as available building blocks with great success. Specially prepared sets of possible structural motifs can be used to describe similarity between very distant, non-homologous proteins. The reason for utilizing the structural information in the description of proteins is straightforward. Structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate.
Here we provide a new fragment library for Local Structure Segment (LSS) prediction called FRAGlib which is integrated with a previously described segment alignment algorithm SEA. A joined FRAGlib/SEA server provides easy access to both algorithms, allowing a one stop alignment service using a novel approach to protein sequence alignment based on a network matching approach. The FRAGlib used as secondary structure prediction achieves only 73% accuracy in Q3 measure, but when combined with the SEA alignment, it achieves a significant improvement in pairwise sequence alignment quality, as compared to previous SEA implementation and other public alignment algorithms. The FRAGlib algorithm takes ~2 min. to search over FRAGlib database for a typical query protein with 500 residues. The SEA service align two typical proteins within circa ~5 min. All supplementary materials (detailed results of all the benchmarks, the list of test proteins and the whole fragments library) are available for download on-line at .
The joined FRAGlib/SEA server will be a valuable tool both for molecular biologists working on protein sequence analysis and for bioinformaticians developing computational methods of structure prediction and alignment of proteins.
PMCID: PMC497040  PMID: 15271224
Library of protein motifs; Profile-profile sequence similarity (BLAST; FFAS); Fragments library (FRAGlib); Predicted Local Structure Segments (PLSSs); Segment Alignment (SEA); Network matching problem
19.  Detecting distant homology with Meta-BASIC 
Nucleic Acids Research  2004;32(Web Server issue):W576-W581.
Meta-BASIC ( is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure–functional assignments available on-line at Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an α/β-knot SAM-dependent RNA methyltransferase, respectively.
PMCID: PMC441508  PMID: 15215454
20.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins 
Nucleic Acids Research  2003;31(13):3625-3630.
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more.
PMCID: PMC168952  PMID: 12824381
21.  Detection of reliable and unexpected protein fold predictions using 3D-Jury 
Nucleic Acids Research  2003;31(13):3291-3292.
3D-Jury is a fully automated protein structure meta prediction system accessible via the Meta Server interface (http://BioInfo.PL/Meta). This is one of the meta predictors, which have made a dramatic, unprecedented impact on the last CASP-5 experiment. The 3D-Jury is comparable with other meta servers but it has the highest combined specificity and sensitivity. The presented method is also very simple and versatile and can be used to create meta predictions even from sets of models produced by humans. An additional and very important and novel feature of the system is the high correlation between the reported confidence score and the accuracy of the model. The number of correctly predicted residues can be estimated directly from the prediction score. The high reliability of the method enables any biologist to submit a target of interest to the Meta Server and screen with relatively high confidence, whether the target can be predicted by fold recognition methods while being unpredictable using standard approaches like PSI-Blast. This can point to interesting relationships which could have been missed in annotations of proteins or genomes and provide very valuable information for novel scientific discoveries.
PMCID: PMC168910  PMID: 12824309
22.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure 
Nucleic Acids Research  2003;31(13):3804-3807.
ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence conservation and variability, a technique known from hybrid threading approaches. The accuracy of the meta profiles created this way is compared with profiles containing only sequence information and with the standard approach of aligning a single sequence with a profile. Additionally, the alignment of meta profiles is more sensitive in detecting remote homology between protein families than if aligning two sequence-only profiles or if aligning a profile with a sequence. The specificity of the alignment score is improved in the lower specificity range compared with the robust sequence-only profiles.
PMCID: PMC168911  PMID: 12824423
23.  RNA:(guanine-N2) methyltransferases RsmC/RsmD and their homologs revisited – bioinformatic analysis and prediction of the active site based on the uncharacterized Mj0882 protein structure 
BMC Bioinformatics  2002;3:10.
Escherichia coli guanine-N2 (m2G) methyltransferases (MTases) RsmC and RsmD modify nucleosides G1207 and G966 of 16S rRNA. They possess a common MTase domain in the C-terminus and a variable region in the N-terminus. Their C-terminal domain is related to the YbiN family of hypothetical MTases, but nothing is known about the structure or function of the N-terminal domain.
Using a combination of sequence database searches and fold recognition methods it has been demonstrated that the N-termini of RsmC and RsmD are related to each other and that they represent a "degenerated" version of the C-terminal MTase domain. Novel members of the YbiN family from Archaea and Eukaryota were also indentified. It is inferred that YbiN and both domains of RsmC and RsmD are closely related to a family of putative MTases from Gram-positive bacteria and Archaea, typified by the Mj0882 protein from M. jannaschii (1dus in PDB). Based on the results of sequence analysis and structure prediction, the residues involved in cofactor binding, target recognition and catalysis were identified, and the mechanism of the guanine-N2 methyltransfer reaction was proposed.
Using the known Mj0882 structure, a comprehensive analysis of sequence-structure-function relationships in the family of genuine and putative m2G MTases was performed. The results provide novel insight into the mechanism of m2G methylation and will serve as a platform for experimental analysis of numerous uncharacterized N-MTases.
PMCID: PMC102759  PMID: 11929612
24.  Reassignment of specificities of two cap methyltransferase domains in the reovirus lambda2 protein 
Genome Biology  2001;2(9):research0038.1-research0038.6.
The reovirus λ2 protein catalyzes mRNA capping, that is, addition of a guanosine to the 5' end of each transcript in a 5'-to-5' orientation, as well as transfer of a methyl group from S-adenosyl-L-methionine (AdoMet) to the N7 atom of the added guanosyl moiety and subsequently to the ribose 2'-O atom of the first template-encoded nucleotide. The structure of the human reovirus core has been solved at 3.6 Å resolution, revealing a series of domains that include a putative guanylyltransferase domain and two putative methyltransferase (MTase) domains. It has been suggested that the order of domains in the λ2 protein corresponds to the order of reactions in the pathway and that the m7G (cap 0) and the 2'-O-ribose (cap 1) MTase activities may be exerted by the MTase 1 and the MTase 2 domains, respectively.
We show that the reovirus MTase 1 domain shares a putative active site with the structurally characterized 2'-O-ribose MTases, including vaccinia virus cap 1 MTase, whereas the MTase 2 domain is structurally similar to glycine N-MTase.
On the basis of our analysis of the structural details we propose that the previously suggested functional assignments of the MTase 1 and MTase 2 domains should be swapped.
PMCID: PMC56899  PMID: 11574057
25.  A study of quality measures for protein threading models 
BMC Bioinformatics  2001;2:5.
Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them.
Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3.
We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.
PMCID: PMC55330  PMID: 11545673

