Search tips
Search criteria

Results 1-24 (24)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations 
Bioinformatics  2014;30(21):3128-3130.
Motivation: Recent breakthroughs in protein residue–residue contact prediction have made reliable de novo prediction of protein structures possible. The key was to apply statistical methods that can distinguish direct couplings between pairs of columns in a multiple sequence alignment from merely correlated pairs, i.e. to separate direct from indirect effects. Two classes of such methods exist, either relying on regularized inversion of the covariance matrix or on pseudo-likelihood maximization (PLM). Although PLM-based methods offer clearly higher precision, available tools are not sufficiently optimized and are written in interpreted languages that introduce additional overheads. This impedes the runtime and large-scale contact prediction for larger protein families, multi-domain proteins and protein–protein interactions.
Results: Here we introduce CCMpred, our performance-optimized PLM implementation in C and CUDA C. Using graphics cards in the price range of current six-core processors, CCMpred can predict contacts for typical alignments 35–113 times faster and with the same precision as the most accurate published methods. For users without a CUDA-capable graphics card, CCMpred can also run in a CPU mode that is still 4–14 times faster. Thanks to our speed-ups ( contacts for typical protein families can be predicted in 15–60 s on a consumer-grade GPU and 1–6 min on a six-core CPU.
Availability and implementation: CCMpred is free and open-source software under the GNU Affero General Public License v3 (or later) available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4201158  PMID: 25064567
2.  RECQL5 Controls Transcript Elongation and Suppresses Genome Instability Associated with Transcription Stress 
Cell  2014;157(5):1037-1049.
RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes.
Graphical Abstract
•RECQL5 is a general RNAPII elongation factor•RECQL5 reduces the elongation rate while decreasing pausing and arrest events•Loss of RECQL5 results in genome instability in genes and at common fragile sites•Incidents of genome instability colocalize with pause and arrest events
Rapid elongation by RNA polymerase II leads to increased transcriptional stress including a high incidence of pausing and arrests, which correlates with sites of genomic instability. RECQL5 modulates the rate of transcription, mitigating both the stress and instability effects.
PMCID: PMC4032574  PMID: 24836610
3.  In Vivo Ligands of MDA5 and RIG-I in Measles Virus-Infected Cells 
PLoS Pathogens  2014;10(4):e1004081.
RIG-I-like receptors (RLRs: RIG-I, MDA5 and LGP2) play a major role in the innate immune response against viral infections and detect patterns on viral RNA molecules that are typically absent from host RNA. Upon RNA binding, RLRs trigger a complex downstream signaling cascade resulting in the expression of type I interferons and proinflammatory cytokines. In the past decade extensive efforts were made to elucidate the nature of putative RLR ligands. In vitro and transfection studies identified 5′-triphosphate containing blunt-ended double-strand RNAs as potent RIG-I inducers and these findings were confirmed by next-generation sequencing of RIG-I associated RNAs from virus-infected cells. The nature of RNA ligands of MDA5 is less clear. Several studies suggest that double-stranded RNAs are the preferred agonists for the protein. However, the exact nature of physiological MDA5 ligands from virus-infected cells needs to be elucidated. In this work, we combine a crosslinking technique with next-generation sequencing in order to shed light on MDA5-associated RNAs from human cells infected with measles virus. Our findings suggest that RIG-I and MDA5 associate with AU-rich RNA species originating from the mRNA of the measles virus L gene. Corresponding sequences are poorer activators of ATP-hydrolysis by MDA5 in vitro, suggesting that they result in more stable MDA5 filaments. These data provide a possible model of how AU-rich sequences could activate type I interferon signaling.
Author Summary
RIG-I-like receptors (RLRs) are helicase-like molecules that detect cytosolic RNAs that are absent in the non-infected host. Upon binding to specific RNA patterns, RLRs elicit a signaling cascade that leads to host defense via the production of antiviral molecules. To understand how RLRs sense RNA, it is important to characterize the nature and origin of RLR-associated RNA from virus-infected cells. While it is well established that RIG-I binds 5′-triphosphate containing double-stranded RNA, the in vivo occurring ligand for MDA5 is poorly characterized. A major challenge in examining MDA5 agonists is the apparently transient interaction between the protein and its ligand. To improve the stability of interaction, we have used an approach to crosslink MDA5 to RNA in measles virus-infected cells. The virus-infected cells were treated with the photoactivatable nucleoside analog 4-thiouridine, which is incorporated in newly synthesized RNA. Upon 365 nm UV light exposure of living cells, a covalent linkage between the labeled RNA and the receptor protein is induced, resulting in a higher RNA recovery from RLR immunoprecipitates. Based on next generation sequencing, bioinformatics and in vitro approaches, we observed a correlation between the AU-composition of viral RNA and its ability to induce an MDA5-dependent immune response.
PMCID: PMC3990713  PMID: 24743923
4.  Recruitment of TREX to the Transcription Machinery by Its Direct Binding to the Phospho-CTD of RNA Polymerase II 
PLoS Genetics  2013;9(11):e1003914.
Messenger RNA (mRNA) synthesis and export are tightly linked, but the molecular mechanisms of this coupling are largely unknown. In Saccharomyces cerevisiae, the conserved TREX complex couples transcription to mRNA export and mediates mRNP formation. Here, we show that TREX is recruited to the transcription machinery by direct interaction of its subcomplex THO with the serine 2-serine 5 (S2/S5) diphosphorylated CTD of RNA polymerase II. S2 and/or tyrosine 1 (Y1) phosphorylation of the CTD is required for TREX occupancy in vivo, establishing a second interaction platform necessary for TREX recruitment in addition to RNA. Genome-wide analyses show that the occupancy of THO and the TREX components Sub2 and Yra1 increases from the 5′ to the 3′ end of the gene in accordance with the CTD S2 phosphorylation pattern. Importantly, in a mutant strain, in which TREX is recruited to genes but does not increase towards the 3′ end, the expression of long transcripts is specifically impaired. Thus, we show for the first time that a 5′-3′ increase of a protein complex is essential for correct expression of the genome. In summary, we provide insight into how the phospho-code of the CTD directs mRNP formation and export through TREX recruitment.
Author Summary
Gene expression is a fundamental cellular process that translates the information stored in the DNA into proteins, the workhorses of the cell. Eukaryotic cells contain a nucleus, where the genetic information is stored and transcribed by RNA polymerase II into messenger (m)RNAs. These copies of the blueprint of life need to be exported to the cytoplasm for protein production. Interestingly, mRNA synthesis is coupled to nuclear mRNA export. The protein complex TREX mediates this coupling of transcription to mRNA export. To assess the recruitment mechanism of TREX to genes we analyzed the presence of TREX over the whole genome in budding yeast. We found that there is more TREX at the end than at the beginning of genes. TREX binds to a subunit of RNA polymerase II, phosphorylation of which increases over the gene mediating the increase in TREX. Importantly, this increase in TREX over genes is important for normal levels of long transcripts. Thus, we show for the first time that a gradual increase of a protein complex is important for correct expression of the genome. We propose that TREX functions to keep the mRNA in the vicinity of the transcription machinery for correct processing and mRNP formation.
PMCID: PMC3828145  PMID: 24244187
5.  kClust: fast and sensitive clustering of large protein sequence databases 
BMC Bioinformatics  2013;14:248.
Fueled by rapid progress in high-throughput sequencing, the size of public sequence databases doubles every two years. Searching the ever larger and more redundant databases is getting increasingly inefficient. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed, sensitivity, and readability of homology searches. However, because the clustering time is quadratic in the number of sequences, standard sequence search methods are becoming impracticable.
Here we present a method to cluster large protein sequence databases such as UniProt within days down to 20%–30% maximum pairwise sequence identity. kClust owes its speed and sensitivity to an alignment-free prefilter that calculates the cumulative score of all similar 6-mers between pairs of sequences, and to a dynamic programming algorithm that operates on pairs of similar 4-mers. To increase sensitivity further, kClust can run in profile-sequence comparison mode, with profiles computed from the clusters of a previous kClust iteration. kClust is two to three orders of magnitude faster than clustering based on NCBI BLAST, and on multidomain sequences of 20%–30% maximum pairwise sequence identity it achieves comparable sensitivity and a lower false discovery rate. It also compares favorably to CD-HIT and UCLUST in terms of false discovery rate, sensitivity, and speed.
kClust fills the need for a fast, sensitive, and accurate tool to cluster large protein sequence databases to below 30% sequence identity. kClust is freely available under GPL at
PMCID: PMC3843501  PMID: 23945046
6.  DBIRD integrates alternative mRNA splicing with RNA polymerase II transcript elongation 
Nature  2012;484(7394):386-389.
Alternative mRNA splicing is the main reason vast mammalian proteomic complexity can be achieved with a limited number of genes. Splicing is physically and functionally coupled to transcription, and is greatly affected by the rate of transcript elongation1,2,3. As the nascent pre-mRNA emerges from transcribing RNA polymerase II (RNAPII), it is assembled into a messenger ribonucleoprotein (mRNP) particle which is its functional form and determines the fate of the mature transcript4. However, factors that connect the transcribing polymerase with the mRNP particle and help integrate transcript elongation with mRNA splicing remain obscure. Here, we characterized the interactome of chromatin-associated mRNP particles. This led to the identification of Deleted in Breast Cancer 1 (DBC1) and a protein we named ZIRD as subunits of a novel protein complex, named DBIRD, which binds directly to RNAPII. DBIRD regulates alternative splicing of a large set of exons embedded in A/T-rich DNA, and is present at the affected exons. RNAi-mediated DBIRD depletion results in region-specific decreases in transcript elongation, particularly across areas encompassing affected exons. Together, these data indicate that DBIRD complex acts at the interface between mRNP particles and RNAPII, integrating transcript elongation with the regulation of alternative splicing.
PMCID: PMC3378035  PMID: 22446626
7.  The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences 
Nucleic Acids Research  2012;40(Web Server issue):W104-W109.
The discovery of regulatory motifs enriched in sets of DNA or RNA sequences is fundamental to the analysis of a great variety of functional genomics experiments. These motifs usually represent binding sites of proteins or non-coding RNAs, which are best described by position weight matrices (PWMs). We have recently developed XXmotif, a de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences, their P-values for enrichment, conservation and localization, their sequence contexts and coordinates. Free access:
PMCID: PMC3394272  PMID: 22693218
8.  The Mre11:Rad50 structure shows an ATP dependent molecular clamp in DNA double-strand break repair 
Cell  2011;145(1):54-66.
The MR (Mre11 nuclease and Rad50 ABC ATPase) complex is an evolutionarily conserved sensor for DNA double-strand breaks, highly genotoxic lesions linked to cancer development. MR can recognize and process DNA ends even if they are blocked and misfolded. To reveal its mechanism, we determined the crystal structure of the catalytic head of Thermotoga maritima MR and analyzed ATP dependent conformational changes. MR adopts an open form with a central Mre11 nuclease dimer and two peripheral Rad50 molecules, a form suited for sensing obstructed breaks. The Mre11 C-terminal helix-loop-helix domain binds Rad50 and attaches flexibly to the nuclease domain, enabling large conformational changes. ATP binding to the two Rad50 subunits induces a rotation of the Mre11 helix-loop-helix and Rad50 coiled-coil domains, creating a clamp conformation with increased DNA binding activity. The results suggest that MR is an ATP controlled transient molecular clamp at DNA double-strand breaks
PMCID: PMC3071652  PMID: 21458667
Rad50; Mre11; DNA double-strand break repair; X-ray crystallography; protein complex; homologous recombination; ABC ATPases
9.  A Conserved GA Element in TATA-Less RNA Polymerase II Promoters 
PLoS ONE  2011;6(11):e27595.
Initiation of RNA polymerase (Pol) II transcription requires assembly of the pre-initiation complex (PIC) at the promoter. In the classical view, PIC assembly starts with binding of the TATA box-binding protein (TBP) to the TATA box. However, a TATA box occurs in only 15% of promoters in the yeast Saccharomyces cerevisiae, posing the question how most yeast promoters nucleate PIC assembly. Here we show that one third of all yeast promoters contain a novel conserved DNA element, the GA element (GAE), that generally does not co-occur with the TATA box. The distance of the GAE to the transcription start site (TSS) resembles the distance of the TATA box to the TSS. The TATA-less TMT1 core promoter contains a GAE, recruits TBP, and supports formation of a TBP-TFIIB-DNA-complex. Mutation of the promoter region surrounding the GAE abolishes transcription in vivo and in vitro. A 32-nucleotide promoter region containing the GAE can functionally substitute for the TATA box in a TATA-containing promoter. This identifies the GAE as a conserved promoter element in TATA-less promoters.
PMCID: PMC3217976  PMID: 22110682
10.  The MOF-containing NSL complex associates globally with housekeeping genes, but activates only a defined subset 
Nucleic Acids Research  2011;40(4):1509-1522.
The MOF (males absent on the first)-containing NSL (non-specific lethal) complex binds to a subset of active promoters in Drosophila melanogaster and is thought to contribute to proper gene expression. The determinants that target NSL to specific promoters and the circumstances in which the complex engages in regulating transcription are currently unknown. Here, we show that the NSL complex primarily targets active promoters and in particular housekeeping genes, at which it colocalizes with the chromatin remodeler NURF (nucleosome remodeling factor) and the histone methyltransferase Trithorax. However, only a subset of housekeeping genes associated with NSL are actually activated by it. Our analyses reveal that these NSL-activated promoters are depleted of certain insulator binding proteins and are enriched for the core promoter motif ‘Ohler 5’. Based on these results, it is possible to predict whether the NSL complex is likely to regulate a particular promoter. We conclude that the regulatory capacity of the NSL complex is highly context-dependent. Activation by the NSL complex requires a particular promoter architecture defined by combinations of chromatin regulators and core promoter motifs.
PMCID: PMC3287193  PMID: 22039099
11.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega 
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
Multiple sequence alignments are fundamental to many sequence analysis methods. The new program Clustal Omega can align virtually any number of protein sequences quickly and has powerful features for adding sequences to existing precomputed alignments.
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
PMCID: PMC3261699  PMID: 21988835
bioinformatics; hidden Markov models; multiple sequence alignment
12.  Different Binding Properties and Function of CXXC Zinc Finger Domains in Dnmt1 and Tet1 
PLoS ONE  2011;6(2):e16627.
Several mammalian proteins involved in chromatin and DNA modification contain CXXC zinc finger domains. We compared the structure and function of the CXXC domains in the DNA methyltransferase Dnmt1 and the methylcytosine dioxygenase Tet1. Sequence alignment showed that both CXXC domains have a very similar framework but differ in the central tip region. Based on the known structure of a similar MLL1 domain we developed homology models and designed expression constructs for the isolated CXXC domains of Dnmt1 and Tet1 accordingly. We show that the CXXC domain of Tet1 has no DNA binding activity and is dispensable for catalytic activity in vivo. In contrast, the CXXC domain of Dnmt1 selectively binds DNA substrates containing unmethylated CpG sites. Surprisingly, a Dnmt1 mutant construct lacking the CXXC domain formed covalent complexes with cytosine bases both in vitro and in vivo and rescued DNA methylation patterns in dnmt1−/− embryonic stem cells (ESCs) just as efficiently as wild type Dnmt1. Interestingly, neither wild type nor ΔCXXC Dnmt1 re-methylated imprinted CpG sites of the H19a promoter in dnmt1−/− ESCs, arguing against a role of the CXXC domain in restraining Dnmt1 methyltransferase activity on unmethylated CpG sites.
PMCID: PMC3032784  PMID: 21311766
13.  Of Bits and Bugs — On the Use of Bioinformatics and a Bacterial Crystal Structure to Solve a Eukaryotic Repeat-Protein Structure 
PLoS ONE  2010;5(10):e13402.
Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins.
PMCID: PMC2954813  PMID: 20976240
14.  HHomp—prediction and classification of outer membrane proteins 
Nucleic Acids Research  2009;37(Web Server issue):W446-W451.
Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all β-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20 000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at
PMCID: PMC2703889  PMID: 19429691
15.  PDBalert: automatic, recurrent remote homology tracking and protein structure prediction 
During the last years, methods for remote homology detection have grown more and more sensitive and reliable. Automatic structure prediction servers relying on these methods can generate useful 3D models even below 20% sequence identity between the protein of interest and the known structure (template). When no homologs can be found in the protein structure database (PDB), the user would need to rerun the same search at regular intervals in order to make timely use of a template once it becomes available.
PDBalert is a web-based automatic system that sends an email alert as soon as a structure with homology to a protein in the user's watch list is released to the PDB database or appears among the sequences on hold. The mail contains links to the search results and to an automatically generated 3D homology model. The sequence search is performed with the same software as used by the very sensitive and reliable remote homology detection server HHpred, which is based on pairwise comparison of Hidden Markov models.
PDBalert will accelerate the information flow from the PDB database to all those who can profit from the newly released protein structures for predicting the 3D structure or function of their proteins of interest.
PMCID: PMC2605448  PMID: 19025670
16.  Phospholipid scramblases and Tubby-like proteins belong to a new superfamily of membrane tethered transcription factors 
Bioinformatics  2008;25(2):159-162.
Motivation: Phospholipid scramblases (PLSCRs) constitute a family of cytoplasmic membrane-associated proteins that were identified based upon their capacity to mediate a Ca2+-dependent bidirectional movement of phospholipids across membrane bilayers, thereby collapsing the normally asymmetric distribution of such lipids in cell membranes. The exact function and mechanism(s) of these proteins nevertheless remains obscure: data from several laboratories now suggest that in addition to their putative role in mediating transbilayer flip/flop of membrane lipids, the PLSCRs may also function to regulate diverse processes including signaling, apoptosis, cell proliferation and transcription. A major impediment to deducing the molecular details underlying the seemingly disparate biology of these proteins is the current absence of any representative molecular structures to provide guidance to the experimental investigation of their function.
Results: Here, we show that the enigmatic PLSCR family of proteins is directly related to another family of cellular proteins with a known structure. The Arabidopsis protein At5g01750 from the DUF567 family was solved by X-ray crystallography and provides the first structural model for this family. This model identifies that the presumed C-terminal transmembrane helix is buried within the core of the PLSCR structure, suggesting that palmitoylation may represent the principal membrane anchorage for these proteins. The fold of the PLSCR family is also shared by Tubby-like proteins. A search of the PDB with the HHpred server suggests a common evolutionary ancestry. Common functional features also suggest that tubby and PLSCR share a functional origin as membrane tethered transcription factors with capacity to modulate phosphoinositide-based signaling.
PMCID: PMC2639001  PMID: 19010806
17.  Expression, crystallization and preliminary X-ray crystallographic studies of the outer membrane protein OmpW from Escherichia coli  
The outer membrane protein OmpW from E. coli was overexpressed in inclusion bodies and refolded with the help of detergent. The protein has been crystallized and the crystals diffract to 3.5 Å resolution.
OmpW is an eight-stranded 21 kDa molecular-weight β-barrel protein from the outer membrane of Gram-negative bacteria. It is a major antigen in bacterial infections and has implications in antibiotic resistance and in the oxidative degradation of organic compounds. OmpW from Escherichia coli was cloned and the protein was expressed in inclusion bodies. A method for refolding and purification was developed which yields properly folded protein according to circular-dichroism measurements. The protein has been crystallized and crystals were obtained that diffracted to a resolution limit of 3.5 Å. The crystals belong to space group P422, with unit-cell parameters a = 122.5, c = 105.7 Å. A homology model of OmpW is presented based on known structures of eight-stranded β-barrels, intended for use in molecular-replacement trials.
PMCID: PMC2222561  PMID: 16582500
OmpW; membrane proteins; outer membrane; homology modelling
18.  On the origin of the histone fold 
Histones organize the genomic DNA of eukaryotes into chromatin. The four core histone subunits consist of two consecutive helix-strand-helix motifs and are interleaved into heterodimers with a unique fold. We have searched for the evolutionary origin of this fold using sequence and structure comparisons, based on the hypothesis that folded proteins evolved by combination of an ancestral set of peptides, the antecedent domain segments.
Our results suggest that an antecedent domain segment, corresponding to one helix-strand-helix motif, gave rise divergently to the N-terminal substrate recognition domain of Clp/Hsp100 proteins and to the helical part of the extended ATPase domain found in AAA+ proteins. The histone fold arose subsequently from the latter through a 3D domain-swapping event. To our knowledge, this is the first example of a genetically fixed 3D domain swap that led to the emergence of a protein family with novel properties, establishing domain swapping as a mechanism for protein evolution.
The helix-strand-helix motif common to these three folds provides support for our theory of an 'ancient peptide world' by demonstrating how an ancestral fragment can give rise to 3 different folds.
PMCID: PMC1847821  PMID: 17391511
19.  TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences 
BMC Bioinformatics  2007;8:2.
Solenoid repeat proteins of the Tetratrico Peptide Repeat (TPR) family are involved as scaffolds in a broad range of protein-protein interactions. Several resources are available for the prediction of TPRs, however, they often fail to detect divergent repeat units.
We have developed TPRpred, a profile-based method which uses a P-value-dependent score offset to include divergent repeat units and which exploits the tendency of repeats to occur in tandem. TPRpred detects not only TPR-like repeats, but also the related Pentatrico Peptide Repeats (PPRs) and SEL1-like repeats. The corresponding profiles were generated through iterative searches, by varying the threshold parameters for inclusion of repeat units into the profiles, and the best profiles were selected based on their performance on proteins of known structure. We benchmarked the performance of TPRpred in detecting TPR-containing proteins and in delineating the individual repeats therein, against currently available resources.
TPRpred performs significantly better in detecting divergent repeats in TPR-containing proteins, and finds more individual repeats than the existing methods. The web server is available at , and the C++ and Perl sources of TPRpred along with the profiles can be downloaded from .
PMCID: PMC1774580  PMID: 17199898
20.  HHrep: de novo protein repeat detection and the origin of TIM barrels 
Nucleic Acids Research  2006;34(Web Server issue):W137-W142.
HHrep is a web server for the de novo identification of repeats in protein sequences, which is based on the pairwise comparison of profile hidden Markov models (HMMs). Its main strength is its sensitivity, allowing it to detect highly divergent repeat units in protein sequences whose repeats could as yet only be detected from their structures. Examples include sequences with β-propellor fold, ferredoxin-like fold, double psi barrels or (βα)8 (TIM) barrels. We illustrate this with proteins from four superfamilies of TIM barrels by revealing a clear 4- and 8-fold symmetry, which we detect solely from their sequences. This symmetry might be the trace of an ancient origin through duplication of a βαβα or βα unit. HHrep can be accessed at .
PMCID: PMC1538828  PMID: 16844977
21.  The MPI Bioinformatics Toolkit for protein sequence analysis 
Nucleic Acids Research  2006;34(Web Server issue):W335-W339.
The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at .
PMCID: PMC1538786  PMID: 16845021
22.  HHsenser: exhaustive transitive profile search using HMM–HMM comparison 
Nucleic Acids Research  2006;34(Web Server issue):W374-W378.
HHsenser is the first server to offer exhaustive intermediate profile searches, which it combines with pairwise comparison of hidden Markov models. Starting from a single protein sequence or a multiple alignment, it can iteratively explore whole superfamilies, producing few or no false positives. The output is a multiple alignment of all detected homologs. HHsenser's sensitivity should make it a useful tool for evolutionary studies. It may also aid applications that rely on diverse multiple sequence alignments as input, such as homology-based structure and function prediction, or the determination of functional residues by conservation scoring and functional subtyping.
HHsenser can be accessed at . It has also been integrated into our structure and function prediction server HHpred () to improve predictions for near-singleton sequences.
PMCID: PMC1538784  PMID: 16845029
23.  The HHpred interactive server for protein homology detection and structure prediction 
Nucleic Acids Research  2005;33(Web Server issue):W244-W248.
HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at .
PMCID: PMC1160169  PMID: 15980461
24.  REPPER—repeats and their periodicities in fibrous proteins 
Nucleic Acids Research  2005;33(Web Server issue):W239-W243.
REPPER (REPeats and their PERiodicities) is an integrated server that detects and analyzes regions with short gapless repeats in protein sequences or alignments. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. Both programs use a sliding window to ensure that different periodic regions within the same protein are detected independently. FTwin and REPwin are complemented by secondary structure prediction (PSIPRED) and coiled coil prediction (COILS), making the server a versatile analysis tool for sequences of fibrous proteins. REPPER is available at .
PMCID: PMC1160166  PMID: 15980460

Results 1-24 (24)