Monitoring resistance phenotypes for Plasmodium falciparum, using in vitro growth assays, and relating findings to parasite genotype has proved particularly challenging for the study of resistance to artemisinins.
Plasmodium falciparum isolates cultured from 28 returning travellers diagnosed with malaria were assessed for sensitivity to artemisinin, artemether, dihydroartemisinin and artesunate and findings related to mutations in pfatp6 and pfmdr1.
Resistance to artemether in vitro was significantly associated with a pfatp6 haplotype encoding two amino acid substitutions (pfatp6 A623E and S769N; (mean IC50 (95% CI) values of 8.2 (5.7 – 10.7) for A623/S769 versus 623E/769 N 13.5 (9.8 – 17.3) nM with a mean increase of 65%; p = 0.012). Increased copy number of pfmdr1 was not itself associated with increased IC50 values for artemether, but when interactions between the pfatp6 haplotype and increased copy number of pfmdr1 were examined together, a highly significant association was noted with IC50 values for artemether (mean IC50 (95% CI) values of 8.7 (5.9 – 11.6) versus 16.3 (10.7 – 21.8) nM with a mean increase of 87%; p = 0.0068). Previously described SNPs in pfmdr1 are also associated with differences in sensitivity to some artemisinins.
These findings were further explored in molecular modelling experiments that suggest mutations in pfatp6 are unlikely to affect differential binding of artemisinins at their proposed site, whereas there may be differences in such binding associated with mutations in pfmdr1. Implications for a hypothesis that artemisinin resistance may be exacerbated by interactions between PfATP6 and PfMDR1 and for epidemiological studies to monitor emerging resistance are discussed.
Artemisinin resistance; pfmdr1; pfatp6; Gene copy number; Malaria; Travellers; Plasmodium
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of ‘high-throughput biology’, the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
Bioinformatics; training; end users; bioinformatics courses; learning bioinformatics
Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) resource at http://elm.eu.org provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences. The current update of the ELM database comprises 1800 annotated motif instances representing 170 distinct functional classes, including approximately 500 novel instances and 24 novel classes. Several older motif class entries have been also revisited, improving annotation and adding novel instances. Furthermore, addition of full-text search capabilities, an enhanced interface and simplified batch download has improved the overall accessibility of the ELM data. The motif discovery portion of the ELM resource has added conservation, and structural attributes have been incorporated to aid users to discriminate biologically relevant motifs from stochastically occurring non-functional instances.
The function of proteins is often mediated by short linear segments of their amino acid sequence, called Short Linear Motifs or SLiMs, the identification of which can provide important information about a protein function. However, the short length of the motifs and their variable degree of conservation makes their identification hard since it is difficult to correctly estimate the statistical significance of their occurrence. Consequently, only a small fraction of them have been discovered so far. We describe here an approach for the discovery of SLiMs based on their occurrence in evolutionarily unrelated proteins belonging to the same biological, signalling or metabolic pathway and give specific examples of its effectiveness in both rediscovering known motifs and in discovering novel ones. An automatic implementation of the procedure, available for download, allows significant motifs to be identified, automatically annotated with functional, evolutionary and structural information and organized in a database that can be inspected and queried. An instance of the database populated with pre-computed data on seven organisms is accessible through a publicly available server and we believe it constitutes by itself a useful resource for the life sciences (http://www.biocomputing.it/modipath).
Genes involved in post-mating processes of multiple mating organisms are known to evolve rapidly due to coevolution driven by sexual conflict among male-female interacting proteins. In the malaria mosquito Anopheles gambiae - a monandrous species in which sexual conflict is expected to be absent or minimal - recent data strongly suggest that proteolytic enzymes specifically expressed in the female lower reproductive tissues are involved in the processing of male products transferred to females during mating. In order to better understand the role of selective forces underlying the evolution of proteins involved in post-mating responses, we analysed a cluster of genes encoding for three serine proteases that are down-regulated after mating, two of which specifically expressed in the atrium and one in the spermatheca of A. gambiae females.
The analysis of polymorphisms and divergence of these female-expressed proteases in closely related species of the A. gambiae complex revealed a high level of replacement polymorphisms consistent with relaxed evolutionary constraints of duplicated genes, allowing to rapidly fix novel replacements to perform new or more specific functions. Adaptive evolution was detected in several codons of the 3 genes and hints of episodic selection were also found. In addition, the structural modelling of these proteases highlighted some important differences in their substrate specificity, and provided evidence that a number of sites evolving under selective pressures lie relatively close to the catalytic triad and/or on the edge of the specificity pocket, known to be involved in substrate recognition or binding. The observed patterns suggest that these proteases may interact with factors transferred by males during mating (e.g. substrates, inhibitors or pathogens) and that they may have differently evolved in independent A. gambiae lineages.
Our results - also examined in light of constraints in the application of selection-inference methods to the closely related species of the A. gambiae complex - reveal an unexpectedly intricate evolutionary scenario. Further experimental analyses are needed to investigate the biological functions of these genes in order to better interpret their molecular evolution and to assess whether they represent possible targets for limiting the fertility of Anopheles mosquitoes in malaria vector control strategies.
molecular evolution; reproduction; adaptive evolution; gene duplication; Anopheles gambiae complex
Resistance to chloroquine of malaria strains is known to be associated with a parasite protein named PfCRT, the mutated form of which is able to reduce chloroquine accumulation in the digestive vacuole of the pathogen. Whether the protein mediates extrusion of the drug acting as a channel or as a carrier and which is the protonation state of its chloroquine substrate is the subject of a scientific debate. We present here an analytical approach that explores which combination of hypotheses on the mechanism of transport and the protonation state of chloroquine are consistent with available equilibrium experimental data. We show that the available experimental data are not, by themselves, sufficient to conclude whether the protein acts as a channel or as a transporter, which explains the origin of their different interpretation by different authors. Interestingly, though, each of the two models is only consistent with a subset of hypotheses on the protonation state of the transported molecule. The combination of these results with a sequence and structure analysis of PfCRT, which strongly suggests that the molecule is a carrier, indicates that the transported species is either or both the mono and di-protonated forms of chloroquine. We believe that our results, besides shedding light on the mechanism of chloroquine resistance in P. falciparum, have implications for the development of novel therapies against resistant malaria strains and demonstrate the usefulness of an approach combining systems biology strategies with structural bioinformatics and experimental data.
The Phospho.ELM resource (http://phospho.elm.eu.org) is a relational database designed to store in vivo and in vitro phosphorylation data extracted from the scientific literature and phosphoproteomic analyses. The resource has been actively developed for more than 7 years and currently comprises 42 574 serine, threonine and tyrosine non-redundant phosphorylation sites. Several new features have been implemented, such as structural disorder/order and accessibility information and a conservation score. Additionally, the conservation of the phosphosites can now be visualized directly on the multiple sequence alignment used for the score calculation. Finally, special emphasis has been put on linking to external resources such as interaction networks and other databases.
Phospho3D is a database of three-dimensional (3D) structures of phosphorylation sites (P-sites) derived from the Phospho.ELM database, which also collects information on the residues surrounding the P-site in space (3D zones). The database also provides the results of a large-scale structural comparison of the 3D zones versus a representative dataset of structures, thus associating to each P-site a number of structurally similar sites. The new version of Phospho3D presents an 11-fold increase in the number of 3D sites and incorporates several additional features, including new structural descriptors, the possibility of selecting non-redundant sets of 3D structures and the availability for download of non-redundant sets of structurally annotated P-sites. Moreover, it features P3Dscan, a new functionality that allows the user to submit a protein structure and scan it against the 3D zones collected in the Phospho3D database. Phospho3D version 2.0 is available at: http://www.phospho3d.org/.
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality.
Current methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications. The structure filter is implemented as a pipeline with both a graphical interface via the ELM resource and through a Web Service protocol.
New occurrences of known linear motifs require experimental validation as the bioinformatics tools currently have limited reliability. The ELM structure filter will aid users assessing candidate motifs presenting in globular structural regions. Most importantly, it will help users to decide whether to expend their valuable time and resources on experimental testing of interesting motif candidates.
The occurrence of very similar structural motifs brought about by different parts of non homologous proteins is often indicative of a common function. Indeed, relatively small local structures can mediate binding to a common partner, be it a protein, a nucleic acid, a cofactor or a substrate. While it is relatively easy to identify short amino acid or nucleotide sequence motifs in a given set of proteins or genes, and many methods do exist for this purpose, much more challenging is the identification of common local substructures, especially if they are formed by non consecutive residues in the sequence.
Here we describe a publicly available tool, able to identify common structural motifs shared by different non homologous proteins in an unsupervised mode. The motifs can be as short as three residues and need not to be contiguous or even present in the same order in the sequence. Users can submit a set of protein structures deemed or not to share a common function (e.g. they bind similar ligands, or share a common epitope). The server finds and lists structural motifs composed of three or more spatially well conserved residues shared by at least three of the submitted structures. The method uses a local structural comparison algorithm to identify subsets of similar amino acids between each pair of input protein chains and a clustering procedure to group similarities shared among different structure pairs.
FunClust is fast, completely sequence independent, and does not need an a priori knowledge of the motif to be found. The output consists of a list of aligned structural matches displayed in both tabular and graphical form. We show here examples of its usefulness by searching for the largest common structural motifs in test sets of non homologous proteins and showing that the identified motifs correspond to a known common functional feature.
Phospho.ELM is a manually curated database of eukaryotic phosphorylation sites. The resource includes data collected from published literature as well as high-throughput data sets.
The current release of Phospho.ELM (version 7.0, July 2007) contains 4078 phospho-protein sequences covering 12 025 phospho-serine, 2362 phospho-threonine and 2083 phospho-tyrosine sites. The entries provide information about the phosphorylated proteins and the exact position of known phosphorylated instances, the kinases responsible for the modification (where known) and links to bibliographic references. The database entries have hyperlinks to easily access further information from UniProt, PubMed, SMART, ELM, MSD as well as links to the protein interaction databases MINT and STRING.
A new BLAST search tool, complementary to retrieval by keyword and UniProt accession number, allows users to submit a protein query (by sequence or UniProt accession) to search against the curated data set of phosphorylated peptides.
Phospho.ELM is available on line at: http://phospho.elm.eu.org
3dLOGO is a web server for the identification and analysis of conserved protein 3D substructures. Given a set of residues in a PDB (Protein Data Bank) chain, the server detects the matching substructure(s) in a set of user-provided protein structures, generates a multiple structure alignment centered on the input substructures and highlights other residues whose structural conservation becomes evident after the defined superposition. Conserved residues are proposed to the user for highlighting functional areas, deriving refined structural motifs or building sequence patterns. Residue structural conservation can be visualized through an expressly designed Java application, 3dProLogo, which is a 3D implementation of a sequence logo. The 3dLOGO server, with related documentation, is available at http://3dlogo.uniroma2.it/
SH3-Hunter (http://cbm.bio.uniroma2.it/SH3-Hunter/) is a web server for the recognition of putative SH3 domain interaction sites on protein sequences. Given an input query consisting of one or more protein sequences, the server identifies peptides containing poly-proline binding motifs and associates them to a list of SH3 domains, in order to compose peptide–domain pairs. The server can accept a list of peptides and allows users to upload an input file in a proper format. An accurate selection of SH3 domains is available and users can also submit their own SH3 domain sequence.
SH3-Hunter evaluates which peptide–domain pair represents a possible interaction pair and produces as output a list of significant interaction sites for each query protein. Each proposed interaction site is associated to a propensity score and sensitivity and precision levels for the prediction. The server prediction capability is based on a neural network model integrating high-throughput pep-spot data with structural information extracted from known SH3-peptide complexes.
We performed an exhaustive search for local structural similarities in an ensemble of non-redundant protein functional sites. With the purpose of finding new examples of convergent evolution, we selected only those matching sites composed of structural regions whose residue order is inverted in the relative protein sequences.
A novel case of local analogy was detected between members of the ABC transporter and of the HprK/P families in their ATP binding site. This case cannot be derived by events of circular permutation since the residues of one of the region pairs are located in reverse order in the sequence of the two protein families. One of the analogous binding sites, the one identified in HprK/P, is known to also bind pyrophosphate, which is used as preferred energy source in its kinase and phosphorylase activity.
The discovery of this striking molecular similarity, also associated to a functional similarity, may help in suggesting new experiments aimed at a deeper understanding of members of the ABC transporter family known to be involved in many serious human diseases.
False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?
Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms.
Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences.
Phosphorylation is the most common protein post-translational modification. Phosphorylated residues (serine, threonine and tyrosine) play critical roles in the regulation of many cellular processes. Since the amount of data produced by screening assays is growing continuously, the development of computational tools for collecting and analysing experimental data has become a pivotal task for unravelling the complex network of interactions regulating eukaryotic cell life. Here we present Phospho3D, , a database of 3D structures of phosphorylation sites, which stores information retrieved from the phospho.ELM database and is enriched with structural information and annotations at the residue level. The database also collects the results of a large-scale structural comparison procedure providing clues for the identification of new putative phosphorylation sites.
The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses.
Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results.
With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface.
The SH3 domain family is one of the most representative and widely studied cases of so-called Peptide Recognition Modules (PRM). The polyproline II motif PxxP that generally characterizes its ligands does not reflect the complex interaction spectrum of the over 1500 different SH3 domains, and the requirement of a more refined knowledge of their specificity implies the setting up of appropriate experimental and theoretical strategies. Due to the limitations of the current technology for peptide synthesis, several experimental high-throughput approaches have been devised to elucidate protein-protein interaction mechanisms. Such approaches can rely on and take advantage of computational techniques, such as regular expressions or position specific scoring matrices (PSSMs) to pre-process entire proteomes in the search for putative SH3 targets.
In this regard, a reliable inference methodology to be used for reducing the sequence space of putative binding peptides represents a valuable support for molecular and cellular biologists.
Using as benchmark the peptide sequences obtained from in vitro binding experiments, we set up a neural network model that performs better than PSSM in the detection of SH3 domain interactors. In particular our model is more precise in its predictions, even if its performance can vary among different SH3 domains and is strongly dependent on the number of binding peptides in the benchmark.
We show that a neural network can be more effective than standard methods in SH3 domain specificity detection. Neural classifiers identify general SH3 domain binders and domain-specific interactors from a PxxP peptide population, provided that there are a sufficient proportion of true positives in the training sets. This capability can also improve peptide selection for library definition in array experiments. Further advances can be achieved, including properly encoded domain sequences and structural information as input for a global neural network.
pdbFun () is a web server for structural and functional analysis of proteins at the residue level. pdbFun gives fast access to the whole Protein Data Bank (PDB) organized as a database of annotated residues. The available data (features) range from solvent exposure to ligand binding ability, location in a protein cavity, secondary structure, residue type, sequence functional pattern, protein domain and catalytic activity. Users can select any residue subset (even including any number of PDB structures) by combining the available features. Selections can be used as probe and target in multiple structure comparison searches. For example a search could involve, as a query, all solvent-exposed, hydrophylic residues that are not in alpha-helices and are involved in nucleotide binding. Possible examples of targets are represented by another selection, a single structure or a dataset composed of many structures. The output is a list of aligned structural matches offered in tabular and also graphical format.
Post-translational phosphorylation is one of the most common protein modifications. Phosphoserine, threonine and tyrosine residues play critical roles in the regulation of many cellular processes. The fast growing number of research reports on protein phosphorylation points to a general need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins.
Phospho.ELM is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1703 phosphorylation site instances for 556 phosphorylated proteins.
Phospho.ELM will be a valuable tool both for molecular biologists working on protein phosphorylation sites and for bioinformaticians developing computational predictions on the specificity of phosphorylation reactions.
post-transcriptional modification; protein kinase; bioinformatics
A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure.
Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed.
Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available.
Multidomain proteins predominate in eukaryotic proteomes. Individual functions assigned to different sequence segments combine to create a complex function for the whole protein. While on-line resources are available for revealing globular domains in sequences, there has hitherto been no comprehensive collection of small functional sites/motifs comparable to the globular domain resources, yet these are as important for the function of multidomain proteins. Short linear peptide motifs are used for cell compartment targeting, protein–protein interaction, regulation by phosphorylation, acetylation, glycosylation and a host of other post-translational modifications. ELM, the Eukaryotic Linear Motif server at http://elm.eu.org/, is a new bioinformatics resource for investigating candidate short non-globular functional motifs in eukaryotic proteins, aiming to fill the void in bioinformatics tools. Sequence comparisons with short motifs are difficult to evaluate because the usual significance assessments are inappropriate. Therefore the server is implemented with several logical filters to eliminate false positives. Current filters are for cell compartment, globular domain clash and taxonomic range. In favourable cases, the filters can reduce the number of retained matches by an order of magnitude or more.
Relatively few protein structures are known, compared to the enormous amount of sequence data produced in the sequencing of different genomes, and relatively few
protein complexes are deposited in the PDB with respect to the great amount of
interaction data coming from high-throughput experiments (two-hybrid or affinity
purification of protein complexes and mass spectrometry). Nevertheless, we can rely
on computational techniques for the extraction of high-quality and information-rich
data from the known structures and for their spreading in the protein sequence space.
We describe here the ongoing research projects in our group: we analyse the protein
complexes stored in the PDB and, for each complex involving one domain belonging
to a family of interaction domains for which some interaction data are available, we
can calculate its probability of interaction with any protein sequence. We analyse the
structures of proteins encoding a function specified in a PROSITE pattern, which
exhibits relatively low selectivity and specificity, and build extended patterns. To
this aim, we consider residues that are well-conserved in the structure, even if their
conservation cannot easily be recognized in the sequence alignment of the proteins
holding the function. We also analyse protein surface regions and, through the
annotation of the solvent-exposed residues, we annotate protein surface patches via a
structural comparison performed with stringent parameters and independently of the
residue order in the sequence. Local surface comparison may also help in identifying
new sequence patterns, which could not be highlighted with other sequence-based