|Home | About | Journals | Submit | Contact Us | Français|
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
Linear motifs (LMs) are short elements embedded within larger protein sequence segments that operate as sites of regulation (1–5). They can be found in telomeric proteins (6), in proteins of the extracellular matrix (7)—and seemingly every macromolecular complex in between. Many are post-translationally modified, but not all. The essence of their function is embodied in the linear amino acid sequence and is not dependent on the tertiary structural context. Nevertheless, as a consequence of low affinity binary binding interactions, they usually act in a concerted and cooperative manner, enabling regulatory decisions to be made on the basis of multiple inputs (8–12). These properties may be important for the inherent robustness of cellular systems (13), as cell regulation is increasingly revealed to be cooperative, networked and redundant in nature (14–20).
Over the time that we have worked to develop the Eukaryotic Linear Motif resource ELM, our conviction has grown that there will be well over a million LM instances in a higher eukaryotic proteome. (Phosphoproteomics is on the way to revealing 100 000 phosphorylation sites, for example.) If these estimates reflect reality, one might expect that experimentalists should be stumbling across new motifs with every experiment. But they are not. The paradox is that it remains difficult to establish the existence of LM instances whether by experiment or computationally. The bioinformatics problem is simple to state: LMs are too short (and the information content too poor) to be statistically significant in protein sequence searches. Experimentalists are similarly afflicted: while trying to identify LMs, they are likely to spend a lot of resources, time and effort performing experiments on the false motif candidates, which usually vastly outnumber the genuine ones in any set of proteins of interest (1).
Nevertheless, useful advances are now being made in the bioinformatics tools that address the remarkable modularity of eukaryotic regulatory proteins. Thus, two dedicated LM databases now exist: ELM (21) and the Minimotif Miner (22). (Users should utilize both resources as there are many differences in approach and the datasets only partially overlap.) Specialized databases for phosphorylation sites include PhosphoSite, Phospho.ELM and Phosida (23–25). Resources such as HPRD (26) and UniProtKB/Swiss-Prot (27) annotate a broader range of Post-Translational Modifications (PTMs). Furthermore, numerous predictive tools for identifying natively disordered protein segments—the main harbour for LMs (28–30)—have become available (31,32), complementing the more established globular domain resources Pfam, SMART, PROSITE and InterPro (33–36). The ELM datasets have been used by bioinformaticians to develop and benchmark novel prediction strategies such as hunting for motifs in interaction data and to provide likelihood estimates for motif candidates based on structural and sequence conservation contexts (37–41). While LM discovery remains challenging, if progress continues apace, it should become possible to address the intricate subfunctionalization of proteins like p53, CBP/p300, APC and Tau with ever-greater effectiveness.
Here, we provide an overview of the current status of the ELM resource and the research contexts in which it is being used. The utility of ELM is threefold: for researchers, it is first a knowledgebase, second a predictive tool but ELM has a third important function too; it can also be used for more general educational purposes, as it covers a topic that is often poorly served in text books. ELM provides written text summaries and links to the experimental literature that are a useful starting point for people who, for any reason, wish to gain an understanding of the role of LMs in cell regulation. We also take the opportunity here to provide a summary of progress made by the pioneering community of bioinformatics teams that are applying ELM to develop new tools for LM discovery. Finally, we provide some guidance about good practice and warnings about pitfalls for researchers seeking to apply ELM in experimental motif discovery.
To use ELM effectively, a user will need to grasp why such a resource is needed. The earliest definition of LM known to us was written in 1990 by Tim Hunt to introduce the new Protein Sequence Motifs column in Trends in Biological Sciences (42).
The sequences of many proteins contain short, conserved motifs that are involved in recognition and targeting activities, often separate from other functional properties of the molecule in which they occur. These motifs are linear, in the sense that three-dimensional organization is not required to bring distant segments of the molecule together to make the recognizable unit. The conservation of these motifs varies: some are highly conserved while others, for example, allow substitutions that retain only a certain pattern of charge across the motif.
This definition was written at a time when it was becoming apparent that many cellular proteins would have complex multidomain architectures and the first LMs such as KDEL, NLS, the Destruction Box of cyclin B and the fascinating KFERQ starvation-dependent lysosomal targeting motif were being reported (43–46). The definition has stood the test of time and can still serve very well today.
Sequence motifs contributing to the tertiary structure and primary function of globular domains are excluded by the definition of LM. An LM is effectively an irreducible unit of structure and function. Although LMs may be found in exposed parts of globular folds, they must be able to function independently to fit the definition: conversely, the globular domain would still have the same function if the LM was inactivated, although of course that domain function might well be dysregulated in the absence of the motif. The need to separate motif/domain functions applies to methods that seek to define new motifs. Historically, it has been difficult to develop computational methods that can distinguish short conserved segments of protein domains from LMs. Failure to make the distinction is likely to lead to false LM assignment (1), as has often happened for the nuclear export sequence (NES) as discussed by Hantschel et al. and Kadlec et al. (47,48).
Over the last few years, it has become increasingly clear that most LMs do not reside inside globular domains but instead are present in segments of natively disordered polypeptide. Often many LMs are clustered within one segment of native disorder. LMs quite frequently overlap, providing the potential for switch-like mutually exclusive functionality. For example, overlapping peptides from p53 are present in solved structures of several different protein complexes (20). Therefore, an overview of the types and locations of protein architecture modules existing in regulatory proteins provides an essential adjunct to LM investigation.
At the core of the ELM resource is a PostgreSQL relational database with 69 tables storing data about LMs. Not all of this complexity is fully utilized: it anticipates current and future ﬁltering strategies as well as information retrieval by users. The key information content is summarized in Figure 1. Users should make sure they grasp the importance of the three fundamental nodes in the hierarchy: the top level ‘Functional Site’ links to ‘ELM Motif’ which includes ‘ELM Instances’. The top level of ‘Functional Site’ is essentially a biological designation with general information: for example, ‘Nuclear export signal’. The ‘ELM Motif’ is given a more specific description, links to information pertaining to the given LM, including key literature and Gene Ontology (GO) assignments, and includes the Regular Expression pattern representing the motif: see, for example, the NES entry at http://elm.eu.org/elmPages/TRG_NES_CRM1_1.html. Of note, ELM is effectively motif-centric—if a regular expression cannot be defined, there is no entry in ELM. An ‘ELM Instance’ embodies the specific information for a motif match in a protein sequence: for example, click on the links for the NES instance in MAPKAPK2. The instances provide the essential information that supports the ELM hierarchy. Instance-containing sequences are mapped to their respective UniProt entries. A well-annotated instance may also have links to the experimental literature, the types of experiments undertaken and to informative structure entries in the PDB (49). Importantly, an instance may have a reliability value assigned by the curator: many false positive motifs have been claimed in the literature. (Note: some of the older ELM entries do not yet have well-annotated instances).
All data input is by manual curation. Annotating each ELM entry typically involves extensive literature searches, BLAST runs, multiple alignment of relevant protein families, perusal of Swiss-Prot and other online databases and, where practical, discussion with experimentalist experts from the ﬁeld. In order to promote interoperability with other bioinformatics resources, we use two public annotation standards. GO identiﬁers are used for cell compartment, molecular function and biological process (50) while the NCBI taxonomy database identiﬁers (51) are used for taxonomic nodes at the apex of phylogenetic groupings in which an LM occurs. A third standard—POSIX regular expressions (http://standards.ieee.org/regauth/posix/)—is used to represent the motif patterns. These ‘RegExps’ are conveniently usable in the Python and Perl scripting languages. They are analogous to PROSITE motifs (35), but with a different syntax. For example, the C-terminal motif LIG_CAP-Gly_1 that binds to CAP-Gly domains for microtubule plus-end regulation (52) is represented by the RegExp
where $ is the protein C-terminus, preceded by a conserved aromatic residue and a flexibly spaced run of negatively charged residues. See the help page http://elm.eu.org/help.html#regular_expressions for guidance on the ELM expressions.
Table 1 provides some representative examples of different motif categories. Based on the type of function of the LM, we have defined four classes of ELM motif (Cleavage, Ligand, Modification and Target), which are summarized in the table. Some of these motifs have complicated regular expressions, others are very simple, e.g. with just two conserved positions. It has become clear that the most common conservation pattern is for three (semi-) conserved positions in the motif. A substantial minority of motifs have one or more positions that tolerate gaps (indels). The length range of indels can usually be accurately determined from sequence alignments: the most common indel is to allow a one-residue insertion.
Table 2 provides a summary of the data that have so far been entered into the ELM DB in its current state. The most noteworthy numbers are 146 ELM motifs, the >1300 instances and the >1100 citations of LM literature. Our goal is to create representative, not comprehensive, LM entries. For abundant motifs like the sumoylation site, with thousands of instances per proteome, we will not try to annotate more than a small fraction of experimental instances, since the appropriate location for these data are the protein annotation resources such as Swiss-Prot and HPRD.
The three most commonly used methods for bioinformatical representation of sequence conservation patterns are: Profile/HMMs (54); Artificial neural networks (ANNs) (55); and RegExps (http://en.wikipedia.org/wiki/Regular_expression). Of these, RegExps are considered the worst approach to encapture protein sequence information. They are ad hoc—typically created by annotators without applying a consistent formalism. The motif characters are represented with integer values, so RegExps cannot use position-weighting to capture weaker preferences. They are over-determined and can only capture exactly what is specified (whereas the more probabilistic HMMs and ANNs can rank near misses too). They do not support searching for an exact number of a given amino acid character within a specified range [which would better approximate the charged runs in e.g. CAP-Gly and NLS motifs (56)]. Despite these shortcomings, using RegExps to establish ELM has proved to be the correct decision. Many LMs have short indels in the pattern. HMM software does not (yet) provide for variable gaps with exactly bounded ranges while ANNs do not account for gaps at all: a motif such as the NES with multiple short indels is hard to represent with these algorithms. The scoring of presence/absence matches for LM RegExps simplifies statistical analyses of motif searches. These two advantages have been critical to the first wave of development of motif-hunting software.
Thus we consider that it was appropriate to initiate LM database resources with RegExps. Of course, HMMs and ANNs are used in a number of useful predictive tools, e.g. Scansite (57) and NetPhorest (58) and there is little doubt that HMMs, neural networks and other methods will grow in importance for LM analyses in future, once the contexts can be better controlled.
The ELM resource is freely accessible to users. The data in ELM can be accessed via the Web either interactively or programmatically. Motif entries are available to be browsed from the browse links page at http://elm.eu.org/. Details from the browse page for the LIG_CAP-Gly_1 entry are shown in Figure 2. A user can also submit a protein sequence of interest through the main submission page and will receive an output page with the matched candidates. The key data retrieved by the ELM resource for the sequence is displayed in a ‘bar code’ style graphical output as shown for the motif-rich endocytic protein Epsin-1 (Figure 3). Mouse-over provides annotation and there are many links to summaries in tabular and text form. Help is available online to explain the meanings of the elements and colour code in the output.
Programmatic access takes advantage of SOAP/XML Web Services (WS) interfaces for six ELM resource modules listed in Table 3. [See the EMBRACE registry for a large collection of Bioinformatics WS (59)]. Programmers can use the ELM DB WS interfaces to collect data—for example, a query might be to retrieve all regular expressions stored in ELM or another query might be for all ELM instances, or a defined subset thereof. Other WS interfaces allow LM matching to a query sequence and structural and conservation filtering.
Upon request, we can provide a SQL dump if for any reason, the WS interface is not suitable. At some future point, we would like to provide a standardized ELM DB dump, probably using the BioMart format (60).
Searches of sequence databases with short motifs do not yield significant results (due to the large number of non-functional sequences matching the motif consensus) and therefore, it is necessary to evaluate the context of the match. Essentially, any aspect of a protein that can be informative might provide contextual filtering. Filters might be simple or complicated and ELM provides examples of both. Originally, three simple filters (21) were implemented in ELM: (i) Cell compartment filter: an LM is only meaningful in appropriate cell compartments; (ii) Taxonomy filter: an LM is only meaningful in an organism that is known to possess its interaction partners; and (iii) SMART globular domain filter: LMs are interaction sites and must be accessible, hence they are much more common in natively disordered sequence. ELM does not provide benchmarked scores for the simple filters. Two more complicated filters have been implemented and benchmarked to provide reliability assessments, for structural context and evolutionary conservation.
The ELM structure filter (SF) assesses the accessibility and secondary structure components of LM candidates whenever a reference globular domain structure is available (41). The benchmarked scale shows that most LMs are in exposed and accessible loops. Although a few genuine LMs are quite inaccessible in the available structural conformation, the benchmarking indicates that it is usually not worth experimental testing of the inaccessible motifs unless there is an indication of, for example, allosteric rearrangement that might enable the site to become exposed. When it applies, the SF is much more informative than the simple globular domain filter. The SF is implemented in the ELM resource output (Figure 3), and can be accessed independently as a web service (Table 3).
The ELM conservation score (CS) filter assesses the conservation of motif candidates in related proteins (61). LMs tend to be more evolutionarily dynamic than globular domains—it is uncommon to find an LM instance that is conserved between yeast and mammals (e.g. see the GLEBS and FFAT motif entries for counterexamples). The CS filter is a pipeline to collect and align homologous sequences and test ELM motifs for conservation, using a benchmarked scoring scheme. The CS filter has already proven its value in motif discovery efforts (62,63) but, due to the resource reengineering required, is not yet implemented in the ELM output. For the time being, therefore, it is offered as a stand-alone server (http://elm.eu.org/conscorer) and web service (Table 3). Figure 4 shows variation in conservation of some of the motif matches from the Epsin-1 example used above (Figure 3).
It is not uncommon that all the experimentation demonstrating the existence of a particular LM instance has been undertaken in a single model organism, e.g. yeast, or cell lines from one of mouse, chicken or human. For a given LM class, the set of known instances may have been identified in a range of different species. Therefore, researchers are routinely faced with the issue of mapping experimental results from diverse organisms onto the protein sequence of their model organism. The instance mapper module addresses this issue for the ELM server.
A rarely used BLAST variant, PHI-BLAST, is at the core of the ELM instance mapper (64). PHI-BLAST requires a regular expression in addition to the query sequence: the pattern must have at least one match in the query. We found PHI-BLAST to be ideally suited for mapping known LM matches from homologous sequences, so that the instance mapping issue was reduced to developing a protocol to utilize it effectively.
The flow scheme of the instance mapper is summarized in Figure 5. Sequences harbouring known instances are stored in a small BLAST formatted database. For each pattern matching the query, this database is searched by PHI-BLAST. The instance mapper then parses the output and assigns a divergence-based score to any matches that are retrieved. These are then displayed in the ELM server graphical output (Figure 3).
PHI-BLAST calculates an E-value, based on the BLAST bit score, which is useful for determining the statistical signiﬁcance of a given alignment. However, this statistic does not reflect how similar the query sequence is to the LM instance sequence, which is particularly relevant for our purpose. To address this issue, we have devised an ELM instance score Sei that is calculated from the PHI-BLAST alignment:
where i is the number of identical positions in the alignment, g is the number of gaps, la is the length of the alignment (minus gaps), lq is the length of the query sequence and ls is the length of the subject sequence. The assumptions behind the score are that false matches are more likely at higher divergence and in longer sequences. At higher divergence, the sequences may be nonorthologous (or only partially so) or, in orthologous sequences, nonorthologous matches may also be superposed, especially for common, simple motifs. Therefore, while the instance matcher can retrieve genuine instances in sequences that are as low as 30% identity, a low score serves as a warning to evaluate the match. Note that this score is designed for evaluation of pairwise matches: if we had a multiple alignment and were confident that the alignment was correct for a motif, then the conservation can be scored as ‘more’ significant at higher divergence (61).
The instance mapper is a key addition to the resource as it unites the information content of the experimental instances stored in the ELM database with the motif exploration capabilities afforded by the ELM regular expressions.
In common with other bioinformatics resources, only a few of the ELM users choose to communicate with us. Users should know that certain types of communication are very useful to us. Obviously, if a server problem persists for a few hours, we should be informed immediately. Suggestions about the ELM resource interface would also be welcome—though we can probably only respond slowly to good ideas.
Of most use to ELM and the user community would be information to improve the data stored in ELM. Sometimes this might be a simple update such as an important instance that has been omitted, a new structure or a useful reference. More substantial help with creating or improving entries would be particularly valuable. In several cases, experts have contributed or reviewed entries for ELM. Entries with expert involvement include: LIG_CAP-Gly_1, LIG_EH_1, LIG_SxIP_EBH_1, LIG_ULM_U2AF65_1, LIG_RRM_PRI_1, TRG_AP2beta_CARGO_1 (65–70).
The obvious reason why researchers may be chary of getting involved with improving ELM is the time and effort that it costs. There is an upside that scientific information now disseminates to a great extent through the web: ELM can provide another route to showcase your work and, presumably, the prouder you are of your achievements, the more visible you would like them to be. We thank those researchers who have already helped us improve ELM and hope that their research will receive some reciprocal benefit.
As ELM has become more widely known to researchers, experimental investigations of candidate matches to known motifs have begun to appear in the literature. For example, an HCMV transmembrane protein has been shown to have LMs for cooption of cellular retention systems, aiding viral immune evasion (71). A candidate 14-3-3-binding phosphosite has been validated in the cytosolic C-terminus of integrin-α4 (72). Several regulatory motifs have been investigated in Drosophila cryptochrome, a regulator of circadian rhythm (73). Collectively such studies afford optimism that our work to establish the ELM resource will increasingly be justified by experimental application.
We take the view that by applying ELM ourselves, we can better evaluate and optimize our methodologies. We have sometimes been able to employ a protocol involving GO term enrichment to reveal sets of proteins with LM matches that are significantly enriched in specific contexts. Thus, we have reported a bioinformatics survey (63) of KEN box anaphase destruction motifs enriched in mitotic proteins: KEN box motifs in CHFR and C13orf3 are thought to aid in defining their roles in mitosis, though experimental validation is still needed (74,75). In a second example, while annotating the SUMO motif, we were able to define a larger motif, KEPE, superposed on a subset of sumoylation sites (62). It is, however, too soon for the role of KEPE to have been investigated.
The ELM instance dataset has been deployed by several bioinformatics groups in ways that have provided insight into LM context and/or to develop and benchmark novel strategies for LM discovery. Thus, the anecdotal observation that LMs are more abundant in natively disordered protein sequence (21) has been verified by more systematic analyses using benchmarked native disorder predictors (28,29). More recently, this research line has been extended with the ANCHOR server providing benchmarked prediction of short stretches of sequence that have strong interacting potential (76). The local context of LMs has been further investigated, revealing that the adjacent peptide sequence often has a role in modulating LM function (77,78). Stemming from an awareness that viruses utlilize numerous LMs to hijack cellular systems, Dinkel and Sticht (37) developed and benchmarked a pipeline to apply conservation and domain masking to motif candidates. Observing that multiple sequence alignment software has been overtrained on globular sequences and therefore performs quite poorly with short conserved motifs, the BAliBASE alignment benchmark suite was extended with an LM benchmark in the hope that this will lead to improved alignment algorithms (79).
While the ELM resource per se is not suited to de novo discovery of hitherto unknown motifs, the instances have been used by others to develop and benchmark tools for just this purpose. Yeast 2-hybrid data includes candidate LM-mediated interactions and both DILIMOT and SLiMFinder use interaction sets to search for enriched motifs in the binders of a protein (38,39,80). These methods depend on overrepresentation of a motif and therefore are probably not suited to motifs that have few biological instances. However, another promising approach uses amino acid preferences to sample 3D structural surfaces for sites with high peptide binding values (40): such methods have the potential to reveal LMs that have only a single functional instance in a proteome. These strategies illustrate how other data (interactions, structures) can be integrated into bioinformatics LM discovery pipelines, complementing experimental approaches for motif definition such as peptide libraries and arrays (81–83).
When we began the ELM project, LM bioinformatics was essentially nonexistent (21). The progress in the last few years has been impressive and exciting. There is growing awareness that the study of protein interactions is not just about globular–globular interfaces (5,84). Protein interaction data and domain surfaces can now be explored for possible LM interactors. There is much more to be done before researchers can pull up strong LM candidates as easily as running BLAST searches, but this goal—so important if we are to understand cell regulation—no longer seems to be impossibly fanciful.
Candidate LMs require experimental validation. The key to using ELM is to select good candidates for experimental validation and not waste time on the poor ones. Since LMs are always interaction sites, they must be in the same cell compartment as their ligand. There is little point in experimentally testing a candidate cyclin-binding motif in a collagen sequence. Likewise, a motif that is deeply buried in a solved structure makes a poor choice for experimentation (41). Therefore, it is first necessary to establish if a motif match is conserved, exposed and in the right cell compartment, according to the ELM filters. Motifs that pass these tests can then be further examined using a range of bioinformatics tools. Figure 6 shows a flowchart for how a typical motif evaluation might proceed. After the initial ELM tests, native disorder predictors and domain databases can give an indication of structural context. If the motif is within a known 3D structure, the context should be visualized; e.g. with PyMol (http://pymol.sourceforge.net/). Swiss-Prot features, the HPRD entry and phosphorylation databases may provide additional structure–function context. A user should always prepare a multiple sequence alignment and examine the motif conservation. Note that multiple alignment software sometimes struggle with motif alignments, with MAFFT (85) perhaps being the best current choice (79). If motifs are present but misaligned, an alignment editor such as JalView (86) may be helpful. Is the motif conserved in a specific lineage, e.g. vertebrates? If the motif is conserved, is the adjacent sequence less so? If things are looking good, it is important to ask whether the proposed LM function makes any sense for the protein; if this is unfamiliar, it is advisable to spend some time reading the literature: the ELM links to PubMed are a useful starting point, but unlikely to be exhaustive.
If LM candidates have survived the routine tests, there are other bioinformatics tools that might provide further insight. Protein interaction resources such as STRING (87), MINT (88) and IntAct (89) can reveal if a ligand protein is known to be close in the network. Interaction data can also be supplied to DILIMOT and/or SLiMFinder to evaluate whether there is statistical support for motif enrichment (38,39). Enrichment of motifs with UniProt GO terms and other keywords can sometimes provide statistical support for sets of motifs (62,63,90). SIRW is an online tool (http://sirw.embl.de/index.html) that allows keyword exploration for RegExps (91). If enrichment is found, SIRW can provide a probability estimate using Fisher’s Exact Test. Of course, motif enrichment can be an artefact of sequence length or amino acid bias so judgement of the results is required. If the enriched set is not more conserved than the background, then it is unlikely to be biologically meaningful.
After doing all this, ask once again: Is the motif buried? We think it likely that inaccessible motifs are the most common reason for erroneous LM reports in the literature.
Even when an LM candidate is in the right cell compartment, and survives many other tests, it does not have to be functional as it still may never contact the ligand protein (20). There is increasing evidence that cell signalling decisions are made in large dynamic protein complexes. If a motif-containing protein is never in the same complex as a ligand protein, the motif will be false. For this reason, cell localization assays are useful, although they can be misleading if overexpression is used. Coimmunoprecipitation and pull down experiments are also widely used as part of motif validation. We thought it might be of interest to list the most commonly annotated methods applied in motif validation and these are presented in Table 4. Since no one experiment is definitive, many of these methods will have been applied to a well-validated motif instance.
In common with LM bioinformatics, in general, ELM has advanced to a state of practical usefulness, yet there is much more to do. LM RegExp matches cannot yet be taken as indicators of true functional sites and the candidates must be experimentally verified. The ELM dataset is incomplete with respect to motifs reported in the literature and there is work to be done to extend the coverage of the database: currently, users should not use ELM as a sole source of LM information. We have identified a need to improve the data captured regarding interactions of the ELM instances, which currently are of limited use for systems modelling in silico. ELM filtering can be improved in the short to medium term by embedding the CS filter and by using Swiss-Prot topology domains for automated cell compartment filtering of transmembrane proteins. In the ELM output, we would like to present the user with phosphorylation sites and other readily available information about the structure/function modules of query proteins. It is our hope that most of these goals will have been achieved when we next report on ELM.
The ELM Web Service interfaces were developed in the framework of the EU FP5 EMBRACE grant (LHSG-CT-2004-512092). The FIRB 2004 ITALBIONET grant (to A.V.); the NGFN DiGToP grant (to M.S.); the FP6 ProteomeBinders grant (to N.H.). SF development was aided by DAAD and Vigoni covered travel expenses between Heidelberg and Rome. Funding for open access charge: EMBL.
Conflict of interest statement. None declared.
The authors thank the former contributors to the ELM resource, the Bioinformatics developers who have applied the ELM instances to develop discovery methods and the ELM resource users whose web access statistics spurred us on.