Members of the ubiquitin multiprotein family function as covalent modifiers of other proteins. These post-translational modifications (PTMs) then cause the target protein to be relocated to another subcellular location (Dye and Schulman, 2007
). In the case of SUMO (the small ubiquitin-like modifier), attachment can affect processes including gene transcription and cell-cycle progression, although the mechanisms by which relocation within the nucleus achieves this are far from clear (Geiss-Friedlander and Melchior, 2007
). While sumoylation seems largely to be restricted to the nucleus, a few non-nuclear proteins have been proposed to be sumoylated (Watts, 2004
). SUMO substrates are often difficult to validate due to the low stoichiometry of the SUMO modification: however, proteomic approaches have lead to the identification of many putative substrates (reviewed in Rosas-Acosta et al.
Like other PTMs, sumoylation occurs at accessible linear motifs (LMs), usually in regions of natively disordered polypeptide (reviewed in Diella et al.
). Sumoylation occurs on a lysine in a motif that can be described by the pattern
=hydrophobic (Girdwood et al.
; Rodriguez et al.
) or the regular expression [VILMAFP]K.E as used in the ELM linear motif resource (Puntervoll et al.
). Sumoylated proteins that do not have the classical consensus motif have also been reported (Zhou et al.
K.E pattern matches nearly half of the proteins in Swiss-Prot (Yang et al.
), indicating that most of the matches are false positive. As a consequence, there have been several attempts to try to extend this motif to get more specificity, resulting in the identification of different extended SUMO consensus motifs. Thus, the phosphorylation-dependent sumoylation motif (PDSM)
K.E..SP has been described in a subset of substrates, mainly transcriptional regulators: the phosphorylation of the SP motif regulates the interaction between the substrates and the SUMO-conjugating machinery, promoting sumoylation of the substrates (Hietakangas et al.
). In a second analysis, a cluster of acidic residues downstream from the core of many SUMO sites has been shown to be important for substrate binding and subsequent sumoylation (Yang et al.
). The importance of negative charges was also identified in substrates like Elk-1 and LRH-1; this extended SUMO consensus motif was named NDSM, negatively charged amino acid-dependent sumoylation motif.
The C/EBP transcription factors regulate cellular proliferation and differentiation of a range of cell types. They have been described as both tumour promoters and tumour suppressors, indicating that their regulatory system is complex (Nerlov, 2008
). In C/EBPα, a regulatory domain motif (RDM) has been shown to inhibit the activity of an activation domain in a position-independent, but dose-dependent manner. The RDM was characterized by the consensus [VIL]K.EP and it was shown that sumoylation of lysine at position 2 decreases its inhibitory function in vitro
(Kim et al.
; Nerlov, 2008
A major hindrance to bioinformatic investigation of LM occurrences is that simple database searches do not yield significant results, while the false instances of motifs vastly outnumber the true ones. However, with improved sequence database annotation, LMs can sometimes be significantly enriched with certain keywords. Thus, Copley used transcriptional keywords to detect and justify new examples of the EH1 transcriptional repressor motif (Copley, 2005
). A similar approach in combination with disorder prediction and conservation scoring, has shown that KEN-box destruction motifs are significantly enriched in the set of UniProt/Swiss-Prot entries annotated with cell-cycle keywords and Gene Ontology (GO) terms (Michael et al.
In this article, we report a computational investigation based on the RDM of C/EBPs. We refined the motif in the aligned C/EBP RDM sequence segments and then deployed a protocol involving keyword enrichment, native disorder prediction and conservation scoring in a survey of protein sequence databases. Highly significant results for motif matches were obtained with sets of entries annotated with keywords, such as nucleus, transcription and chromatin. The conservation pattern of the C/EBP motif was found to be the archetype of a linear motif, which we term KEPE, which is present in many nuclear proteins of the metazoa.