Since the first release of MnM (15
), the number of minimotif sequences has grown approximately 11-fold to 5089 sequences (). The source of these minimotifs has been primary literature with the exception of several hundred minimotifs imported from PDZbase (14
). To identify new motifs several sets of keywords were used to search PubMed. Typical words were ‘motif’, ‘peptide’, ‘site’, etc. Papers were read by an expert, who then inserted the minimotif into the database. The majority of the growth was due to new motif entries; however, another reason for the increase in the number of motif entries arises because some previous annotations had motifs that bound to more than one different protein. We now consider a single motif entry to describe a single binding protein.
Growth of minimotif entries in MnM
Complete entries in the first release of MnM had a motif sequence, annotation, identifier, cellular compartment and a reference source. For an entry to be complete in MnM 2, the motif annotation has been replaced with a motif sequence and a corresponding source protein (and accession number), an activity, and a target, which can be a protein, nucleic acid, lipid or other small molecule. For a protein target, support for corresponding information for a target region such as a protein domain has been added. This alteration enables us to integrate motifs, activities and motif targets with other biological databases. Inclusion of data in the MnM database is still based on the requirement that the motif sequence and its activity are published.
We also now designate whether a minimotif sequence is a consensus or an instance of a protein or peptide. The original MnM release contained 312 consensus sequences and has grown approximately 3-fold to 858 sequences and we have started annotation of verified instances of minimotifs which has grown approximately 100-fold to 4229 peptide sequences. Most of the new minimotifs are binding motifs which have grown approximately 29-fold and the numbers of post-translational modifications have grown approximately 5-fold. We have now broken up several previous annotations that were for either multiple minimotif sequence sources or multiple targets into separate entries, artificially inflating the number of entries, but properly segregating information, reducing ambiguities and allowing the database to be mined in new ways. The number of references has grown approximately 5-fold which likely, more accurately reflects the growth of the database over the past 2 years.