Overview of LIM domain identification and classification
In the course of this study, we adopted the classification scheme previously put forth for homeodomain proteins 
. In this scheme, a class contains one or more families that, in turn, contains one or more proteins. A protein family is usually defined as containing all proteins that descended from a single ancestral protein in the last common ancestor to bilaterians, while classes reflect deep evolutionary relationships between multi-domain proteins with distinct domain architectures. We divided the previously defined groups of LIM domains into 14 classes (ABLIM, CRP, ENIGMA, EPLIN, LASP, LIMK, LHX, LMO, LMO7, MICAL, PXN, PINCH, TES, ZXN). The term “superclass” is used to refer to the entire repertoire of LIM proteins.
We used the LIM hidden Markov model (HMM) from PFAM 
as a query against nine predicted proteomes – Capsaspora ocwazarki
(Filasterea), Salpingoeca rosetta
(Choanoflagellatea), Monosiga brevicollis
(Choanoflagellatea), Amphimedon queenslandica
(Porifera), Mnemiopsis leidyi
(Ctenophora), Nematostella vectensis
(Cnidaria), Trichoplax adhaerens
(Placozoa), Drosophila melanogaster
(Arthropoda), and Homo sapiens
(Vertebrata); see for the relationships between these species. We retrieved a total of 623 LIM domains from 265 proteins and constructed a multiple sequence alignment by aligning each individual sequence to the LIM HMM. We then used this alignment (shown in Fig. S1
) and multiple starting trees to generate phylogenetic trees under both Bayesian inference and maximum likelihood frameworks. The maximum likelihood of each of these trees was evaluated, and the tree with the highest likelihood was selected for further analysis (, S2
). This process was also performed on an alignment consisting of only human LIM sequences (Fig. S4
). For both datasets, we generated 100 bootstrap replicates, finding poor support for most clades.
Origin of LIM classes and families.
Given this poor statistical support, we used a consensus approach to identify consistently recovered clades. We generated a strict consensus tree between a pruned version of the multi-species tree and the human-only dataset. We designated each of the 38 clades radiating from the midpoint of this strict consensus tree as human LIM homology groups. Out of 171 human LIM sequences, only 12 were placed in homology groups with three or fewer taxa. Superimposing these homology groups onto the multispecies tree in , we placed 392 of the 473 non-human LIM sequences into these homology groups using a nearest neighbor approach (see Methods
). The 59 proteins that could not be classified shared a most recent common ancestor with human taxa from multiple homology groups and did not belong to a lineage diverging just outside of a single-homology group clade (See the “Unclassified” section of Table S1
We retrieved the full amino acid sequences of all 265 hypothetical proteins and scanned them for non-LIM PFAM domains using HMMER 
. We also scanned these sequences for motifs using the motif discovery program MEME 
. We used the following criteria to define the domain architecture of a particular LIM protein: (1) the number of LIM domains, (2) the presence of any non-LIM PFAM domains, (3) the presence of any sequence motifs, and (4) and the arrangement of these features. We used these domain architectures, along with the assignment of each LIM domain into one of the homology groups described above, as parallel lines of evidence to systematically place each protein into one of the 14 LIM classes (Table S1
ABLIM genes code for focal adhesion and adherens junction scaffolding proteins that mediate interactions between actin filaments and cytoplasmic targets; they also activate cytoskeletal signaling cascades that lead to transcription 
. These proteins consist of a carboxyl-terminal villin headpiece (VHP) domain and four amino-terminal LIM domains (). The domain architecture of ABLIM proteins makes them important components for cell-cell adhesion in epithelial tissues; the VHP domain confers F-actin-binding properties, while the LIM domains localize these proteins to adherens junctions 
. Defects in the Drosophila
ABLIM protein unc-115 lead to axon navigation errors 
In addition to the three human ABLIMs, we found a single ABLIM in Drosophila
, and Amiphimedon
with the canonical architecture of four LIM domains and a VHP domain (Table S1
has two ABLIM proteins: one containing a VHP and one without. Similarly, Trichoplax
has two ABLIM proteins that are both missing the VHP domain. One of the Trichoplax
ABLIMs is also missing the most carboxyl-terminal LIM. Capsaspora
, and Salpingocea
do not have ABLIM proteins, suggesting that ABLIM is a metazoan novelty ().
CRP is an ancient class of LIM proteins. It is the only LIM class that includes proteins from plants and the amoeba Dictyostelium discoideum
. As in plants, animal CRP proteins have been reported to modulate cytoskeletal dynamics 
. CRP proteins stabilize α-actinin 
and are involved in scaffolding at focal adhesions 
. They also can shuttle to the nucleus where they serve as transcriptional regulators 
. A CRP gene in Nematostella
is expressed in the developing mesenteries, the coelenteron lining, and tentacles – all muscle-associated tissues 
CRP proteins typically contain two LIM domains separated by an approximately 50-residue linker, although some class members contain only a single LIM domain (). A conserved 15–20 amino acid glycine-rich motif can be found on the carboxyl-terminus of each LIM domain 
. In human CRP1, this motif is required for its localization to the cytoskeleton and ability to bundle actin 
. This region may also overlap with a CRP nuclear localization signal 
If we root our multi-species tree with CRP, which is reasonable given that CRP is present in plants, the LIM domains of this class form a clade that is almost monophyletic (, S2
). All but four of the proteins within this clade have a glycine-rich motif. Two of these four (Nv_68197, and Aq_223000) appear to be partial isoforms from CRP proteins that are already represented in our dataset (Nv_78916 and Aq_229999). We consider these proteins to be misannotated and have removed them from our table of classified LIM proteins (Table S1
). An alternative gene model for the single LIM protein Nv_7949 encodes two CRP LIM domains and a glycine-rich motif. Therefore, we have designated this protein as belonging to the CRP class. We have classified Co_04145T0 (from Capsaspora
) as “unclassified” rather than a bona fide
CRP, since we are unable to generate any corroborating evidence to ally this protein with the CRP class.
We identified six CRP proteins in humans, eight in Nematostella
, one in Mnemiopsis
, two in Amphimedon
, and two in Capspaspora
). Two Drosophila
CRP-related proteins each contain five tandemly duplicated LIMs and glycine-rich motifs. We were unable to unambiguously recognize CRP proteins in Trichoplax
, or Monosiga
The ENIGMA class consists of three families with differing numbers of LIM domains; Alp family proteins have one, Enigma family proteins have three, and Tungus family proteins have four (). The proteins of this class include a PDZ domain that binds α-actinin and modulates actin dynamics. ENIGMA proteins are able to enter the nucleus to modulate gene expression and signal transduction (reviewed in 
In addition to the LIM and PDZ domains, two motifs have been described in a subset of the ENIGMA class of proteins. The Zasp (ZM) motif helps localize the Pdlim7 protein to α-actinin 
. Using the HMM from the SMART database 
, we identified this motif (Table S2
) in the Drosophila
Tungus protein, the human Alp proteins Pdlim1 and Pdlim3, as well as in the human Enigma protein Ldb3 (Table S1
). This suggests that this motif was established prior to the divergence of the Alp, Enigma, and Tungus families.
A second motif of unknown function, the Alp motif (AM), was previously thought to be present only in the Alp family of proteins (e.g., human Pdlim1-4) 
. However, we find that most of this motif is conserved in all members of the human Enigma family (Pdim5, Ldb3, and Pdlim7). In addition, we recovered the Alp motif in Nematostella
Enigma proteins (Nv_ 231944, Ml_108023b), as well as a Tungus protein encoded by the cephalochordate Branchiostoma floridae
(Bf_123730). This suggests that this motif was also established prior to the divergence of these three families.
, a single ENIGMA class protein, Tungus, exists with a PDZ domain and four LIM domains. The first Tungus LIM forms a clade with the LIM domain from the Alp family, while the other three LIM domains are related to each of the three Enigma LIM domains (, S2
, and S3
). Tungus is present in the nematode Caenorhabditis elegans
(Ce_alp-1) and the invertebrate chordate Branchiostoma floridae
(Bf_123730), but absent from all other species in our study (Fig. S6
We found a single Enigma protein in Nematostella, Trichoplax, Mnemiopsis, and Amphimedon. We did not find an Enigma in Drosophila or in C. elegans, but in addition to the three human Enigma proteins, we detected one Enigma in the lophotrochozoan Capitella teleta (JGI Capca1|63591). We were unable to recover an Alp from any of the non-bilaterian species, Drosphila, or C. elegans, but we did find Alp proteins in Capitella (JGI Capca1|190169) and Branchiostoma (Bf_124330), as well as human.
A previous study, based on the distribution of domains and relationship of a limited set of bilaterian LIM proteins, suggested that a Tungus-like ancestor gave rise to the Alp and Enigma families 
. However, this hypothesis seems unlikely given the presence of the Enigma family in Capitella
, as well as in non-bilaterian genomes; all these data were unavailable at the time of the previous study. The presence of the ALP motif throughout the ENIGMA class further contradicts this hypothesis. The most parsimonious explanation given this new data is that an Enigma-like ancestor originated in the stem of the Metazoa and gave rise to the Alp and Tungus families in the stem of the Bilateria ().
EPLIN class proteins promote the bundling and stabilization of actin stress fibers and act as scaffolds to associate cell adhesion machinery (specifically, cadherin-catenin complexes) with the cytoskeleton 
. The mammalian EPLIN gene Lima1 can be found in the cleavage furrow during early embryogenesis (potentially as a recruiter protein) and is also required for cytokinesis 
. Xirp2 is expressed in skeletal muscle and intercalated discs, where it is required for normal heart development in mice 
We identified a highly conserved 22-amino acid motif, which we have named the Eplin Motif, positioned adjacent to the carboxyl-terminus of the EPLIN LIM domain (, Table S2
). In addition to human Lima1 and Xirp2 proteins, we identified this motif-domain combination in a third human protein, Limd2. We also found a single EPLIN class protein with this architecture in each of Drosophila, Trichoplax, Nematostella
, as well as three in Monosiga
), which dates the origin of this class to before the last common ancestor of Capsaspora
and Metazoa ().
The Amphimedon EPLIN also contains a troponin-like interaction domain, potentially for binding to either actin or tropomyosin. The Salpingoeca EPLIN encodes a SLyX domain that has no known function. One of the Monosiga proteins has a carboxyl-terminal cyclic nucleotide binding domain and an EF-hand domain. We were unable to identify an obvious EPLIN in Mnemiopsis.
The three vertebrate LASP proteins – Lasp1, Nrap, and Nebl – are closely related to the non-LIM protein Neb. Like Neb, LASP proteins are able to stabilize both F-actin filaments and focal adhesion plaques via nebulin repeats. Nrap is a striated muscle protein involved in myofibril assembly and sarcomere organization. The Nebl gene encodes multiple isoforms, including two that have the characteristic LASP domain architecture and one that has a non-LIM architecture. The latter, also known as Nebulette, encodes over 20 nebulin repeats and no LIM domains. The two LIM domain-containing isoforms (also known as Lasp2) are most highly expressed in the brain as an actin cross-linking structural protein (reviewed in 
). Lasp1 is the only known nebulin protein to be found in the nucleus as well as the cytoplasm 
Human Lasp1 contains a single LIM domain followed by two nebulin repeats and an SH3 domain. Nebl has a similar architecture, but with an additional nebulin repeat, while Nrap contains numerous nebulin repeats and lacks an SH3 domain (). We identified a single LASP protein with a LIM, two nebulin repeats, and an SH3 domain in Drosophila, Mnemiopsis, and Amphimedon. Three tandemly duplicated proteins with the same architecture were also found in Nematostella. No LASP class proteins were found in Trichoplax. A single related protein with only one nebulin repeat was identified in the two choanoflagellates and Capsaspora. However, the Monosiga homolog contained two additional carboxy-terminal SH3 domains, while the Salpingoeca homologs contained three. This phylogenetic distribution suggests that the LASP class originated prior to the last common ancestor of Capsaspora and Metazoa ().
Domain spacing in all animal LASP proteins besides Nrap is highly conserved. The first nebulin repeat always occurs exactly 67 amino acids from the amino-terminus, while the second one occurs at or near amino acid position 102. Likewise, the LIM domain is always five or six positions from the amino-terminus. Furthermore, the distance between the LIM domain and first nebulin repeat in animals (62 amino acids) is identical to the length of the corresponding interval between the LIM domain and the single nebulin repeat in the Capsaspora and Salpingoeca LASPs. The spacing in human Nebl is also consistent with this trend. All five of the LASP class proteins in the non-human metazoans in this study contain two rather than three nebulin repeats, suggesting that the domain architecture of Lasp1, rather than Nebl, is the ancestral domain configuration.
Outside of the LASP class, we were unable to find other nebulin repeat-containing proteins in any of the non-human species in this study. This is consistent with previous studies that report only being able to find nebulin repeat-containing proteins in vertebrates and the cephalochordate Branchiostoma floridae
. This phylogenetic distribution supports the hypothesis that an ancestral LASP gene gave rise to all genes that code for nebulin repeats in metazoan evolution 
. The rigid spatial requirements on the domains of the LASP proteins might be why there have been so few redeployments of nebulin repeats in the evolution of animals.
LIM homeodomain proteins (LHX) are transcription factors that usually consist of two amino-terminal LIM domains and one carboxyl-terminal homeodomain (). This class of LIM proteins plays an important role in tissue specification, particularly in the nervous system, where LHX proteins work in combination to determine neuronal fates. This cooperative interaction has been termed the “LIM code” (reviewed in 
In vertebrates, LHX proteins are involved in patterning the head and limbs, and the organogenesis of the forebrain, spinal cord, pituitary, heart, kidneys, eyes, and pancreas (reviewed in 
). In Drosophila
, LHX proteins are involved in axon guidance, patterning, and muscle formation (reviewed in 
). LHX gene expression has been observed in presumptive neural territories during Nematostella
development and in the photoreceptor ring of Amphimedon
Previous studies have suggested that LHX proteins are metazoan innovations (e.g., 
). Consistent with these studies, we recovered LHX proteins from all of the metazoans in our study, whereas none were found in the three non-metazoan proteomes. This phylogenetic distribution suggests that LHX proteins originated at the stem of the Metazoa (). In total, we recovered three Amphimedon
, four Mnmeiopsis
, four Trichoplax
, six Nematostella
, six Drosophila
, and 12 human LHX proteins (Table S1
has two additional LHX proteins that are absent from JGI's proteome version 1.0, but were described by Srivavstava and coauthors, making for a total of six LHX proteins 
Unlike LHX transcription factors, nuclear LMO proteins lack a DNA-binding homeodomain (). However, the two LIM domains of the LMO proteins each form a corresponding clade with the two LIM domains of LHX proteins, suggesting that these two classes are sister groups (, S2
LMO proteins regulate gene expression by binding transcription factors and other nuclear proteins. For example, in many cell types, “LIM Only” (LMO) proteins are co-expressed with LHX proteins and are thought to play a role in antagonizing selected LHX combinations (reviewed in 
). In this way, LMO proteins negatively regulate the “LIM code.”
In addition to the four human LMO proteins and two Drosophila
LMO proteins, we identified three LMO proteins in Nematostella
and one protein in Trichoplax
). No LMO proteins were recovered from Capsaspora
, or Amphimedon
. Given the phylogenetic distribution of these lineages and the corresponding relationship of the two LIM domains of LMO and LHX in our tree (, S2
, and S3
), the most parsimonious explanation is that an ancestral LHX-like gene lost its homeobox somewhere in the stem of the ParaHoxozoa, thereby forming the LMO class ().
LIMK proteins are serine/threonine kinases that inhibit actin disassembly by phosphorylating cofilin proteins (reviewed in 
). Through this interaction, LIMK proteins regulate cell spreading, motility, growth, and cytokinesis. Moreover, LIMK proteins localize to focal adhesions, where they catalyze signaling cascades, or they can be shuttled to the nucleus where they regulate transcription 
. Homo-dimerization of LIMK proteins may inhibit kinase activity or, in complex with a mediator, can enhance kinase activity (reviewed in 
LIMK proteins contain two amino-terminal LIM domains, a PDZ domain, and a kinase domain (). In addition to the human LIMK1 and LIMK2 proteins, we identified single LIMKs in Drosophila
, and Amphimedon
. No LIM domains from Trichoplax
, or Monosiga
are present in the two clades that comprise the LIMK LIM domains (, S2
). Furthermore, we were unable to identify any proteins with both a kinase domain and a LIM domain from these four species. LIMK appears to be absent from these species.
has three proteins that have both kinase and LIM domains. We chose to exclude two of the Capsaspora
proteins (Co_06515T0 and Co_08582T0) from the LIMK class. These two have atypical domain architectures, which lack PDZ domains; in addition, each contains more than two LIM domains, none of which share phylogenetic affinity with the bona fide
LIMK LIM domains. The other (Co_05847T0) has a typical LIMK domain architecture, but also contains an additional TFIIA domain (Pfam PF03153). Although the first LIM of this protein is highly divergent, the second LIM is phylogenetically related to the second LIM of the metazoan LIMK proteins (, S2
). We have classified this as a true LIMK and as such, date the origin of this class prior to the last common ancestor of animals and Capsaspora ().
The canonical LMO7 proteins consist of a CH domain, a PDZ domain, and a single LIM domain (). The mammalian Lmo7 protein is involved in actin polymerization and stabilizing F-actin 
. It localizes to focal adhesions, but in response to mechanical stress, can shuttle to the nucleus, where it is a potent transcriptional regulator 
We found related single LIM proteins in both Drosophila
. The Drosophila
protein, which lacks both PDZ and CH domains (Dm_CG31534), had previously been designated as an LMO7 
. In Nematostella
, we recovered a single protein (Nv_216756) with a LIM domain and a degraded CH, but no PDZ. Interestingly, we identified LMO7 proteins, each with a single PDZ and CH domain, in Amphimedon
, but did not find any LMO7 proteins in the non-metazoan species. The presence of these proteins in the two earliest animal lineages suggests that LMO7 originated at the stem of the Metazoa ().
According to our phylogenetic analysis, the human Limch1 and Znf185 proteins are closely related to human Lmo7 (Fig. S4
). Limch1 contains a single LIM domain and a CH domain, but lacks the PDZ domain. Znf185 lacks both the PDZ and CH domain but unlike other LMO7 class protein, has an amino-terminal domain called an actin-targeting domain (ATD), which is required for Znf185 to localize to actin-regulated structures 
. In our multi-species tree (, S2
), Limch1 and Znf185 form a clade with human Lmo7 and the Drosophila
Lmo7 within the larger LMO7 clade suggesting that these proteins are likely the product of bilaterian-specific gene duplications.
MICAL is a single LIM domain-containing class consisting of the Mical and Mical-like families. Proteins of the Mical family are involved in destabilizing actin for neuronal growth and axon guidance during embryogenesis. They are expressed throughout adulthood in lung, brain, heart, thymus, and particularly in neuronal and muscular tissues. Mical-like proteins are involved in vesicular trafficking and the recycling of tight junction components (reviewed in 
In addition to a single LIM domain, MICAL class proteins have an actin-binding calponin homology (CH) domain and a highly conserved carboxyl-terminal region, represented by PFAM model DUF3585 (Pfam PF12130; ). The Mical family is distinguished from the Mical-like family by an additional amino-terminal catalytic FAD-binding/oxidoreductase domain, which is required for Mical to bind F-actin 
. We found that the Pfam FAD-binding HMM (Pfam PF01494.12) was not sensitive enough to identify all FAD-binding domains of the Mical family. Furthermore, we found that the entire region from the amino-terminus to the CH domain, which incudes the FAD-binding domain in MICAL proteins, is highly conserved across Metazoa. Therefore, we constructed two HMMs to represent the regions surrounding the PFAM-predicted FAD-binding domain in Mical family proteins (Fig. S9
We were unable to identify any MICAL class proteins from the non-animal genomes in this study. On the other hand, both Mical and Mical-like proteins were found in each animal we investigated except for Trichoplax, which encoded a single Mical protein. This phylogenetic distribution suggests that both the MICAL class and the Mical and Mical-like families were established at the metazoan stem (). In an attempt to better resolve the relationships between the ENIGMA, LIMK, LMO7, and MICAL classes, we performed a phylogenetic analysis on the PDZ and CH domains of these proteins (data not shown). Unfortunately, the results of this analysis were inconclusive and were, therefore, not included.
Like ABLIM, PXN (Paxillin) is a class of focal adhesion scaffolding and integrin-mediated signaling proteins 
. PXN proteins encode four carboxyl-terminal LIM domains, which localize these proteins to focal adhesions. They also encode one or more amino-terminal LD motifs, which are short leucine-aspartate-rich regions that have the consensus sequence LDxLLxxL (). These LD motifs are required for interaction with many other proteins 
When phosphorylated, PXNs can recruit complexes of proteins to focal adhesions and regulate Rho GTPase signaling to effect cell adhesion, spreading, motility, and survival (reviewed in 
). In human cells, the Tgfb1i1 and Pxn proteins have been shown to shuttle between the cytoplasm and nucleus, where they serve as nuclear receptor co-activators 
PXNs can be found in both fungi and amoebae and, as such, are an ancient class of LIM protein () 
. We found a single PXN in each genome we surveyed except for human, which encodes three (Table S1
). We identified LD motifs in the PXNs of all animals and Capsaspora
, but not in either of the choanoflagellates. In addition to a true PXN protein, Capsaspora
has an additional PXN-like protein with four divergent PXN LIM domains as well as a Rap-GAP domain, but no identifiable LD motifs (Co_06505T0 in Table S1
PINCH (sometimes called LIMS) proteins are adapters responsible for focal adhesion assembly and linking integrins to multiple signaling pathways (reviewed in 
). PINCH proteins complex with integrins at muscle attachment sites 
and also have been shown to shuttle to the nucleus in Schwann cells and neurons 
PINCH proteins contain five tandem LIM domains (). We also identified a highly conserved twelve amino acid PINCH motif. This leucine-rich motif occurs immediately adjacent to the C-terminal side of the five LIM domains (Table S2
). We found a single PINCH protein in Drosophila
, and Amphimedon
. The Mnemiopsis
genome encodes two PINCH proteins and the human genome encodes three (Table S1
). No PINCH proteins were observed in either of the choanoflagellates, but a PINCH protein exists in Capsaspora
, which sets the origin of the PINCH class prior to the last common ancestor of metazoans and Capsaspora ().
The TES class consists of the Tes, Etes, and Fhl families. The PET domain is a highly conserved putative protein-protein interaction domain 
that is specific to metazoans and choanoflagellates. The domain is characteristic of Tes and Etes families. The Fhl family originated recently in evolution and is characterized by the loss of the PET domain.
We identified two novel motifs in TES class proteins that we call TMA1 and TMA2 (Table S2
). These motifs always occur to the amino-terminal region of the PET domain (Table S1
). Seven of the TES class proteins have both of these motifs, which, in all cases, are separated by 17 or 18 amino acids. This suggests that they are part of a larger ~60 amino acid motif. 18 of the 28 proteins that make up the Tes and Etes families have at least one of these motifs (Table S1
). In the human Lmcd1 protein, the region corresponding to the TMA2 motif is reported to bind the GATA6 transcription factor 
, suggesting that this motif is somehow related with transcriptional activities. We did not detect the motif in any of the FHL proteins. The presence of this motif in Tes family proteins of Monosiga
suggests that this motif was one of the founding components of the class.
Proteins of the Tes family are characterized by an amino-terminal PET domain and two to three carboxyl-terminal LIM domains (). The PET domain is capable of binding its own LIM domains and subsequently altering its set of binding partners; this, in turn, regulates its cellular localization 
. Human Tes localizes to focal adhesions and is involved in cell spreading 
. It has been shown to be present in the nucleus and is potentially involved in shuttling, similar to other LIM proteins 
Prickle and Human Prickle1 and Prickle2 are classically described as core components in the non-canonical Wnt planar cell polarity (PCP) pathway. In this pathway, these proteins antagonize Dsh on the proximal side of the cell, inducing a distal Fz-Dsh complex and establishing cell polarity (reviewed in 
We identified Tes family proteins in all species surveyed except for Capsaspora. This phylogenetic distribution suggests that Tes proteins originated just prior to the last common ancestor of chonanoflagellates and animals ().
We have designated TES class proteins that contain a PET domain and six LIM domains as the Etes (for “Extended testin”) family (). We recovered one Etes family protein from both Drosophila
and two from Nematostella
). There is limited literature describing the Etes proteins from these three species. However, the C. elegans
ortholog, lim-8, is a component of the focal adhesion complex at muscle wall sarcomeres 
, and is expressed in neurons, depressor muscles, and other tissues 
. The presence of an Etes protein in Amphimedon
but not in any of the non-metazoans suggests that this family originated in the stem lineage of Metazoa ().
Fhl (for “Four and a half LIM”) proteins contain four LIM domains and a LIM-like amino-terminal zinc-finger domain (the “half LIM”; ). These five domains share corresponding homology with the terminal five LIM domains of Nematostella and Drosophila Etes family proteins. Humans lack an Etes family protein and are the only species in our study with Fhl proteins. The most parsimonious explanation for this data is that an ancestral Etes-like protein lost its PET domain somewhere in the lineage to humans after it split from Drosophila ().
Members of the human Fhl (Four and a half LIMs) family are highly expressed in striated muscle, osteoblasts, and testes, where they have documented interactions with more than 50 other proteins 
. They are involved in integrin-mediated, Notch, TGF-β, and Rho signaling, co-transcriptional activation and repression, cell differentiation, cytoskeletal remodeling, and mechanical stress response 
. Their involvement in skeletal/cardiac myopathies and metastatic cancers is well-characterized 
ZYX (Zyxin) class proteins act as adapter proteins that facilitate the assembly of protein complexes at focal adhesions and take part in traffic to and from the nucleus (reviewed in 
). ZYX proteins are characterized by three closely spaced carboxyl-terminal LIM domains that are required for localization to focal adhesions and adherens junctions (reviewed in 
; ). The amino-terminal region of ZYX proteins are highly variable, leading to a diverse set of binding partners within the class 
. ZYXs are implicated in cell fate determination, cell motility, oncogenesis, and cell growth (
). Recent work has shown that ZYXs also play a role in microRNA silencing and telomere protection 
We recovered seven ZYX proteins from human, three from Drosophila
, two from Nematostella
, and one each from Amphimedon
). We were not able to identify any ZYX proteins in the Trichoplax
or non-animal genomes. The phylogenetic distribution of the ZYX class suggests that this class arose in the stem of the Metazoa ().
We identified a leucine-rich amino-terminal motif in Drosophila
Jub, five of the seven human ZYXs, and one of the Nematostella
ZYXs. In the human LPP protein, this motif overlaps with a functional leucine-rich nuclear export signal. We used the NetNES algorithm to predict putative nuclear export signals in the non-bilaterian ZYXs and found one overlapping with this same motif in the Nematostella
ZYX protein 
. In addition, we also found putative nuclear export signals in the Mnemiopsis
ZYXs despite the lack of the motif in these proteins, suggesting that nuclear shuttling is an ancestral trait of this class.
Fifty-nine proteins did not meet the criteria required to be included in one of the LIM classes. Depending on the complexity of domain architecture in a class, our criteria included a reasonable subset of these requirements: (1) conservation of LIM quantity, (2) phylogenetic affinity of LIM domains with the LIM domains of human proteins within the class, (3) presence of non-LIM domains and/or motifs that are characteristic of the group, and (4) correct order of LIM and non-LIM domains and/or motifs.
Most of these 59 proteins include domain architectures not seen in any of the described classes. Many of these proteins could not be categorized since they represent lineage-specific innovations that no longer fit the criteria for membership to an existing class. Others may be the result of erroneous gene predictions in the genomic region of a classifiable LIM gene. However, we were able to identify a group of possibly related proteins from Drosophila, Trichoplax, and Amphimedon (Dm_Rassf, Aq_215865, Ta_55975) with the conserved architecture of an amino-terminal LIM domain and a carboxy-terminal RasGTP association domain (Pfam PF00788). Further phylogenetic analysis is needed to assess whether this group represents a novel class of metazoan LIM proteins.
It is worth noting that 37 of the 59 unclassified LIM proteins are from the three non-metazoan species. This is not surprising, since the non-metazoan species have had a longer stretch of independent evolution and have experienced much different selective pressures than metazoans, especially in terms of their cell surface environments.
We also note here that this study did not characterize two of the 73 described human LIM genes, SCEL and LIMS3L. These genes have been included in the “Unclassified” section of Table S1