Overview of homeobox genes in Mnemiopsis
We extracted 76 homeoboxes from the genome of
Mnemiopsis leidyi. The corresponding homeodomains were aligned to the human and
Drosophila dataset used in Holland
et al. 2007 [
23] and supplemented with eight amphioxus homeodomains known to be missing from humans. The sequence alignment is available as supplemental material (Additional File
1). We generated nine trees from this alignment using multiple methods (neighbor-joining, maximum likelihood (ML) and Bayesian inference), multiple starting trees and multiple implementations. For example, in the case of ML, we used RaxML [
37] and PhyML [
38]). In this case, we generated a likelihood value for each tree and then chose the one with the highest likelihood (Figure ). We subsequently used this tree and secondary domain information, along with the classification scheme in the Homeo Database (HomeoDB) [
24], to divide the 76
Mnemiopsis homeodomains into the following classes: ANTP (22 homeodomains); PRD (7); TALE (3); POU (4); LIM (4); and SINE (18). Eighteen homeodomains remained unclassified (Table ).
Most of these class-level assignments are confirmed by the presence of secondary domains, sequence signatures, and/or class-specific introns (Table ). To all of these classes (with the exception of the 18 homeodomains that remained unclassified), we added corresponding homeodomain sequence data from the demosponge
Amphimedon queenslandica [
30], the placozoan
Trichoplax adhaerens [
27], the cnidarian
Nematostella vectensis [
26]and the choanoflagellate
Monosiga brevis [
39]: we then performed class-specific phylogenetic analyses. We named
Mnemiopsis homeodomains that showed a strong affiliation for a particular family accordingly; otherwise, the name of the class is used in conjunction with a preliminary number that was originally assigned to the homeodomain.
ANTP class NKL subclass
Eighteen of the 22 ANTP homeodomains group are within the NKL subclass. There is only weak support for assigning any of the Mnemiopsis NKL homeodomains with particular families but, in some cases, there is consistency between our initial superfamily tree (Figure ) and our ANTP-specific tree that included the additional Amphimedon, Nematostella, and Trichoplax sequences (Figure ).
The following groupings are consistent in both trees and have support values over 50 in our best Bayesian tree: (1) MlANTP65 with the Tlx family and (2) MlANTP22 with the Human Ventx gene. MlANPT25, MlANTP35, MlANTP67 and MlANTP78 group with the Dlx family consistently in both trees but, in the full tree, the Dlx clade also includes MlANTP66. Similarly, MlANTP19, MlANTP68, MlANTP71 and MlANTP72 form clades positioned sister to the Barx, Bsx, Dbx and Hlx familes. However, the relationships between Mnemiopsis homeodomains is inconsistent between these trees. The other NKL homeodomains identified are MlANTP21, MlANTP23, MlANTP25, MlANTP35, MlANTP37, MlANTP47, MlANTP48, MlANTP51, MlANTP63, MlANTP66, MlANTP67 and MlANTP78.
Consistent with our analysis, a previous study classified MlANTP65 as a Tlx-like homeodomain [
36]. The same study also associated MlANTP66 with the Dlx family, MlANTP67 with the Barh family and MlANTP68 with the Bsx family, an observation that was not consistently reproduced in our trees.
Evidence in the form of diagnostic residues can provide additional support to classifications [
40]. The following homeodomains all contain the diagnostic residues associated with the NKL subclass ([AKST][DENPS][LAST][Q][V] at positions 41-45): MlANTP19, MlANTP37, MlANTP48, MlANTP68, MlANTP71 and MlANTP72. The only other
Mnemiopsis homeodomain with the NKL signature is the SINE class homeodomain MlSIX36. (The position of MlSIX36 on the tree in Figure and its upstream SIX domain led to its SINE class designation.) In addition to the NKL signature, the MlANTP37 homeodomain also contains the HOXL signature ([KT][IV]WFQNRR[AMV]K[DEHKLMQWY][KR][KR] at positions 46-58) and the MlANTP48 homeodomains contains the HOXL2 signature (LE[AGKNR]E at positions 16-19) (Table ).
ANTP class HOXL-related
Four paralogous
Mnemiopsis ANTP homeodomains (MlANTP03a, MlANTP03b, MlANTP03c and MlANTP03d) group with the engrailed family in our superfamily tree (Figure ) and with the Evx family in the ANTP tree (Figure ). Despite the engrailed family being assigned to the NKL subclass in HomeoDB [
24], engrailed has been historically allied with the extended Hox subclass based on synteny [
41,
42] and phylogeny [
43]. Evx is also considered a member of the extended Hox subclass. While it is difficult to pin down the exact relationship of the MlANTP03 homeodomains, it does appear that they are the most likely descendants of the homeodomain that gave rise to the HOXL genes in the lineage leading to Placozoa, Cnidaria and Bilateria. Consistent with this classification, the MlANTP03a and MlANTP03b genes both contain the HOXL2 diagnostic residue signature. There are no clear ParaHox or Hox genes in
Mnemiopsis.
PRD class
We identified seven PRD class homeodomains in the
Mnemiopsis genome. The PRD class is divided into three subclasses based on the amino acid residue at position 50: Q50, K50 and S50 [
44]. As with most homeodomain studies, these subclasses are not monophyletic in our trees (Figure ). However, given the extremely low support values at the subfamily level, this may not reflect their true relationship. All three subclasses are clearly present in the genomes of bilaterians,
Nematostella and
Trichoplax. Eight of the nine PRD class homeoboxes in
Amphimedon possess the Q50 residue. The remaining PRD homeodomain is the
Amphimedon PaxB homeodomain, which is has a degenerate homeodomain [
45] and, as such, was not included in our phylogenetic analysis.
Of the seven Mnemiopsis PRD class homeodomains, six have a Q at position 50. The exception (MlPRD43) is missing sequence information at that position. (Note: just prior to the submission of this manuscript, a new assembly has revealed the likely 3' end of this homeodomain that includes a Q at position 50). We did not find any Mnemiopsis genes with an S at position 50. The only other Mnemiopsis genes with a K at position 50 are the 18 SINE class genes that, like the K50 PRD class genes, also characteristically have a K residue at position 50.
Consistent with the absence of lysine or serine residues at position 50 in
Mnemiopsis and
Amphimedon PRD homeodomains, we see no grouping of
Mnemiopsis or
Amphimedon homeodomains with S50 or K50 clades, with the following exceptions: (1) MlPRD16 groups with the
Nematostella S50 homeodomain NvPRD074, albeit with virtually no support (ML bootstrap = 2, Bayesian posterior probability distribution = 2), within a larger clade of Q50 homedomains; and (2) the
Amphimedon homeodomain AqQ50a groups with the highly divergent HsDUXBI (ML bootstrap = 27, Bayesian posterior probability distribution = 95), also within a larger clade of Q50 homedomains (Figure ). The overwhelming evidence suggests that
Mnemiopsis and
Amphimedon are devoid of S50 and K50 PRD class homeodomains. Conversely,
Nematostella and
Trichoplax both have clear K50 and S50 homeodomains. The phylogenetic distribution of Q50, S50 and K50 PRD homeodomains in our study is consistent with the hypothesis that Q50 homeodomains were the founders of the PRD class [
44].
Five of the seven
Mnemiopsis PRD class homeodomains contain the diagnostic residues (L[EINQRV][^DGHMPTVWY][^CDGKMNPQR][FL][^CFILPTWY][AEFHKQRV][ADEGKNSTW][CHKMPQR][FHY]P at positions 16-26) associated with paired homeodomains in bilaterians: MlPRD10a, MlPRD10b, MlPRD16, MlPRD44, MlPRD61 (Table ). No other
Mnemiopsis homeodomains display this pattern. MlPRD10b, MlPRD16 and MlPRD44 had been identified as Paired class genes in a previous study and were named Prd3, Prd1 and Prd2 respectively [
36]. MlPRD44 also contains the HOXL2 diagnostic residues (Table ).
MlPRD16 and MlPRD61 have clear octapeptide sequences upstream of the homeodomain (SSISSLLS and HSIDDILG, respectively), a hallmark characteristic of a subset of the PRD class homeodomains. MlPRD43 and MlPRD50 have less-conserved but possible octapeptides as well (QRILGILS and YNIEGLLG, respectively). There are no paired domains associated with any
Mnemiopsis homeodomains, but there are two independent paired domain sequences that appear to be direct orthologs of the two identified in the ctenophore
Coeloplana willeyi [
33].
Like most PRD class homedomains [
46], all but one of the
Mnemiopsis PRD homeodomains have an intron that occurs in the vicinity of the 46th and 47th codons. The one exception, MlPRD10a, has a single intron that interrupts the 37th codon. This might be the result of a retrotransposition event involving a transcript from its paralog (MlPRD10b) followed by an intron gain event. There are additional introns in the N-termini of the homeodomains of MlPRD10b, MlPRD50, and MlPRD61.
POU class
MlPOU1, MlPOU26a, MlPOU26b, and MlPOU26c make up the four
Mnemiopsis POU class homeodomains. MlPOU1 has relatively strong support values, placing it in the POU1 family (ML bootstrap = 65; Bayesian posterior probability distribution = 98; Figure ). In addition, it has a POU-specific domain upstream of the homeodomain, a defining factor of the POU class [
47]. There is weak support uniting MlPOU26a, MlPOU26b and MlPOU26c with the human HDX (highly divergent homeobox) homeodomain of POU class genes (ML bootstrap = 19; Bayesian posterior probability distribution = 45). Only one of the three MlPOU26 homeodomains (MlPOU26a) contains an upstream POU-specific domain.
TALE class
MlPbx, MlMeis and MlPknox, like other TALE class homeodomains, have a three amino acid insertion in the loop between the first and second alpha-helices (Table ). MlPbx, MlMeis and MlPknox consistently group with the Pbx, Meis and Pknox families, respectively, in both trees with moderate support (Figures and ). In all three cases, the phylogenetic assignment of these homeodomains is reinforced by the identification of several conserved motifs outside of the homeodomain, as well as by conserved intron positions (Table ).
Like other Pbx genes (and unlike other TALE genes), MlPbx has a glycine residue at position 50 of the homeodomain. In addition, a Basic Local Alignment Search Tool (BLAST) search to the contig containing MlPBX shows significant similarity to the PBC domain [
48] located ~1.5 KB upstream of the homeodomain, as assessed by BLAST [percent identity (ID)= 25/67, expectation (E)-value = 2 × 10
-7). Like the cnidarian and human PBX genes, MlPbx has an intron that interrupts the second codon and one that falls between the 47th and 48th codon of the 63-codon TALE homeobox.
Meis homeodomain proteins have several conserved motifs in addition to the homeodomain [
49]. A GENSCAN prediction containing the MlMeis homeodomain shows similarity to the upstream MEIS A domain (ID = 19/69, E-value = 0.005), as well as weaker similarity to the MEIS D domain downstream of the homeodomain (ID = 16/48, E-value = 0.014). Similar to bilaterians and cnidarians, MlMeis has two introns. One falls between the 25th and 26th codons, while another interrupts the 51st codon.
The GENSCAN-predicted peptide that contains the MlPknox homeodomain also includes the abbreviated MEIS A domain that is characteristic of the Pknox family, as well as the MEIS B motif (ID = 33/139, E-value = 8 × 10-5). MlPknox, like the human PKNOX1 and PKNOX2 genes, has an intron that separates the 25th and 26th codons and one that interrupts the 51st codon of the homeobox.
We were unable to identify an Irx homeodomain in Mnemiopsis, despite there being Irx family members from Amphimedon, Trichoplax and Nematostella. Also absent was the Tgif homeodomain, found only in cnidarians and bilaterians.
LIM class
MlIsl, MlLhx1.5, MlLhx3.4 and MlLmx make up the four LIM class homeodomains of Mnemiopsis (Figure ). We assigned these four Mnemiopsis homeodomains to the Isl, Lhx1/5, Lhx3/4 and Lmx families, respectively, based on the consistency between tree runs (Figures and ) and moderate support in the full homeodomain tree (Figure ). BLAST searches of the genomic scaffolds containing Mnemiopsis LIM homeodomains reveal LIM-type zinc finger domains immediately upstream of these four homeodomains. Additional BLAST searches also reveal traces of LIM domains independent of homeodomains in the Mnemiopsis genome (data not shown), suggesting the existence of LIM domain transcription regulator genes.
SINE class
Eighteen SINE class homeodomains representing seven distinct SINE lineages were recovered from the
Mnemiopsis genome (Figure , Table ). Of these, all but one have the characteristic lysine at position 50 (as described in [
21]). The exception is MlSIX41, for which we are missing the sequence information from the C-terminus of the homeodomain (including position 50). Additionally, 17 of the 18 SINE class homeodomains have a SIX domain upstream of the homeodomain. The exception, MlSIX32a, is situated on the N-terminal end of a small scaffold in our current assembly, so its absence may be due to the resolution of our assembly.
The SINE class is monophyletic in our superfamily tree except for a clade of five Mnemiopsis homeodomains (MlSIX59a, MlSIX59b, MlSIX59c, MlSIX59 d and MlSIX59e), which group with Zhx/Homez (Figure ). This exception is perhaps not completely unexpected given that, like the Zhx/Homez genes, MlSIX59 homeodomains are quite divergent; they are five of only six homeodomains in our entire Mnemiopsis set that do not include a tryptophan at position 48, which is characteristic of the typical homeodomain. The other homeodomain, MlSIX45, is also a member of the SINE class.
The
Mnemiopsis SINE class homeodomains do not clearly separate into the three families recognized in bilaterians. Only two of the 18 maintain the four family-defining diagnostic residues (positions 3-6) in the homeodomain [
50]. MlSIX41 and MlSIX27 have the SIX1/2 family 'ETSY' pattern in positions 3-6 of the homeodomain. However, neither MlSIX41 nor MlSIX27 group convincingly with the Six1/2 group. The
Mnmemiopsis SIX class is the result of extensive ctenophore-specific diversification. A more in-depth phylogenetic analysis that includes SIX domains may provide additional insight into these relationships.
Unclassified Mnemiopsis homeodomains
Two clades consisting of 18 Mnemiopsis homeodomains appear as separate offshoots in our superfamily tree (Figure , Table ). None of these 18 homeodomains have introns, or any of the known class signatures, that would hint that they might belong to an existing class. MlHD60 and MlHD79 have insertions that interrupt the homeodomain but these insertions are unlike the known insertions seen in the TALE, HNF and PROS classes. The MlHD60 insertion consists of two amino acids that occur in the third alpha-helix. The other insertion occurs in the loop region between the first and second alpha-helices but, unlike the TALE insertions, it consists only of a single amino acid. The average branch length of the homeodomains in these clades is 5% shorter than for the other Mnemiopsis homeodomains, confirming that these unclassified Mnemiopsis homeodomains do not simply comprise a clade of unusually long branches.
Missing classes
There are no Mnemiopsis homeodomains that grouped with HNF, CUT, PROS, or CERS classes in our analyses. Consistent with this result, no Mnemiopsis homeodomains exhibit insertions between the second and third helices, like those seen in the bilaterian HNF and PROS class homeoboxes. Besides the five apparent SINE class homeodomains, no other Mnemiopsis homeodomains group with zinc finger (ZF) homeodomains.
Homeobox linkage
There are four pairs of linked homeoboxes in our current Mnemiopsis genome assembly (Figure ). The tightest linkage is between two ANTP class homeoboxes (MlANTP19 and MlANTP47), which are 4.7 KB apart. A different ANTP class homeobox (MlANTP68) is situated 5.0 KB downstream from the SINE class homeobox MLSIX36. The HOXL-related ANTP class homeobox MlANTP03a is separated by 26.0 KB from the PRD class homeobox MLPRD16. The ANTP class MlANTP21 and the SINE class homeobox MLSIX59 are on the same contig, 148.9 KB apart. None of the linked homeoboxes are obvious paralogs, suggesting that these pairs are not the result of recent duplication events.
Evolutionary dynamics of the Mnemiopsis homeodomain superfamily
In order to better-understand the nature of the homeodomain superfamily in
Mnemiopsis, we compared average branch lengths and the number of species-specific homeodomain clades in the
Mnemiopsis,
Amphimedon,
Trichoplax,
Nematostella,
Drosophila,
Caenorhabditis elegan, and human genomes (Table ). We performed ML analyses with homeodomain sequences from this set of species. Degenerate homeodomains (for example,
Amphimedon PaxB) and homeodomains from pseudogenes were not included. The resulting tree and alignments are included as supplemental material (Additional file
2).
| Table 2Paralog count and estimated branch lengths of seven species. |
For each species, we recorded the number of species-specific clades that included more than one homeodomain, as well as the total number of homeodomains in those species-specific clades (Table ). These numbers give us an approximation of the number of lineage-specific homeodomain duplications that have been preserved in a specific lineage since it split from its closest relative in the analysis [
51]. A species that has recently undergone extensive genome reduction would be expected to harbour less species-specific clades than a genome that has experienced a recent genomic expansion. Our data shows that very few paralogous homeodomains exist in the
Amphimedon (7) and
Trichoplax (0) genomes, whereas the human genome has a remarkably high level of paralogs (197).
Mnemiopsis (45) and
Nematostella (66) are both very close to the mean (Table ).
Branch lengths provide a means of measuring the level of divergence for a particular homeodomain. Longer branches correspond to higher levels of divergence. We rooted the same neighbor-joining tree described above at its midpoint and determined the average branch lengths for each species' set of homeodomains (Table ). In our tree, the
Mnemiopsis branches tend to be longer than for all the other species except for
C. elegans, which is known to have very long branches [
52]. The
Mnemiopsis average branch length is slightly closer to the mean than it is to the
C. elegans average, suggesting that the
Mnemiopsis homeodomains are moderately divergent. The trees used in this analysis are included as supplemental material (Additional Files
2 and
3) and branch lengths can be visualized directly using a tree-viewing program such as Figtree [
53].