|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: TD JB PP. Performed the experiments: TD. Analyzed the data: TD JB PP. Contributed reagents/materials/analysis tools: JB PP. Wrote the paper: TD JB. Performed in silico analyses and prepared the figures: TD.
The Nme gene family is involved in multiple physiological and pathological processes such as cellular differentiation, development, metastatic dissemination, and cilia functions. Despite the known importance of Nme genes and their use as clinical markers of tumor aggressiveness, the associated cellular mechanisms remain poorly understood. Over the last 20 years, several non-vertebrate model species have been used to investigate Nme functions. However, the evolutionary history of the family remains poorly understood outside the vertebrate lineage. The aim of the study was thus to elucidate the evolutionary history of the Nme gene family in Metazoans.
Using a total of 21 eukaryote species including 14 metazoans, the evolutionary history of Nme genes was reconstructed in the metazoan lineage. We demonstrated that the complexity of the Nme gene family, initially thought to be restricted to chordates, was also shared by the metazoan ancestor. We also provide evidence suggesting that the complexity of the family is mainly a eukaryotic innovation, with the exception of Nme8 that is likely to be a choanoflagellate/metazoan innovation. Highly conserved gene structure, genomic linkage, and protein domains were identified among metazoans, some features being also conserved in eukaryotes. When considering the entire Nme family, the starlet sea anemone is the studied metazoan species exhibiting the most conserved gene and protein sequence features with humans. In addition, we were able to show that most of the proteins known to interact with human NME proteins were also found in starlet sea anemone.
Together, our observations further support the association of Nme genes with key cellular functions that have been conserved throughout metazoan evolution. Future investigations of evolutionarily conserved Nme gene functions using the starlet sea anemone could shed new light on a wide variety of key developmental and cellular processes.
The Nme family, initially called NDPK or Nm23, was named after the identification of a novel gene associated with low metastatic potential . In humans, NME genes are involved in a wide variety of physiological or pathological cellular processes including development, metastatic potential, ciliary functions, and cell differentiation and proliferation at various tissular and subcellular localization (see  for recent review). Despite their critical role in key developmental and pathological processes, the molecular functions of Nme genes remain poorly documented . In vertebrates, Nme genes can be separated in 2 groups – group I and group II – based on their evolutionary history and protein domains . Nme genes of the group I (Nme1-4) originate from a unique gene of the chordate ancestor while Nme genes of the group II (Nme 5-8) are present throughout chordate evolution . Nme-related genes have been sporadically reported in Archaea , Eubacteria , , and in several eukaryotic lineages including fungi , plants , and bilaterians , . However, the evolutionary history of the family that has led to a repertoire of 5 Nme genes in the chordate ancestor remains poorly understood. Indeed, the complexity of the Nme gene repertoire outside the chordate lineage was previously uncharacterized and existing literature suggested that the complexity of the Nme gene family was much more limited in non-chordate species. In Dictyostelium discoideum, two Nme-related proteins, named NdkC-2 and NdkM, and expressed in the cytosol and in the mitochondria, respectively, were used for biochemical and structural studies –. In Drosophila melanogaster, only one Nme-related gene, named awd, had been reported and intensively studied for its role in aberrant development . In Caenorhabditis elegans, one Nme-related gene had been shown to be associated with severe developmental defects . As recently stressed, the use of model species to decipher Nme gene functions is extremely beneficial and needs to be further supported . However, a better understanding of the evolutionary links between the Nme genes that are found in non-vertebrate model species and their mammalian counterparts is required to allow this comparative biology approach. In order to gain insight into putatively conserved key functions of Nme genes that would have been retained throughout evolution, the aim of the present study was thus to characterize gene family complexity and protein features among metazoans. We were able to show that the complexity of the Nme family predates the metazoan radiation. We also provided evidence supporting the association of Nme genes with key cellular functions that have been conserved throughout metazoan evolution.
Using available sequenced genomes of 21 eukaryote species ranging from amoebozoans to humans, we were able to reconstruct, in opisthokonts – the metazoan/choanoflagellate/fungi phylum – the evolutionary history of the Nme genes that had previously been identified in the chordate ancestor . We were able to show that the metazoan and chordate ancestors share a similar Nme gene repertoire. A similar repertoire was also found in Monosiga brevicollis, a choanoflagellate, but not in fungi (Figures 1–55).
All non-vertebrate metazoans species displayed 2 Nme genes of the group I with the exception of Trichoplax adhaerens, Drosophila melanogaster, Aedes aegypti, Tribolium castaneum, and Lottia gigantea, in which a single gene could be identified (Figure 1A–B). In all studied non-metazoan eukaryotes, two Nme genes of the group I could also be identified with the exception of Saccharomyces cerevisiae and Tetrahymena thermophila. In starlet sea anemone Nematostella vectensis, Caenorhabditis elegans and Branchiostoma floridae, the two genes are located tandemly and thus originate from a cis-duplication of an ancestral gene (Table 1). In addition, a surprisingly well conserved gene synteny of group I Nme sequences was found between sea anemone and humans (Figure 1C). The location of the Sox8/9 ancestor gene in the vicinity of Nme genes in the sea anemone and the location of SOX8 and SOX9 on human chromosomes 17 and 16, respectively, are consistent with the first round of whole genome duplication (1R) that gave rise to Nme2 and Nme3/4 in vertebrates . In Ciona intestinalis, we have obtained evidence suggesting a duplication of a large portion of genomic DNA resulting in duplicated genes on two different chromosomes (Table 1, Figure S1). Two duplicated Nme genes of the group I originating from a common group I ancestral gene were found on chromosomes 2q and 8q (Figure S1). In all studied non-metazoan eukaryotes, Capitela teleta, and Strongylocentrotus purpuratus, two Nme genes of the group I can be found on different scaffolds. Because of the limited size of these scaffolds and the level of assembly of corresponding draft genomes, it is not currently possible to speculate on the nature of the duplication event that has led to 2 genes. In Dictyostelium discoideum, 2 Nme genes of the group I could be identified on different chromosomes. The topology of the phylogenetic tree displaying – for Chlamydomonas reinhardtii, Nematostella vectensis, Capitela teleta, Strongylocentrotus purpuratus, Branchiostoma floridae, and Ciona intestinalis – the group I Nme sequences more closely related within a species than between species strongly suggests that corresponding duplication events are independent and lineage-specific. It is noteworthy that for the above species, duplicated sequences remained closely related, whereas in other species, Naegleria gruberi, Ustilago maydis, M. brevicollis, and C. elegans one sequence is highly divergent in comparison to the other one. Altogether, this strongly suggests independent and lineage specific gene duplications. It was previously shown that a single Nme gene of the group I was present in the vertebrate ancestor . This ancestor gene subsequently duplicated differently in the different vertebrates lineages and resulted in 2 to 5 genes, depending on the species . Together, our data indicate that a single Nme gene of the group I was present in the opisthokont ancestor. Our observations also suggest that a single Nme gene of the group I was present in the eukaryote ancestor. Understanding the selective factors that have led to multiple independent duplications events in the Nme family and the role of these proteins in non-vertebrate species would provide major insights into the evolution and functions of the family.
All studied metazoan species displayed a full set of group II Nme genes with the exception of the 4 ecdysozoan species and T. adhaerens (Figures 2–5).5). The 2 other group II Nme genes, Nme9 and Nme10, that have been shown to be eutherian and vertebrate innovations, respectively , will not be discussed here. No Nme7 homolog could be identified in the C. elegans genome thus indicating a possible gene loss after the nematode radiation. In M. brevicollis the Nme7 protein is structurally highly divergent as shown but the topology of the phylogenetic tree (Figure 4A) and displays a unique and incomplete domain (Figure 4B). In insects, Nme7 proteins also displayed specific domain structure (Figure 4B) resulting in a divergent position of the corresponding group in the phylogenetic tree (Figure 4A). As for all phylogenetic analyses reported here, the tree topology remained unchanged whether we used the full length protein sequence or only domains for the phylogenetic reconstruction. Similarly, very divergent Nme8-related sequences were identified in insects and T. adhaerens (Figure 5). In these 4 species, the Nme8-related proteins have lost the 3 NDPK_TX domains that are found in all other metazoan species, including the starlet sea anemone (Figure 5). In M. brevicollis, the Nme8 protein does not display typical Nme8 NDPK_TX domains but 2 different domains of the NDPk superfamily. It is also noteworthy that the exon/intron structure corresponding to the Thioredoxin TRX_NDPK domain is very well conserved among all studied choanoflagellate and metazoan species. In contrast, the exon/intron structure corresponding to the NDPK domains is highly divergent in insects, placozoans and choanoflagellates (Figure S2). Together, our data demonstrate that Nme5, Nme6, Nme7, and Nme8 genes were already present in the genome of the common ancestor of choanoflagellates and metazoans. In non-choanoflagellate/metazoan species, all Nme proteins of the group II were found in low branching eukaryotic lineages such as heteroloboseans, green plants, amoebozoans, alveolates, and fungi (Figures 2–4,4, Table 2) with the exception of Nme8. In contrast, we failed to identify Nme genes of the group II outside the eukaryotic lineage. Together, our results strongly suggest that Nme5, Nme6, and Nme7 emerged around Eukaryote radiation. Interestingly, none of the studied non-choanoflagellate/metazoan species display all 3 proteins suggesting lineage specific loss of Nme 5, Nme6, and Nme7 genes. The Nme8 protein typically displays 1 Thioredoxin (TRX) domain followed by 2 or 3 complete NDPk domains. In non-choanoflagellates/metazoan species investigated, very few sequences displaying one thioredoxin domain could be identified but were never associated with an NDPk domain. We thus hypothesize that Thioredoxin domains already existed in the opisthokont ancestor and that Nme8 emerged in the choanoflagellate/metazoan ancestor by domain shuffling of 2 or 3 NDPk domains. This would, however, require further analysis.
Here, we have shown that the Nme gene repertoire of the metazoan ancestor was similar to that of the vertebrate ancestor (Figure 6). In agreement with prior studies reporting major gene losses and genomic rearrangement experienced by ecdysozoans – we show that Nme genes have highly diverged in this phylogenetic group, resulting in the loss of either functional NDPK domains or entire genes. Furthermore, we demonstrate that the complexity of the family predates metazoan radiation. We also provide evidence suggesting that the complexity of the family is mainly a eukaryotic innovation, with the exception of Nme8 that is likely to be a choanoflagellate/metazoan innovation. This unexpectedly ancient complexity of the eukaryotic Nme gene family is in striking contrast with the existing hypothesis associating the emergence of Nme family complexity with the differentiation of Bilateralian  lineage. It should be stressed that this burst of complexity in the Nme family is much more ancient than what has been reported for many genes in which the expansion of the family is thought to have occurred around the metazoan radiation , , , . This would be in favor of the participation of Nme genes in ancestral functions and would also be consistent with their known involvement in key biological processes such as cell proliferation and development.
The topology of the phylogenetic tree (Figure 1A) obtained using group I Nme proteins does not match the topology of the tree of life. Indeed, N. vectensis sequences appear, along with B. floridae and C. teleta sequences, as a sister group of the vertebrate and insect sequences, in contrast to T. adhaerens, S. purpuratus, and C. intestinalis sequences that appear much more divergent. The topology of the tree is, however, consistent with the genomic structure of group I Nme genes reported on Figure 1B. A highly conserved exon/intron structure is observed between T. adhaerens, N. vectensis, L. gigantea, and vertebrates, while ecdysozoans, C. teleta, C. intestinalis, and S. purpuratus exhibit a totally different genomic organization that reflects the high gene divergence observed in these species. Interestingly, the D. discoideum mitochondrial NdkM shows gene and protein features that are similar (Figure 1B) to its cytosolic paralog NdkC-2, but displays a longer N-terminus sequence containing a mitochondrial assignation signal. In vertebrates, the Nme4 protein also displays a mitochondrial assignation signal in N-terminus sequence and was shown to be a gnathostome innovation . In contrast, no mitochondrial assignation signal is found in any other Nme protein. This feature is thus likely to be a functional convergence between amoebozoan NdkM and vertebrate Nme4.
As indicated above, orthologs of Human Nme5, Nme6, Nme7, and Nme8 are found in metazoans (Figure S3). The topology of the phylogenetic tree (Figure 2A) obtained using Nme5 proteins showed that ecdysozoan sequences are highly divergent. Ecdysozoan Nme5 proteins appear to be even more divergent than the early diverging eukaryotes N. gruberi and M. pusilla. Similarly to group I, gene exon/intron structure and protein domains of N. vectensis, B. floridae, S. purpuratus, C. teleta, and L. gigantea Nme 5 proteins (Figures 2B and 2C) were highly similar to their human counterpart, while ecdysozoan sequences exhibited different protein length and/or domains. In agreement with these observations was the remarkably conserved exon/intron structure observed between humans and sea anemone Nme5 genomic sequences (Figure 2C). Similar observations on tree topology, protein domains, and genomic structure were made for Nme6 (Figure 3). For Nme7, no ortholog could be identified in C. elegans, while very divergent Nme7 sequences were identified in non-metazoan eukaryote species, M. brevicollis, and insects as shown by the topology of the phylogenetic tree (Figure 4A). In contrast to all other studied species, no NDPk7A domain was found (Figure 4B) in M. brevicollis and insects sequences whereas they could be identified in several non-opisthokont eukaryote species, thus suggesting a high divergence of ecdysozoan genes within the metazoan lineage. In addition, an incomplete NDPk7B domain was found in A. aegypti (Figure 4B). It should be stressed that, in contrast to insects, the Nme7 sequences of T. adhaerens, N. vectensis, and B. floridae were remarkably similar to their human counterpart in terms of genomic exon/intron structure, protein size, and functional domains (Figures 4B and S4). No synteny analysis could be performed between starlet sea anemone and humans for Nme5, Nme6, and Nme7 genes due to the limited number of genes present on corresponding sea anemone scaffolds (Table 2). Similarly to Nme7, no Nme8 ortholog was identified in C. elegans. In insects and T. adhaerens, the Nme8-related sequences that could be identified were extremely divergent (Figure 5A, Figure S2) and lacked the 3 NDPK_TX domains found in all eumetazoan Nme8 proteins, including the starlet sea anemone (Figure 5B). A conserved synteny was also identified between human and sea anemone (Figure 5C). Together our data suggest that, in contrast to Nme5 and Nme6 that have been identified in all investigated metazoan species, Nme7 and Nme8 have been lost in C. elegans. In insects, very divergent Nme7 and Nme8 genes remain. For Nme8, the high divergence associated with the loss of the 3 NDPK_TX domains in insects and T. adhaerens suggests a fast evolution of the gene possibly associated with a loss of ancestral metazoan Nme8 function.
We have shown that, in addition to the complexity of the Nme family, several highly conserved gene structures and protein domains are also conserved throughout metazoan evolution, some features being also conserved throughout the eukaryotic lineage. When considering the entire Nme family, the starlet sea anemone is the metazoan species exhibiting the most conserved gene and protein sequence features with humans.
Several non vertebrate model species, mainly C. elegans and D. melanogaster, have been used to investigate Nme functions. This fruitful approach has shed light on evolutionary conserved mechanisms involved in Alzheimer's disease , ciliary function , or epithelial integrity . Nevertheless, the need for studies of Nme functions in non-vertebrate model species was recently stressed by the scientific community . Here, we show that the starlet sea anemone exhibits a full set of metazoan Nme genes and shares remarkably conserved gene and protein sequence features with humans that were lost in flies and C. elegans. The starlet sea anemone is an emerging model  offering several biological features such as separate sexes, inducible spawning, flagellated sperm, and external fertilization. This model species thus offers significant opportunities to investigate Nme gene functions and thus shed new light on Nme functions that remain poorly understood .
Nme proteins of the group I are involved in a wide variety of cellular and physiological processes including tumor metastatic potential. Using a reciprocal BLASTP strategy, the starlet sea anemone genome was searched for homologs of human proteins known to interact with group I NME proteins. Among the 48 human proteins known to interact with NME1, NME2, NME3, or NME4, 44 had a homolog in sea anemone (Table S1). For instance, homologs of proteins involved in cancer and cell cycle control, such as MIF  and Rac1 , were clearly identified in sea anemone (Table S1). In addition, NME1 and NME2 have been demonstrated to regulate the expression of specific genes such as c-MYC ,  and p53 . Interestingly, c-MYC and p53 homologs could also be identified in the starlet sea anemone genome thus suggesting that at least some down-stream targets of Nme proteins are also present in this species (Table S1). Interestingly, the importance of p53 in sea anemone development was recently stressed and found to be similar to its known function in vertebrate development .
As previously documented, most Nme proteins of the group II, with the exception of Nme6, have been associated with ciliary functions. They have been shown to play critical roles in spermatogenesis –, sperm motility , development , and human conditions associated with primary ciliary dyskinesia . The existence of orthologs in non-metazoan species was however unsuspected with the exception of the report of Nme7 in C. reinhardtii and T. thermophila . In the present study, we have demonstrated that Nme5, Nme6, Nme7, and Nme8 genes were present in the metazoan ancestor, while Nme5, Nme6, and Nme7 were most likely present in the eukaryote ancestor. To date, only 7 proteins are known to interact with NME proteins of the group II. We were however able to identify homologs for 6 of these interacting partners in starlet sea anemone (Table S1).
As indicated above, functional evidence exist demonstrating the importance of Nme gene for key biological processes, including development, cell proliferation, ciliary function, and cancer. We have shown here that the complexity of the Nme family predates metazoan radiation and that all Nme proteins display functional domains that have been conserved throughout evolution. Using the starlet sea anemone we were able to show that most proteins known to interact with human NME were also found in the eumetazoan ancestor. Together, these observations suggest a participation of Nme genes in key cellular functions that have been conserved throughout evolution. In this context, the starlet sea anemone that exhibits a full set of highly conserved Metazoan group II Nme genes and appropriate biological features - such as separate sex, flagellated sperm, and asymmetrical expression patterns during development -offers major opportunities to investigate Nme functions.
In summary, we demonstrated that the complexity of the Nme gene family initially thought to be restricted to chordates was also shared by the Metazoan ancestor. We also provide evidence suggesting that the complexity of the family is mainly a eukaryotic innovation, with the exception of Nme8 that is likely to be a choanoflagellate/metazoan innovation. Remarkably conserved gene structures, genomic linkage, and protein domains were identified among metazoans, some features being also conserved in eukaryotes. When considering the entire Nme family, the starlet sea anemone is the studied metazoan species exhibiting the most conserved gene and protein sequence features with humans. In addition, we were able to show that most of the proteins known to interact with human NME proteins were also found in the starlet sea anemone. Together, our observations further support the association of Nme genes with key cellular functions that have been conserved throughout metazoan evolution.
All Nme sequences were identified using the following genome assemblies: human (Homo sapiens, Assembly GRCh37), xenopus (Xenopus tropicalis, Assembly V.4.1), zebrafish (Danio rerio, Assembly ZV8), tunicate (Ciona intestinalis, Assembly V.2.0), florida lancelet (Branchiostoma floridae, Assembly V.2.0), purple sea urchin (Strongylocentrotus purpuratus, Assembly NCBI V.2.1), fruit fly (Drosophila melanogaster, Assembly BDGP5), yellow fever mosquito (Aedes aegypti, Assembly AaegL1), red flour beetle (Tribolium castaneum, Assembly Tcas 3.0), nematode (Caenorhabditis elegans, Assembly WS214), polychaete worm (Capitella teleta, Assembly V1.0), owl limpet (Lottia gigantea, Assembly V.1.0), starlet sea anemone (Nematostella vectensis, Assembly V.1.0), placozoan (Trichoplax adhaerens, Assembly Grell-BS-1999 V.1.0), marine choanoflagellate (Monosiga brevicollis, Assembly V1.0), fungi (Saccharomyces cerevisiae, Assembly EF 2; and Ustilago maydis, Assembly 1), amoebozoa (Dictyostelium discoideum, Assembly V.2.1), alveolate (Tetrahymena thermophila, Assembly 1.1), green plants (Micromonas pusilla, Assembly V.2.0; and Chlamydomonas reinhardtii, Assembly V.4.0) and heterolobosean (Naegleria gruberi, Assembly V.1.0). A large number of sequences were obtained from the NCBI NR database using human or zebrafish protein sequences as a query. When more than one sequence was obtained, the RefSeq one was preferentially selected. When no RefSeq sequence was available, the longest sequence was used. When no sequences were available in the NR database, BLASTP was used on the Ensembl  and DoE Joint Genome Institute databases. The chromosomal localization of Nme genes was established using the Ensembl genome browser or JGI gene information, or when not available, using the UCSC Genome Bioinformatics BLAT  and the NCBI Sequence Viewer. Exon/intron structure was obtained from the Ensembl, NCBI, or JGI databases. The protein domain structure of Nme proteins was obtained from the GenBank Conserved Domain Database .
Phylogenetic reconstructions were performed using the automated genomic annotation platform FIGENIX . For each phylogenetic tree reconstruction, all selected protein sequences were added to a single multiple sequence alignment. Sequence alignment was performed automatically by the FIGENIX pipeline using MUSCLE v3.6 , . The pipeline used is based on three different methods of phylogenetic tree reconstruction (i.e. neighbour-joining (NJ), maximum parsimony (MP), and maximum likelihood (ML)). The substitution model was calculated from data for ML while BLOSUM was used for NJ. Bootstrapping was carried out to assess node support with 1000 pseudoreplicates . Support values were mapped onto a midpoint-rooted 50% majority rule consensus tree for each optimality criterion. Bootstrap values are reported for the nodes that are present in all three phylogenetic reconstruction methods. Asterisks denote the absence of a node for a given phylogenetic method.
The synteny relationships of starlet sea anemone and human Nme genes were analyzed by reciprocal BLASTP on the NCBI NR database using surrounding genes of Nme genes in Nematostella vectensis. Homologous genes were considered in the analysis only when reciprocal BLASTP returned the couple as best hit. For the Ciona intestinalis paralogy analysis, synteny relationships were obtained using the Synteny Database  and putative paralogs were validated by reciprocal BLASTP on the NCBI NR database.
Validated human NME partners were obtained though NCBI Entrez Gene Interactions (http://www.ncbi.nlm.nih.gov/gene) information. Homologous genes in Nematostella vectensis were identified by reciprocal BLASTP on the NCBI NR database. Only BLASTP hits with an E-value lower than 10−10 were considered significant.
Ciona intestinalis genomic region paralogy relationships between chromosomes 2q and 8q. For Ciona intestinalis paralogy analysis, synteny relationships were inquired using the Synteny Database  and putative paralogs were validated by reciprocal BLASTP on NCBI NR databases. (TIF)
Exon/intron structure of Nme8 genes. Exon/intron structure was obtained through Ensembl, NCBI, or JGI databases. When exon boundaries correspond to similar amino acid positions, the exons are displayed in color. Otherwise, exons are displayed in black. Non-coding exons are shown in grey. Numbers indicate exon size in nucleotides. (TIF)
Phylogenetic reconstruction of the Nme protein family in eumetazoans. Phylogenetic tree was constructed from a single multiple alignment. Bootstrap values for neighbor joining, maximum parsimony, and maximum likelihood methods, respectively, are indicated for each node. * indicates that the node does not exist in the corresponding tree. The consensus tree was calculated using the FIGENIX  automated phylogenomic annotation pipeline. Only Homo sapiens, Danio rerio, Ciona intestinalis and Nematostella vectensis sequences were used in this phylogenetic tree reconstruction because of the highly divergent ecdysozoans sequences greatly modifying the tree topology. (TIF)
Exon/intron structure of Nme7 genes. Exon/intron structure was obtained through Ensembl, NCBI, or JGI databases. When exon boundaries correspond to similar amino acid positions, the exons are displayed in color. Otherwise, exons are displayed in black. Non-coding exons are shown in grey. Numbers indicate exon size in nucleotides. (TIF)
BLASTP hits of human NME partners against starlet sea anemone sequences. Partners of human NME proteins were listed from NCBI Entrez Gene interaction information. BLASTP hits were considered significant for e-values lower than 10−10. Accession numbers and BLASTP e-values are also given for c-Myc and p53, two downstream targets of human NME1 and NME2 proteins. (XLS)
Authors thank Alexis Fostier for helpful discussions.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by an INRA - IFREMER PhD fellowship to TD. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 222719 - LIFECYCLE. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.