One of the most exciting puzzles in developmental research is posed by the highly conserved set of Hox-protein transcription factors and how they set up specific body patterns along the anterior-posterior axis of bilateral animals. Mis-expression of Hox-genes can lead to drastic phenotypes, such as the famous four-winged fly 
or a fly sprouting legs from its head where antennae should be 
. In humans, Hox-gene mis-expression can result in the formation of extra vertebrae, digits or genital malformations 
. Another striking peculiarity of Hox-proteins is that the corresponding genes are clustered on the chromosome and expressed along the anterior-posterior axis of the organism in a manner consistent with the relative positions of the genes on the chromosome 
. Similar Hox-gene clusters have been found in all bilateral organisms examined to date. Research on Hox-proteins is preferentially conducted in model organisms such as Caenorhabditis elegans
(nematode), Drosophila melanogaster
(fruit fly), Mus musculus
(mouse) or Danio rerio
(zebrafish) since these organisms are easy to manipulate genetically and tools are available that circumvent the lethality of many Hox-gene mis-expressions.
The extent to which information about the molecular function of a Hox-protein gained from one model organism is transferable to other organisms can be assessed by comparing the presumed functionally equivalent proteins from different species. If, for example, over-expression of a Drosophila Hox-protein and the presumed functionally equivalent protein from mouse exhibit a similar phenotype in Drosophila, we can have higher confidence that this phenotype is due to a conserved feature in the proteins we compare. Insights gained from experiments analyzing a Hox-protein feature responsible for such a phenotype will therefore most likely be transferable to other species, including humans. Identification of presumed functionally equivalent proteins is usually performed by inferring a sequence-based evolutionary history for the proteins, the underlying assumption being that the amino acid sequence of a protein reflects its ancestry and function. Although Hox-proteins are critical to the correct development of bilateral organisms, the identification of functionally equivalent Hox-proteins in the different model organisms is not always straight forward.
All Hox-proteins contain a highly conserved 60 amino acid sequence motif, the homeodomain 
. The high degree of sequence conservation led to this domain being used as the main feature in determining how the various Hox-protein encoding genes are related to one another 
. The homeodomains of Hox-proteins were generally found to exhibit greater sequence similarity to the homeodomains of proteins encoded by genes in comparable positions in the Hox-clusters of other species than to adjacent genes in the Hox-cluster of the same species. It was therefore proposed that a ‘prototypic’ or ‘ancestral’ Hox-cluster had evolved from a single Hox-gene via tandem duplication and subsequent divergence 
and that the common ancestor of all bilateral organisms must have contained a partially differentiated, ‘prototypic’ Hox-cluster containing approximately six genes 
. This ‘prototypic’ cluster is thought to have further diverged and in some cases multiplied by whole genome duplications 
, to give rise to the different types and numbers of Hox-clusters present in our model organisms of interest. These include a single, fairly dispersed cluster in the nematode C
(6 Hox-genes) 
, a single interrupted Hox-cluster in the fruit fly D
(8 Hox-genes) 
, a single Hox-cluster in the prechordate amphioxus B
(14 Hox-genes) 
, four clusters in the mouse M
as well as humans Homo sapiens sapiens
(39 Hox-genes each) 
and seven clusters in the zebrafish D
(48 Hox-genes) 
Some Hox-proteins with clearly distinct functions and distinct sets of downstream genes 
proved difficult to classify due to their nearly identical homeodomains. This is best exemplified by the classification of the Drosophila ANTP, UBX and ABD-A proteins in relation to the vertebrate Hox6, Hox7, and Hox8 protein groups. Due to their high sequence similarity, these proteins are believed to have arisen from the same gene in the ‘ancestral’ cluster 
. There are two distinct ways these proteins have previously been classified (). A) Phylogeny-based classification schemes infer an evolutionary history for the Hox-proteins based on their similarities across the homeodomains. The exact evolutionary relationships of many Hox-proteins can be reliably determined, however, some groups of proteins with (inferred) common ancestry cannot be reliably fully resolved. Proteins within such unresolved groups are often classified as one further unresolvable group of homologs/orthologs/co-orthologs. A summary of phylogeny-based classification schemes is depicted in 
. A disadvantage of these classifications is that it remains unclear which of the proteins within unresolved groups are to be regarded as most functionally similar across species. B) Synteny-based schemes, a second prominent classification method, attempt to resolve this issue by further subdividing unresolved groups using the relative positioning of the genes within the Hox-cluster. Examples thereof can be seen in 
(summarized in ). This latter classification scheme relies on the assumption that the position of the genes in the Hox-cluster reflects ancestry or function in the organism. Clear examples where this is not the case are provided by a number of arthropod species and the sea urchin, in which inversions seem to have changed the relative order of the Hox-genes in the Hox-cluster 
. Another example where synteny-based classification may not be appropriate is the ‘problematic’ set of central Hox-proteins, i.e. Drosophila ANTP-UBX-ABD-A in relation to vertebrate Hox6-Hox7-Hox8. Consistent with the phylogeny-based classification scheme (), we can hypothesize that Drosophila and vertebrates independently triplicated an ‘ancestral’ ANTP-UBX-ABD-A/Hox6-Hox7-Hox8 protein. This would lead to a Hox-cluster with the exact same gene order and sequence similarities we observe, but a synteny-based assignment would wrongly predict co-orthologous proteins to be orthologous. While this may seem trivial, it is not. Co-orthologous proteins are more likely to have diverged considerably in their function than truly orthologous proteins as, due to their independent duplication, they are also expected to be subject to independent selection pressures. Such a mis-classification of co-orthologous proteins as orthologous could lead researchers to compare, across different model organisms, the downstream effects of proteins that have different functions.
Classification schemes for Drosophila melanogaster and Mus musculus Hox-proteins.
Fortunately, it is possible to assess the accuracy of Hox-protein classification schemes by examining whether the Hox-proteins, expected to be functionally similar based on the classification, actually lead to similar mis-expression phenotypes in vivo
. The results of functional comparison studies are summarized in . Both the phylogeny- and synteny-based classification schemes are in agreement with the experimental evidence for: Drosophila Labial (LAB) vs. chicken (Gallus gallus
) HOXB1 (rescue experiment) 
, Drosophila Deformed (DFD) vs. human HOXD4 
, Drosophila Sex combs reduced (SCR) vs. murine HOXA5 
(both ectopic expression phenotype comparisons) as well as various paralogs within mouse 
. However, studies of this type have been unable to confirm functional equivalence across species for any of the central or posterior Hox-proteins 
Experiments supporting and conflicting with assignments of presumed functionally equivalent Hox-proteins.
One specific example in which the experimental evidence does not support the synteny-based classification scheme that predicts ANTP to be equivalent to Hox6, UBX to Hox7 and ABD-A to Hox8, is provided by a comparative analysis of ectopic expression phenotypes in Drosophila for the Drosophila Antennapedia (ANTP) and murine HOXB6 proteins. For this example, it is important to know that most Hox-proteins are capable of inducing antenna to generic leg phenotypes in Drosophila 
. HOXB6 is able to induce partial transformation of antennae into generic legs, however, it is not able to induce the specific leg type (T2) induced by ectopic expression of Drosophila ANTP 
. As such, HOXB6 does not appear to be a better functional equivalent to ANTP than other Hox-proteins. A further example where previous classifications are questionable is provided by ectopic expression of Drosophila ABD-B and murine HOXB9 in Drosophila 
. While some phenotypes were in common between ABD-B and HOXB9, e.g. the ability to induce ectopic abdominal-type denticles in addition to thoracic ones, most of the HOXB9 phenotypes were clearly distinct from those induced by ABD-B. In embryos, for example, HOXB9 expression was unable to induce ectopic posterior spiracles or create ABD-B-like morphologies of denticles and sensory organs. HOXB9 also exhibited additional functions usually attributed to other Hox-proteins, but not ABD-B, such as partial transformation of the posterior head into the dorsal thorax.
Knowing which proteins provide the best functional equivalents across different species is pivotal to predicting and understanding Hox-protein function such as, for example, differentiating between the ‘co-selective binding’ (specific DNA binding) and ‘widespread binding’ (transcriptional activity regulation once bound to the DNA) models defined by Biggin and McGinnis 
. The two most prevalent classification schemes for Hox-proteins () coincide and agree with experimental evidence in their classification of the anterior Hox-proteins. However, the classification of the central and posterior Hox-proteins is less clear. For the above experiments, ANTP vs. HOXB6 and ABD-B vs. HOXB9, the schemes either provide insufficient resolution (phylogeny) or predict proteins with differing functions to be functionally equivalent (synteny). In either case the classification schemes do not provide any estimates to which extent the function of the predicted equivalent proteins will be comparable. Furthermore, the relationship beween the posterior amphioxus Hox-proteins (Hox9 to Hox15) to the corresponding vertebrate proteins (paralogy groups Hox9 to Hox13) is unclear and needs to be resolved 
In an attempt to improve upon previous classifications, we examined all Hox-protein sequences available in the GenBank non-redundant protein database (NCBI-nr). Our aim was to improve three aspects of the existing classification schemes: I) correct potential mis-classifications of Hox-proteins, II) refine the classification for the insufficiently resolved groups of Hox-proteins and III) provide estimates as to how comparable the most similar Hox-proteins from different organisms are likely to be. We present a pairwise sequence similarity based classification of the family of Hox-proteins with special emphasis on the major model organisms. To help resolve the relationship of the ‘problematic’ central group of Hox-proteins we define an extended Hox-homeodomain encompassing their YPWM motif, linker region and homeodomain. The classification scheme we provide is in complete accordance with the published experimental evidence and provides a more detailed classification of the ANTP, UBX, ABD-A and Hox6, Hox7, Hox8 as well as the ABD-B, vertebrate Hox9-13 (vHox) and amphioxus Hox9–15 (AmphiHox) groups of proteins than previous classification schemes. The results indicate the utility of including the YPWM motif and linker region for the classification of Hox-proteins and strongly suggest that these elements have a role in determining Hox-protein function. The detailed classification of these groups provides novel and experimentally testable predictions for functionally comparable pairs of Hox-proteins across the major model organisms.