Our analysis shows that the number of distinct domain combinations per genome varies greatly between different groups of species and increases systematically with their complexity. This increase matches the intuitive meaning of “complexity” as related to differentiation between cell types in an organism, which typically results from the interactions between multidomain regulatory processes.
The main result presented in this paper, namely the fact that at least 25% of all known and 75% of all recurring domain combinations have evolved independently, is less intuitive. On one hand, it is an obvious effect of the plasticity of eukaryotic genomes, with genome rearrangements constantly reshuffling existing domain combinations. On the other hand, it is interesting that this apparently random process leads to repeated reemergence of the same domain arrangements. Given that the genomes analyzed in this work contain a total of 8,023 distinct domains, it would allow the formation of about 64×106 distinct directed domain combinations. And yet in the genomes analyzed here, we observed a total of only 34,778 domain combinations, which corresponds to only about 0.05% of the theoretical maximum. Therefore, we can speculate that the process of domain recombination is not entirely random and that organisms evolved some mechanisms that constrain the process of domain recombination in such a way that the chances of harmful, nonsensical arrangements are decreased. Here, we can only speculate about possible mechanisms to implement such constraints, but, for example, this could be achieved via the specific distribution of transposable elements and/or chromosomal locations of preferred recombination “hot spots.”
The number of times many domain combinations emerged independently is even more significant when viewed from the perspective of individual species. Over 70% of the domain combinations present in the human genome, and about 70% for all vertebrates, have evolved independently in other species at least once. This apparent discrepancy between the global and per-species averages is caused by a large number—over 22,000, unique, species-specific domain combinations, which, while rare (about 130 on average, with a median of 57) in individual species, add up to a large percentage over all species. One can argue that we are seeing two types of domain combinations: “universal, reemerging domain combinations” and “clade–specific, non-reemerging domain combinations.”
One might speculate that domains that tend to appear in independently evolved domain combinations could be functionally different from those that make up combinations that only appeared once. This seems not to be the case, though— preliminary studies using a variety of methods and tools (such as Gene Ontology term enrichment analysis) indicate that there is no significant correlation between domain function and the tendency of domains to appear in independently evolved domain combinations. Similarly, strong correlation between domain “promiscuity” 
and presence in reemerging domain combinations could not be observed. On the other hand, the modeling of structures of several specific cases of independently emerged domain combinations indicates that surface features of individual domains could be dramatically different, suggesting dissimilar functions 
. This interesting issue definitely requires more in-depth analysis.
Observations presented in this paper have important consequences in interpreting similarities and differences between genomes of distantly related organisms. Usually, discovery of a protein with known domain architectures in newly studied species is taken as an argument for evolutionary conservation of function of these proteins. This is of particular importance when attempting to transfer protein function from distantly related model organisms, such as from the ecdysozoans Drosophila melanogaster
and C. elegans,
to vertebrates, such as humans. The high rate of independent domain combination evolution between protostomes and deuterostomes (the second-largest rate; see ) is yet another reason for interpreting results from such model organisms with caution 
Besides estimating the rate of independent domain evolution, we also assessed the number of clade-specific domains and domain combinations. All branches of life (at all levels) have unique domain combinations (combinations not shared with other branches). Due to unequal sampling, it is difficult to compare these numbers. Nevertheless, some issues are worth mentioning. While, as expected, animals have the largest number of unique domain combinations (~12,800, based on 48 genomes, compared to ~4,800 in fungi based on 61 genomes and ~3,700 in green plants based on 33 genomes), within animals there appears to be little-to-no correlation between the number of unique domain combinations and morphological complexity. For example, mammals have ~400 unique domain combinations from 10 genomes, whereas Arthropoda have roughly three times that number (~1,500 from 12 genomes). Clearly, the number of unique domain combinations does not explain the complexity of mammals. In this context, we introduced the concept of clade core domain combinations, combinations exclusively found in each genome of a given clade. It can be argued that such clade core domain combinations provide fundamental and distinguishing functionality for the organisms of a clade. For example, animal core domain combinations are all involved in extracellular matrix/cell–cell adhesion functions and in transcription regulation functions and are thus strongly correlated with the development of multicellular organisms.
In summary, our results stress a recurring theme—namely, that evolution is an exceedingly dynamic, and seemingly random, process. New domain combinations are being created and recreated throughout evolution. Each group of organisms (and probably even each organism) has their own solution, based on a partially shared set of building blocks (domains) to solve shared biochemical and regulatory needs.
As more and more genomes are being sequenced, we expect the percentage of independent domain combination evolution to grow even more. In fact, we expect that, with sufficient data available, the following paradigm of evolution at the domain level will emerge. Major clades (such as animals) have a relatively small set of distinguishing core domain combinations that are essential and defining for members of that clade (such as developmental programs and cell–cell adhesion for animals). Outside of these hierarchical sets of core domain combinations (such as for eukaryotes, animals, and vertebrates), all domains are randomly undergoing reshuffling, and the vast majority keep reemerging and disappearing both over species space and over time, with the exception of various small sets of core domain combinations.