One of the ways in which a modern-day genome can diverge from the ancestral state is through secondary gene or genome duplications. First, duplication can lead to an increase in the number of genes in some functional classes, relative to others. Second, duplicate genes can diverge in function, leading to greater functional diversity inside the genome [15
Notably, all jawed vertebrate genomes share at least two rounds of whole-genome duplication [9
], and up to three in the ancestry of teleost fishes. Such events are followed by biased gene loss. Thus, a long-term consequence of whole-genome duplication is that the genome is enriched in certain functional categories, such as transcription factors, or in genes expressed in late development, relative to the ancestor [9
Under this metric, one could expect any other deuterostome genome to be a better representative of the ancestor, rather than any vertebrate. However, small scale duplications can also be an abundant source of divergence, and ‘lineage-specific’ explosive duplications of different gene families in different species appear widespread [20
]. Indeed, examples of such lineage-specific duplications have been found in all deuterostome genomes. For example, Oikopleura, which has the smallest chordate genome, has 266 homeobox genes, resulting from 87 amplification events [13
]. Interestingly, innate immunity genes have expanded independently in the amphioxus and sea urchin genomes [8
]. Although such examples can be repeatedly listed [21
], a systematic view is required in order to quantify divergence from the ancestor more accurately.
An important technical problem is that many of these genomes are assembled with a lesser quality than those of the vertebrate model organisms, such as human or mouse. Moreover, the assembly is often based on a mixture of haplotypes, from populations with very high levels of polymorphism [discussed in 9, 12, 13]. As a result, it can be difficult to diagnose lineage-specific duplications in an automatic manner, with an acceptable false positive rate.
To gain some insight into the extent of duplication, we have measured the number of homologs descending from one chordate ancestral gene in each genome, using the following procedure: (i) gene trees from Ensembl [22
] that contain at least one vertebrate gene and at least one gene outside vertebrates were used to reconstruct the ancestral chordate complement of genes (15
040 genes); (ii) an all-against-all BlastP comparison was performed between Metazoa (sea urchin, oikopleura, sea anemone, amphioxus) absent from the Ensembl data set and representatives of the latter (human, chicken, zebrafish, drosophila
, nematode, Ciona intestinalis
and Ciona savignyi
); (iii) all best reciprocal hits were used to insert the new genes in Ensembl trees according to the species phylogeny. This procedure is very conservative, as fast evolving duplicates will not be identified, but several paralogs per genome can still be identified, if they are best reciprocal hits to different genes of the gene family. The advantage of this procedure is that we remove most false positives, while using a consistent definition that allows a comparison between genomes. It is biased against discovering new lineage-specific duplicates, especially for gene families that are single copy in all Ensembl genomes. We consider this risk of false negatives in amphioxus and other nonmodel organisms to be preferable to a high level of false positives. Thus the results should not be taken as indicative of the absolute level of duplication, but rather of the relative amount of duplication in different genomes.
With this procedure, we find the highest number of duplications in zebrafish, followed by the two vertebrates investigated (). This shows that whole-genome duplications were the main factor in generating paralogs in chordate genomes (at least those that are sufficiently conserved in sequence to be detected by our approach). Oikopleura, which has the smallest chordate genome, also has fewer duplications, consistent with its general properties of reductive history, whereas amphioxus and Ciona show intermediate levels of duplicate gene retention.
Number of descendants of ancestral chordate genes
As might be expected, whole-genome duplications have thus had a large impact on vertebrate genomes, suggesting that chordate genomes that did not undergo these duplications are better proxies for the chordate ancestor.