One of the main motivations to study amphioxus is its potential for understanding the last common ancestor of chordates, which notably gave rise to the vertebrates. An important feature in this respect is the slow evolutionary rate that seems to have characterized the cephalochordate lineage, making amphioxus an interesting proxy for the chordate ancestor, as well as a key lineage to include in comparative studies. Whereas slow evolution was first noticed at the phenotypic level, it has also been described at the genomic level. Here, we examine whether the amphioxus genome is indeed a good proxy for the genome of the chordate ancestor, with a focus on protein-coding genes. We investigate genome features, such as synteny, gene duplication and gene loss, and contrast the amphioxus genome with those of other deuterostomes that are used in comparative studies, such as Ciona, Oikopleura and urchin.
Amphioxus (cephalochordates), and especially the model amphioxus Branchiostoma floridae, are often used as proxies for the ancestor of chordates, notably in molecular studies [1–7], and more recently in genomics [8, 9].
Although there is an interest in reconstructing ancestral genomic features in many comparative studies, the amphioxus stands out as an organism, which is strongly studied as an ancestor proxy . Among article abstracts present in PubMed, 27% of those that include the word ‘amphioxus’ also include some variation of the root ‘ancest*’ (e.g. ancestor, ancestral). This figure is only 11% for ‘Ciona’, the most studied invertebrate chordate and 3% for ‘Hydra’. The effect is even stronger in amphioxus-related articles highlighted in Faculty of 1000 of which 69% contain the root ‘ancest*’ only 37% for Ciona.
This raises the following question: how good a proxy for the chordate ancestor is amphioxus? Here, we investigate this question from a genomic perspective: how good a proxy for the ancestral chordate genome is the available amphioxus genome? We compare the relevance of the amphioxus genome with those of other invertebrate deuterostomes, whose sequenced genomes are also potentially useful to reconstruct the chordate ancestor: Ciona [10, 11], sea urchin  and Oikopleura  (Table 1 and Figure 1), plus sea anemone .
One of the ways in which a modern-day genome can diverge from the ancestral state is through secondary gene or genome duplications. First, duplication can lead to an increase in the number of genes in some functional classes, relative to others. Second, duplicate genes can diverge in function, leading to greater functional diversity inside the genome .
Notably, all jawed vertebrate genomes share at least two rounds of whole-genome duplication [9, 16], and up to three in the ancestry of teleost fishes. Such events are followed by biased gene loss. Thus, a long-term consequence of whole-genome duplication is that the genome is enriched in certain functional categories, such as transcription factors, or in genes expressed in late development, relative to the ancestor [9, 17–19].
Under this metric, one could expect any other deuterostome genome to be a better representative of the ancestor, rather than any vertebrate. However, small scale duplications can also be an abundant source of divergence, and ‘lineage-specific’ explosive duplications of different gene families in different species appear widespread . Indeed, examples of such lineage-specific duplications have been found in all deuterostome genomes. For example, Oikopleura, which has the smallest chordate genome, has 266 homeobox genes, resulting from 87 amplification events . Interestingly, innate immunity genes have expanded independently in the amphioxus and sea urchin genomes . Although such examples can be repeatedly listed , a systematic view is required in order to quantify divergence from the ancestor more accurately.
An important technical problem is that many of these genomes are assembled with a lesser quality than those of the vertebrate model organisms, such as human or mouse. Moreover, the assembly is often based on a mixture of haplotypes, from populations with very high levels of polymorphism [discussed in 9, 12, 13]. As a result, it can be difficult to diagnose lineage-specific duplications in an automatic manner, with an acceptable false positive rate.
To gain some insight into the extent of duplication, we have measured the number of homologs descending from one chordate ancestral gene in each genome, using the following procedure: (i) gene trees from Ensembl  that contain at least one vertebrate gene and at least one gene outside vertebrates were used to reconstruct the ancestral chordate complement of genes (15040 genes); (ii) an all-against-all BlastP comparison was performed between Metazoa (sea urchin, oikopleura, sea anemone, amphioxus) absent from the Ensembl data set and representatives of the latter (human, chicken, zebrafish, drosophila, nematode, Ciona intestinalis and Ciona savignyi); (iii) all best reciprocal hits were used to insert the new genes in Ensembl trees according to the species phylogeny. This procedure is very conservative, as fast evolving duplicates will not be identified, but several paralogs per genome can still be identified, if they are best reciprocal hits to different genes of the gene family. The advantage of this procedure is that we remove most false positives, while using a consistent definition that allows a comparison between genomes. It is biased against discovering new lineage-specific duplicates, especially for gene families that are single copy in all Ensembl genomes. We consider this risk of false negatives in amphioxus and other nonmodel organisms to be preferable to a high level of false positives. Thus the results should not be taken as indicative of the absolute level of duplication, but rather of the relative amount of duplication in different genomes.
With this procedure, we find the highest number of duplications in zebrafish, followed by the two vertebrates investigated (Table 2). This shows that whole-genome duplications were the main factor in generating paralogs in chordate genomes (at least those that are sufficiently conserved in sequence to be detected by our approach). Oikopleura, which has the smallest chordate genome, also has fewer duplications, consistent with its general properties of reductive history, whereas amphioxus and Ciona show intermediate levels of duplicate gene retention.
As might be expected, whole-genome duplications have thus had a large impact on vertebrate genomes, suggesting that chordate genomes that did not undergo these duplications are better proxies for the chordate ancestor.
In Table 1, we present the number of best reciprocal BlastP hits (BRBHs) between an outgroup to bilaterians, the sea anemone and different bilaterian animal genomes. This provides a rough estimate of conserved orthologs between the genomes. If genes were retained in single copy in two species, and did not diverge too much in sequence, then they will be reported. They will also be reported if there were duplications, but one gene copy diverged less than the others, and presumably remained closer to the ancestral function and structure. These are obviously approximations and notably, a recent study failed to support a correlation between sequence and function conservation . We still believe that this provides a useful estimation of the amount of conservation of ancestral genes in each genome. If a genome lost more genes, or if its genes diverged more, then we expect fewer BRBHs between that genome and the outgroup.
The results are striking: amphioxus has the most BRBHs with sea anemone of any species considered. This is despite the fact that the quality of the genome sequences of many of the model species is better, with deeper sequencing, better assembly and better annotation. Supporting the utility of our approximate measure, Oikopleura has the lowest number of BRBHs, consistent with the known pattern of gene loss and gene remodeling in that lineage . The second highest number of BRBHs, very similar to that of amphioxus, is for the zebrafish. Thus, on this measure, these two genomes are the closest to the ancestral genome. But for zebrafish, this should be combined with three rounds of genome duplication, which implies another form of divergence that amphioxus did not experience.
We next considered only the subset of 15040 genes that were inferred in the ancestor of chordates (Table 2). Of these, amphioxus lost 5491, similar to the 4950 lost in human and much fewer than the 7753 lost in Ciona or 8198 lost in Oikopleura. Moreover, there is a subset of 4629 genes that were repeatedly lost in different lineages (i.e. human, zebrafish, Ciona, amphioxus) (Figure 2). There are only 925 genes that were lost only in amphioxus. This compares with 701 lost only in vertebrates, but 2354 lost only in Ciona. Thus, amphioxus has conserved ancestral genes much more than Ciona, and similarly to vertebrates. Moreover, these results might be biased by the better quality of the human and zebrafish genomes, i.e. there are probably more false negatives in the amphioxus genome.
If we combine the results of gene loss and gene duplication, it appears likely that the amphioxus complement of protein-coding genes is close, but not identical, to the ancestral chordate complement. Indeed, it is the only species investigated for which more than half of the ancestral chordate genes are still present in single copy (Table 2) (within the limitations of our reciprocal best hits and of an imperfectly assembled genome).
Although the conservation of protein-coding genes is an important aspect of evolution, there are many other ways in which a genome can diverge from its ancestral state. An interesting global measure of genome evolution is the conservation of synteny, i.e. of gene order and gene neighborhood. Cases in which exact gene order is functionally important, such as the Hox clusters of vertebrates or of insects , appear to be rather exceptional in animals. On the other hand, a more relaxed definition of synteny based on shared gene neighborhood appears to play a functional role in vertebrate genomes [25, 26], and is applicable to the comparison of genomes as distant as human and hydra .
Comparative studies of animal genomes have shown a large variability between lineages in the level of synteny conservation. Despite the limitations of the amphioxus genome assembly, and despite a longer divergence time between amphioxus and vertebrates compared with Ciona and vertebrates, the conservation of gene neighborhood with vertebrates is greater for amphioxus than for C. intestinalis . In total, 74% of amphioxus scaffolds have a significant concentration of orthologs from the same human chromosome, as opposed to 9% of Ciona scaffolds. Even less conserved than Ciona, Oikopleura is the only known chordate genome to show no significant conservation of gene neighborhood with other chordates, at a 30 genes neighborhood distance . Even Nematostella (sea anemone) and Caenorhabditis elegans have higher conservation with the chordate gene order than Oikopleura.
A comparative estimation of Deuterostomes plus hydra showed that the lowest rearrangement rates since the ancestral bilaterian were in the lineages leading to urchin and amphioxus . The vertebrate genomes appeared highly impacted by the whole-genome duplications that were followed by intense rearrangements. Yet, the amphioxus genome does not appear very strongly conserved in this analysis, and ‘therefore it cannot be assumed to be uniquely representative of the ancestral chordate genome’ .
To confirm the extent of synteny conservation between different model genomes and the chordate ancestor, we have used estimated ancestral chordate linkage groups . We plotted the position of these ancestral genes on the amphioxus genome scaffolds, and on scaffolds or chromosomes from other species, ensuring that a similar number of genes were used in each species in order to make comparisons possible. The resulting dot-plots clearly confirm the lack of conservation in Oikopleura, and a similar lack of conservation in Drosophila (data not shown). Some level of synteny conservation is found in C. intestinalis and C. elegans, and still higher for amphioxus (Figure 3). The pattern in sea urchin is not clear, because of a lack of mapped orthologs (data not shown). Despite the post whole-genome duplication rearrangements, the strongest conservation of synteny is found for vertebrates, notably the chicken (Figure 3). There might be a bias in that the estimation of the ancestral linkage groups used more information from the well assembled human genome, than from less well assembled genomes. Of note, the patterns observed show clearly the 4-to-1 homology of chicken to the ancestral chordate, due to two whole-genome duplications. Thus, it seems that either chicken or amphioxus provide the best proxy for the ancestral gene arrangement, depending on the importance of working with a nonduplicated genome (i.e. amphioxus), or the importance of having very well conserved synteny (i.e. chicken).
Interestingly, comparative synteny and sequence alignments have been used to identify conserved noncoding elements between vertebrates and amphioxus. Such elements were first identified among vertebrates, but not between vertebrates and other species , although only the C. intestinalis genome was then available. The amphioxus draft genome allowed the detection of a few conserved noncoding elements, which were shown to be functional, i.e. they drive expression in development . Using conserved synteny with vertebrates, Hufton et al.  identified 1299 conserved noncoding elements in amphioxus. All vertebrate genomes had many more such elements. Of those that were tested, about half had enhancer activity in vivo. It seems probable that the 1299 elements in amphioxus are representative of the ancestral state, providing an exciting window into gene regulation in ancestral chordate development.
Although the use of the amphioxus as a proxy for the chordate ancestor is frequent in the literature, tests for its appropriateness are much rarer. The identification of many functional conserved noncoding elements in amphioxus —thanks to conserved synteny—is thus particularly interesting, since these elements are not found in other basal chordates, whereas they are highly duplicated in vertebrates. This is consistent with the accumulated evidence from small-scale studies, that gene regulation in the amphioxus is probably much closer to the ancestral state (e.g. in terms of transcription factors), than either the tunicates (because of gene loss and rearrangement) or the vertebrates (because of genome duplication) .
The different metrics that we have used paint a picture that is consistent with the conclusions of Hufton et al. : amphioxus is not ‘ancestral’, but has derived from the ancestral chordate in many ways. Yet, it is the less derived of the available species with genomes sequenced, specifically in terms of gene content.
We can reformulate the question as: how useful is amphioxus for reconstructing the ancestral state? If the amphioxus genome is not used as a proxy of the ancestor, but as a data point to reconstruct that ancestor and understand chordate evolution, then it is clear that it is the most useful genome for understanding chordate origins and evolution.
Research in the Robinson–Rechavi lab is supported by the Swiss National Science Foundation, the Swiss Institute of Bioinformatics, the HP2C program of the Swiss National Supercomputing Center, and Etat de Vaud.
Alexandra Louis is a bioinformatics Research Engineer at CNRS, and member of the DYOGEN group at the Institute of Biology of the Ecole Normale Supérieure (IBENS) in Paris. She is working on the reconstruction of ancestral genomes in vertebrates and plants and actively maintains the Genomicus synteny viewer.
Hugues R. Crollius is Research Director at CNRS and Group Leader at the Institute of Biology of the Ecole Normale Supérieure (IBENS) in Paris. He is interested in evolution as a general framework to understand the emergence of genomic and functional properties of organisms through the reconstruction of ancestral genomes.
Marc Robinson-Rechavi is associate Professor at the Department of Ecology and Evolution in the University of Lausanne, and Group Leader at the Swiss Institute of Bioinformatics. His main interest is in the evolution of animal genomes in the context of organismal function and development.