The extant jawed vertebrates are represented by three major lineages, the cartilaginous fishes, the lobe-finned fishes, and the ray-finned fishes, with the cartilaginous fishes constituting an outgroup to the other two groups. Cartilaginous fishes thus constitute a critical reference for understanding the evolution of jawed vertebrates. The survey sequencing of the elephant shark, the first cartilaginous fish genome to be characterized to this depth, has provided useful information regarding the length, gene complement, and organization of the genome, and highlighted specific examples of vertebrate genes and gene families that have been lost differentially in the mammalian and teleost fish lineages. The 1.4× coverage elephant shark sequence generated in this study contains partial or complete sequences for about 15,000 unique genes. These sequences can serve as probes for isolating genomic clones and for obtaining complete sequences of gene loci of interest on a priority basis. At 0.91 Gb, the length of elephant shark genome is similar to that of the chicken (1.05 Gb), half that of the zebrafish (~1.7 Gb), and one-third the length of the human genome (2.9 Gb). It is about twice the length of the fugu and Tetraodon genomes (~0.4 Gb), which are the smallest among vertebrates. The elephant shark genome is the smallest among known cartilaginous fish genomes, and thus is an ideal cartilaginous fish genome for economically sequencing the whole genome and for comparative analysis.
A major drawback in comparisons between human and teleost fish genomes is the presence of many duplicate gene loci in teleost fishes due to the additional fish-specific whole-genome duplication event in the ray-finned fish lineage. Analysis of Hox genes in the elephant shark assembly has indicated that the elephant shark genome has not undergone a lineage-specific whole-genome duplication. Interestingly, the human and elephant shark genomes exhibit a higher level of conserved synteny compared with human and zebrafish genomes, even though humans are more closely related to zebrafish than they are to the elephant shark. The disruption of syntenic blocks in the teleosts may be partly related to differential loss of duplicate copies of genes following the fish-specific genome duplication event. The elephant shark also exhibits a higher level of sequence similarity with humans. A higher number of mammalian UCEs, which include both coding and noncoding sequences, were identified in the elephant shark genome compared with the zebrafish and fugu genomes. In a related study, we have shown that twice as many noncoding elements are conserved between human and elephant shark genomes compared with that between human and zebrafish or fugu genomes [
40]. The higher level of sequence similarity between the elephant shark and humans could be due to a decelerated evolutionary rate of the elephant shark DNA compared with human and teleost DNA or an accelerated evolutionary rate of teleost sequences compared with the elephant shark and human genomes. Analysis of mitochondrial DNA sequences from 12 lineages of sharks belonging to the elasmobranch lineage has shown that the nucleotide substitution rate in sharks is 7- to 8-fold slower than in mammals [
55]. The evolutionary rate of mitochondrial proteins ND2 and Cytb was also found to be slower (about one-fourth) in these sharks compared with mammals [
56]. These studies suggest that the evolutionary rate of DNA in cartilaginous fishes is slower than that in mammals. Comparisons of evolutionary rates of protein-coding genes in
Tetraodon, fugu, zebrafish, and other teleosts have shown that the fish coding sequences have been evolving at a faster rate than their mammalian orthologs, and that the duplicated pairs of fish genes are evolving at an asymmetric rate [
6,
15,
16,
57,
58]. Duplicated fish genes also tend to accumulate complementary degenerate mutations in the coding and noncoding sequences, resulting in partitioning of regulatory elements and exons between the two copies [
59–
62]. Such partitioning could result in a reduced level of sequence conservation between each of the duplicate copies and its ortholog in humans. Thus, the higher level of sequence similarity between the elephant shark and humans compared with that between teleost fish and humans could be the result of both a decelerated evolutionary rate of elephant shark DNA and an accelerated evolutionary rate of teleost fish sequences. The higher degree of conservation of synteny and conserved sequences between the human and elephant shark genomes compared with human and teleost fish genomes, and the absence of evidence for a lineage-specific whole-genome duplication event in the elephant shark lineage, underscore the importance of the elephant shark genome as a model jawed vertebrate genome for comparative analysis of human and other jawed vertebrate genomes.
Cartilaginous fishes are the oldest phylogenetic group of jawed vertebrates that possess an adaptive immune system. Analysis of the elephant shark genome sequences has identified all components of the adaptive immune system genes (e.g., T-cell receptors, immunoglobulins, and
RAG and MHC genes) known in tetrapods and teleosts, as well as a unique family of doubly rearranging antigen receptor (NAR-TcR) genes previously reported only in elasmobranch cartilaginous fishes [
43]. The presence of this unique family of genes in the elephant shark, a holocephalian, indicates that NAR-TcR existed in a common ancestor of all cartilaginous fishes. Thus, cartilaginous fishes appear to have evolved a distinct type of adaptive immune system after they diverged from their common ancestor with bony fishes. The physiological significance of such a unique adaptive immune system remains to be understood.
The number of Hox gene clusters in vertebrates illuminate the history of genome duplications during vertebrate evolution (). It has been proposed that the evolution of phenotypic complexity in vertebrates was accomplished through two rounds of whole-genome duplication (the “2R” hypothesis) during the evolution of vertebrates from invertebrates [
63]. Although the presence of four mammalian paralogs for many single genes in invertebrates [
64] and four Hox clusters in mammals compared with a single Hox cluster in amphioxus is consistent with this hypothesis, the exact timings of the two rounds of genome duplication are unclear. The identification of four putative clusters of Hox genes in the elephant shark in the present study indicates that the two rounds of genome duplication occurred before the divergence of the cartilaginous fish and bony fish lineages (). Since the analyses of Hox genes in jawless vertebrates such as the lamprey show that at least one round of genome duplication (“1R”) occurred before the divergence of the jawless and jawed vertebrate lineages, it can be inferred that the second round of duplication (“2R”) occurred after the divergence of the jawless and jawed vertebrate lineages but before the split of cartilaginous fish and bony fish lineages (). The presence of almost twice the number of Hox clusters in teleost fishes as in mammals and the elephant shark supports an additional whole-genome duplication event in the ray-finned fish lineage. This more recent fish-specific genome duplication event, referred to as “3R,” has been hypothesized to be responsible for the rapid speciation and diversity of teleosts [
61]. Thus, genome duplication has continued to play an important role in the evolution of vertebrates even after the emergence of bony vertebrates.
In this project, we have taken a survey sequencing approach to characterize the elephant shark genome. Previously, a survey sequencing approach was used to estimate several global parameters of the dog genome, such as its length, repeat content, and neutral mutation rate [
23]. The coverage (1.5×) included partial sequence data for dog orthologs of ~75% of annotated human genes, and revealed that >4% of intergenic sequence is conserved between the dog, human and mouse. More complete sequencing of the dog genome has confirmed the accuracy of these estimates [
24]. The survey sequencing approach has now been recognized as an effective and economical way of rapidly characterizing the large genomes of closely related vertebrates for which there is little or no genomic sequences or genetic/physical maps. Here, we have shown that a survey sequencing approach can also be productively used for characterizing most distantly related vertebrate genomes. In contrast to sequencing of paired-ends of short-insert plasmid libraries in conventional whole-genome shotgun sequencing strategy, survey sequencing of the elephant shark genome was based on sequencing of paired-end sequences of fosmid clones. This approach allows accurate assembly of dispersed repeats that are larger than 2–3 kb and provides long-range linkage information that can be used to determine conserved synteny between species. Fosmid clones are also valuable templates for filling gaps in the assembly and for obtaining complete sequences of gene loci of interest. We propose survey sequencing to a depth of 1.5–2× based on paired-end sequencing of large-insert libraries as an effective and economical approach for characterizing distantly related vertebrate genomes.