|Home | About | Journals | Submit | Contact Us | Français|
Atlantic cod (Gadus morhua) is a large, cold-adapted teleost that sustains long-standing commercial fisheries and incipient aquaculture1,2. Here we present the genome sequence of Atlantic cod, showing evidence for complex thermal adaptations in its haemoglobin gene cluster and an unusual immune architecture compared to other sequenced vertebrates. The genome assembly was obtained exclusively by 454 sequencing of shotgun and paired-end libraries, and automated annotation identified 22,154 genes. The major histocompatibility complex (MHC) II is a conserved feature of the adaptive immune system of jawed vertebrates3,4, but we show that Atlantic cod has lost the genes for MHCII, CD4 and Ii that are essential for the function of this pathway. Nevertheless, Atlantic cod is not exceptionally susceptible to disease under natural conditions5. We find a highly expanded number of MHCI genes and a unique composition of its Toll-like receptor (TLR) families. This suggests how the Atlantic cod immune system has evolved compensatory mechanisms within both adaptive and innate immunity in the absence of MHCII. These observations affect fundamental assumptions about the evolution of the adaptive immune system and its components in vertebrates.
We sequenced the genome of a heterozygous male Atlantic cod (NEAC_001, Supplementary Note 1 and 2) applying a whole-genome shotgun (WGS) approach to 40x coverage (estimated genome size 830Mb, Supplementary Note 4, Supplementary Figure 2) using 454 technology (Supplementary Note 3). Two programs (Newbler6 and Celera7, Supplementary Note 5 and 6) produced assemblies with short contigs, yet scaffolds of comparable size to those of Sanger sequenced teleost genomes (Supplementary Note 10, Supplementary Figure 8). While fragmentation due to short tandem repeats is difficult to address (Supplementary Note 7), we resolved numerous gaps due to heterozygosity (Supplementary Note 8). The assemblies differ in scaffold and contig length (Table 1) though their scaffolds align to large extent (Supplementary Note 9, Supplementary Figure 7). We obtained ~1 million SNPs by mapping 454 and Illumina reads from the sequenced individual to the Newbler assembly (Supplementary Note 11). Both assemblies cover over 98% of the reads from an extensive transcriptome dataset, indicating that the proteome is well represented (Supplementary Note 13). The assemblies are consistent with four independently assembled BAC insert clones (Supplementary Note 14, Supplementary Figure 9) and with the expected insert size of paired BAC-end reads (Supplementary Note 15, Supplementary Figure 10).
A standard protein-evidence based annotation approach was complemented with a whole genome alignment with stickleback (Gasterosteus aculeatus), after repeatmasking 25.4% of the Newbler assembly (Supplementary Note 16, Supplementary Table 6). In this way, 17,920 out of 20,787 protein-coding stickleback genes were mapped onto reorganized scaffolds (Supplementary Note 17). Additional protein-coding genes, pseudogenes and non-coding RNAs were annotated using the standard Ensembl pipeline. These approaches resulted in a final gene set of 22,154 genes (Supplementary Table 7). Comparative analysis of Gene Ontology (GO) classes indicates that the major functional pathways are represented in the annotated gene set (Supplementary Note 18, Supplementary Figure 11). We anchored 332Mb of the Newbler assembly to 23 linkage groups of an existing Atlantic cod linkage map using 924 SNPs8 (Supplementary Note 19, Supplementary Table 8). These linkage groups have distinct orthology to chromosomes of other teleosts based on the number of co-occuring genes showing that the WGS assembly reflects the expected chromosomal ancestry (Figure 1, Supplementary Note 20, Supplementary Table 9).
Well-studied haemoglobin polymorphisms are indicative of functional molecular adaptation to thermal variation in Atlantic cod9,10,11,12. Its genome contains nine α- and β- globin genes that are organized in two unlinked clusters, β5-α1-β1-α4 and β3-β4-α2-α3-β213,14. We discovered a 73 bp indel polymorphism in the intergenic promoter region of the α1-β1 globin pair (Figure 2a, Supplementary Note 21). This promoter polymorphism occurs in highly significant linkage disequilibrium with two known polymorphic sites, the Val55β1Met and Ala62β1Lys substitutions11, in eight Atlantic cod populations (Supplementary Note 22, Supplementary Figure 12). In fact, in the three most northern Atlantic and both Baltic populations, cod β1 globin predominantly occurs as a single homozygous genotype consisting of the long promoter and the Val55—Ala62 allele (Supplementary Table 10). By placing the two promoter variants in front of a luciferase reporter gene and transfecting the constructs into salmon kidney cells (Supplementary Note 23), we found that temperature and promoter type have a significant interaction effect (GLM, F2,36 = 7.85, P = 0.007, Figure 2b) and that the long promoter has a two-fold higher transcriptional activity compared to the short promoter at 15 °C and 20 °C. Increased globin synthesis of the Val—Ala allele would compensate for its lower affinity10,11 at high temperatures. Thus, the promoter polymorphism provides a molecular compensatory mechanism that helps maintain the total oxygen carrying capacity15. The tight linkage between the two types of polymorphism provides a compelling example of the coevolution of structural and regulatory adaptation and highlights the relationship between temperature and functional molecular variation in the haemoglobin system16.
The Atlantic cod immune system has unusual properties that set it apart from other teleosts, i.e. high levels of IgM17, a minimal antibody response after pathogen exposure5,17,18 and abundant phagocytic neutrophils in its peripheral blood19,20. Despite speculations, the exact causes for these differences remain unknown5. We found that most genes involved in the vertebrate immune response are present in Atlantic cod (Supplementary Note 24, Supplementary Figure 13, Supplementary Table 11). Nevertheless, we did not find the major histocompatibility complex (MHC) II isoforms, their assembly and trafficking chaperone Invariant chain (Ii)21 and the MHCII interacting protein CD4, essential for helper T cell activation. By comparing a comprehensive set of vertebrate MHCII, CD4 and Ii sequences to the genome assemblies and all unassembled 454 and Illumina sequencing reads (a dataset of ~49.5 Gb), we detected a truncated pseudogene for CD4 (Supplementary Note 25), which is located in a region of conserved synteny (Supplementary Note 27, Supplementary Figure 18). No traces of MHCII and Ii were found in syntenous regions (Supplementary Note 27, Supplementary Figures 16, 17, 19 and 20) and qPCR targeting a conserved domain in MHCII did not amplify the target sequence (Supplementary Note 26, Supplementary Figure 15). The absence of MHCII and Ii and the pseudogenic nature of CD4 show that Atlantic cod has lost function of the classical pathway for adaptive immunity against bacterial and parasitic infections. Nevertheless, Atlantic cod deals adequately with its prevailing pathogen load in its natural ecological settings5. Earlier transcriptional (cDNA) studies in Atlantic cod have indicated an expansion of the number of MHCI loci22,23. By targeting the conserved MHCI α3 domain in genomic DNA using qPCR, we more accurately quantified the number of loci belonging to the teleost U-lineage24 (Supplementary Note 28). Remarkably, Atlantic cod has ~100 classical MHCI loci, which is a highly expanded number compared to other teleosts (Figure 3a). A phylogenetic analysis of teleost MHCI sequences supports two clades in cod (Figure 3b, Supplementary Note 29). Within each clade, the mutation patterns display statistically significant signs of positive selection that are indicative of subfunctionalization. These findings suggest that loss of MHCII functionality has coincided with a more versatile usage of the cytosolic pathway of MHCI. Two different MHCI antigen presentation pathways – the classical and the alternative cross-presentation pathway – can initiate immune responses in mammals25. The cross-presentation pathway represents a structural and cellular modification of the MHCI machinery that allows activation of CD8+ T cells upon bacterial infection. The Atlantic cod cytokine gene profile (Supplementary Table 11) supports the possibility of generating different CD8+ T cell subsets that provide either direct protection or regulate other immune cells and thus compensate for the loss of CD4+ T cells.
In addition to the MHCI expansion, we find an unusual composition of the highly conserved Toll-like receptor (TLR) families that play a fundamental role in the innate immune response and the initial detection of pathogens. Teleost TLR genes occur in well-supported phylogenetic clusters most of which share functional properties with mammalian orthologs though some are fish-specific26. The Atlantic cod TLR genes form monophyletic groups within the known teleost functional groups (Figure 4, Supplementary Note 30, Supplementary Figure 22). Several bacterial surface antigen-recognizing TLRs (TLR1, -2 and -5) are absent however, leaving only the teleost-specific TLR14/18 as members of the TLR1 family in Atlantic cod. Moreover, multiple families of nucleic acid-recognizing TLRs (TLR7, -8, -9 and -22) have markedly expanded, resulting in the highest number of TLRs found in a teleost so far. This TLR repertoire suggests that the Atlantic cod immune system relies relatively more on nucleic acid-detecting TLRs in order to recognize bacterial pathogens. Notably, the gene expansion of TLR9 coincides with an expansion of interleukin 8 genes (IL-8, Supplementary Table 11). IL-8 is an important chemokine in the innate immune response and is directly induced by TLR9 in human neutrophils27. The corresponding expansions of IL-8 and TLR9 indicate that this signalling cascade is particularly important in Atlantic cod.
The loss of MHCII function and lack of CD4+ T cell response represent a fundamental change in how the adaptive immune system is initiated and regulated in Atlantic cod. The marked expansion of MHCI genes and unusual TLR composition signify a shift of its immune system in handling microbial pathogens. An expanded MHCI repertoire in the presence of a non-polymorphic MHCII is found in a distant vertebrate, the axolotl (Ambystoma mexicanum)28,29. These observations suggest that anomalous immune systems (possibly analogous to that of Atlantic cod) have evolved independently. Additionally, we did not recover evidence for expressed MHCII, CD4 and Ii in the transcriptomes of three other gadoids, indicating that the unusual immune system is a derived characteristic of the gadoid lineage (Supplementary Table 18 and 19).
We have provided the first annotated genome of a species that supports extensive fisheries and is on the verge of becoming an important aquaculture species. This work provides a major foundation for addressing key issues related to the management of natural Atlantic cod populations, such as the concept of fisheries-induced evolution, which dictates that selective harvesting changes the evolutionary trajectory in major life history traits of natural populations30. Moreover, our novel immune findings allow for more targeted vaccine development aiding disease management and aid the process of domestication of Atlantic cod. Importantly, these findings change fundamental assumptions regarding the evolution of the vertebrate immune system.
Detailed methods on the sequencing and assembly of data from genomic and transcriptomic origin, annotation, syntheny analyses, transfection experiments, bioinformatic analyses and phylogenetic analyses presented in this manuscript are described in the Supplementary Information.
This work was supported by a grant From the Research Council of Norway (FUGE program) to KSJ. The authors wish to thank the following people and organisations: the 454 Life Science Sequencing Center (Branford, USA); the 454 and Illumina nodes of the Norwegian Sequencing Centre (Univ. of Oslo); Michael Egholm (formerly 454 Life Science); the Norwegian Metacenter for Computational Science (Notur) and the Norwegian Storage Infrastructure (Norstore); the Research Computing Services (RCS) group, especially Bjørn-Helge Mevik, at the Center for Information Technology (USIT, Univ. Oslo); Brian Walenz from Celera (San Francisco, USA); the Canadian Cod Genomics and Broodstock Development Consortium; Pål Olsvik, Kai Lie and Elisabeth Holen at the Norwegian National Institute of Nutrition and Seafood Research (NIFES); Junita Gaup and Hege Bakke (CEES, Univ. of Oslo); Matthew Kent (CIGENE, Norwegian University of Life Sciences); Sharen Bowman (Genome Atlantic, Canada); the FUGE bioinformatics platforms’ group, especially Svenn Grindhaug, at CBU (Univ. Bergen, Norway); Inger Sandlie and Ole B. Landsverk (Centre for Immune Regulation, Univ. of Oslo); Roche Norway.
This manuscript is dedicated to the memory of Prof. Lars Pilström and Prof. Rene J.M. Stet. Their research inspired our work to further understand the Atlantic cod immune system.
Author Contributions DNA and RNA isolation, library construction and sequencing: A.T.K., M.S., M.H.S., T.B.R., M.M., M.E., B.S., A.J.N. and J.T. Sanger BAC (end-) sequencing: H.K. and R.R. Assembly: A.J.N., B.S., A.S. and A.L. Linkage map analyses: K.G.T. and B.S. SNP analyses: K.G.T., P.R.B, S.L. and A.J.N. Annotation: J.H.V., B.A. and S.S. Repeat analyses: B.S. Synteny analyses: J.P. and B.S. Haemoglobin analyses: Ø.A., O.F.W., B.S. and T.G. Bioinformatics: A.J.N., B.S., A.S., T.B.R., J.P., C.P., C.N., R.B.E., R.W., J.K., K.L., A.L., I.J., M.M, K.M., P.R.B., K.G.T. and M.H.S. Immune analyses: U.G., M.M., M.H.S., M.E., B.S., B.O.K., T.M., K.L., S.D.J. and T.B.R. Interpretation of immune results: U.G., T.F.G., S.J., B.S. and K.S.J. 454 contributions: L.D. Revisions: Ø.A., T.M., S.D.J, F.N., I.J., S.J., N.C.S and S.W.O. Project initiation: S.W.O, I.J., F.N., S.L., N.C.S. and K.S.J. Project coordination: S.J. Consortium leader: K.S.J. The unassembled sequencing reads and Newbler assembly have been deposited at ENA-EMBL under the accession numbers CAEA01000001 through CAEA01554869. The annotation is available through Ensembl at http://www.ensembl.org/index.html. These and more resources are additionally available through http://codgenome.no.
The authors declare no competing financial interests.