|Home | About | Journals | Submit | Contact Us | Français|
Carotenoids are commonly deposited in the gonads of marine bivalves but rarely in their adductor muscles. An orange-adductor variant was identified in our breeding program for the bay scallop Argopecten irradians. In the present study, bay scallop genome survey sequencing was conducted, followed by genotyping by sequencing (GBS)-based case-control association analysis in a selfing family that exhibited segregation in adductor color. K-mer analysis (K=17) revealed that the bay scallop genome is about 990 Mb in length. De novo assembly produced 217,310 scaffold sequences, which provided 72.1% coverage of the whole genome and covered 72,187 transcripts, thereby yielding the most informative sequence resource for bay scallop to date. The average carotenoid content of the orange-adductor progenies was significantly higher than that of the white-adductor progenies. Thus, 20 individuals of each subgroup were sampled for case-control analysis. As many as 15,224 heterozygous loci were identified in the parent, among which 9280 were genotyped in at least 10 individuals of each of the two sub-groups. Association analysis indicated that 126 SNPs were associated with carotenoid accumulation in the adductor muscle and that 88 of these were significantly enriched on 28 scaffolds (FDR controlled P < 0.05). The SNPs and genes located on these scaffolds can serve as valuable candidates for further research into the mechanisms by which marine bivalves accumulate carotenoids in their adductor muscles.
Carotenoids are widespread yellow and orange organic pigments that are synthesized by plants and other photosynthetic organisms, including some bacteria and fungi. The only animals known to produce carotenoids are pea aphids (Acyrthosiphon pisum) and spider mites (Tetranychus urticae), which have acquired this ability via gene transfer from fungi 1, 2. About 750 naturally occurring carotenoids had been reported as of 2004, and more than 20 new carotenoid structures have been reported annually since then 3. In addition to their function in pigmentation, carotenoids possess an array of other beneficial properties, including anti-oxidative, anti-tumor, and immune enhancement activities 4.
Even though animals lack the ability to synthesize carotenoids de novo, they can acquire carotenoids from their diet and metabolize them into other forms 5. Many marine animals have been reported to accumulate carotenoids, as reviewed by Maoka 6. In marine bivalves, carotenoids are mainly deposited in gonad tissues, change along with gonad development and maturation 7-9, and play important roles in reproduction 10. In recent years, however, carotenoids accumulation has been observed in the adductor muscles of the Yesso scallop Patinopecten yessoensis 11 and the noble scallop Chlamys nobilis 12. A carotenoid extracted from the pigmented muscle tissues of Yesso scallop was characterized as pectenolone 11, which is a common carotenoid among marine animals. Several studies have been delivered for dissecting the genetic basis of carotenoids accumulation in the adductor in Yesso scallop 13 and Noble scallop 14. However, the underlying mechanisms by which adductor muscles of marine bivalves accumulate carotenoids remains unclear.
During a field study, we observed a very low rate (about 0.1% or less) color variation in the adductor muscles of the bay scallop Argopecten irradians cultivated in China. The color variation is likely related to carotenoid accumulation, as reported in the Yesso scallop 11. Later, during a breeding program, we also observed segregation in the color of adductor muscle in a self-fertilization family, and the carotenoid contents of orange-adductor tissue was significantly higher than that of non-pigmented adductor muscles. In the present study, we report the draft genome and candidate SNPs associated with carotenoid accumulation in the adductor muscles of bay scallops, thereby providing valuable information for elucidating the underlying mechanisms of carotenoid accumulation in the adductor muscles of marine bivalves.
The bay scallop specimen used in the present study was obtained from culture station in Jiaonan, Qingdao and acclimated in an aquarium at the Institute of Oceanology, Chinese Academy of Sciences (IOCAS). All the experiments were conducted according to local and national regulations. No specific permissions were required to collect the bay scallop specimen or to perform the experiments described. All of the field studies were conducted at the culture station of the IOCAS in Jiaonan, Qingdao and did not involve any endangered or protected species.
The scallop specimen used for sequencing originated from a self-fertilization family that was developed from the Zhongkehong variety 15. Genomic DNA was extracted from the adductor muscle tissue using the DNeasy Blood & Tissue Kit (Qiagen). Two paired-end DNA libraries, with insert sizes of 180 and 500 bp, were constructed according to the protocol and sequenced using the Illumina HiSeq2000 platform.
The raw data were filtered to remove low-quality reads that had resulted from sequencing error or adapter contamination, and the resulting clean reads were assembled using SOAPdenovo2 16. Using our group's recently reported bay scallop transcriptomic resources 17, which are the most comprehensive to date, the genomic scaffolds were annotated by alignment to 82,267 unigene sequences using GMAP 18.
Without considering color depth, 46 orange-adductor and 146 white-adductor progenies were obtained (Figure (Figure1),1), an observation that conformed to the 3:1 Mendelian segregation via chi-square test. The total carotenoid levels were determined according to Li 11 and Zheng 12. Briefly, the adductor muscle of each sample was separately freeze-dried and ground to a fine powder. The carotenoids were then extracted using acetone, and the extractant was scanned using Nanodrop 2000 to determine the maximum absorption wavelength and carotenoid contents. The carotenoid contents of the two subgroups (orange- and white-adductor) were compared using a T-test. Additional 50 bay scallop specimens that purchased from local market were also subject to carotenoid content analysis. For case-control association analysis, genomic DNA was extracted from the adductor muscles of 20 specimens from each subgroup, using the DNeasy Blood & Tissue Kit.
To determine the optimum restriction enzyme combination for preparing the double-digest genotyping by sequencing (ddGBS) library, a digestion simulation was performed according to Wang et al. 19, using the draft genome as a reference (see Supplementary file “Digestion simulation”).
Library preparation was performed as described by Poland et al. 20 with some modifications. First, the concentrations of the genomic DNA samples were determined using a Qubit fluorometer (Life), and the samples were diluted to 20 ng/μl. Twenty-μl double-digestion reactions were performed by incubating ~200 ng DNA and 10 U each of EcoRI and MspI (NEB) in 1× buffer for 2 h at 37 °C, followed by 20 min at 65 °C. The resulting fragments were ligated to barcode adapters in by incubation in 50-μl reactions with ~0.1 pmol EcoRI adaptor, 10 pmol MspI adaptor, and 200 U T4 DNA ligase (NEB) in 1× buffer for 2 h at 22 °C, followed by 20 min at 65 °C. The ligase products of each sample were mixed, purified, concentrated using a QIAquick PCR Purification Kit (Qiagen), and subject to size selection (400-600 bp) using the gel electrophoresis and a QIAquick Gel Extraction Kit (Qiagen). The recovered DNA was amplified using PCR enrichment (six cycles) in 50-μl reactions that included High-Fidelity 2× PCR Master Mix (NEB), according to the manufacturer's instructions. Finally, the library was purified using magnetic beads (Axygen), quantified using a Qubit fluorometer and real-time quantitative PCR, and then subject to sequencing using the Illumina HiSeq2000 platform.
De novo analysis was conducted using stacks (version 1.27), as described by Catchen et al. 21, 22. Briefly, raw sequence reads were de-multiplexed and cleaned, and the resulting clean reads were assembled de novo for single-nucleotide polymorphism (SNP) identification and genotype determination using the built-in wrapper program “denovo_map.pl”. The parameters were set to ensure that the minimum depth of coverage required to create a stack was three and that the maximum number of stacks at a single de novo locus was two, whereas default settings were used for all other parameters, as described in the manual (http://catchenlab.life.illinois.edu/stacks/comp/denovo_map.php).
To identify carotenoid accumulation-associated molecular markers, the SNPs were further screened to ensure that they had ① been heterozygous in the parent, ② been genotyped in at least 10 individuals from each of the two subgroups, ③ a major allele frequency equal or greater than 0.9 in the orange-subgroup, ④ and a difference in allele frequency of greater than 0.5 between the two subgroups.
For enrichment analysis, both the scaffolds in which the carotenoid-associated SNPs were located and the total of SNPs located in these scaffolds were determined by aligning the catalog tags with the draft genome using GMAP (version 2015-09-29). The parameters were set to screen the matches using the following conditions: (1) identity of ≥95%, (2) coverage of ≥98%, and (3) no insertions or deletions. The hypergeometric test was conducted in R (version 3.2.3), and the P value was adjusted using the fdr method. To identify genes associated with differences in carotenoid accumulation, the associated scaffolds were annotated using annotated unigene sequences from a previous study 17, and the associated unigenes were subjected to GO enrichment analysis using topGO in R (Version 3.2.3).
Genome survey sequencing yielded over 94 Gb of sequencing data (Table (Table1),1), which has been archived in the Sequence Read Archive (SRA) database (see Data Accessibility). The subsequent K-mer (K = 17) analysis indicated that the bay scallop genome (~990 Mb) possessed a relatively low level of heterozygosity (0.9%; Figure Figure2),2), and de novo assembly produced 217,310 scaffold sequences, with an N50 of 6.8 kb. The sum of the scaffold lengths was 714 Mb, which corresponded a genome coverage of 72.1% (Table (Table22).
A total of 87.7% (72,187 out of 82,267) of the unigene transcripts were mapped to 45,964 scaffolds (105,990 hits in total), and each of the mapped unigene transcripts was located in 1.5 scaffolds, on average. Of the mapped unigenes, 27,399 were functionally annotated using homology-based comparison against public databases.
Wave scanning indicated that the maximum absorption wavelength of the orange-adductor extract was 455 nm. Thus, the carotenoid contents of all samples were determined at 455 nm. The carotenoid content of the orange-adductor subgroup was significantly higher than that of the white-adductor subgroup (Figure (Figure3,3, P < 0.01). No significant difference was detected between the carotenoid levels of the adductor muscles from the local market specimens and those of the white-adductor self-fertilization progenies (Figure (Figure33).
A total of 94 million 100-bp paired-end clean reads were obtained. These were trimmed to 88 bp in length and de-multiplexed according to the barcode, and three progenies that yielded <10,000 reads were excluded from further analysis. Finally, over five million clean reads were obtained for the parent, and an average of over two million were obtained for each of the progenies. All the clean reads used for in silico analysis have been archived in the Sequence Read Archive (SRA) database (see Data accessibility).
SNP identification and genotyping were conducted using the stacks software, according to the manual, with the parent and two subgroups regarded as three populations. In total, 23,637 SNPs were identified, among which 15,224 were genotyped and heterozygous in the parent. According to the classic 3:1 Mendelian segregation ratio, carotenoid accumulation was assumed to be a recessive trait controlled by a single locus. Thus, the theoretical genotypic segregation ratio of the causative variation should be QQ:Qq:qq = 1:2:1, with the allele frequency of the q allele being 100 and 33% in the orange- and white-adductor subgroups, respectively. Of the 15,224 SNPs, 126 were associated with carotenoid accumulation in the adductor muscle (Table S1). The associated SNP loci were successfully genotyped for 16.6 ± 2.7 and 16.1 ± 2.4 specimens for the white- and orange-adductor subgroups, respectively, and the mean difference in allele frequency was 0.77 ± 0.08 (Figure (Figure44).
In total, 116 SNPs were located in 96 catalog tags. For enrichment analysis, 89 of the 96 catalog tags were successfully mapped to 55 scaffolds of the draft genome, which amounted to a total length of 2.82 Mb, or 2.85‰ genome coverage. The number of SNPs located in each scaffold was also determined. The hypergeometric test indicated that the associated SNPs were significantly enriched on 28 scaffolds (FDR adjusted P < 0.05; Table Table3),3), for which the total length was 1.56 Mb, accounting for 1.58‰ of the whole genome.
Of the 72,187 unigene transcripts that mapped to the draft genome of bay scallop, 346 unigenes were located on the 55 scaffolds. Out of these unigenes, 122 have been annotated in our previous study (Table S2) 17. For example, the unigene annotated as cytochrome P450 and putative oxidoreductase GLYR1 can oxidize a variety of structurally unrelated compounds while multidrug resistance-associated protein 5 is involved in organic anion transmembrane transporter activity, and F-box/WD repeat-containing protein 7 is related to the accumulation and maintenance of organic solvent-soluble compounds in cells or tissues. No GO terms were significantly enriched.
Previous studies have published bay scallop expressed sequence tags 23, 24 and transcriptome sequences 17, which provided valuable resources for gene cloning. However, a reference genome would be more informative and meritorious for both gene function exploration and the genetic dissection of important traits. Accordingly, genome survey sequencing was conducted during the present study. Compared to the pacific oyster Crassostrea gigas 25 and the Zhikong scallop Chlamys farreri 26, the heterozygosity of the bay scallop genome is relatively low (0.9%), likely because the bay scallop is an introduced species in China and because the specimen used for survey sequencing was inbred. Owning to the species' relatively low heterozygosity, de novo assembly produced much longer scaffold in bay scallop (N50 = 6.8 kb) than in Zhikong scallop (N50 = 1.5 kb) 26. Therefore, reducing genome heterozygosity through inbreeding is an effective strategy for acquiring high-quality draft genomes of non-model organisms. In addition, K-mer analysis indicated that the bay scallop genome is ~990 Mb in length, which is smaller than that reported previously 27 and recorded in the Animal Genome Size Database 28. It is possible that the difference in predicted genome size is the result the different methods used by the studies, which is why it is generally recommended to use at least two methods to predict genome size before performing whole-genome sequencing. The draft genome generated by the present study covered >70% of the whole genome and harbored >87% of the known gene transcripts, thereby constituting the most informative genome resource currently available for bay scallop. In addition, more sequencing data and more comprehensive analyses are in progress. Therefore, more information about the bay scallop genome will be released in the near future.
Although marine animals cannot synthesize carotenoids, carotenoids can be obtained from dietary algae, modified, and subsequently accumulated 6. In most cases, the accumulated carotenoids are deposited in gonad tissues and play important roles in reproduction 9, 29-31. In recent years, however, carotenoid accumulation has also been reported to occur in the adductor muscles of bivalves, including the Yesso scallop 11 and noble scallop 12. Since adductor muscles are the main edible part of scallops, carotenoid enrichment of scallop adductor muscles could prove beneficial to human health. For this reason, an orange-adductor variety of Yesso scallop, named “Haida golden scallop”, has been cultivated 32. Similar orange-adductor phenotype also exists in the bay scallop. However, the maximum absorption wavelength of the adductor extract was different from that of the Yesso scallop, which suggests that the modification of dietary carotenoids by the two species is distinct. Indeed, according to the studies in the noble scallop, the expression patterns of candidate genes involved in carotenoid deposition in other organisms show no significant difference between orange- and white-adductor specimens 14, demonstrating that the mechanism underlying carotenoid accumulation in the adductor muscles of marine bivalves probably differs from any known ones. Thus, it is of great value to dissect the genetic basis of carotenoid accumulation in the adductor of marine bivalves.
Several studies have been conducted to identify differentially expressed genes or proteins among orange- and white-adductor marine bivalves. For instance, three scavenger receptor genes were suggested to facilitate the cellular uptake of carotenoids in the adductor muscles of orange-adductor Yesso scallop 13. This was further supported by comparative transcriptome analysis in the noble scallop 14, in which gene scavenge receptor class B like-3 was only expressed in the blood of orange scallops (i.e., not in brown ones). However, neither gene expression nor sequence polymorphism of this gene were correlated to carotenoid contents in the adductor muscles of the noble scallop 14. Meanwhile, proteomic analysis revealed that four other genes were differentially expressed in golden and normal Yesso scallop, at both the mRNA and protein levels 33. Since all four of the genes could be regulated by a common transcription factor, peroxisome proliferator-activated receptors, it was supposed that the transcription factor might play an important role in carotenoid accumulation 33. In addition, stearoyl-CoA desaturase, a key regulator of lipid metabolism, was also suggested to facilitate carotenoid accumulation in the adductor of golden scallop, owing to its significantly different expression in the adductor muscles of golden and normal scallops 34. These studies represent an important step towards elucidating the genetic basis of carotenoid accumulation in the adductor muscles of marine bivalves. However, differentially expressed genes could either cause variation in carotenoid accumulation or result from carotenoid accumulation. Therefore, mining associated molecular markers via association analysis might be a more powerful approach for identifying causative variation.
In the present study, 126 SNPs were identified as closely associated with carotenoid accumulation. The associated SNPs were significantly enriched in 28 scaffolds, which demonstrated that the SNPs were not randomly distributed in the genome. Genes located on the 28 scaffolds were also analyzed, but no significant enrichment was detected, which is consistent with the hypothesis that carotenoid accumulation in bay scallop adductor muscles is controlled by a single gene. Out of the 122 unigenes located on the carotenoid accumulation associated scaffolds, cytochrome P450 and the putative oxidoreductase GLYR1 can oxidize a variety of structurally unrelated compounds and, hence, may be involved in carotenoid metabolism. Protein multidrug resistance-associated protein 5 is involved in organic anion transmembrane transporter activity, and protein F-box/WD repeat-containing protein 7 is related to the accumulation and maintenance of organic solvent-soluble compounds in cells or tissues, thus might be involved in carotenoid transportation, and deposition, respectively, in the adductor of bay scallop. The genes mentioned above can be valuable candidate genes for further research.
Because of the limited recombination in single families, the associated SNPs could be arranged in a large linkage disequilibrium block. Given that the identified SNPs are limited (15,224 SNPs versus 217,310 scaffolds), the associated scaffolds probably only account for a small section of the linkage disequilibrium block. This could explain the paradox that none of the previously reported genes are located on associated scaffolds. Thus, the candidate SNPs identified in the present study are valuable resources for further association studies and linkage analysis using samples with more extensive recombination.
This research was supported by National High Technology Research and Development Program (863 program, 2012AA10A410, LL) and Taishan Scholars Climbing Program of Shandong.
Sequencing raw data for genome survey: NCBI SRA accession number: SRR5514773, SRR5514774.
Sequencing raw data for association analysis: NCBI SRA accession number: SRR5574679, SRR5574680.