PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ijgInternational Journal of Genomics
 
Int J Genomics. 2017; 2017: 7636513.
Published online 2017 February 6. doi:  10.1155/2017/7636513
PMCID: PMC5317147

The Draft Genome and Transcriptome of the Atlantic Horseshoe Crab, Limulus polyphemus

Abstract

The horseshoe crab, Limulus polyphemus, exhibits robust circadian and circatidal rhythms, but little is known about the molecular mechanisms underlying those rhythms. In this study, horseshoe crabs were collected during the day and night as well as high and low tides, and their muscle and central nervous system tissues were processed for genome and transcriptome sequencing, respectively. The genome assembly resulted in 7.4 × 105 contigs with N50 of 4,736, while the transcriptome assembly resulted in 9.3 × 104 contigs and N50 of 3,497. Analysis of functional completeness by the identification of putative universal orthologs suggests that the transcriptome has three times more total expected orthologs than the genome. Interestingly, RNA-Seq analysis indicated no statistically significant changes in expression level for any circadian core or accessory gene, but there was significant cycling of several noncircadian transcripts. Overall, these assemblies provide a resource to investigate the Limulus clock systems and provide a large dataset for further exploration into the taxonomy and biology of the Atlantic horseshoe crab.

1. Introduction

The Atlantic horseshoe crab, Limulus polyphemus, is an important species for a variety of reasons. First, it is considered a keystone species within many estuaries and bays along the Atlantic coast of the United States. Moreover, in these habitats their foraging behavior releases trapped nutrients into their local environment [1, 2]. Second, their eggs are a source of food for endemic and endangered long distance avian migrants. Third, they are used as bait for the eel and conch fisheries [3]. Fourth, their blood is the source of the most sensitive assay for gram-negative bacteria contamination of medical products [46]. Finally, Limulus have long been used as model systems in neurobiological research. For example, lateral inhibition, one of the underlying principles of visual physiology, was first demonstrated in Limulus by Haldan Keffer Hartline, and as a result he won the Nobel Prize in Physiology or Medicine in 1967 [7]. More recently, this species has been used as a model for the investigation biological rhythms, for instance, circadian rhythms of visual sensitivity [810].

Circadian rhythms are produced by a combination of endogenous clocks and cues from the environment [11, 12]. The identity of several core circadian genes that are responsible for producing these rhythms has been revealed in a number of invertebrates and vertebrates [13] and includes period, clock, cryptochrome, cycle, timeless, and other accessory genes. Although these genes are labeled as circadian, due to their critical function within the circadian clock mechanism, they may also play a role in other types of biological rhythms including those that regulate seasonal activity [14]. It has also been proposed, but not demonstrated, that they might be involved in shorter (~12.5 hr) circatidal rhythms [9, 15, 16]. One of the overall goals of this study was to test this hypothesis in horseshoe crabs, which express both a circadian rhythm of lateral eye sensitivity [17] and a circatidal rhythm of locomotion [10, 18].

In this study we developed draft genomic and transcriptomic assemblies for Limulus polyphemus and then compared the genes expressed during high and low tides and during the day versus the night. Particular attention was paid to putative core and accessory circadian genes. We identified these and then compared their expression, using RPKM values, across the different conditions (tides and light : dark (L : D)). Because no clear differences in the expression of putative circadian genes were apparent, we further examined some of the transcripts that did exhibit significant day/night or high/low tide differences as a first step towards the identification of potential proteins involved in the temporal control of the behavior and physiology in this species.

2. Methods

2.1. Animals and Environmental Conditions

For genomic sequencing, an individual horseshoe crab was wild-caught from Great Bay Estuary in Durham, NH (43°05′30′′N and 70°51′55′′W). Leg skeletal muscle tissue was removed and placed in liquid nitrogen for immediate DNA extraction (described in the following).

For transcriptome sequencing, four animals were captured from Great Bay Estuary in Durham, NH, and sacrificed at four different times: day high tide (DHT, 0800), night high tide (NHT, 2030), evening low tide (ELT 1800), and during the day at low tide (DLT, 1530). DLT was collected while still being active (during high tide), placed into a natural water flow-through tank located next to the bay with open exposure, and sacrificed, while being inactive (buried), at the time of low tide (1530). DHT and NHT were used to compare the expression of genes during the day versus the night, and DHT and DLT were used to compare expression during high and low tides. Tissues from ELT were sequenced and used to increase the overall depth of the combined transcriptome dataset. Animals were dissected and their entire central nervous system tissue (protocerebrum, subesophageal ganglia, ventral nerve cord, and ganglia) was snap frozen on dry ice.

2.2. DNA Extraction

300 mg of frozen muscle tissue was pulverized using a sterile, autoclaved mortar and pestle. 19 mL of Qiagen G2 lysis buffer (Qiagen #1014636) spiked with 38 μL of Qiagen RNase A stock solution (100 μg/μL, Qiagen #19101) was combined with the ground tissue in a sterile 50 mL conical tube. DNA was extracted per the Qiagen Genomic-Tip 500/G protocol (Qiagen genomic DNA Handbook, 08/2001). Extracted gDNA was stored at −80°C until library preparation.

2.3. RNA Extraction for Transcriptome Sequencing

Frozen whole CNS tissue was shipped on dry ice to University of Vermont Cancer Center DNA Analysis Facility (Burlington, VT) for extraction. Tissue was stored at −80°C until RNA extraction. Frozen tissue was homogenized in 1 mL of Ambion® TRIzol® Reagent (LifeTech, #15596-026) using a sterile, autoclaved mortar, and pestle. RNA was extracted per the Ambion® TRIzol® protocol (LifeTech, MAN0001271) and purified using Qiagen RNeasy Mini Kit (Qiagen #74104) per manufacturers protocol (Qiagen RNeasy Mini Handbook, 06/2012).

2.4. Genome Illumina Paired-End Library Construction and Sequencing

Extracted gDNA was sent to Vanderbilt Technologies and Advanced Genomics Facility, Nashville, TN. The library was prepared using TruSeq DNA Sample Preparation Guide v2, catalog #FC-930-1021 (Illumina, San Diego, CA, USA). One g of gDNA was sheared using a Covaris S2. Sheared ends were then repaired and adenylated and the products were ligated with Illumina adaptors. Ligated fragments were then size selected using Pippen Prep (Sage Science, Beverly, MA, USA) and cleaned using Zymo Clean and Concentrator Kit (Zymo Research, Irvine, CA, USA). KAPA Hot Start (KAPA Biosystems, Wilmington, MA, USA) was then used the amplify libraries over a total of 14 PCR cycles. Clustering was performed using a cBot (Illumina) and paired-end sequencing was performed on a HiSeq 2000 (Illumina) over two lanes.

2.5. Transcriptome Illumina Paired-End Library Construction and Sequencing

Extracted, purified RNA was quantified and analyzed using a Qubit 2.0 (Life Technology, Carlsbad CA) and an Agilent Bioanalyzer 2100 (Agilent Technology, Santa Clara, CA). RNA libraries were prepared using Illumina TruSeq RNA Sample Prep LT version 2 (RS-122-2001/2002). 1 μg of total RNA was PolyA enriched using AMPure XP Magnetic Beads (Beckman Coulter #A63880). Complementary DNA libraries were created from the enriched RNA using Invitrogen Superscript II® per the manufacturers protocol. cDNA was fragmented, end repaired, and adenylated followed by a ligation of Illumina adaptor indices. Illumina Reagents (Part #15012995) were used for PCR amplification. TruSeq libraries were quantified using Nanodrop, Qubit, and qPCR (KAPA Biosciences kit #4824). Fragment size was determined using Agilent Bioanalyzer 2100 and clustering was performed on the Illumina cBOT. Sequencing was carried out at 12 pM on an Illumina HiSeq1000/1500.

2.6. Genome De Novo Assembly

Illumina CASAVA pipeline was used to demultiplex and filter Limulus genomic reads (reads with Q < 10 were removed). Reads were assembled in CLC Genomics Workbench (v 5.1.2.) using CLC Bio's Proprietary CLC Assembly Cell 4.0 (CLC4) set to default parameters at the Hubbard Center for Genome Studies at the University of New Hampshire (Durham, NH).

2.7. Transcriptome De Novo Assembly

Four unique conditions (DHT, NHT, ELT, and DLT) were assembled separately in CLC Genomics Workbench (v 5.1.2.) using CLC Bio's Proprietary CLC Assembly Cell 4.0 (CLC4) set to default parameters. Additionally, all four conditions were combined and again assembled using the same algorithm and parameters for use as a reference library.

2.8. Benchmarking Universal Single-Copy Orthologs (BUSCO) Analysis

Genome and transcriptome completeness's were assessed using BUSCO v1.1 using the eukaryotic linage for both and default parameters for the genome and transcriptome analyses, respectively.

2.9. mtDNA Analysis

The previously published Limulus mitochondrial genome (NC_003057.1) was blasted against the genomic assembly and used to identify Limulus genomic contig 669. Contig 669 was then analyzed for coding regions and fully annotated using NC_003057.1 as a reference and visualized (Figure 1) using Organellar Genome DRAW [19]. Limulus mtDNA was then blasted against the Limulus genome assembly to look for nuclear mitochondrial (NUMT) sequences. To validate potential NUMT sequences, genomic contigs that contained homologous regions of mtDNA were extracted and compared for similarity.

Figure 1
Gene map of the Limulus polyphemus mitochondrial genome. Arrows indicate strand direction with the inner circle representing genes on the light strand while the outer circle represents genes on the heavy strand. ND1–6 represents nicotinamide adenine ...

2.10. Transcriptome Analysis

Individual read sets from the four samples were mapped to the overall transcriptome assembly, with each contig given a unique identifying number. Reads per kilobase per million mapped reads (RPKMs) were used to determine relative fold change between day/night and high/low tides. Blast2GO [20] was performed on each of the four RNA-Seq mappings to look for expression variation in molecular groupings. Transcripts between conditions were normalized by identifying the condition with the highest number of sequences across all defined gene ontologies (GO) and dividing that condition's GO sequence numbers by the corresponding GOs of another condition marked for comparison. The average difference between GOs of the two individuals (to be compared) was then used as a normalization value for the actual sequence number of the individual with less sequences per GO.

2.11. Accession Numbers

Limulus genome and transcriptome sequence read data are located in the NCBI Sequence Read Archive under accession numbers [SRR4181534] and [SSR421559-SSR4215562], respectively. The Limulus mtDNA genome sequence is located on NCBI under the accession number JX983598.

3. Results

3.1. Limulus Genome Sequencing and Assembly

Genomic sequences yielded a total of 620,743,244 one hundred base pair (paired-end) reads (Table 1). The Limulus genome size is estimated to be 2.74 Gb based on biochemical analysis [21], and, based on that estimation, these sequences should result in an approximate 23x coverage of the genome (Table 1). De novo assembly of the genome resulted in N50 of 4.7 kb with the largest contig being 68.9 kb. The total length of all contigs ≥ 500 bp was 1.4 Gb. To evaluate the completeness of the genome assembly with respect to protein coding genes, BUSCO v1.1 genome assessment was used to look for evidence of 429 putative universal orthologs and yielded the following results: complete genes found (C), 12.9%; duplicated genes found (D), 1.6%; fragmented genes found (F), 10.7%; genes missing (M), 74.8%; number of genes used (N), 429. Overall only 25.2% of the genes that were looked for were found within the genome [22].

Table 1
Limulus genome and transcriptome assembly statistics.

3.2. Limulus mtDNA

The overall length of the Limulus mitochondrial genome is 15,012 bp, which is longer than the other previously published Limulus mitochondrial genome at 14,985 bp [23]. The majority of extra mtDNA appears to be in the noncoding origin of the mtDNA genome. However, there are a number of polymorphisms, insertions, and deletions between the mitochondrial genome sequences, many of which occur in the coding regions (Table 2).

Table 2
Comparison of Limulus mtDNA, JX983598.1 relative to NC_003057.1 using sites of polymorphic variance. Gene name is represented under locus. Size represents the length of the gene. Number of variable sites indicates the total number of nucleotides that ...

3.3. Limulus NUMT Sequences

Two NUMT sequences were identified as genomic contigs 269235 and 5819, which were 9,955 bp and 12,620 bp in length, respectively. These two nuclear contigs each had significant stretches of sequence that were clearly homologous to the extant mitochondrial genome. Specifically, nuclear contig 269235 contained a contiguous region of 2,008 bp that includes sequences homologous to the mitochondrial genes for nad2, methionyl-tRNA, lysidine synthase, glutamate-tRNA ligase, valine-tRNA ligase, and 12s ribosomal RNA and contig 5819 includes 1,065 bp homologous to nad1. To confirm that the NUMT sequences were not simply errors in assembly, we compared the NUMTs and extant mitochondrial sequences. Neither of the NUMT sequences in genomic contigs 269235 or 5819 should have high levels of sequence identity expected if they were assembled from extant mitochondrial genome reads. Specifically NUMT sequences in genomic contig 269235 had an 84.4% average percent identity to mtDNA sequences and those in genomic contig 5819 had a 79.4% identity to the homologous mtDNA sequences (Figure 2).

Figure 2
Diagram of two nuclear mitochondrial (NUMT) sequences identified in the Limulus genome. Limulus genomic contigs 269235 and 5819 show the length of each contig, the location of the NUMT sequences (yellow = NAD variants, blue = tRNAs, and red = rRNA), and ...

3.4. Limulus Transcriptome Sequencing and Assembly

The combined dataset had a total of 484,378,406 reads at a size of 100 bp and was de novo assembled in CLC to generate a reference assembly for analysis. The resulting transcriptomic assembly contained 92,348 contigs with a maximum contig size of 23.85 kb. Transcripts mapped back to the genome with 88.3% of the total transcripts, equating to 109.76 Mb, suggesting that the transcriptome assembly represents most of the exons present in our genome assembly (Figure 3). The completeness of the transcriptome was evaluated using BUSCO v1.1 transcriptome assessment to look for evidence of 429 putative universal orthologs, which yielded the following results: complete genes found (C), 56.6%; duplicated genes found (D), 10.7%; fragmented genes found (F), 13.3%; genes missing (M), 19.3%. Overall the transcriptome was found to have 80.7% of the total number of genes looked for in the BUSCO analysis, more than three times that of the genome [22]. Sequencing and assembly statistics obtained for the genomic and transcriptomic libraries are summarized in Table 1.

Figure 3
Transcriptome assembly mapped to the genome assembly. (a) Nucleotide coverage: y-axis: total number of nucleotides that could be mapped for each transcript length bin (1000 nucleotides). (b) Percent mapped bases: y-axis: total percentage of mapped bases ...

3.5. Effects of Photoperiod and Tidal Phase on Limulus Transcript Expression

The primary purpose of the RNA-Seq data was to generate a central nervous system transcriptome, while a secondary purpose was to generate a set of pilot data for potential differences with regard to photoperiod and tidal phase. Of the four conditions used for transcriptomic analysis (DHT, NHT, ELT, and DLT), three were used for the following comparisons: DHT/NHT and NHT/DLT. Total read counts for each condition were 1.5 × 108 for DHT, 1.3 × 108 for NHT, 5.8 × 107 for ELT, and 1.4 × 108 for DLT. To compare gene expression across the different treatments we mapped the reads from each dataset to the combined reference assembly using CLC Genomics Workbench's (v5.1.2) RNA-Seq analysis toolkit. Volcano plot analysis (CLC v5.1.2) was used to analyze variation between night and day samples as well as high and low tide samples (Figures (Figures44 and and5).5). As expected, the majority of sequences for both conditions fell below the thresholds of statistical significance (below horizontal red line) and a minimum fold change of 2 (in between vertical red lines). However, when the number of contigs that varied significantly was compared between the two treatments (Figure 6), there was a larger number of contigs with a minimum fold change of ≥2 that varied significantly in the high/low tide comparison than in the day/night comparison.

Figure 4
Volcano plot of day versus night showing the distribution of transcripts with respect to fold change (x-axis) and statistical significance (y-axis). Points above the horizontal red line indicate transcripts whose abundance is statistically significant ...
Figure 5
Volcano plot of high versus low tide showing the distribution of transcripts with respect to fold change and statistical significance.
Figure 6
Number of statistically significant (P ≤ 0.05) transcripts (contigs) in both the day/night (red bars) and high tide/low tide (blue bars) volcano plots grouped by Log2 fold change. Results indicate a greater presence of statistically significant ...

3.6. Expression of Gene Types with respect to Photoperiod and Tidal Phase

The majority of transcripts that were different in the photoperiod and tidal phase comparisons were novel [24]. Therefore, only the top four identifiable genes with the lowest P values and highest fold changes are shown for both comparisons in Table 3. Of the genes that could be identified when BLASTed against NCBI, twice as many were affected by tidal versus photoperiodic conditions (e-values <1.0E − 4; Table 3). Additionally, genes with e-values <1.0E − 22 were primarily developmental and regulatory in nature (Table 3).

Table 3
Top four contigs of day/night and high/low tide conditions from RNA-Seq volcano plot analysis. Contigs chosen were furthest away from the origin while still having an associated gene name in GenBank (Figures (Figures44 and and5).5). Contig ...

All transcript contigs from photoperiodic and tidal conditions were also annotated using the BLAST2GO pipeline, which uses gene ontology (GO) to define genes based on common functions and relationships. The GO terms “multiorganism process” (Figure 7), “molecular transducer activity,” “receptor activity,” “structural molecule activity” (Figure 8), and “translation regulator activity” (Figure 9) all varied greater than 40% when looking at the comparison between high and low tides. Only 1 gene ontology term, “structural molecule activity,” appeared to change more than 40% when looking at night versus day (Figure 8).

Figure 7
Biological process BLAST2GO values for RNA-Seq conditions: day/high tide and night and low tide. Data labels are shown in conjunction with their respective slice of the pie graph. Enlarged slices represent GO values with a percent change greater than ...
Figure 8
Molecular function BLAST2GO values for RNA-Seq conditions: day/high tide and night and low tide.
Figure 9
Cellular component BLAST2GO values for RNA-Seq conditions: day/high tide and night and low tide.

3.7. Targeted Analysis of Candidates Genes Involved in Circadian Pathways in Drosophila

Candidate genes homologous to Drosophila were selected as they are considered the core circadian genes from the model insect organism, Drosophila melanogaster. Cycle 1 (KX014724) RPKM values had the largest absolute fold change with respect to tidal phase with a value of 4.3-fold (Table 4). Interestingly, cycle 1 was also among the genes with the least amount of variance when looking at differences in photoperiod (1.1-fold). Cycle 2 (KX014725), however, has approximately the same amount of change when looking at photoperiod or tidal phase (3.1-fold and 3.3-fold, respectively). Cryptochrome 1, cryptochrome 2 (KX014723), timeless (KX014719), clock (KX014718), and period variants A (KX014720), B (KX014721), and C (KX014722) all had absolute fold change values <2.0 for both conditions of photoperiod and tidal phase.

Table 4
Reads per kilobase per million mapped reads (RPKM) for common circadian and control genes. The absolute fold change is shown for two conditions, day/night and high/low tide. Gene name, either homologous gene title identified through BLAST or designated ...

Relative RPKM values were also calculated for a number of circadian accessory genes [2528] and potential control genes [2932], as shown in Table 4. The accessory genes timeout, double-time, vrille (KX014726), and casein kinase's (KX014742), IIα (KX014727), IIβ (KX014741), and (KX014743) show an absolute fold change of ≤1.1-fold in all but two cases: vrille, with respect to photoperiod (1.6-fold), and timeout, with respect to tidal phase (1.9-fold). Additionally, 12 potential control genes were investigated: C-reactive protein (CRP), tubulin (TUB), ubiquitin (UBQ), succinate dehydrogenase complex, subunit A (SDHA), synaptotagmin (SYN), and seven variations of actin (ACT). All of the investigated control genes with the exception of the actin variants had an absolute fold change ≤1.2-fold for either condition. Notably, six of seven actin variations, a gene traditionally used for normalization in QPCR analysis, exhibit an absolute fold change ≥1.3-fold over tidal conditions and four of seven display a variation ≥1.1-fold over photoperiodic conditions. The neuropeptide, pigment-dispersing hormone (PDH), was also investigated as it has been described as a potential downstream regulator of circadian rhythms in arthropods [33, 34]. A PDH receptor-like homolog was identified in the Limulus transcriptome dataset and found to have an absolute fold change of 3.0-fold for conditions of tidal phase, whereas the absolute fold change for conditions of photoperiod was only 1.5-fold (Table 4).

4. Discussion

Based on the fossil record, the Atlantic horseshoe crab, Limulus polyphemus, is one of the most ancient living organisms [35] and therefore the genome and transcriptome presented here will likely both provide clues to evolutionary divergence and lead to the identification of novel genes [36]. The draft sequences for the Limulus genome, transcriptome, and the addition of a second mitochondrial sequence will also provide a dataset for mining biomarkers and other molecular tools with which diversity and population studies can be conducted [37]. Thus, these data may also help to provide genetic markers that can be used to identify local subpopulations and lead to improved management of this important species. Finally, the subphylum Chelicerata is underrepresented when it comes to annotated genomes. Currently there is only one other representative from the Chelicerata subphylum which has had its genome sequenced (Ixodes scapularis [38]) and so adding a draft genome and transcriptome at 23x coverage and 38.5 Gb, respectively, will greatly increase the genetic reference for all Chelicerata.

Limulus has at least two distinct endogenous clocks [8, 9]. There is a circadian clock, which controls day/night sensitivity to light [39] and a clock system that controls circatidal rhythms of locomotor activity [10]. While several core and accessory genes are known to be part of the molecular structure of the circadian clock in model organisms such as Drosophila [2628], the molecular workings of the clock controlling circatidal rhythms are completely unknown [40, 41]. While investigations into the molecular mechanics of this clock in the intertidal mangrove cricket, Apteronemobius asahinai, have shown that the circadian genes period and clock are likely not involved in circatidal rhythms [42, 43], our working hypothesis is that circatidal rhythms evolved from existing circadian mechanisms. Indeed, the findings that circatidal rhythms appear to be controlled by two clocks (“circalunidian clocks” [41, 44]), each of which has an endogenous period (~24.8 h) very close to circadian clocks (~24 h), support this idea. Homologs for all of the core circadian genes have been identified within Limulus (Table 4 [45]), two of which (cycle and timeless variants) seem to vary by time of day and tide (Table 4), supporting the idea that some of core circadian genes are somehow involved in modulating the circadian rhythm as well as the circatidal rhythm.

Overall however, RNA-Seq analysis of the majority of core circadian transcript expression across conditions of photoperiod and tidal phase indicated only slight variation in expression (<2-fold for nearly all) for those conditions. Additionally, QPCR data of the core circadian mRNA within the protocerebrum has shown no statistically significant variation of expression for clock, cycle 1, or timeless (Simpson, unpublished results). When combined, these data suggest a relatively constitutive expression profile when looking at daily and tidal differences of most core circadian clock gene transcripts within the central nervous system as a whole. However, both variants of the core clock gene cycle seem to vary moderately (>3-fold) with respect to tidal phase with only one variant (cycle 2) varying moderately (>3-fold) with respect to photoperiod. Additionally, investigations into downstream circadian regulators like the homolog of the neuropeptide PDH identified a transcript that is closely related to a PDH receptor but shares a higher percent identity with the vertebrate receptor for corticotropin-releasing hormone (CRH). Changes in RPKM values for the transcript exhibited a much greater change (3-fold) from conditions of tidal phase rather than from conditions of photoperiod (Table 4). Further BLAST investigations of PDH, CRH, and precursors involved with their synthesis yielded no reads [45]. With the majority of investigated circadian transcript expression exhibiting strong tendencies towards tidal phase rather than photoperiod, these findings may provide evidence of genes involved with the Limulus circatidal timing system.

When the top 20 most significant transcript contigs were pulled out of the experimental groupings (high tide versus low tide and night versus day) and blasted against the NCBI database, only 20% came back with an identified hit (Table 3). Of the genes that were expressed differentially during conditions of photoperiod, only two had e-values < 1.0E − 3. The exact function of these genes is not known; however, “DMX-like protein 2” seems to be involved in synaptic processes and “protein trapped in endotherm 1” may play a critical role in germ cell migration. The genes that exhibited the largest changes over conditions of tidal phase all had e-values < 1.0E − 31. “Thyroid peroxidase-like” and “3 beta-hydroxysteroid dehydrogenase” seem to be involved in hormone production; “ALX homeobox protein 1” is involved with neural development in mice and “ovostatin-like” is similar to a proteinase inhibitor found in chicken eggs (Table 3). The lack of definitive gene identification and the small percentage of transcripts that returned with BLAST evidence of homology suggests that the majority of transcripts found to respond to photoperiod and tidal phase in the Limulus transcriptome are novel and these findings support similar studies that found genes involved in ecological adaptation tend to be novel [24, 46].

Overall trends in transcript expression indicate a higher degree of difference in transcription occurring during high versus low tide periods than night versus day time periods. For example, there are nearly twice as many transcripts that are differentially expressed during high/low tide versus night/day (Figure 6). When comparing gene ontology term values (GO; Figures Figures7779) there are five terms that vary with a minimum percent change ≥5%. When comparing these terms across conditions of night versus day we see a minimum percent change of 5% and maximum percent change of 52%. However, when looking at those same terms over high and low tidal conditions there is a minimum percent change of 36% and a maximum percent change of 198% (Figures (Figures7779). Taken together the results suggest a transcription profile that is dominated by tides more than by photoperiod. Deeper sequencing and statistical power will perhaps lead to the teasing apart of subtle and potentially significant changes in transcript expression.

The overall analysis of genome completeness using BUSCO was much less than that of the transcriptome (25.2% and 80.7% orthologs identified, resp.) suggesting recalcitrance by the genome and/or genome assembly to the BUSCO analysis [22]. If the genome assembly was causing the low completeness score we would expect a mapping of the transcriptome assembly to the genome assembly to be poor. However, the transcriptome maps to the genome with 88.3% coverage (Figure 3). Additionally, when the genome assembly was compared to previously reported Limulus genome sizes, it was found to be about half of the expected size, indicating a large, highly repetitive genome [21, 47]. Specifically, our assembly appeared to include highly frequent repetitive elements that represent a large factor of the genome. We hypothesize that it is those repeats in the genes (introns) that reduce the success of the findings universal orthologs. By contrast, the transcriptome cuts away these highly repetitive elements. Additionally, while not contributing to the low BUSCO score, other complexities with the genome structure, like the integration of mtDNA elements, may have facilitated increased collapse within the genome assembly.

Nuclear incorporation of mtDNA (NUMT) sequences can cause problems when conducting mtDNA-based studies because they can be confused with the extant version of the mitochondrial genome. To identify potential NUMTs, the Limulus genome assembly was searched for sequences highly similar to the extant version of the mitochondrial genome using BLASTN, which yielded two contigs. In both contigs these putative NUMT sequences were flanked by nonmitochondrial sequences including hypothetical Zinc Finger MYM-type 1 coding sequences, which are not present in any known animal mtDNA genome. The presence of these Zinc Finger sequences suggested either that the mitochondrial-like sequences were NUMTs or that there was an error which is the genome assembly causing the mtDNA sequences to be combined with the gDNA sequences. Results indicate that these sequences seem to be true NUMTs as the percent identity between the suspected NUMT sequences and the mtDNA sequences was less than 85% in either case (Figure 2). For each of these two putative NUMTs the nuclear and extant mitochondrial sequences are collinear, suggesting that these could have been derived from single translocation events and although there are two distinct mtDNA loci incorporated into two separate genomic contigs their similar levels of divergence from the extant versions may suggest a single common origin for both putative NUMTs.

The draft genome and transcriptome will begin to provide a molecular guide for a host of environmental and experimental studies. This will allow for a deeper, more thorough understanding of the behavioral and environmental impacts and challenges these animals face. These datasets are an important addition to an evergrowing aggregation of genomic and transcriptomic databases.

5. Conclusions

Endogenous biological rhythms are the result of complex and highly dynamic interactions between many different molecular components. When more than one rhythmic system is present in an organism, such as Limulus, it can be much more difficult to elucidate the most important molecules and their interactions. Here we show that Limulus have many genes that are considered central to other model organism's circadian clocks, but their transcripts do not seem to cycle with respect to time of day. However, there does seem to be some degree of transcript regulation with respect to tidal phase. The draft Limulus genome and transcriptome reported should facilitate future molecular and genetic investigations concerning biological clocks and other processes in chelicerates and other related organisms.

Acknowledgments

This work was supported by NSF IOS 090342 to Christopher C. Chabot and Winsor H. Watson III and the New Hampshire IDeA Network of Biological Research Excellence with grants from the National Center for Research Resources (5P20RR030360-03) and the National Institute of General Medical Sciences (8P20GM103506-03), PSU College of Graduate Studies, and the PSU Department of Biological Sciences. The authors would also like to thank Megan Cooper for her work on the Limulus mtDNA and Susan Swope for helpful technical comments, as well as the UNH Hubbard Center for Genome Studies and Kelley Thomas for their technical assistance and comments on the manuscript.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

References

1. Jackson N. L., Nordstrom K. F., Smith D. R. Influence of waves and horseshoe crab spawning on beach morphology and sediment grain-size characteristics on a sandy estuarine beach. Sedimentology. 2005;52(5):1097–1108. doi: 10.1111/j.1365-3091.2005.00725.x. [Cross Ref]
2. Karpanty S. M., Fraser J. D., Berkson J., Niles L. J., Dey A., Smith E. P. Horseshoe crab eggs determine red knot distribution in Delaware Bay. Journal of Wildlife Management. 2006;70(6):1704–1710. doi: 10.2193/0022-541X(2006)70[1704:HCEDRK]2.0.CO;2. [Cross Ref]
3. Berkson J., Shuster C. N., Jr. The horseshoe crab: the battle for a true multiple-use resource. Fisheries. 1999;24(11):6–10.
4. Bianchini M. L., Sorensen P. W., Winn H. E. The use of horseshoe crabs as eel bait. Journal of the World Mariculture Society. 1981;12(2):127–129. doi: 10.1111/j.1749-7345.1981.tb00286.x. [Cross Ref]
5. Novitsky T. J. Limulus amebocyte lysate (LAL) detection of endotoxin in human blood. Journal of Endotoxin Research. 1994;1(4):253–263. doi: 10.1177/096805199400100407. [Cross Ref]
6. Ferrari K. M., Targett N. M. Chemical attractants in horseshoe crab, Limulus polyphemus, eggs: the potential for an artificial bait. Journal of Chemical Ecology. 2003;29(2):477–496. doi: 10.1023/a:1022698431776. [PubMed] [Cross Ref]
7. Hartline H. K., Wagner H. G., Ratliff F. Inhibition in the eye of Limulus. The Journal of General Physiology. 1956;39(5):651–673. doi: 10.1085/jgp.39.5.651. [PMC free article] [PubMed] [Cross Ref]
8. Barlow R. B. Circadian rhythms in the Limulus visual system. The Journal of Neuroscience. 1983;3(4):856–870. [PubMed]
9. Watson W. H., III, Bedford L., Chabot C. C. Rhythms of locomotion expressed by Limulus polyphemus, the American horseshoe crab: II. relationship to circadian rhythms of visual sensitivity. The Biological Bulletin. 2008;215(1):46–56. doi: 10.2307/25470682. [PubMed] [Cross Ref]
10. Chabot C. C., Kent J., Watson W. H., III Circatidal and circadian rhythms of locomotion in Limulus polyphemus. The Biological Bulletin. 2004;207(1):72–75. doi: 10.2307/1543630. [PubMed] [Cross Ref]
11. Dunlap J. C. Molecular bases for circadian clocks. Cell. 1999;96(2):271–290. doi: 10.1016/S0092-8674(00)80566-8. [PubMed] [Cross Ref]
12. Reppert S. M., Weaver D. R. Molecular analysis of mammalian circadian rhythms. Annual Review of Physiology. 2001;63(1):647–676. doi: 10.1146/annurev.physiol.63.1.647. [PubMed] [Cross Ref]
13. Panda S., Hogenesch J. B., Kay S. A. Circadian rhythms from flies to human. Nature. 2002;417(6886):329–335. doi: 10.1038/417329a. [PubMed] [Cross Ref]
14. Nuesslein-Hildesheim B., O'Brien J. A., Ebling F. J. P., Maywood E. S., Hastings M. H. The circadian cycle of mPER clock gene products in the suprachiasmatic nucleus of the Siberian hamster encodes both daily and seasonal time. European Journal of Neuroscience. 2000;12(8):2856–2864. doi: 10.1046/j.1460-9568.2000.00173.x. [PubMed] [Cross Ref]
15. Forward R. B., Jr., Tankersley R. A., Pochelon P. N. Circatidal activity rhythms in ovigerous blue crabs, Callinectes sapidus: implications for ebb-tide transport during the spawning migration. Marine Biology. 2003;142(1):67–76. doi: 10.1007/s00227-002-0923-5. [Cross Ref]
16. Stillman J. H., Barnwell F. H. Relationship of daily and circatidal activity rhythms of the fiddler crab, Uca princeps, to the harmonic structure of semidiurnal and mixed tides. Marine Biology. 2004;144(3):473–482. doi: 10.1007/s00227-003-1213-6. [Cross Ref]
17. Barlow R. Circadian and efferent modulation of visual sensitivity. Progress in Brain Research. 2001;131:487–503. doi: 10.1016/S0079-6123(01)31039-7. [PubMed] [Cross Ref]
18. Chabot C. C., Watson W. H., III Circatidal rhythms of locomotion in the American horseshoe crab limulus polyphemus: underlying mechanisms and cues that influence them. Current Zoology. 2010;56(5):499–517.
19. Lohse M., Drechsel O., Kahlau S., Bock R. OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Research. 2013;41:W575–W581. doi: 10.1093/nar/gkt289. [PMC free article] [PubMed] [Cross Ref]
20. Conesa A., Götz S., García-Gómez J. M., Terol J., Talón M., Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–3676. doi: 10.1093/bioinformatics/bti610. [PubMed] [Cross Ref]
21. Goldberg R. B., Crain W. R., Ruderman J. V., et al. DNA sequence organization in the genomes of five marine invertebrates. Chromosoma. 1975;51(3):225–251. doi: 10.1007/BF00284817. [PubMed] [Cross Ref]
22. Simão F. A., Waterhouse R. M., Ioannidis P., Kriventseva E. V., Zdobnov E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [PubMed] [Cross Ref]
23. Lavrov D. V., Boore J. L., Brown W. M. The complete mitochondrial DNA sequence of the horseshoe crab Limulus polyphemus. Molecular Biology and Evolution. 2000;17(5):813–824. doi: 10.1093/oxfordjournals.molbev.a026360. [PubMed] [Cross Ref]
24. Colbourne J. K., Pfrender M. E., Gilbert D., et al. The ecoresponsive genome of Daphnia pulex. Science. 2011;331(6017):555–561. doi: 10.1126/science.1197761. [PMC free article] [PubMed] [Cross Ref]
25. Price J. L., Blau J., Rothenfluh A., Abodeely M., Kloss B., Young M. W. double-time is a novel Drosophila clock gene that regulates period protein accumulation. Cell. 1998;94(1):83–95. doi: 10.1016/S0092-8674(00)81224-6. [PubMed] [Cross Ref]
26. Blau J., Young M. W. Cycling vrille expression is required for a functional Drosophila clock. Cell. 1999;99(6):661–671. doi: 10.1016/S0092-8674(00)81554-8. [PubMed] [Cross Ref]
27. Benna C., Scannapieco P., Piccin A., et al. A second timeless gene in Drosophila shares greater sequence similarity with mammalian tim. Current Biology. 2000;10(14):R512–R513. doi: 10.1016/S0960-9822(00)00594-7. [PubMed] [Cross Ref]
28. Lin J.-M., Kilman V. L., Keegan K., et al. A role for casein kinase 2α in the Drosophila circadian clock. Nature. 2002;420(6917):816–820. doi: 10.1038/nature01235. [PubMed] [Cross Ref]
29. Marten N. W., Burke E. J., Hayden J. M., Straus D. S. Effect of amino acid limitation on the expression of 19 genes in rat hepatoma cells. The FASEB Journal. 1994;8(8):538–544. [PubMed]
30. Lupberger J., Kreuzer K.-A., Baskaynak G., Peters U. R., Le Coutre P., Schmidt C. A. Quantitative analysis of beta-actin, beta-2-microglobulin and porphobilinogen deaminase mRNA and their comparison as control transcripts for RT-PCR. Molecular and Cellular Probes. 2002;16(1):25–30. doi: 10.1006/mcpr.2001.0392. [PubMed] [Cross Ref]
31. Goossens K., Van Poucke M., Van Soom A., Vandesompele J., Van Zeveren A., Peelman L. J. Selection of reference genes for quantitative real-time PCR in bovine preimplantation embryos. BMC Developmental Biology. 2005;5(1, article 27) doi: 10.1186/1471-213x-5-27. [PMC free article] [PubMed] [Cross Ref]
32. Etschmann B., Wilcken B., Stoevesand K., Von Der Schulenburg A., Sterner-Kock A. Selection of reference genes for quantitative real-time PCR analysis in canine mammary tumors using the GeNorm algorithm. Veterinary Pathology. 2006;43(6):934–942. doi: 10.1354/vp.43-6-934. [PubMed] [Cross Ref]
33. Pyza E., Meinertzhagen I. A. Neurites of period-expressing PDH cells in the fly's optic lobe exhibit circadian oscillations in morphology. European Journal of Neuroscience. 1997;9(8):1784–1788. doi: 10.1111/j.1460-9568.1997.tb01537.x. [PubMed] [Cross Ref]
34. Renn S. C. P., Park J. H., Rosbash M., Hall J. C., Taghert P. H. A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell. 1999;99(7):791–802. doi: 10.1016/S0092-8674(00)81676-1. [PubMed] [Cross Ref]
35. Dunlop J. A. Geological history and phylogeny of Chelicerata. Arthropod Structure & Development. 2010;39(2-3):124–142. doi: 10.1016/j.asd.2010.01.003. [PubMed] [Cross Ref]
36. Warren W. C., Hillier L. W., Marshall Graves J. A., et al. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008;453(7192):175–183. doi: 10.1038/nature06936. [PMC free article] [PubMed] [Cross Ref]
37. Hauser L., Adcock G. J., Smith P. J., Ramírez J. H. B., Carvalho G. R. Loss of microsatellite diversity and low effective population size in an overexploited population of New Zealand snapper (Pagrus auratus) Proceedings of the National Academy of Sciences of the United States of America. 2002;99(18):11742–11747. doi: 10.1073/pnas.172242899. [PubMed] [Cross Ref]
38. Hill C. A., Wikel S. K. The Ixodes scapularis Genome Project: an opportunity for advancing tick research. Trends in Parasitology. 2005;21(4):151–153. doi: 10.1016/j.pt.2005.02.004. [PubMed] [Cross Ref]
39. Barlow R. B., Jr., Bolanowski S. J., Jr., Brachman M. L. Efferent optic nerve fibers mediate circadian rhythms in the Limulus eye. Science. 1977;197(4298):86–89. doi: 10.1126/science.867057. [PubMed] [Cross Ref]
40. De La Iglesia H. O., Hsu Y.-W. A. Biological clocks and rhythms in intertidal crustaceans. Frontiers in Bioscience—Elite. 2010;2(4):1394–1404. [PubMed]
41. Chabot C. C., Ramberg-Pihl N. C., Watson W. H., III Circalunidian clocks control tidal rhythms of locomotion in the American horseshoe crab, Limulus polyphemus. Marine and Freshwater Behaviour and Physiology. 2016;49(2):75–91. doi: 10.1080/10236244.2015.1127679. [PMC free article] [PubMed] [Cross Ref]
42. Takekata H., Matsuura Y., Goto S. G., Satoh A., Numata H. RNAi of the circadian clock gene period disrupts the circadian rhythm but not the circatidal rhythm in the mangrove cricket. Biology Letters. 2012;8(4):488–491. doi: 10.1098/rsbl.2012.0079. [PMC free article] [PubMed] [Cross Ref]
43. Takekata H., Numata H., Shiga S., Goto S. G. Silencing the circadian clock gene Clock using RNAi reveals dissociation of the circatidal clock from the circadian clock in the mangrove cricket. Journal of Insect Physiology. 2014;68:16–22. doi: 10.1016/j.jinsphys.2014.06.012. [PubMed] [Cross Ref]
44. Palmer J. D. Comparative studies of tidal rhythms. VIII. A translocation experiment involving Circalunidian rhythms. Marine Behaviour and Physiology. 1989;14(4):231–243. doi: 10.1080/10236248909378710. [Cross Ref]
45. Chesmore K. N., Watson W. H., Chabot C. C. Identification of putative circadian clock genes in the American horseshoe crab, Limulus polyphemus. Comparative Biochemistry and Physiology Part D: Genomics and Proteomics. 2016;19:45–61. doi: 10.1016/j.cbd.2016.06.001. [PubMed] [Cross Ref]
46. Connor K. M., Gracey A. Y. Circadian cycles are the dominant transcriptional rhythm in the intertidal mussel Mytilus californianus. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(38):16110–16115. doi: 10.1073/pnas.1111076108. [PubMed] [Cross Ref]
47. Nossa C. W., Havlak P., Yue J.-X., et al. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication. GigaScience. 2014;3(1, article no. 9) doi: 10.1186/2047-217X-3-9. [PMC free article] [PubMed] [Cross Ref]

Articles from International Journal of Genomics are provided here courtesy of Hindawi