|Home | About | Journals | Submit | Contact Us | Français|
The horseshoe crab, Limulus polyphemus, exhibits robust circadian and circatidal rhythms, but little is known about the molecular mechanisms underlying those rhythms. In this study, horseshoe crabs were collected during the day and night as well as high and low tides, and their muscle and central nervous system tissues were processed for genome and transcriptome sequencing, respectively. The genome assembly resulted in 7.4 × 105 contigs with N50 of 4,736, while the transcriptome assembly resulted in 9.3 × 104 contigs and N50 of 3,497. Analysis of functional completeness by the identification of putative universal orthologs suggests that the transcriptome has three times more total expected orthologs than the genome. Interestingly, RNA-Seq analysis indicated no statistically significant changes in expression level for any circadian core or accessory gene, but there was significant cycling of several noncircadian transcripts. Overall, these assemblies provide a resource to investigate the Limulus clock systems and provide a large dataset for further exploration into the taxonomy and biology of the Atlantic horseshoe crab.
The Atlantic horseshoe crab, Limulus polyphemus, is an important species for a variety of reasons. First, it is considered a keystone species within many estuaries and bays along the Atlantic coast of the United States. Moreover, in these habitats their foraging behavior releases trapped nutrients into their local environment [1, 2]. Second, their eggs are a source of food for endemic and endangered long distance avian migrants. Third, they are used as bait for the eel and conch fisheries . Fourth, their blood is the source of the most sensitive assay for gram-negative bacteria contamination of medical products [4–6]. Finally, Limulus have long been used as model systems in neurobiological research. For example, lateral inhibition, one of the underlying principles of visual physiology, was first demonstrated in Limulus by Haldan Keffer Hartline, and as a result he won the Nobel Prize in Physiology or Medicine in 1967 . More recently, this species has been used as a model for the investigation biological rhythms, for instance, circadian rhythms of visual sensitivity [8–10].
Circadian rhythms are produced by a combination of endogenous clocks and cues from the environment [11, 12]. The identity of several core circadian genes that are responsible for producing these rhythms has been revealed in a number of invertebrates and vertebrates  and includes period, clock, cryptochrome, cycle, timeless, and other accessory genes. Although these genes are labeled as circadian, due to their critical function within the circadian clock mechanism, they may also play a role in other types of biological rhythms including those that regulate seasonal activity . It has also been proposed, but not demonstrated, that they might be involved in shorter (~12.5hr) circatidal rhythms [9, 15, 16]. One of the overall goals of this study was to test this hypothesis in horseshoe crabs, which express both a circadian rhythm of lateral eye sensitivity  and a circatidal rhythm of locomotion [10, 18].
In this study we developed draft genomic and transcriptomic assemblies for Limulus polyphemus and then compared the genes expressed during high and low tides and during the day versus the night. Particular attention was paid to putative core and accessory circadian genes. We identified these and then compared their expression, using RPKM values, across the different conditions (tides and light:dark (L:D)). Because no clear differences in the expression of putative circadian genes were apparent, we further examined some of the transcripts that did exhibit significant day/night or high/low tide differences as a first step towards the identification of potential proteins involved in the temporal control of the behavior and physiology in this species.
For genomic sequencing, an individual horseshoe crab was wild-caught from Great Bay Estuary in Durham, NH (43°05′30′′N and 70°51′55′′W). Leg skeletal muscle tissue was removed and placed in liquid nitrogen for immediate DNA extraction (described in the following).
For transcriptome sequencing, four animals were captured from Great Bay Estuary in Durham, NH, and sacrificed at four different times: day high tide (DHT, 0800), night high tide (NHT, 2030), evening low tide (ELT 1800), and during the day at low tide (DLT, 1530). DLT was collected while still being active (during high tide), placed into a natural water flow-through tank located next to the bay with open exposure, and sacrificed, while being inactive (buried), at the time of low tide (1530). DHT and NHT were used to compare the expression of genes during the day versus the night, and DHT and DLT were used to compare expression during high and low tides. Tissues from ELT were sequenced and used to increase the overall depth of the combined transcriptome dataset. Animals were dissected and their entire central nervous system tissue (protocerebrum, subesophageal ganglia, ventral nerve cord, and ganglia) was snap frozen on dry ice.
300mg of frozen muscle tissue was pulverized using a sterile, autoclaved mortar and pestle. 19mL of Qiagen G2 lysis buffer (Qiagen #1014636) spiked with 38μL of Qiagen RNase A stock solution (100μg/μL, Qiagen #19101) was combined with the ground tissue in a sterile 50mL conical tube. DNA was extracted per the Qiagen Genomic-Tip 500/G protocol (Qiagen genomic DNA Handbook, 08/2001). Extracted gDNA was stored at −80°C until library preparation.
Frozen whole CNS tissue was shipped on dry ice to University of Vermont Cancer Center DNA Analysis Facility (Burlington, VT) for extraction. Tissue was stored at −80°C until RNA extraction. Frozen tissue was homogenized in 1mL of Ambion® TRIzol® Reagent (LifeTech, #15596-026) using a sterile, autoclaved mortar, and pestle. RNA was extracted per the Ambion® TRIzol® protocol (LifeTech, MAN0001271) and purified using Qiagen RNeasy Mini Kit (Qiagen #74104) per manufacturers protocol (Qiagen RNeasy Mini Handbook, 06/2012).
Extracted gDNA was sent to Vanderbilt Technologies and Advanced Genomics Facility, Nashville, TN. The library was prepared using TruSeq DNA Sample Preparation Guide v2, catalog #FC-930-1021 (Illumina, San Diego, CA, USA). One g of gDNA was sheared using a Covaris S2. Sheared ends were then repaired and adenylated and the products were ligated with Illumina adaptors. Ligated fragments were then size selected using Pippen Prep (Sage Science, Beverly, MA, USA) and cleaned using Zymo Clean and Concentrator Kit (Zymo Research, Irvine, CA, USA). KAPA Hot Start (KAPA Biosystems, Wilmington, MA, USA) was then used the amplify libraries over a total of 14 PCR cycles. Clustering was performed using a cBot (Illumina) and paired-end sequencing was performed on a HiSeq 2000 (Illumina) over two lanes.
Extracted, purified RNA was quantified and analyzed using a Qubit 2.0 (Life Technology, Carlsbad CA) and an Agilent Bioanalyzer 2100 (Agilent Technology, Santa Clara, CA). RNA libraries were prepared using Illumina TruSeq RNA Sample Prep LT version 2 (RS-122-2001/2002). 1μg of total RNA was PolyA enriched using AMPure XP Magnetic Beads (Beckman Coulter #A63880). Complementary DNA libraries were created from the enriched RNA using Invitrogen Superscript II® per the manufacturers protocol. cDNA was fragmented, end repaired, and adenylated followed by a ligation of Illumina adaptor indices. Illumina Reagents (Part #15012995) were used for PCR amplification. TruSeq libraries were quantified using Nanodrop, Qubit, and qPCR (KAPA Biosciences kit #4824). Fragment size was determined using Agilent Bioanalyzer 2100 and clustering was performed on the Illumina cBOT. Sequencing was carried out at 12pM on an Illumina HiSeq1000/1500.
Illumina CASAVA pipeline was used to demultiplex and filter Limulus genomic reads (reads with Q < 10 were removed). Reads were assembled in CLC Genomics Workbench (v 5.1.2.) using CLC Bio's Proprietary CLC Assembly Cell 4.0 (CLC4) set to default parameters at the Hubbard Center for Genome Studies at the University of New Hampshire (Durham, NH).
Four unique conditions (DHT, NHT, ELT, and DLT) were assembled separately in CLC Genomics Workbench (v 5.1.2.) using CLC Bio's Proprietary CLC Assembly Cell 4.0 (CLC4) set to default parameters. Additionally, all four conditions were combined and again assembled using the same algorithm and parameters for use as a reference library.
Genome and transcriptome completeness's were assessed using BUSCO v1.1 using the eukaryotic linage for both and default parameters for the genome and transcriptome analyses, respectively.
The previously published Limulus mitochondrial genome (NC_003057.1) was blasted against the genomic assembly and used to identify Limulus genomic contig 669. Contig 669 was then analyzed for coding regions and fully annotated using NC_003057.1 as a reference and visualized (Figure 1) using Organellar Genome DRAW . Limulus mtDNA was then blasted against the Limulus genome assembly to look for nuclear mitochondrial (NUMT) sequences. To validate potential NUMT sequences, genomic contigs that contained homologous regions of mtDNA were extracted and compared for similarity.
Individual read sets from the four samples were mapped to the overall transcriptome assembly, with each contig given a unique identifying number. Reads per kilobase per million mapped reads (RPKMs) were used to determine relative fold change between day/night and high/low tides. Blast2GO  was performed on each of the four RNA-Seq mappings to look for expression variation in molecular groupings. Transcripts between conditions were normalized by identifying the condition with the highest number of sequences across all defined gene ontologies (GO) and dividing that condition's GO sequence numbers by the corresponding GOs of another condition marked for comparison. The average difference between GOs of the two individuals (to be compared) was then used as a normalization value for the actual sequence number of the individual with less sequences per GO.
Limulus genome and transcriptome sequence read data are located in the NCBI Sequence Read Archive under accession numbers [SRR4181534] and [SSR421559-SSR4215562], respectively. The Limulus mtDNA genome sequence is located on NCBI under the accession number JX983598.
Genomic sequences yielded a total of 620,743,244 one hundred base pair (paired-end) reads (Table 1). The Limulus genome size is estimated to be 2.74Gb based on biochemical analysis , and, based on that estimation, these sequences should result in an approximate 23x coverage of the genome (Table 1). De novo assembly of the genome resulted in N50 of 4.7kb with the largest contig being 68.9kb. The total length of all contigs ≥ 500bp was 1.4Gb. To evaluate the completeness of the genome assembly with respect to protein coding genes, BUSCO v1.1 genome assessment was used to look for evidence of 429 putative universal orthologs and yielded the following results: complete genes found (C), 12.9%; duplicated genes found (D), 1.6%; fragmented genes found (F), 10.7%; genes missing (M), 74.8%; number of genes used (N), 429. Overall only 25.2% of the genes that were looked for were found within the genome .
The overall length of the Limulus mitochondrial genome is 15,012bp, which is longer than the other previously published Limulus mitochondrial genome at 14,985bp . The majority of extra mtDNA appears to be in the noncoding origin of the mtDNA genome. However, there are a number of polymorphisms, insertions, and deletions between the mitochondrial genome sequences, many of which occur in the coding regions (Table 2).
Two NUMT sequences were identified as genomic contigs 269235 and 5819, which were 9,955bp and 12,620bp in length, respectively. These two nuclear contigs each had significant stretches of sequence that were clearly homologous to the extant mitochondrial genome. Specifically, nuclear contig 269235 contained a contiguous region of 2,008bp that includes sequences homologous to the mitochondrial genes for nad2, methionyl-tRNA, lysidine synthase, glutamate-tRNA ligase, valine-tRNA ligase, and 12s ribosomal RNA and contig 5819 includes 1,065bp homologous to nad1. To confirm that the NUMT sequences were not simply errors in assembly, we compared the NUMTs and extant mitochondrial sequences. Neither of the NUMT sequences in genomic contigs 269235 or 5819 should have high levels of sequence identity expected if they were assembled from extant mitochondrial genome reads. Specifically NUMT sequences in genomic contig 269235 had an 84.4% average percent identity to mtDNA sequences and those in genomic contig 5819 had a 79.4% identity to the homologous mtDNA sequences (Figure 2).
The combined dataset had a total of 484,378,406 reads at a size of 100bp and was de novo assembled in CLC to generate a reference assembly for analysis. The resulting transcriptomic assembly contained 92,348 contigs with a maximum contig size of 23.85kb. Transcripts mapped back to the genome with 88.3% of the total transcripts, equating to 109.76Mb, suggesting that the transcriptome assembly represents most of the exons present in our genome assembly (Figure 3). The completeness of the transcriptome was evaluated using BUSCO v1.1 transcriptome assessment to look for evidence of 429 putative universal orthologs, which yielded the following results: complete genes found (C), 56.6%; duplicated genes found (D), 10.7%; fragmented genes found (F), 13.3%; genes missing (M), 19.3%. Overall the transcriptome was found to have 80.7% of the total number of genes looked for in the BUSCO analysis, more than three times that of the genome . Sequencing and assembly statistics obtained for the genomic and transcriptomic libraries are summarized in Table 1.
The primary purpose of the RNA-Seq data was to generate a central nervous system transcriptome, while a secondary purpose was to generate a set of pilot data for potential differences with regard to photoperiod and tidal phase. Of the four conditions used for transcriptomic analysis (DHT, NHT, ELT, and DLT), three were used for the following comparisons: DHT/NHT and NHT/DLT. Total read counts for each condition were 1.5 × 108 for DHT, 1.3 × 108 for NHT, 5.8 × 107 for ELT, and 1.4 × 108 for DLT. To compare gene expression across the different treatments we mapped the reads from each dataset to the combined reference assembly using CLC Genomics Workbench's (v5.1.2) RNA-Seq analysis toolkit. Volcano plot analysis (CLC v5.1.2) was used to analyze variation between night and day samples as well as high and low tide samples (Figures (Figures44 and and5).5). As expected, the majority of sequences for both conditions fell below the thresholds of statistical significance (below horizontal red line) and a minimum fold change of 2 (in between vertical red lines). However, when the number of contigs that varied significantly was compared between the two treatments (Figure 6), there was a larger number of contigs with a minimum fold change of ≥2 that varied significantly in the high/low tide comparison than in the day/night comparison.
The majority of transcripts that were different in the photoperiod and tidal phase comparisons were novel . Therefore, only the top four identifiable genes with the lowest P values and highest fold changes are shown for both comparisons in Table 3. Of the genes that could be identified when BLASTed against NCBI, twice as many were affected by tidal versus photoperiodic conditions (e-values <1.0E − 4; Table 3). Additionally, genes with e-values <1.0E − 22 were primarily developmental and regulatory in nature (Table 3).
All transcript contigs from photoperiodic and tidal conditions were also annotated using the BLAST2GO pipeline, which uses gene ontology (GO) to define genes based on common functions and relationships. The GO terms “multiorganism process” (Figure 7), “molecular transducer activity,” “receptor activity,” “structural molecule activity” (Figure 8), and “translation regulator activity” (Figure 9) all varied greater than 40% when looking at the comparison between high and low tides. Only 1 gene ontology term, “structural molecule activity,” appeared to change more than 40% when looking at night versus day (Figure 8).
Candidate genes homologous to Drosophila were selected as they are considered the core circadian genes from the model insect organism, Drosophila melanogaster. Cycle 1 (KX014724) RPKM values had the largest absolute fold change with respect to tidal phase with a value of 4.3-fold (Table 4). Interestingly, cycle 1 was also among the genes with the least amount of variance when looking at differences in photoperiod (1.1-fold). Cycle 2 (KX014725), however, has approximately the same amount of change when looking at photoperiod or tidal phase (3.1-fold and 3.3-fold, respectively). Cryptochrome 1, cryptochrome 2 (KX014723), timeless (KX014719), clock (KX014718), and period variants A (KX014720), B (KX014721), and C (KX014722) all had absolute fold change values <2.0 for both conditions of photoperiod and tidal phase.
Relative RPKM values were also calculated for a number of circadian accessory genes [25–28] and potential control genes [29–32], as shown in Table 4. The accessory genes timeout, double-time, vrille (KX014726), and casein kinase's Iα (KX014742), IIα (KX014727), IIβ (KX014741), and Iε (KX014743) show an absolute fold change of ≤1.1-fold in all but two cases: vrille, with respect to photoperiod (1.6-fold), and timeout, with respect to tidal phase (1.9-fold). Additionally, 12 potential control genes were investigated: C-reactive protein (CRP), tubulin (TUB), ubiquitin (UBQ), succinate dehydrogenase complex, subunit A (SDHA), synaptotagmin (SYN), and seven variations of actin (ACT). All of the investigated control genes with the exception of the actin variants had an absolute fold change ≤1.2-fold for either condition. Notably, six of seven actin variations, a gene traditionally used for normalization in QPCR analysis, exhibit an absolute fold change ≥1.3-fold over tidal conditions and four of seven display a variation ≥1.1-fold over photoperiodic conditions. The neuropeptide, pigment-dispersing hormone (PDH), was also investigated as it has been described as a potential downstream regulator of circadian rhythms in arthropods [33, 34]. A PDH receptor-like homolog was identified in the Limulus transcriptome dataset and found to have an absolute fold change of 3.0-fold for conditions of tidal phase, whereas the absolute fold change for conditions of photoperiod was only 1.5-fold (Table 4).
Based on the fossil record, the Atlantic horseshoe crab, Limulus polyphemus, is one of the most ancient living organisms  and therefore the genome and transcriptome presented here will likely both provide clues to evolutionary divergence and lead to the identification of novel genes . The draft sequences for the Limulus genome, transcriptome, and the addition of a second mitochondrial sequence will also provide a dataset for mining biomarkers and other molecular tools with which diversity and population studies can be conducted . Thus, these data may also help to provide genetic markers that can be used to identify local subpopulations and lead to improved management of this important species. Finally, the subphylum Chelicerata is underrepresented when it comes to annotated genomes. Currently there is only one other representative from the Chelicerata subphylum which has had its genome sequenced (Ixodes scapularis ) and so adding a draft genome and transcriptome at 23x coverage and 38.5Gb, respectively, will greatly increase the genetic reference for all Chelicerata.
Limulus has at least two distinct endogenous clocks [8, 9]. There is a circadian clock, which controls day/night sensitivity to light  and a clock system that controls circatidal rhythms of locomotor activity . While several core and accessory genes are known to be part of the molecular structure of the circadian clock in model organisms such as Drosophila [26–28], the molecular workings of the clock controlling circatidal rhythms are completely unknown [40, 41]. While investigations into the molecular mechanics of this clock in the intertidal mangrove cricket, Apteronemobius asahinai, have shown that the circadian genes period and clock are likely not involved in circatidal rhythms [42, 43], our working hypothesis is that circatidal rhythms evolved from existing circadian mechanisms. Indeed, the findings that circatidal rhythms appear to be controlled by two clocks (“circalunidian clocks” [41, 44]), each of which has an endogenous period (~24.8h) very close to circadian clocks (~24h), support this idea. Homologs for all of the core circadian genes have been identified within Limulus (Table 4 ), two of which (cycle and timeless variants) seem to vary by time of day and tide (Table 4), supporting the idea that some of core circadian genes are somehow involved in modulating the circadian rhythm as well as the circatidal rhythm.
Overall however, RNA-Seq analysis of the majority of core circadian transcript expression across conditions of photoperiod and tidal phase indicated only slight variation in expression (<2-fold for nearly all) for those conditions. Additionally, QPCR data of the core circadian mRNA within the protocerebrum has shown no statistically significant variation of expression for clock, cycle 1, or timeless (Simpson, unpublished results). When combined, these data suggest a relatively constitutive expression profile when looking at daily and tidal differences of most core circadian clock gene transcripts within the central nervous system as a whole. However, both variants of the core clock gene cycle seem to vary moderately (>3-fold) with respect to tidal phase with only one variant (cycle 2) varying moderately (>3-fold) with respect to photoperiod. Additionally, investigations into downstream circadian regulators like the homolog of the neuropeptide PDH identified a transcript that is closely related to a PDH receptor but shares a higher percent identity with the vertebrate receptor for corticotropin-releasing hormone (CRH). Changes in RPKM values for the transcript exhibited a much greater change (3-fold) from conditions of tidal phase rather than from conditions of photoperiod (Table 4). Further BLAST investigations of PDH, CRH, and precursors involved with their synthesis yielded no reads . With the majority of investigated circadian transcript expression exhibiting strong tendencies towards tidal phase rather than photoperiod, these findings may provide evidence of genes involved with the Limulus circatidal timing system.
When the top 20 most significant transcript contigs were pulled out of the experimental groupings (high tide versus low tide and night versus day) and blasted against the NCBI database, only 20% came back with an identified hit (Table 3). Of the genes that were expressed differentially during conditions of photoperiod, only two had e-values < 1.0E − 3. The exact function of these genes is not known; however, “DMX-like protein 2” seems to be involved in synaptic processes and “protein trapped in endotherm 1” may play a critical role in germ cell migration. The genes that exhibited the largest changes over conditions of tidal phase all had e-values < 1.0E − 31. “Thyroid peroxidase-like” and “3 beta-hydroxysteroid dehydrogenase” seem to be involved in hormone production; “ALX homeobox protein 1” is involved with neural development in mice and “ovostatin-like” is similar to a proteinase inhibitor found in chicken eggs (Table 3). The lack of definitive gene identification and the small percentage of transcripts that returned with BLAST evidence of homology suggests that the majority of transcripts found to respond to photoperiod and tidal phase in the Limulus transcriptome are novel and these findings support similar studies that found genes involved in ecological adaptation tend to be novel [24, 46].
Overall trends in transcript expression indicate a higher degree of difference in transcription occurring during high versus low tide periods than night versus day time periods. For example, there are nearly twice as many transcripts that are differentially expressed during high/low tide versus night/day (Figure 6). When comparing gene ontology term values (GO; Figures Figures777–9) there are five terms that vary with a minimum percent change ≥5%. When comparing these terms across conditions of night versus day we see a minimum percent change of 5% and maximum percent change of 52%. However, when looking at those same terms over high and low tidal conditions there is a minimum percent change of 36% and a maximum percent change of 198% (Figures (Figures777–9). Taken together the results suggest a transcription profile that is dominated by tides more than by photoperiod. Deeper sequencing and statistical power will perhaps lead to the teasing apart of subtle and potentially significant changes in transcript expression.
The overall analysis of genome completeness using BUSCO was much less than that of the transcriptome (25.2% and 80.7% orthologs identified, resp.) suggesting recalcitrance by the genome and/or genome assembly to the BUSCO analysis . If the genome assembly was causing the low completeness score we would expect a mapping of the transcriptome assembly to the genome assembly to be poor. However, the transcriptome maps to the genome with 88.3% coverage (Figure 3). Additionally, when the genome assembly was compared to previously reported Limulus genome sizes, it was found to be about half of the expected size, indicating a large, highly repetitive genome [21, 47]. Specifically, our assembly appeared to include highly frequent repetitive elements that represent a large factor of the genome. We hypothesize that it is those repeats in the genes (introns) that reduce the success of the findings universal orthologs. By contrast, the transcriptome cuts away these highly repetitive elements. Additionally, while not contributing to the low BUSCO score, other complexities with the genome structure, like the integration of mtDNA elements, may have facilitated increased collapse within the genome assembly.
Nuclear incorporation of mtDNA (NUMT) sequences can cause problems when conducting mtDNA-based studies because they can be confused with the extant version of the mitochondrial genome. To identify potential NUMTs, the Limulus genome assembly was searched for sequences highly similar to the extant version of the mitochondrial genome using BLASTN, which yielded two contigs. In both contigs these putative NUMT sequences were flanked by nonmitochondrial sequences including hypothetical Zinc Finger MYM-type 1 coding sequences, which are not present in any known animal mtDNA genome. The presence of these Zinc Finger sequences suggested either that the mitochondrial-like sequences were NUMTs or that there was an error which is the genome assembly causing the mtDNA sequences to be combined with the gDNA sequences. Results indicate that these sequences seem to be true NUMTs as the percent identity between the suspected NUMT sequences and the mtDNA sequences was less than 85% in either case (Figure 2). For each of these two putative NUMTs the nuclear and extant mitochondrial sequences are collinear, suggesting that these could have been derived from single translocation events and although there are two distinct mtDNA loci incorporated into two separate genomic contigs their similar levels of divergence from the extant versions may suggest a single common origin for both putative NUMTs.
The draft genome and transcriptome will begin to provide a molecular guide for a host of environmental and experimental studies. This will allow for a deeper, more thorough understanding of the behavioral and environmental impacts and challenges these animals face. These datasets are an important addition to an evergrowing aggregation of genomic and transcriptomic databases.
Endogenous biological rhythms are the result of complex and highly dynamic interactions between many different molecular components. When more than one rhythmic system is present in an organism, such as Limulus, it can be much more difficult to elucidate the most important molecules and their interactions. Here we show that Limulus have many genes that are considered central to other model organism's circadian clocks, but their transcripts do not seem to cycle with respect to time of day. However, there does seem to be some degree of transcript regulation with respect to tidal phase. The draft Limulus genome and transcriptome reported should facilitate future molecular and genetic investigations concerning biological clocks and other processes in chelicerates and other related organisms.
This work was supported by NSF IOS 090342 to Christopher C. Chabot and Winsor H. Watson III and the New Hampshire IDeA Network of Biological Research Excellence with grants from the National Center for Research Resources (5P20RR030360-03) and the National Institute of General Medical Sciences (8P20GM103506-03), PSU College of Graduate Studies, and the PSU Department of Biological Sciences. The authors would also like to thank Megan Cooper for her work on the Limulus mtDNA and Susan Swope for helpful technical comments, as well as the UNH Hubbard Center for Genome Studies and Kelley Thomas for their technical assistance and comments on the manuscript.
The authors declare that there are no competing interests regarding the publication of this paper.