|Home | About | Journals | Submit | Contact Us | Français|
Weevils (Curculionoidea) comprise one of the most diverse groups of organisms on earth. There is hardly a vascular plant or plant part without its own species of weevil feeding on it and weevil species diversity is greater than the number of fishes, birds, reptiles, amphibians and mammals combined. Here, we employ ultraconserved elements (UCEs) designed for beetles and a novel partitioning strategy of loci to help resolve phylogenetic relationships within the radiation of Australasian smurf-weevils (Eupholini). Despite being emblematic of the New Guinea fauna, no previous phylogenetic studies have been conducted on the Eupholini. In addition to a comprehensive collection of fresh specimens, we supplement our taxon sampling with museum specimens, and this study is the first target enrichment phylogenomic dataset incorporating beetle specimens from museum collections. We use both concatenated and species tree analyses to examine the relationships and taxonomy of this group. For species tree analyses we present a novel partitioning strategy to better model the molecular evolutionary process in UCEs. We found that the current taxonomy is problematic, largely grouping species on the basis of similar color patterns. Finally, our results show that most loci required multiple partitions for nucleotide rate substitution, suggesting that single partitions may not be the optimal partitioning strategy to accommodate rate heterogeneity for UCE loci.
The Curculionoidea represents a diverse group of phytophagous insects with ~62,000 described species and an estimated 200,000 undescribed species . They are not only highly diverse, but also of major economic importance because they cause destruction to many crop and ornamental plant species worldwide. Curculionoidea are a relatively old group dating to the Upper Jurassic 161–151 Ma [1–3]. Despite both their diversity and economic importance, there are still few resources for conducting phylogenomic research on this group. Here, we test a set of target enrichment baits designed to enrich ultraconserved elements (UCEs) across Coleoptera , and we use the resulting sequence data to examine the phylogeny of Australasian smurf-weevils in the tribe Eupholini. We had two main goals: 1) to examine the utility of these newly designed universal Coleoptera UCE baits for phylogenetic studies of weevils, using freshly collected individuals as well as older museum specimens (up to 65 years); and 2) to test the monophyly of the three main genera within the Eupholini, examining the validity of previous ecomorphology-driven classification efforts.
Ultraconserved genomic elements (UCEs, sensu Faircloth et al. 2012 ), provide a powerful approach to sequence many independent regions of the genome. UCEs have proven useful in resolving phylogenies at multiple phylogenetic scales both shallow and deep [6–9]. UCEs, like many other reduced representation techniques that rely on oligonucleotide “bait” capture procedures, also have the potential to perform well when collecting data from museum samples [10,11]. Given the challenges with obtaining samples for many rare taxa and/or from logistically challenging regions of the world, such approaches can be highly beneficial [12,10]. The challenge of obtaining weevil samples is even more complex because many species can be found only on specific host plants at very particular times of the year, and many weevil hosts are unknown. In addition to the challenges of sampling, many weevil groups have undergone rapid radiations [2,13] necessitating the development of large genome-wide data sets to take on such challenges.
The Eupholini represent a tribe of weevils with approximately 300 known species. New Guinea is the Eupholini’s center of diversity both in terms of species and genera. Currently, there are three main genera recognized within the Eupholini, although the monophyly of each is somewhat questionable: Eupholus Boisduval (63 species), Gymnopholus Heller (76 species) and Rhinoscapha Montrouzier (137 species) [14,15]. Eupholus species are brilliantly colored and can be divided in two morphologically distinct groups, i.e. the E. schoenherri-group and the E. loriae-group . Members of the E. schoenherri-group appear to feed exclusively on toxic wild yams (Dioscorea) (Fig 1). Most Eupholus species (~58) occur in habitats below 1,000m elevation with few (~5) above 1,500m. Members of Gymnopholus are of dark or cryptic coloration and usually occur above 1,500m. Gymnopholus species in the subgenus Symbiopholus (33 species) occur above 2,000m and exhibit epizoic symbiosis, i.e. lichens, algae and moss growing on their integument that enhance camouflage (Fig 1) [17,18]. Gymnopholus species are polyphagous, browsing on a variety of plants. Species in the third genus, Rhinoscapha, are sometimes brilliantly colored and other times more cryptic. Although their hosts are less well known, Rhinoscapha species can be found at all elevational zones, with certain clades showing some elevational zonation.
The 1,172 UCE loci that we tested with Eupholini were identified in Faircloth 2017. The 13,674 baits targeting these loci were designed from six different Coleoptera genomes and one genome from the sister Order Strepsiptera. Of these taxa, Dendroctonus ponderosae (Curculionidae: Scolytinae) is most closely related to the Eupholini (Curculionidae: Entiminae): they shared a most recent common ancestor ~105 Ma . We had the bait set synthesized by MYcroarray (MYcroarray, Ann Arbor MI, USA) as described, with no modifications.
We extracted DNA non-destructively from 48 samples, using DNeasy Tissue kits (Qiagen; Macherey-Nagel). Nine of these 48 samples were pinned museum specimens that ranged in age from 1952 to 2006. Tissue was extracted from the pronotum and/or leg of each specimen for specimens preserved in ethanol. For pinned specimens, tissue was taken out of the pronotum, mesothorax and abdomen using sterilized forceps (flame treated then soaked in 10% bleach between use) and placed in a 1.5–2.0 ml centrifuge tube. In the case of large pinned specimens, the tissue from the pronotum, mesothorax and abdomen were placed in separate centrifuge tubes, the resulting eluates from the specimen parts were later combined. We added 4μl of RNase-A to remove RNA from the samples during the extraction process according to manufacturer's protocol (http://www.bea.ki.se/documents/EN-DNeasy%20handbook.pdf). In the case of extractions obtained without the use of RNase-A during the extraction process, 1μl of 10 μg/mL RNase-A was added to each genomic DNA (gDNA) sample, the mixture was then incubated at 37°C for 30 minutes, followed by purification via ethanol precipitation. Extracted DNA quantity ranged from 253–1600 ng (mean 556 ng) measured by Qubit fluorometer (Life Technologies, Inc.). Subsequent library preparation and sequence capture steps were performed by RAPiD Genomics (Gainesville, FL, USA). Briefly, to confirm that ethanol preserved samples’ DNA was not degraded we visualized their DNA on a polyacrylamide gel for a few exemplar specimens, while all museum samples’ DNA was visualized on a polyacrylamide gel to assess the amount of degradation. Following visual examination, samples that were not highly degraded (>500 bp) were sonicated to a target size of 450 bp. Capture was performed with two pools of 24 samples each. Libraries were constructed by repairing the ends of the sheared fragments followed by the ligation of an adenine residue to the 3’-end of the blunt-end fragments. Next, custom indexed adapters suited for Illumina Sequencing platform were ligated to the libraries (see S1 Supporting Information Links). Finally, ligated fragments were PCR-amplified using standard cycling protocols (e.g. Mamanova et al. 2010 ), with Pre-capture PCR having 8–12 cycles, and Post-capture PCR with 13 cycles. Enrichment procedures followed the MYcroarray MYBaits kit Version-3 protocol , except that RAPiD Genomics used chicken C0t-1 DNA as blocking reagent. In Hymenoptera, chicken C0t-1 was shown to work better than “homebrew”, taxon specific C0t-1 DNA . The 48 indexed libraries were pooled in equimolar ratios to a total of 250 ng/μl. The 48 MYBaits reactions were used at the full concentration. We sequenced two pools of these samples each using a half lane of paired-end, 100 base pair reads on an illumina HiSeq 3000. We performed these two sequencing runs primarily to see if this substantially improved the number of loci captured. Voucher specimens are stored in museum collections (SNSB-Zoological State Collection, München (ZSM), State Museum of Natural History Karlsruhe (SMNK), California Academy of Sciences (CAS)), see S3 Table for details.
We used ILLUMIPROCESSOR , which is a parallel wrapper of TRIMMOMATIC [23–25], to clean and trim reads. We performed this step for each of the two sequencing runs, we then concatenated the two different sequencing runs together. After trimming, we generated summary stats on the trimmed reads using the PHYLUCE v1.6 package , script “phyluce_assembly_get_fastq_lengths”. All programs hereafter beginning with “phyluce” are PYTHON programs part of the PHYLUCE package. We assembled the cleaned/trimmed reads using “phyluce_assembly_assemblo_trinity” with TRINITY v2013-02-25 . Next we generated summary statistics (counts/lengths) of the assembled contigs using “phyluce_assembly_get_fasta_lengths”. To identify UCE loci from our contigs we needed to match these to the probes, we used “phyluce_assembly_match_contigs_to_probes”, this was set to a minimum coverage of 80% and minimum identity of 80%. We used “phyluce_assembly_get_match_counts” to create an initial database of loci counts per taxon. To obtain average read depth of our contigs by taxon we used “phyluce_assembly_get_trinity_coverage”, the results of which were then used to calculate average read depth among our UCE loci per taxon with “phyluce_assembly_get_trinity_coverage_for_uce_loci” using the sqlite database created in the “phyluce_assembly_match_contigs_to_probes” step. Next, we used “phyluce_assembly_get_fastas_from_match_counts” to get a count of the number of UCE loci captured for each taxa. To examine the effect of preservation type on number of loci captured, we performed a Welch Two Sample t-test between museum-pinned and ethanol preserved samples. We then divided each UCE loci into a separate fasta file using “phyluce_assembly_explode_get_fastas_file”, as we later wanted to construct individual gene tree phylogenies, and then aligned the sequences and trimmed the edges using “phyluce_align_seqcap_align” which implements MAFFT v7.130b . We did not use internal alignment trimming because our taxa are relatively closely related. However we eliminated columns that were entirely composed of “-”,”n” and or “?” with a custom R script using "deleteEmptyCells" function in the "ips" library . Lastly we used “phyluce_align_get_only_loci_with_min_taxa” to select the minimum percentage of missing taxa in our UCE loci alignments.
We performed two different types of phylogenomic analyses: (1) a concatenated analyses using RAxML v8.0.19 , and (2) several species tree analyses using ASTRAL v5.1.0 with TRee ALgorithm III (https://github.com/chaoszhang/ASTRAL/tree/multiind) [31–33] and SVDQuartets [34,35] which is implemented in PAUP* v4.0a152 .
We set each UCE locus as its own character set and then used PartitionFinder v2.1.1  implementing the “greedy” search algorithm  to select for the best partitioning strategy for the data under the General Time Reversible + gamma (GTRGAMMA) site rate substitution model using the AICc metric. We then conducted 20 maximum-likelihood (ML) searches in RAxML. We also performed non-parametric bootstrap replicates under GTRGAMMA using the autoMRE option to optimize the number of bootstrap replicates for this large dataset. We reconciled the bootstrap replicates with the best fitting ML tree.
Here, we performed several different analyses. First we reconstructed the ML gene tree estimated for each of the UCE loci in RAxML, below we list the specific process and settings in RAxML for our partitioned gene trees. Here we used an initial character set of the UCE core central (160bp) and divided up the remaining length of the locus into fifths. This character set was chosen based on a visual inspection of the number of phylogenetically informative sites over the proportion of loci length (Fig 2). This resulted in 5 character sets (6 including the UCE core), that were grouped together based on their proportional length from the conserved UCE core. We then used PartitionFinder v2.1.1  implementing the “greedy” search algorithm  to select the best partitioning strategy for each locus, under the GTRGAMMA site rate substitution model according to the AICc metric. Next, we conducted 20 maximum-likelihood (ML), best tree searches for each partitioned locus in RAxML, as well as 200 non-parametric bootstrap replicates. We reconciled the bootstrap replicates with the best fitting ML tree of each locus. We also wanted to examine the effect of removing poorly supported/oversaturated gene trees on the topology and or support of the species tree. Using modified R code from Borowiec et al. 2015 , we first calculated the average of the bootstrap support values for each gene tree and then eliminated gene trees that were in the lowest 10% quantile of average bootstrap support. We then identified potentially over-saturated loci, based on departure from a linear regression between uncorrected p-distances and inferred distances of the tips for each of the UCE loci . We then eliminated the outlier trees that were potentially over-saturated. Next we estimated a species tree using ASTRAL and assessed support with 200 bootstrap replicates for the complete set of genes, the set without the lowest 10% average bootstrap support, and the remaining trees without potentially oversaturated loci. Lastly, for comparative purposes, we also reconstructed a species tree in ASTRAL based on single character sets / partition and performed the same RAxML analyses as described above for the complete dataset.
Based on the results of the species tree topology and support values described above, we proceeded with gene tree reconstruction in MrBayes v3.2.5 . We also used the same partitioning strategy described above, estimated the most appropriate site rate substitution model for MrBayes using PartitionFinder v2.1.1. We conducted 2 independent runs of 1 cold chain and 3 heated chains (default settings) for 1x107 Markov chain Monte Carlo (MCMC) generations sampling every 1000 generations in MrBayes. We removed the first 25% as the burnin. We combined the two independent runs and created a maximum clade credibility topology (MCCT) for each locus with “sumtrees.py” part of the DendroPy package  as well as with custom scripts, see S1 Supporting Information Links. These gene trees were then input to ASTRAL to create a species tree. In order to assess support of the species tree we used each of the MCMC generations as a bootstrap replicate in ASTRAL (combined 15000 samples of the run after the burnin for each of the gene trees), thus creating support values akin to posterior probabilities of nodes. Lastly, for comparative purposes, we also reconstructed a species tree in ASTRAL based on single character sets/partitions per-locus and performed the same model selection and gene tree reconstruction as described above in MrBayes. We will refer to the analyses based on single character sets / partitions per-locus as ASTRAL-(RAxML or MrBayes)-single-partition analyses and those based on multiple partitions as ASTRAL-(RAxML or MrBayes)-multi-partition analyses.
Finally, we used another species tree method to look for congruence between methods. We created a species tree using SVDQuartets [34,35] where we evaluated all of the quartets with the ‘evalQuartets = all’ command using the Quartet FM (QFM) algorithm . We assessed support with 200 bootstrap replicates.
We also wanted to more objectively examine the differences that resulted from these methods rather than just a visual comparison. Therefore, we calculated three tree distance metrics that rely on the tree topology between two trees in the R package Phangorn ; the Robinson-Foulds distance (RF-dist) , the path distance metric between pairs of taxa (Path-dist)  and the approximate subtree prune and regraft distance (SPR-dist) [47,48]. Visualizing support values and color coding on trees was implemented in the R package phytools .
We investigated if specimen age or preservation type (museum pinned or ethanol preservation) had a greater effect on the number of UCEs captured using generalized linear models (GLM). Next, we compared our dataset to those of two other studies which utilized UCEs and museum samples, specifically Xylocopa (carpenter bees)  and Aphelocoma (scrub-jays) . These studies were chosen because of their data availability and their systematic sampling by specimen age. We wanted to see if the number of UCE loci captured, given specimen age, was similar between studies. We first transformed the fraction of UCE loci captured within each taxon (number of UCE loci captured per-specimen divided by the maximum number of loci captured, within a clade e.g. Eupholini weevils or scrub-jays), and then logit transformed the data and performed a likelihood-ratio test, to see if a model with independent slopes vs one with a single slope fit the data better.
We sequenced on average ~1.2x107 reads per sample with an average length of ~100 bp, see S1 Table. The UCE loci alignment results from the first round of sequencing and UCE loci alignment results from concatenation of reads (1st and 2nd rounds together), showed only a modest increase in the number of loci captured, for example at the 70% complete level the difference was only 36 loci (359 1st vs 395 2nd round of sequencing), see S2 Table. The following results refers to only those of the final concatenated dataset. The average per-sample read depth of all contigs was 11.8X, while the average per-sample read depth of UCE loci was 28.9X. Across all samples, we captured a total of 940 out of 1,172 UCE loci. After removing duplicate loci, we recovered an average of 451 UCE loci from each sample, with a range of 104–610 loci; SE ±13.0 loci. We recovered an average of 470 loci with a range of 251–610; SE ±11.0 loci from ethanol preserved specimens, and we recovered fewer loci from the 9 museum samples (mean of 370 loci with a range of 104–545; SE ±42.6 loci; ethanol vs museum pinned, Welch Two Sample t-test p-value of 0.04812) (Fig 3). Of the 806 UCE alignments containing >2 taxa, we selected 368 alignments to create a ≥70% complete matrix (n = 33 taxa) with a total of 171,290 bp for all further phylogenetic analyses. The mean number of phylogenetically informative sites (PIC) per locus was 195.2 (range 76–387 PIC; SE ±3.1) (Fig 4), the loci had a mean length of 465.4 sites (range of 231–802 sites; SE ±5.6), and locus length was correlated with the number of PIC (R-squared 0.5139, p-value 2.2e-16) (Fig 4).
PartitionFinder suggested that the optimal scheme for the concatenated alignment contained 126 partitions, and ML analysis inferred a tree where most of the nodes were highly supported (≥95% bootstrap support (bs)) (Fig 5). We recovered the Leptopini as a paraphyletic grade containing the Eupholini. The genus Celebia was sister to all of the other Eupholini in all analyses. The three main genera of the Eupholini (Rhinoscapha, Eupholus and Gymnopholus) were polyphyletic. Rhinoscapha appeared in three separate places, the R. tricolor-clade sister to R. usta-clade which is sister to the Gymnopholus clade containing the two subgenera Gymnopholus s. str. and Symbiopholus. The Eupholus loriae-group was sister to the remaining taxa. These include the third subgenus of Gymnopholus, Niphetoscapha which was sister to the R. tuberculata-clade. The clade of Niphetoscapha+R. tuberculata-clade was sister to the R. doriae-clade. The clade of Niphetoscapha+R. tuberculata-clade+R. doriae-clade is sister to the remaining members of Eupholus.
The PartitionFinder results, using the initial 6 character sets under the GTRGAMMA site rate substitution model (for use in RAxML), had the majority of loci with between 1–2 character partitioning schemes: 53 loci with a single partition, 204 loci with 2 partitions, 102 loci with 3 partitions and 9 loci with 4 partitions. There was no difference in topology between the ASTRAL species tree for the full analyses, and the ASTRAL species tree where we removed loci producing gene trees having the lowest 10% quantile of average bootstrap support removed, and the ASTRAL species tree where we removed both these low support species trees and loci we identified as being potentially over-saturated (see S1 and S2 Figs). There was also essentially no difference in support between these tree subsets. Although some nodes varied in their support across analyses, poorly supported nodes did not show increased support between the reduced subsets of trees and the complete set (S1 and S2 Figs). The relationships between the Leptopini and Eupholini remained the same between species tree topologies and concatenated gene tree. The species trees recovered the same species groups as in the concatenated analyses, but the backbone of the tree was substantially different. For the particular differences between species groups see Figs Figs55–9. The PartitionFinder results, using the initial 6 character sets in Bayesian phylogenetic analyses (MrBayes MCCT trees) selected the majority of loci with more than 2 partitions according to the AICc metric: 16 loci with 1 partition, 109 loci with 2 partitions, 148 loci with 3 partitions, 81 loci with 4 partitions and 14 loci with 5 partitions, 0 loci had 6 partitions. We also compared locus length to the number of partitions in the MrBayes and RaxML analyses, and found that in both analyses locus length was positively correlated with the number of partitions ([MrBayes: p-value = 5.251e-08, R-squared = 0.078], [RaxML: p-value = 2.2e-16, R-squared = 0.2237]), (see S1 File, partitioning data). We assessed convergence between the two independent MrBayes runs using the average standard deviation of split frequencies using a custom unix script (see S1 Supporting Information Links). All but two analyses were at or below 0.01 in average standard deviation of split frequencies, the exceptions being 0.02 and 0.03, we subsequently produced the MrBayes MCCT trees.
The most substantial topological difference is between the RAxML-concatenated gene tree and the species tree analyses (see Fig 5, Table 1). Although the partitioning results differed in the number of partitions, the ASTRAL species trees derived from MrBayes-multi-partition gene trees were only slightly different from the RAxML-multi-partition derived gene trees (Fig 6). The Eupholus loriae-group and the Rhinoscapha tricolor-clade switched as being the sister clade to the remaining Eupholini. This node was not well supported in both analyses. These clades showed similar differences between both ASTRAL-RAxML-multi-partition and ASTRAL-RAxML-single-partition (Fig 7). The same difference also appears between the Eupholus loriae-group and the Rhinoscapha tricolor-clade in the ASTRAL MrBayes-single-partition and ASTRAL-MrBayes-multi-partition (Fig 8). The two MrBayes (single and multi-partitioned) derived ASTRAL species trees had the additional difference between the placement of the E. geoffroyi clade. The ASTRAL-RAxML-single-partition and MrBayes-multi-partition species trees only differed between the placement of the E. geoffroyi clade and the placement of Letopius clavis (Fig 9).
The results from the SVDQuartets species tree were much the same as those from the other two species tree analyses. In particular, the backbone topology was most similar to the results of the ASTRAL species tree derived from the MrBayes-multi-partition analyses. One difference is that E. compositus is sister to E. azureus in the SVDQuartets species tree (Fig 10). Also, E. cf. schoenherrii is sister to the E. geoffroyi-clade in the SVDQuartets and ASTRAL-MrBayes-multi-partition trees, whereas the ASTRAL-RAxML-multi-partition tree have the E. geoffroyi-clade sister to the other E. schoenherrii species (Figs (Figs66 and and7).7). Only the placement of E. cf. schoenherii was strongly supported in both the SVDQuartets and ASTRAL-MrBayes-multi-partition trees, but no other conflicting nodes were strongly supported.
The topological tree-to-tree metrics derived from the RF-dist, Path-dist and SPR-dist all give relatively the same result (Table 1). These values show that the ASTRAL-RAxML-single-partition and ASTRAL-MrBayes-single-partition trees have the most similar topology. They are then most similar to the ASTRAL-RAxML species tree based on multiple partitions. The ASTRAL-RAxML-single-partition species tree is the most similar of the species tree methods to the RAxML-concatenated gene tree. The Path-dist metric shows that the ASTRAL-MrBayes species tree based on multiple partitions and SVDQuartets species trees are slightly more similar to each other than is the ASTRAL-RAxML-multi-partition species tree to the SVDQuartets species trees, while the other metrics give equal difference between the ASTRAL species trees and the SVDQuartets species trees. Most importantly the species tree methods are more similar to each other, except for the SVDQuartets species tree as mentioned above, than they are to the concatenated RAxML topology.
Within the Eupholini dataset, we used a GLM to test which variable (specimen age and or preservation type) has the most influence on the UCE number, we found that specimen age was the only significant contributor to the number of UCEs captured (p-value of 0.00886 for the coefficient of specimen age vs p-value of 0.75418 for collection type). Next, we compared our dataset to those of two other studies which utilized UCEs and museum samples, specifically Xylocopa (carpenter bees)  and Aphelocoma (scrub-jays) . From the likelihood-ratio test results we find that the preferred model is one with a single slope (Chi-square test p-value 0.8598), indicating that UCE capture rate through time is approximately the same between datasets (Fig 11).
UCEs are now widely used for different vertebrate taxa [5,7,50–56]. In invertebrates, UCEs have been used in Hymenoptera (primarily ants [8,21,11,57–60]), but also Coleoptera  and arachnids . Transcriptome-based analyses constitute another major source of phylogenomic data in arthropods [63–69]. For hyperdiverse beetles, there is great potential for both UCE- and exon-based markers to help resolve relationships within this diverse group. Here, we tested the utility of UCEs designed across Coleoptera on a radiation of weevils from Australasia. We captured significantly more loci than Baca et al. (2017)  which focused on Adephaga beetles (305 at 50% vs [368 at 70%, 537 at 50% this study]). This is probably due to the phylogenetic distance between the species from which the probes were designed and the target taxa. The Coleoptera UCE markers capture a similar number of loci when compared to the 1st round of Hymenoptera UCEs. This suggests that with more forthcoming weevil genomes (personal comm. D. McKenna), capturing thousands of UCE loci could be possible. Why UCEs are not more universal within Coleoptera as compared to vertebrate lineages is still an open question. One possible explanation is that beetles are extremely diverse both in terms of life history and genomics. Recent studies of beetle genomes show a large diversity in their size and complexity, potentially even incorporating fungal genomic elements . Additionally, Coleoptera originated ~300 Ma [71,72], compounding the problem, for comparison the Mammalia–Root ~223 Ma , and the Palaeognathae (ratite) and Neognathae (birds) divergence of ~100 Ma .
Here, we also find that we can enrich UCE loci from Coleoptera museum specimens, which can significantly aid sampling efforts for future studies, similar to other studies [11,75,76]. Our results are consistent with the findings of McCormack et al. 2016  and Blaimer et al. 2016 , who showed a gradual decline in UCE capture rate with specimen age (Fig 11). Generally speaking, the capture rate we observed correlates similarly to the aforementioned studies, according to a likelihood ratio test between models’ slopes, which is re-assuring because Coleoptera, due to their thicker exoskeletons compared to other insects (e.g. Drosophila), may undergo a different decomposition process following collection. We hypothesize that in large insects and or those with thick cuticle, their initial drying time (period spent air drying for preservation after dispatching the insect) can be longer than other insects, and length of initial drying time is key to preventing decay and preserving the specimens for later use [10,11,77,78]. We suspect that in the case of tropical Coleoptera with thick exoskeletons, more decay tends to occur because of their thickened cuticle and the humid conditions under which they are collected, however this needs to be formally quantified. Additionally, there is substantial variance in the procedures through which beetle specimens are preserved. For example, when collecting weevils on the Solomon Islands in 1959, weevil researcher Dr. Charles O’Brien, would actively dry his weevils at the end of the day on top of a kerosene powered lantern before packing them into a tin with cotton (personal communication). With such information one can start to make better sense as to why some historic samples fail and others work relatively well. For example potentially overheating samples (as described above) could shear DNA as well as specimens that were not dried quickly could have DNA degradation due to enzymatic decomposition. Having such information will be invaluable to identify particular collections that may harbor particularly valuable specimens for genomic work. More systematic studies need to be undertaken to fully explore this area of museum phylogenomics in beetles and preservation methods, but our results at least show the feasibility of using tropical beetle specimens to bolster sampling.
All of our phylogenetic reconstructions show the same result concerning the classification of the Eupholini: the main genera Eupholus, Gymnopholus and Rhinoscapha are polyphyletic. This should not be too surprising because they were described and delineated in the late 19th and early 20th centuries when there were fewer known species and inadequate examination of internal structures (such as genitalia) during classification. For instance, in Eupholus, species were apparently assigned to that genus mostly based on coloration, i.e. all bright blue or green species were placed in Eupholus .
Lastly, for the majority of loci we found that by using multiple character sets which resulted in multiple partition schemes was a better fit for the data, than a single partitioning scheme according to AICc model selection. This indicates that using each UCE locus as a single partition as commonly performed may not be the best fit for the data. With these data, differences appear between species tree topologies based on both the partitioning strategy and initial method used to reconstruct the gene trees. This indicates that using only single partitions may not be adequate for many UCE loci. Additionally, the method used to reconstruct the gene trees also has an effect on the resulting species tree topology. What general effect gene tree partitioning has on species tree reconstruction and resulting branch lengths remains an open question.
We find great potential for UCE loci to tackle some of the more challenging issues in beetle systematics such as in the Eupholini radiation. We suggest that the use of UCEs and similar phylogenomic markers may also provide insights into the deeper relationships of diverse, higher-order taxonomic groups such as the Curculionoidea. Museum collections should also be recognized as a more prominent genomic resource, especially for tropical arthropods. Here, we demonstrate with a logistically challenging group how collection efforts of the past can aid current and future studies by increasing sampling, an undertaking that would otherwise require many years of effort to recollect the same samples. What steps museums should take to help preserve this genomic resource should be studied in a systematic fashion across institutions, as many existing museum specimens may be irreplaceable.
Matrix % complete indicate the percent completeness of the UCE loci alignments. 1st round of sequencing indicates the number of UCE loci alignments from the first round of sequencing of the 48 samples on an illumina HiSeq 3000 half lane. 2nd round of sequencing indicates the number UCE loci alignments from the concatenation of reads from the second round of sequencing (same 48 samples on an illumina HiSeq 3000 half lane) and the first round of sequencing.
Museum abbreviations as follows: SNSB-Zoological State Collection, München (ZSM), State Museum of Natural History Karlsruhe (SMNK), California Academy of Sciences (CAS).
The authors would like to thank Joe Russack and Brian Simison for help with using the linux cluster and software installs at the CAS. We would also like to thank Misha Leong and Jun Ying Lim for their help with statistical analyses. We would like to thank Marek Borowiec and the other 3 anonymous reviewers for their comments that helped to improve this manuscript. The New Guinea Binatang Research Center has provided extremely valuable assistance with collections in Papua New Guinea. Moreover, Bega Inaho (Goroka) and Agus Faknik (Abepura) helped collect specimens. The field work in New Guinea would not have been possible without the generous hospitality and help of many local people, and we thank all of them very warmly. M.H.V.D. and R.L. (salary and lab work/sequencing) were funded by NSF award DBI #1402102, A.R. by DFG BA2152/10-1 & 3 / RI1718/3-1. We are indebted to the PNG Department of Environment and Conservation for issuing permits (#014282) in PNG; to the Queensland Government, Department of Environment and Heritage Protection for granting collecting permits in many Australian National Parks, and the Australian Government, Department of Environment for an export permit.
M.H.V.D. and R.L. (salary and lab work/sequencing) were funded by NSF award DBI #1402102 and A.R. by DFG BA2152/10-1 & 3 / RI1718/3-1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Links to all UCE loci use in phylogenetic analyses: 10.6084/m9.figshare.5172478. R/UNIX CODE at MHVD’s github page: https://github.com/matthewhvandam/weevil-UCE/tree/master. Raw sequence data: NCBI BioProject ID: PRJNA394929.