Simulations on the three large libraries of Diptera, Lepidoptera and Hymenoptera yielded remarkably consistent patterns of variation of TP, FP, TN, FN, accuracy and precision (Data S4, S5, S6). In all cases, more restrictive BCM distance thresholds produced a gradual increase of TN, a gradual decrease in FP and more abrupt variations in the proportions of TP (decreasing) and FN (increasing). With these libraries, the use of more restrictive thresholds also resulted in a marked drop of accuracy and in a gradual improvement of precision. BM identification performances in the large insect libraries (simulating100% taxon coverage) were generally better than those of the tephritid library, yet the variations in precision when using more restrictive BCM thresholds were less pronounced. When passing from no threshold (BM) to the most restrictive threshold value (THRK2P
0.00) precision increased in Diptera with only 0.7% (from 0.946 to 0.953), in Lepidoptera with 1.4% (from 0.944 to 0.958) and in Hymenoptera with 1% (from 0.955 to 0.966). Accordingly, the use of more restrictive BCM thresholds reduced the relative ID error from 0.054 to 0.047 in Diptera, from 0.056 to 0.042 in Lepidoptera and from 0.045 to 0.034 in Hymenoptera. Hence in the Hymenoptera library, the BM criterion could already produce a relative ID error <0.05. THRK2P_0.05
values in libraries simulating different levels of taxon coverage () ranged from 0.019 (+/−0.001) to 0.059 (+/−0.002) in Diptera, from 0.000 (+/−0.001) to 0.025 (+/−0.008) in Lepidoptera and from 0.020 (+/−0.001) to 0.256 (+/−0.005) in Hymenoptera. Regardless of variability observed among THRK2P_0.05
values, relationships between relative ID error estimates and slope of the linear fitting y
a+bx were consistent across insect orders with higher slope values in libraries with lower taxon coverage. In Diptera the slope resulting from the 25% taxon coverage simulation was 34.6 times larger than the value obtained from the 100% taxon coverage simulation, similarly in Lepidoptera and Hymenoptera it was 28.9 and 30.4 times larger (, ).
Relationships between levels of taxon coverage and relative ID error.
a, b, c: Relationships between ID errors and taxon coverage of libraries.
The regional library of tephritid DNA barcodes (Data S1) comprised 153 frugivorous species of the following genera: Bactrocera
(9 species, 84 DNA barcodes), Bistrispinaria
(1 species, 1 DNA barcodes), Capparimyia
(4 species, 9 DNA barcodes), Carpophthoromyia
(5 species, 7 DNA barcodes), Ceratitis
(53 species, 276 DNA barcodes), Clinotaenia
(2 species, 2 DNA barcodes), Dacus
(60 species, 187 DNA barcodes), Neoceratitis
(1 species, 1 DNA barcode), Perilampsis
(4 species, 7 DNA barcodes), Trirhithrum
(14 species, 28 DNA barcodes). The largest part of the vouchers (95.1%) in this library was collected in 30 countries of the African continent (89.5%) or in adjacent islands and archipelagos (Canary Islands, Comoros, La Réunion, Madagascar, Mauritius, Seychelles, 5.6%). The remaining specimens (4.8%) were represented by invasive frugivorous pests collected in Greece, Italy, Spain, Israel, United Arab Emirates, Yemen, India, Indonesia, Pakistan, Philippines and Brazil and nine specimens (1.7%) were of unknown origin (see Data S1). Thirty-three of the species represented in the library are of relevant agricultural importance in Africa 
. The remaining 120 taxa are currently not considered of economical relevance (Data S1). The library also comprised 85% of all taxa regularly encountered in para-pheromone traps during surveys in different parts of the African continent (http://data.gbif.org/species/13143057
). In addition to the library, the 188 interceptions were represented by 49 tephritid species of 7 genera: Bactrocera
(4 species, 53 queries), Capparimyia
(1 species, 1 query), Carpophthoromyia
(5 species, 11 queries), Ceratitis
(13 species, 36 queries), Dacus
(19 species, 77 queries), Perilampsis
(2 species, 2 queries) and Trirhithrum
(5 species, 8 queries). Five economically important species contributed to 53.2% of the specimens from interceptions. Overall, 68.6% of interceptions belonged to 17 economically important species (Data S1).
The distribution of pairwise K2P distances in the tephritid library showed that 95% of all the intraspecific distances were in the interval 0.00–7.98%, while 95% of the mean interspecific, congeneric distances were in the interval 6.23–13.55% (Data S2). There was no well-defined barcoding gap as 6.31% of all pairwise comparisons were shared between the 95% percentiles of the intra- and congeneric interspecific K2P distance distributions (i.e.
fell in the interval 6.23%<K2P<7.98%). BCM simulations in the tephritid library were strongly affected by the K2P distance threshold implemented (). The proportion of TP was always markedly higher than the proportion of FP. The proportions of TP and FP decreased as the THRK2P
approached 0.00. Yet, while the proportion of FP decreased gradually, the proportion of TP showed a more abrupt decrease for THRK2P
ranging from 0.015 to 0.00 (). The proportions of FN and TN increased the more the distance thresholds approached 0.00 (). Moving the THRK2P
threshold toward 0.00 produced a rapid increase of the proportion of discarded queries (up to 0.651). Accuracy, slowly increased up to 0.934 (at THRK2P
0.03), then it rapidly decreased reaching a minimum for THRK2P
0.00. Conversely, precision was positively affected by the use of more restrictive distance thresholds and it gradually increased until a maximum of 0.957 for THRK2P
0.00 (). When passing from no threshold (BM) to the most restrictive threshold value (THRK2P
0.00) precision increased with 11.7%. Overall and relative ID errors showed opposite trends compared to accuracy and precision. Overall ID errors rapidly increased at THRK2P
ranging from 0.003 to 0.00 while the relative ID errors gradually decreased for THRK2P
approaching to 0.00 (). Linear regression showed that in the tephritid library the K2P distance value corresponding to a relative ID error of 0.05 was THRK2P
a, b, c: Tephritid Best Close Match (BCM) identification.
Relative ID errors at 30 arbitrary distance thresholds.
The BM criterion allowed a correct identification of 87.2% of the 188 intercepted specimens (Data S3). When THRK2P_0.05
was used for the BCM identification of these specimens, the proportion of discarded queries was 0.191 and the proportions of TP, FP, TN and FN were 0.787, 0.021, 0.106 and 0.085, respectively. Among interceptions, THRK2P_0.05
produced an overall ID error
0.106 (range 0.096–0.144 considering the 95% confidence intervals of the threshold estimate) and a relative ID error
0.026 (range 0.026–0.028 considering the 95% confidence intervals). This resulted in 89.4% of all queries and 97.4% of the not discarded queries being correctly identified (Data S3).