Sequence analysis of the COI amplicon (electronic supplementary material, figure S2) established that members of a species usually showed low sequence variation, averaging 0.43 per cent (s.e. = 0.017%) while congeneric species possessed 18-fold higher mean divergences (7.70%, s.e. = 0.033%). The present study provides the first comprehensive analysis of barcode divergences in populations of single species separated by large geographical distances. Comparison of intraspecific divergences for populations collected from 500–2800 km apart revealed no significant increase in genetic distances with geographical separation (). This lack of substantial regional variation in barcode sequences indicates that an effective identification system can be constructed for the Lepidoptera fauna of eastern North America without extensive geographical surveys of each species. We anticipate that similarly muted levels of intraspecific variation will be shared by most taxa in other insect orders such as Coleoptera, Diptera and Hymenoptera from this region. We expect more differentiation in groups with low vagility and in other areas, such as western North America, where higher topographic roughness provides more opportunities for population isolation and differentiation. It will also be intriguing to probe the patterns of regional divergence in areas such as Australia where Pleistocene glaciations had a much less dramatic impact on species distributions.
We detected only nine cases of barcode sharing in the 1327 species included in our study, all involving situations in which a pair of species shared the same barcode. These cases always involved congeneric species with close morphological similarity. Because 99.3 per cent of the 1327 species had barcode sequences distinct from those of other taxa, a COI reference library can generate identifications very effectively.
Although most species possessed low intra-specific divergence, 67 taxa included two or three barcode groups with more than 2 per cent sequence divergence. Many of these cases probably reflect overlooked species pairs or triads. As evidence, we note that individuals of
Plusia putnami separated into two barcode groups with 3.8 per cent COI divergence (
a). Subsequent investigation revealed differences in genitalia, host plant use and habitats, leading to the description of a new species (
Handfield & Handfield 2006). Other cases of deep barcode divergence involved species where there is independent evidence for unrecognized taxa. For example, two barcode lineages with 2.8 per cent sequence divergence were detected in the fall webworm,
Hyphantria cunea (
b), which has long been thought to include two species with differing larval morphologies (
Itô & Warren 1973). Young species pairs will be overlooked by a 2 per cent screening threshold, but they can still show barcode differentiation. For example, the fall armyworm,
Spodoptera frugiperda, includes two barcode lineages with 1.3 per cent divergence (
c). This species consists of two ‘host races’ that not only have different primary hosts (rice versus corn), but show allozyme and mitochondrial DNA divergence as well as reproductive isolation (
Levy et al. 2002), justifying their recognition as distinct species. As this last example reveals, barcodes can highlight young species pairs, but studies of biological covariates are critical to confirm their status.
Our work affirms the validity of most Lepidoptera species recognized though prior taxonomy and suggests that relatively few species have been overlooked as just 5.1 per cent of the 1327 taxa included deeply divergent barcode lineages. However, there are two provisos. Young species pairs, such as those comprising
S. frugiperda, will often be morphologically cryptic, and will also show low barcode divergence. Such taxa can be revealed, but only through a search for covariation between barcode splits and ecological or morphological traits. Secondly, the constrained species discovery in this study probably reflects both the intensity of prior taxonomic work on Lepidoptera and their flamboyant phenotypes. Interestingly, the incidence of overlooked species encountered in the present study shows close congruence to the value reported for a well-studied fauna of tropical Lepidoptera (
Hajibabaei et al. 2006). In contrast, barcode analyses on insect groups with cryptic morphologies have encountered much higher rates of species discovery (Smith
et al.
2006,
2008).
In summary, this study has assembled DNA barcodes for 0.1 per cent of the animal species described over the past 250 years. Our results confirm the effectiveness of a DNA barcode reference library in the identification of a continental fauna of Lepidoptera, reinforcing conclusions from studies that examined fewer species and that probed diversity on smaller geographical scales. Our work has also provided further examples of deep barcode divergences, illuminating probable overlooked species, and setting the stage for their detailed taxonomic investigation. There is no reason to expect that Lepidoptera are a particularly compliant target for barcode-based identification systems. Instead, it is likely that the key findings of this investigation apply to most other taxonomic groups occupying continental or oceanic habitats. In such situations, the barcode analysis of very few individuals of each species will provide the basis for a highly effective identification system. More effort will be required to gain a good understanding of sequence diversity in taxa from insular or freshwater habitats where local population differentiation is more pronounced, but such taxa form a minor component of global biodiversity.
We conclude that DNA barcoding can deliver—in its promise both to enable the automated identification of known species and to aid the detection of overlooked taxa. Further, as this study indicates, a comprehensive barcode library for animal life can be assembled rapidly, promising massive improvement in our knowledge of biodiversity.