This study has strongly validated the efficacy of cox1 barcodes for identifying fish species. We sequenced (usually multiple) specimens of three species of chimaerids, 61 species of sharks and rays and 143 species of teleosts for the barcode region of cox1. With no exceptions, all 207 sequenced species could be discriminated. Nearly 98% of all species amplified with the one of two primer sets. Only five of 211 species failed to amplify with these protocols, and one of these amplified with a newly-designed primer set. The four failures came from varied fish groups and included congeners of species that amplified without problem; they may reflect either DNA degradation or primer mismatches. Since our two commonly-used primer sets are extremely similar, we are exploring the possibility of a single, quasi-universal fish primer set that incorporates inosine at the variable positions or has built-in degeneracy.
The GC content of the 655
bp mitochondrial cox1
region was on average higher in the 143 species of Osteichthyes than in the 61 species of Chondrichthyes: 47.1% versus 42.2%. Saccone et al. (1999)
reviewed data from the complete mitochondrial genomes of nine Osteichthyes and three Chondrichthyes species, deriving GC contents of 43.2% and 38.4%, respectively. These values correspond reasonably well to ours, especially with respect to the higher GC content of the teleosts. We observed substantially more nucleotide changes at the 3rd codon position than the 1st, and more at the 1st than the 2nd. For example, the standard errors of the GC percentages of the 3rd, 1st and 2nd bases of the Osteichthyes were 0.50, 0.10 and 0.02, respectively (see also ). This reflects the fact that most synonymous mutations occur at the 3rd position, with a few at the 1st position and none at the 2nd. The higher GC content of the Osteichthyes compared with Chondrichthyes was largely due to 3rd base variation, with mean values of 41.1% and 29.9%, respectively, although 1st base differentiation was also observed. Within the Chondrichthyes, GC content of rays was higher than that of sharks (44.7% versus 41.0%), again largely due to GC3
variation. The causes for the GC variation among teleosts, sharks and rays are not known.
No NUMTs (transfers of mtDNA cox1
sequences into the nuclear genome) were observed. A review of the occurrence of NUMTs in plants and animals did not find any evidence of their existence in Actinopterygii (Bensasson et al. 2001
), but a comparison of Fugu rupripes
mtDNA and nuclear DNA sequences did detect seven or eight NUMTs (Richly & Leister 2004
). This confirms the need for vigilance in examining fish amplicons for potential pseudogene status.
Confusion in taxonomic assignments as a result of inter-specific hybridization (Verspoor & Hammar 1991
) does not seem to be a major issue—only two of 754 sequences (one in the genus Pristiophorus
and one in Plectropomus
) appeared in the ‘wrong’ congeneric species. This may represent either introgressive hybridization, or incorrect identification of the original specimen.
Although barcode analysis seeks only to delineate species boundaries, there is clearly some phylogenetic signal in cox1
sequence data. For example, four major clusters were apparent in the NJ phenogram: chimaerids, rays, sharks and dogfish and teleosts. Congeneric species always clustered together and in most cases so did confamilial species. However, methodologies for phylogeny reconstruction from molecular data remain somewhat controversial, with a wide variety of disparate approaches possible (see, for example, Nei & Kumar 2000
). We cannot hope to recover the true phylogeny of fishes from a 655
bp fragment of mitochondrial DNA through K2P distance and neighbour joining—rather more gene regions should be used (including nuclear genes) and additional analytic methods deployed including maximum parsimony and maximum likelihood.
barcoding for species identification is far more powerful than, for example, protein fingerprinting. Reliable discrimination of Thunnus
species using conventional protein electrophoresis is hard if not impossible (e.g. Yearsley et al. 1999
), but we found that the same samples were readily identified by cox1
Barcoding discriminated all of the fish species we examined, and would clearly be capable of unambiguously identifying individually isolated fish eggs, larvae, fillets and fins from these species. However, some taxa showed deeper divergence than others. For example, the average within-genus divergence of the flatheads (Platycephalus, Neoplatycephalus, Cymbacephalus
) was 15.55%, considerably larger than the 4.17% of the genus Squalus
, which is itself considerably larger than the within-Thunnus
divergence of 1.11%. These differences among genera probably reflect the average age of species divergence, although within genera some species will be older than others. Nevertheless, it seems likely that, for example, the Platycephalus
radiation preceded the Thunnus
radiation. The large tunas have long been suspected of having diverged relatively recently (see, e.g. Elliott & Ward 1995
for allozyme evidence of limited nuclear DNA differentiation).
phenogram () only clearly differentiates one clade (T. alalunga
and T. orientalis
, with 99% bootstrap support). The mitochondrial DNA similarity of T. alalunga
and T. orientalis
had been earlier described by Chow & Kishino (1995)
, from cytochrome b and ATPase sequencing. The three species suggested by Collette (1978)
to comprise a separate subgenus Neothunnus
, T. tonggol
and T. albacares
) form a loosely defined clade in the cox1
phenogram (56% bootstrap support). Our cox1
phenogram is almost identical to one based on sequencing 400
bp of the mtDNA control region of the same eight species (Alvarado Bremer et al. 1997
). One minor difference is that the control region tree gives a higher level of bootstrap support, 86%, to the proposed subgenus Neothunnus.
The three Neothunnus
species differ from the other five species (proposed subgenus Thunnus
) in having central heat exchangers rather than lateral heat exchangers, and in being confined to more tropical waters (Collette 1978
Many of the flathead species that were barcoded () were earlier examined allozymically by Keenan (1991)
. His proposed cladogram of the Platycephalinae component is very similar to the Platycephalinae component of —the four subclades identified from cox1
were also present in the allozyme tree. The mitochondrial and nuclear (allozyme) trees thus compare well. The genera Neoplatycephalus
appear to be monophyletic, Platycephalus
There do not appear to have been any prior surveys of genetic differentiation among species of the genus Squalus
, although S. acanthias
has been used as a model elasmobranch in some DNA sequencing studies (e.g. Stock & Powers 1995
; Hong et al. 1996
; Salaneck et al. 2003
was picked as a genus of particular interest as it included several provisional species (Last & Stevens 1994
). The cox1
data clearly supported the biological reality of the species, Squalus
spp. B, C, D, E and F, as each of these species had bootstrap values close to 100% for their constituent individuals (). Indeed, there was greater genetic divergence among these species than among the well-recognized Thunnus
The various unresolved questions about specimen identification briefly presented here (for the genera Squalus, Centrophorus, Hydrolagus, Pristiophorus) indicate the need to retain whole voucher specimens wherever possible, or at least make an e-voucher from a photograph. While we retained a single voucher specimen for the majority of species discussed here, most other samples are only represented as small tissue samples. Retaining all specimens as vouchers will require significant infrastructure facilities as many fish are large: this may not be practical but it might be feasible to retain whole specimens of most species at least until barcoding of those specimens is complete.
In our survey, conspecific samples often (but not always, see the P. longispinis
example) came from adjacent areas. Thus we might have somewhat underestimated the extent of within-species diversity. However, any such effect is likely to be minor. Allozyme surveys of marine fish indicate that typically only about 5% percent of genetic variance comes from inter-population differentiation. This percentage is appreciably higher for freshwater fish, around 20% on average (Ward et al. 1994
). For freshwater fish, sampling should include individuals from different watersheds whenever possible.
Our results reveal that cox1
barcoding will permit the unambiguous identification of the vast majority of fish species. We now intend to extend our survey to all Australian and all North American fish species. In the longer term, it is hoped that broader collaborations will enable the assembly of a global database of fish cox1
sequences. This will mean collecting sequences from at least 25
000 species. Note that this will inevitably mean that for many species, multiple specimens from widely divergent locations will be sequenced, minimizing the concern expressed above about underestimating genetic diversity. With increasing application of DNA barcoding, many previously unrecognized fish species will be revealed through the discovery of deep divergence of cox1
sequences within currently recognized species. There might also be instances of supposedly distinct species that have identical cox1
sequences, suggesting the possibility of species fusion. Resolution of cases of this nature will require careful morphological analysis from expert taxonomists before any final recommendations can be made. Barcoding and morphological analysis should go hand-in-hand.
Once a global cox1 barcode database has been established for fishes, anyone with direct or indirect access to a DNA sequencer will be able to identify, to a high degree of certainty, any fish egg, larva or carcass fragment. This will be an invaluable tool for fisheries managers, fisheries ecologists and fish retailers, and for those wishing to develop fish identification microarrays. The scientific and practical benefits of fish barcoding are manifold.