When ancient DNA is studied by high-throughput sequencing rather than PCR, the laboratory procedures that have been developed for ancient DNA extraction and contamination prevention over the past 20 years are still of utmost importance. These procedures need to be adhered to up to the point of the construction of libraries using adapters carrying unique tags. Only after such tagged adapters have been added is it safe for libraries to leave clean room facilities for other manipulations and sequencing. Significantly, all potential sources of contamination, starting with the bone itself, through DNA extraction and adaptor ligation can then be considered together in a single ‘snapshot' of the contamination of a library. Thus, all later assays of contamination of the same library can be assigned to the same contamination estimate and thereby add to its precision.
Methods that allow specific sequences of interest to be retrieved from such tagged libraries (Hodges et al, 2007
; Briggs et al, 2009
; Gnirke et al, 2009
) make it possible to quickly analyse many sequences of interest from such libraries. Criteria of authenticity that are currently successfully applied to PCR-based studies of ancient DNA, such as reproduction of results from an independent extraction from the same bone, will then be useful just as they have been hitherto in PCR-based studies. In contrast, these criteria are not easily applicable to high-throughput shot-gun sequencing of entire ancient genomes. This is a particular problem for the Neandertal genome but applies also to other ancient genomes, such as mammoths (Miller et al, 2008
), because all mammals including humans share conserved DNA sequence elements that may confuse results.
For sequencing ancient genomes we suggest a two-phase approach, much as was done for the Neandertal mitochondrial genome, in which initial work identified differences to current human mtDNAs and such differences were later applied to directly estimate contamination. For the first phase of genome sequencing, several direct contamination estimates, where each in itself is less than comprehensive, will be applied in concert. For the Neandertal genome, this includes the determination of mtDNA contamination, the detection of male contamination in bones of females, and capture methods that allow positions diagnostic of contamination in one particular individual to be identified and subsequently used in other libraries from the same individual. Eventually, once a Neandertal genome sequence is determined to high coverage, capture approaches can be applied to other Neandertals to identify enough positions that are fixed among Neandertals and differ from current humans. At that point such positions can be used to estimate contamination in Neandertal libraries even before they are subjected to other analyses. However, even then, some possible technical concerns need to be addressed. For example, because the sequences retrieved from ancient bones tend to be rich in the nucleotides G and C (Green et al, 2008
), it needs to be determined to what extent such preservation biases are equally representative of endogenous and contaminating DNA, and thus whether a ‘correction factor' might be required when extrapolating contamination estimates derived from high-coverage diagnostic positions to the entire genome.
In contrast to the direct estimates that we describe and advocate above, indirect measures based on the extent of fragmentation or modification of the DNA are at best supportive in nature. Particularly, comparisons of features between longer and shorter DNA fragments suffer from the fact that shorter fragments are more difficult to identify and correctly align to genome sequences of extant species.
One interesting question is whether it will be possible to estimate contamination in analyses of early hominins other than Neandertals, such as other archaic human forms or early modern humans. Conceivably, this may be possible by ‘boot-strapping' oneself from the mtDNA to the nuclear DNA much as is done for the Neandertal genome. If extracts from a specimen can be identified for which deep high-throughput sequencing of mtDNA shows that a single mtDNA genome is present with minimal or absent indication of any additional mtDNA, this shows that the DNA preparation derives from a single individual. This individual is either the ancient individual from which the samples stems or a single recent human contaminating the specimen or extract. In this situation, fragmentation and nucleotide misincorporations may have a helpful role. Although individual ancient DNA fragments cannot be reliably distinguished from modern contaminants based on these features, the knowledge that all sequences in a dataset derive from a single individual will allow the overall fragmentation and misincorporation patterns to be analysed. If it can be shown that these patterns fall in a range typical of ancient, minimally contaminated specimens, and outside the range seen in contaminating DNA from specimens found and curated under conditions similar to the specimen being studied, then the DNA sequences are likely to be ancient. The mitochondrial and nuclear DNA sequences thus determined can then serve as an inroad to targeted studies of other, less well preserved specimens of the same hominin group. We are thus hopeful that it may become possible to sequence not only the Neandertal genome to high coverage, but also to study genomes of other ancient human forms provided that uncontaminated specimens that allow very deep sequencing can be found.