Traces of genetic material preserved within ancient specimens can provide a unique and important real-time record of the past (e.g. 1–4
). However, this record is compromized because ancient DNA (aDNA) is invariably damaged and degraded to some extent, initially by endogenous nucleases and microorganisms after death, and subsequently by hydrolysis and oxidation reactions that can fragment the DNA backbone and chemically modify bases (5
). Spontaneous base-loss events, creating non-coding abasic sites (6
), and certain base modifications (8
) can block the amplification of aDNA templates by halting DNA polymerase-mediated primer extension. In contrast, other base modifications can create damage-derived miscoding lesions (DDMLs) which do permit polymerase extension but which have altered base-pairing properties, leading to altered sequences in newly amplified DNA (7
Almost all aDNA studies to date have been PCR-based, as this method can generate sequence data from the trace amounts of DNA preserved in ancient specimens. However, PCR can generate incorrect sequence data from aDNA for a number of reasons. In addition to an intrinsic background rate of polymerase misincorporation errors, the altered base-pairing properties of endogenous DDMLs can cause considerable amounts of sequence variation in PCR-amplified aDNA (10
). ‘Jumping-PCR’, where partially extended primers switch between different damaged and degraded aDNA template strands during the early cycles of PCR amplification, has been shown to create non-authentic, recombinant, sequences (12–14
). ‘Jumping-PCR’ artefacts may also be compounded by the tendency of many DNA polymerases to add a single nucleotide to the 3′-end of primer extensions in a non-template directed fashion (10
). PCR can also generate additional, non-endogenous, sequence artefacts such as so-called ‘Type 1 damage’ (20–22
). PCR amplification of low copy number templates is known to create products with highly skewed representations (13
). This means that sequence artefacts can easily come to dominate the products of PCR-amplified aDNA (10
). Sequence accuracy is therefore a major issue in aDNA research.
These factors have been recognized to varying degrees. The overlapping ‘best-of-three’ PCR amplification and cloning strategy currently used when key ancient samples are amplified by standard or multiplex PCR (10
), explicitly accepts the inherent shortcomings of PCR-generated aDNA sequences and significantly increases the chances of correctly inferring the original endogenous DNA sequence. However, there are two essential prerequisites for a quantitative investigation into the molecular nature of aDNA damage and its effects on sequence accuracy. First, authentic endogenous DDMLs must be disentangled from other non-endogenous, PCR-generated, forms of sequence variation. Secondly, due to the complementary double-stranded nature of DNA, the template strand-of-origin of particular DDML base modification events must somehow be unambiguously specified. Exponential amplification from both strands of a DNA template is intrinsic to PCR and this prevents the strand-of-origin of particular base changes from being determined (20
). Together with the demonstrated potential for the generation of additional non-authentic sequence variation, these limitations of PCR-based methods have prevented full resolution of the molecular nature of DDMLs in aDNA.
Although there has always been strong theoretical and biochemical evidence that C > U-type DDMLs are a major cause of Type 2 ‘damage’ (C > T/G > A) transitions in PCR-amplified aDNA sequences (e.g. 5
), there has also been considerable debate about the existence, or otherwise, of a genuine biochemical basis for Type 1 ‘damage’ (T > C/A > G) transitions. However, it has recently become clear that so-called Type 1 ‘damage’, observed at significant levels by some traditional PCR-based studies (e.g. 20
), disappears once alternative techniques are employed (21
; this study
), and this is now recognized as a non-endogenous, PCR-generated, phenomenon (22
The potential role(s) of aDNA templates that are shorter than the target region in PCR amplifications is an issue that requires closer examination. Following the first cycle, only those initial primer extensions long enough to cover the entire target region could be utilized directly by both PCR primers. As we demonstrate however, as the PCR target length increases so does the proportion of shorter, abortive, primer extensions. These have the potential to contribute to the creation of recombinant and other non-authentic sequence artefacts in subsequent cycles. These findings raise questions about the widespread use of quantitative PCR (qPCR) methods to estimate the numbers of aDNA templates contributing to the products of PCR amplification reactions. qPCR results give no information about the number of templates below the target size that end up contributing to amplification products via ‘jumping-PCR’ and other PCR-generated mechanisms. PCR amplification from ancient extracts with template copy numbers estimated by qPCR to be in the tens-of-thousands have been shown to produce significant levels of non-endogenous ‘Type 1 damage’ artefacts (e.g. 27
). Therefore the widespread assumption that given a sufficiently high estimated starting number, endogenous DDML sequence diversity in aDNA templates will necessarily be reflected by the sequence variation within PCR-generated products simply cannot be sustained.
As traditional PCR-based approaches have proven incapable of fully resolving the molecular nature of DDMLs, we have developed a novel SPEX-based approach () to generate detailed information about post mortem DDML base modifications in aDNA. In direct contrast to PCR, SPEX is an amplification methodology that specifically targets only one of the aDNA template strands at a locus-of-interest and imposes no predefined target length. This allows the production of first-generation copies of aDNA template molecules, with quantifiable (up to 40-fold or more) coverage from a single reaction. SPEX is shown to be capable of producing sequence data of unprecedented accuracy from aDNA, without the generation of additional, non-endogenous, sequence artefacts over and above a background rate of misincorporation errors common to polymerase-based methodologies.
Figure 1. Single primer extension (SPEX) amplification. (A) Denaturation and hybridization of a single biotinylated primer to one target strand at the locus-of-interest. (B) Primer extension by a thermostable DNA polymerase until halted at the physical end of an (more ...)
Recently, massively parallel metagenomic sequencing approaches have also been used to investigate aDNA damage. By inferring the sequences of individual single-stranded DNA (ssDNA) templates generated from aDNA via the 454-methodology (29
), independent studies concluded that in addition to C > U-type DDMLs, a substantial proportion of Type 2 transitions were due to modification(s) of G residues (by an unknown biochemical process) that caused them to be read as A by polymerases (21
). Here, we use SPEX to overcome limitations inherent to both traditional PCR- and current 454-based approaches. The ability of SPEX to rigorously distinguish between authentic aDNA and first-generation copied sequences, whilst simultaneously quantitatively generating highly accurate sequence data from designated loci, has enabled the molecular nature of DDMLs to be fully revealed for the first time.