|Home | About | Journals | Submit | Contact Us | Français|
For many analytical methods the efficiency of DNA amplification varies across the genome and between samples. The most affected genome regions tend to correlate with high C+G content, however this relationship is complex and does not explain why the direction and magnitude of effects varies considerably between samples.
Here, we provide evidence that sequence elements that are particularly high in C+G content can remain annealed even when aggressive melting conditions are applied. In turn, this behavior creates broader ‘Thermodynamically Ultra-Fastened’ (TUF) regions characterized by incomplete denaturation of the two DNA strands, so reducing amplification efficiency throughout these domains.
This model provides a mechanistic explanation for why some genome regions are particularly difficult to amplify and assay in many procedures, and importantly it also explains inter-sample variability of this behavior. That is, DNA samples of varying quality will carry more or fewer nicks and breaks, and hence their intact TUF regions will have different lengths and so be differentially affected by this amplification suppression mechanism – with ‘higher’ quality DNAs being the most vulnerable. A major practical consequence of this is that inter-region and inter-sample variability can be largely overcome by employing routine fragmentation methods (e.g. sonication or restriction enzyme digestion) prior to sample amplification.
The fact that amplification methods vary in efficiency across the genome has often been noted, for example in whole genome amplification (WGA), next generation sequencing, genome wide SNP genotyping, and PCR [1-5]. Difficult to assay regions are somewhat correlated with high C+G content [1,6-10], but this relationship is complex, DNA sample dependent, and incompletely understood. Regions of high C+G content tend to resist the essential DNA denaturation step at the initiation of nearly all DNA amplification protocols, though it is assumed that this effect will not be so extreme as to completely prevent DNA strand separation. However, this assumption may be incorrect. In DNA melting studies in the early 1970s, select human genome DNA fragments were seen to remain double stranded under extreme denaturing conditions [11,12]. The nature of these challenging sequences has not yet been determined, and today most investigators are probably unaware of the early reports.
Here, we investigate a number of genomic regions that across several samples produce low intensity hybridization in Illumina Infinium genotyping. We find that a major factor that can influence such regions are intervals of high C+G content that do not denature efficiently under routinely used conditions. These intervals cause connected DNA sequences to rapidly re-anneal and prevent access to primers or probes. The effects of this in PCR could be completely ameliorated by enzymatic separation of the high C+G interval and the assay target. We postulate that inter-sample variability is due to the amount and random distribution of nicking within a DNA sample which acts to separate these difficult to denature sequences from other DNA, and that highly intact DNAs will suffer the most. We provide optimized PCR protocols and suggest that DNA is pretreated by either sonication or restriction enzyme digestion prior to amplification steps in methods.
To explore the interplay between DNA melting and difficult to assay genome regions, we examined large scale Illumina Infinium SNP array datasets (from genome wide association analyses) and identified genomic regions within which SNPs consistently gave weak intensity signals in the poorest performing samples (example given in Figure Figure1).1). We herein refer to these as ‘weak Illumina signal’ regions. Single copy DNA probes were constructed for ‘weak Illumina signal’ and control ‘normal Illumina signal’ regions on the long arm of Chr 2 (Table (Table1),1), to be hybridized on to Southern blots. These blots employed freshly prepared high quality genomic DNAs, and each sample was divided into four aliquots so that we could differentially process them by temperature or alkaline denaturation before or after restriction enzyme digestion. One would expect the denatured DNA to migrate differently to dsDNA and not give bands of expected restriction fragment sizes upon hybridization with the single copy probes, but any sequences that had fully resisted the denaturation treatments would give such bands. Example results are shown in Figure Figure2.2. Three of three ‘normal Illumina signal’ region probes produced the expected ‘no band’ outcome, whereas two of the three ‘weak Illumina signal’ region probes generated bands from the denatured samples indicating that these latter regions are generally difficult to denature.
We examined normal and ‘weak Illumina signal’ regions using the Paralogue Ratio Test (PRT) [13,14]. Standard PRT, which is a powerful technique to genotype copy number variation, employs a single pair of PCR primers to co-amplify a ‘test’ locus (whose copy number is being assessed) and a ‘reference’ locus (a stable single copy sequence) in a single PCR reaction. The two amplicons are distinguished by size, and their relative product amounts used to determine the test locus copy number. We adapted this concept to co-amplify single copy sequences from normal and ‘weak Illumina signal’ regions. This allowed the comparison of their relative amplification efficiencies in the same PCR reaction with identical conditions and DNA template concentration. Importantly, the ‘test’ and ‘reference’ amplicons employed for six assay designs created for these experiments had similar and not unusually high C+G content (average values of 56.8 and 51.0% C+G respectively). In all six assays, the ‘reference’ amplicon (i.e., the product amplified from the assay’s normal Illumina signal region) produced a strong band, whereas its partnered ‘test’ amplicon produced a weaker band (typically 10-50% of the strength of the reference), indicating a reduced PCR efficiency for ‘weak Illumina signal’ regions.
The above data are consistent with the hypothesis that ‘weak Illumina signal’ regions are refractory to amplification and analysis because they are difficult to denature. To promote DNA denaturation in the PRT assays, and thereby increase the amplification efficiency of the ‘weak Illumina signal’ loci, we tried the following standard denaturing enhancers; including Dimethyl sulphoxide (DMSO) at up to 50% [15,16]; adding Single Stranded Binding Protein ; increasing the PCR denaturing temperature to 98°C. These strategies all helped to improve amplification efficiency, but none of these remedies enabled a full strength intensity gel band to be produced for any of the ‘weak Illumina signal’ loci. Adding Betaine [16,18] was more effective, but only at very high concentrations (i.e., at 1.5-2.0 M), with the downside of causing overall amplification efficiencies to drop considerably. Most effective was denaturing the input DNAs, and snap cooling on ice, prior to inclusion in the PCRs. However, to significantly improve the amplification efficiencies (Figure (Figure3),3), it was necessary to heat the samples to 130°C in water for 1 minute (longer or hotter reduced PCR efficiency presumably due to excessive DNA hydrolysis).
Cumulatively, these findings show that ‘weak Illumina signal’ regions are particularly resistant to DNA denaturation under standard conditions. This is true, even when tested PCR amplicons themselves are not particularly C+G rich or unusual in any apparent way (in fact, for two PRTs the test and reference were almost identical). The implication of this is that locally something other than C+G content of the target sequence is hindering DNA strand separation. Direct visualization of genome features represented as tracks on the UCSC genome browser suggests this may have something to do with the very highest peaks of C+G rich sequence coincident with particularly dense clustering of CpG islands (Figure (Figure4).4). A possible mechanism could then entail localized regions of extreme C+G content remaining duplexed during standard DNA denaturation procedures, and in so doing they would prevent their flanking sequences - that are melted - from diffusing away from each other. As such, these neighboring strands will be able to quickly re-anneal, following zero-order kinetics, as soon as non-denaturing conditions are re-established . We refer to domains affected by this proposed phenomenon as "Thermodynamically Ultra Fastened" (TUF) regions.
To test the TUF hypothesis, we started by looking for localized, highly C+G rich DNA elements in the immediate vicinity of the ‘weak Illumina signal’ region amplicons for the six PRT assays. Such elements were clearly present in five cases. We then targeted one particular assay (‘2n13’: for which the ‘test’ and ‘reference’ efficiencies were most different) and digested the template DNA with various restriction enzymes before running the PRT. DNA amplification was seen to be problematic only when the ‘test’ amplicon was located in the same DNA fragment as the high C+G element (Figure (Figure5).5). In fact, the amplification efficiency was fully restored when the ‘test’ amplicon was separated from the high C+G element, a finding consistent with the TUF hypothesis.
To explore the TUF phenomenon genome wide, we utilized data from 1252 Illumina genotyping runs  and, on a sample by sample basis, regressed the log probe intensity ratio (LRR) on eight C+G and eight CpG terms for genomic window sizes of 50 bp to 1 Mbp. The residual variance prior to and after adjustment for C+G and CpG is shown in Figure Figure6.6. The samples that showed the largest correlations with the C+G and CpG terms, measured by the proportion of LRR variance explained, involved C+G content size windows of 0.1 - 10 kb (Z scores greater than 30 or less than −30), and were also observed with a lower significance with CpG content and other window sizes. We then experimentally tested the amplification behavior of DNAs for samples for which the correlation was extreme (24 positive and 19 negative), plus 11 other DNAs where no significant correlation was apparent, using two PRT assays (2n13 and 8n6). A strong statistical association was seen between PRT performance and the per sample extreme behavior on the Illumina platform when considering the smaller size windows (0.1 kb for C+G; p=0.0001 and for the 0.5 to 5 kb range for CpG; p between 0.01 and 0.00085), as shown in Table Table2.2. This fits perfectly with the notion that many particularly C+G rich elements (including CpG islands) across the genome influence the efficiency of analysis of surrounding contiguous sequences by severely hindering DNA denaturation.
These observations imply that it should be possible to bioinformatically predict and partially correct for the effects of TUF areas of the genome and for other phenomena that have been observed to induce similar C+G correlated effects. Diskin et al.  demonstrate that C+G-correlated intensity fluctuations (waves) are present in both Illumina and Affymetrix whole-genome SNP microarrays and that C+G content in 1 Mb windows are highly correlated with intensity (both positively and negatively) with the amplitude determined by the degree that DNA quantity/concentration deviated from the vendor’s recommended level. Efficiency of PCR amplification of short DNA fragments (<200 bp) has also been shown to be affected by local C+G-content and some suggestions have been made on how to predict and compensate for such effects .
The discovery and descriptive elucidation of TUF allows us to draw several important practical conclusions. Critically, the experimental impact of the phenomenon on any particular DNA sample will depend upon how nicked or fragmented that sample is, because the density of strand discontinuities will affect the probability of any particular DNA sequence being separated from C+G rich elements. Counter-intuitively, this implies that newly prepared, highly intact DNAs will be most vulnerable to TUF induced problems, whereas older and/or more degraded samples will be less affected. In support of this, we artificially ‘rejuvenated’ nicked, old DNAs by ligase treatment (PreCR by NEB), and found that this made them far more susceptible to TUF as measured by our PRT assays (Figure (Figure7).7). Conversely, by artificially introducing nicks and breaks into DNA one can overcome the effect of TUF (as seen above for restriction enzyme digestion, Figure Figure5),5), ensuring highly uniform assay behaviour across genome regions and samples. This benefit of DNA fragmentation was also demonstrated for WGA (Multiple displacement amplification [23,24] - which is often applied before genotyping or sequencing), and for the overall process of Illumina Infinium genotyping (Figure (Figure8).8). In both cases, sonication of the sample prior to each protocol greatly improved the quality and uniformity of the results.
In summary, our description of TUF represents the important recognition of a phenomenon relevant to many regions of the genome, thus impacting in a sample dependant manner the conduct of genome-wide studies of distinct types of genetic variation in relation to human diseases/traits. For example, it may well be practically relevant in Copy Number Variation (CNV) research and the use of next generation sequencing, where assay behavior can be unpredictable [25-28]. Further work will be required to fully understand the biochemical basis of the TUF regions in order to optimally develop protocols and approaches for large scale genomic analyses. Knowledge of the TUF phenomenon and ways to overcome its deleterious consequences should provide investigators with a more nuanced approach towards handling issues related to C+G content and its effect upon assay robustness and efficiency.
DNA donors for Southern Blotting and PRT analysis of TUF regions were of north European origin, and had given informed consent with ethical approval from the Leicestershire, Northamptonshire and Rutland Research Ethics Committee (LNRREC Ref. No. 6659 UHL). DNA was prepared from fresh blood as follows. 20 ml whole blood was centrifuged at 1300 g at 4°C for 15 minutes. The buffy coat was extracted and incubated at 37°C in 15 ml lysis buffer (10 mM Tris-Cl (pH 8.0) 0.1 M EDTA (pH 8.0) 0.5% w/v SDS) for 1 hour. Proteinase K (final concentration 100 μg/ml) was added and mixed gently followed by incubation at 50°C overnight. After allowing to cool to room temperature an equal volume of phenol equilibrated with 0.1 M Tris HCl and mixed slowly on a Stuart Rotator SB3 for 10 mins. The phases were separated by centrifugation at 5600 g for 15 min. The aqueous phase was transferred to a fresh tube and the phenol extraction repeated twice. To the final aqueous phase 1/10th volume 5 M Ammonium Acetate and 2 volumes of 100% Ethanol were added. Samples were mixed very slowly and carefully by inversion. The precipitated DNA was spooled using a glass hook and dried briefly and dissolved in water to a final concentration of 200 ng/μl. DNA quality and quantity was assessed by gel electrophoresis and on the NanoDrop ND-8000 spectrophotometer.
PRTs were designed according to information from Armour et al.,. All PRT oligonucleotide primers are described in Table Table1.1. 10 μl PRT PCRs contained 1 x PCR buffer (75 mM Tris HCl (pH8.8), 20 mM (NH4)2SO4, 0.01% v/v Tween) (Abgene, Epsom, Surrey, UK), 1.5 mM MgCl2 (Abgene), 0.15 μM of each primer (Biomers), 0.2 mM dNTPs (Promega), 0.3 U Taq polymerase (Kapa Biosystems, Boston, MA, USA) and 10 to 25 ng DNA. PCR were initially heated to 94°C for 30 seconds, and then heated for 25 to 35 cycles as follows: 94°C for 30 seconds; annealing temperature for 30 seconds; 72°C for 1 minute. A final extension was carried out at 72°C for 5 minutes. Where required, restriction enzyme digests were performed to allow visualisation of similar sized PRT products. On using additives (DMSO up to 50%, betaine up to 2 M) the optimal annealing temperature was re-optimised for each assay. Recommended PCR conditions for TUF regions are 1.5 M betaine, 5U/μl Taq polymerase, 0.01U/μl pfu enzyme and use of 98°C denaturing temperature in all cycles. Higher concentrations of betaine may be appropriate for individual PCRs.
Gels were documented using a GBOX HR, Gel documentation system (Syngene, Cambridge, Cambridgeshire, UK) using the EDR function and the maximum resolution settings (5.52 M pixels). Peaks were identified and peak heights quantified using the Gene Tools programme version 4.00 (A) (Syngene). For peak height analysis, the rolling disc method (diameter=30 pixels) was used to determine peak base line.
High temperature denaturing was performed in a 96 well format heat block set to the desired temperature. Sierra Antifreeze/coolant (Peak performance products, Northbrook, IL, USA) was used to maintain a liquid contact between the tubes, thermometer and heat block. The DNA was denatured in either water or in buffered conditions (1 x PCR buffer, as above) in tubes with the lids sealed tightly with Nescofilm to prevent evaporation at temperatures greater than 100°C. Samples were heated for 1 minute and snap cooled on ice for 5 minutes. Samples were stored at −20°C and thawed on ice prior to use.
Aliquots of genomic DNA (200 ng/μl) were sonicated for 30 second intervals (with a 30 second gap), using a Bioruptor (Diagenode, Liège, Belgium) until the desired size range (0.3 to 3.0 kbp) was reached (visualised by agarose gel electrophoresis).
Using conditions recommended by Illumina, 200 ng samples of genomic DNA (with or without pre-processing as necessary for each experiment) were hybridised to human370CNV Infinium HD BeadChips (Illumina INC, San Diego, CA, USA).
Whole genome amplification was performed using the REPLI-g Mini Kit (Qiagen) to amplify a range of masses of human genomic DNA to generate >8 μg of DNA. Samples were prepared using the isothermal amplification reaction in PCR tubes incubated at 30°C for 16 hours and 65°C for 3 minutes in a thermal cycler. Amplified products were quantified using a NanoDrop spectrophotometer and visualised on a 0.8% LE agarose gel with Ethidium Bromide.
Six μg of genomic DNA was digested using selected enzymes supplied by New England Biolabs (NEB) (Hitchin, Hertfordshire, UK) under the conditions recommended by the supplier with the addition of 4 mM Spermidine pH 7.4. Double digests were performed in the most suitable buffer, and the quantity of the least active enzyme per reaction was doubled if required.
Heat denaturation was performed in a water-bath at 100°C for either for 40 seconds to 4 minutes as stated. Samples were snap cooled on ice for 5 minutes prior to gel electrophoresis.
Alkaline denaturation was performed by addition of 0.4 M NaOH to 0.32 M (~ 240 μl added to 54 μl of sample), and incubation at room temperature for 10 minutes. 1 M Tris Hcl (pH 8) was added to 0.02 M prior to neutralisation (pH 8 to 8.5) with 0.4 M HCl. Samples were ethanol precipitated and dissolved in distilled water.
Digested DNA was run at 3 V/cm in 0.7% agarose gels (LE agarose, Seakem. 1 X TAE (4.84 g Tris base, 11.4 ml glacial acetic acid, 3.7 g EDTA pH 8.0 per litre)). The resulting gels were soaked twice in denaturing solution (1.5 M NaCl, 0.5 M NaOH) for 30 minutes, and twice in neutralising solution (0.5 M Tris pH 7.2, 1 M NaCl) for 30 min. The denatured DNA was transferred onto uncharged nylon membranes (MAGNA, Nylon, Transfer Membrane, 0.45 Micron; GE Water & Process Technologies, Trevose, PA, USA) using 10X SSC as the transfer buffer and fixed to the membranes by baking at 80°C in a Sanyo MOV drying oven (Sanyo E&E Europe BV, Biomedical Division, Loughborough, Leicestershire, UK), for 1 hour.
PCR amplified probes (Table (Table1)1) were purified using a Qiagen MinElute PCR purification kit (Qiagen). 75 ng of probe was labelled for 15 minutes with α-32P –dCTP (Perkin Elmer, Waltham, MA USA) using the Rediprime II random prime labelling system (Amersham Biosciences, Little Chalfont, Buckinghamshire, UK), purified using ILLUSTA NICK Columns Sephadex DNA grade (GE Healthcare, Little Chalford, Buckinghamshire, UK), and eluted in 400 μl column wash (1 x TE, 0.1% w/v SDS). 75 μg of human Cot I DNA (Invitrogen, Paisley, Renfrewshire, UK) was added prior to denaturation at 100°C for 6 minutes and snap cooling on ice for 5 minutes.
Hybridisation was performed in 20 ml Church buffer (0.5 M sodium phosphate, pH 7.2, 7% SDS, 1 mM EDTA, 1% BSA ) with 2 mg heat denatured (100°C for 5 min, ice for 5 min) salmon sperm DNA. Pre-hybridisation was performed at 65°C in a rolling bottle for 2 hours prior to hybridisation for 10 hours. Hybridised blots were washed for 10 min at 65°C in 0.1 x SSC, 0.1% SDS. Counts were recorded using a phosphoimager screen (Amersham Biosciences) for between 12 and 60 hours. Further washing at 68°C or 72°C depending on the number of background counts.
The log probe intensity ratio (LRR) value for each SNP or CNV assay provides data on probe intensity relative to that of the estimated genotype-specific cluster location. LRR values estimated by the Genome Studio software were corrected for bias due to the properties of the assay chemistry and fluorescent dyes used in the probes. We implemented a method similar to that described by Staaf et al.  to re-estimate LRR after applying quantile-normalization, with an enhanced multiple linear regression model, incorporating within-chip signal re-scaling terms and a polynomial correction for GC and CpG waves. The correction model is an extension to the method described in Diskin et al. with terms for multiple window sizes for proportion of GC and CpG content around the genomic location of each set of probes. GC and CpG terms in the regression model are the proportion of GC and CpG content for window sizes (in bp) of 50, 100, 500, 1 k, 10 k, 50 k, 100 k, 250 k, and 1 M centered around the genomic location of each assay, based on locations annotated in the Illumina manifest files and sequence context based on the NCBI build 36 reference genome sequence. This model is estimated per sample, as the phenomenon is modulated by TUF, the concentration of the DNA input, and possibly other factors. The final LRR was re-computed using the resulting quantile-normalized and GC/CpG corrected values as shown in Peiffer et al.. The reduction in variance of the LRR values is shown in Figure Figure66.
TUF: Thermodynamically ultra-fastened; DNA: Deoxyribonucleic acid; WGA: Whole genome amplification; SNP: Single nucleotide polymorphism; PCR: Polymerase chain reaction; PRT: Paralogue ratio test; DMSO: Dimethyl sulphoxide; CNV: Copy number variation.
The authors declare that there are no competing interests.
CDV participated in study design and coordination, critical analysis of results, performed bioinformatic and statistical comparisons between datasets and drafted the manuscript. PJF planned and performed Southern Blots, PRTs, WGA and Illumina Infinium genotyping experiments and aided in analysis of results. KJ performed statistical analysis of Illumina Infinium raw intensity data and drafted the manuscript. OL carried out alignment of Illumina data with genome features. SJ and ML collected DNA samples and performed data analysis of Illumina genotyping. DA analysed genotyping datasets. RRV performed Southern Blots and interpreted data. IG performed Illumina Infinium genotyping and participated in analysis of results. SJC participated in participated in experiment design, critical analysis of results and drafting of manuscript. AJB conceived the study, participated in its design and coordination, critical analysis of results and drafting of manuscript. All authors read and approved the final manuscript.
This research was supported by Action Medical Research (grants SP4139 and SP4483) and by the European Union’s Seventh Framework Programme (FP7/2007-2013) project READNA (grant agreement HEALTH-F4-2008-201418). We wish to recognize earlier collaborations particularly with Nancy J Cox from the Division of Biological Sciences, University of Chicago and Paul H Dear and Bernard Konfortov of the MRC laboratory of Molecular Biology (University of Cambridge) that contributed to pointing our experiments towards identifying the TUF phenomenon. The authors wish to thank Ed Schwalbe (University of Newcastle) and Nathalie Zahra (University of Leicester) for their technical support.