The CNV-SNP array provides robust, accurate data for both laboratory- and field-derived samples. Through optimizations described here, the CNV-SNP array overcomes many hurdles associated with molecular work on P. falciparum
field samples. Lower starting amounts of DNA are possible when using 65% AT random nonamers that compensate for the extreme AT bias of the genome. This optimization is especially useful for field sample DNA, which is typically scarce and difficult to obtain. It also eliminates the need for in vitro
culture adaptation of field samples, which is typically used to generate enough DNA for applications like next-generation sequencing and is known to alter CNV and skew results of CNV analyses [29
]. Using our modified protocol, the CNV-SNP array requires no more than 250 ng of starting parasite DNA with no compromise in data quality. Moreover, the ample yields of labeled DNA from 250 ng of starting parasite DNA indicate that the lower limit has not yet been defined, raising the possibility that finger prick blood samples on filter paper are accessible to this technology. In addition, the CNV-SNP array is robust to samples with high host DNA contamination (>90%) with no drop in data quality, making microarray-based genotyping complementary to higher resolution next-generation sequencing that is sensitive to human DNA contamination in field samples, often requiring sample preprocessing for target DNA enrichment. Notably, high human DNA contamination and low amounts of parasite DNA present serious challenges to genotyping the large number of samples necessary for genome-wide association studies.
Probe design optimizations contribute to the performance of this microarray for the P. falciparum genome. Sense and antisense resequencing probe quartets were used for SNP genotyping on the CNV-SNP array. A SNP call required that sense and antisense probe quartets made complementary calls; furthermore, the robustness of the base call was evaluated using the ratio of background signal versus the probe with the greatest signal intensity. Signal intensities within SNP probe quartets were more similar to each other than to probes in other probe quartets or between sense and antisense probe quartets of the same locus (Figure ) and indicates the importance of measuring the background signal for each individual SNP quartet - as provided by the resequencing probesets - rather than background noise from the entire array or locus.
Resequencing probes were optimized for SNP genotyping in P. falciparum
by comparing the performance of probes at static lengths with probes balanced by melting temperature on a prototype 5K SNP array. Probe melting temperature outperformed static probe lengths for optimal SNP detection at a probe melting temperature of 66°C with performance that was reasonably consistent in exons, introns, and intergenic regions (Figure S1 in Additional file 1
Our results on optimal probe length and melting temperature differ from findings in another study [31
]. This is likely due to the use of different methods for calculating probe melting temperature and our optimization to the AT-rich P. falciparum
genome. However, our broader conclusion that variable length or isothermal probes provide optimal SNP detection is supported across various organisms [31
], and indicate that longer, isothermal probes increase signal strength while also being short enough to remain sensitive to single base mismatches [32
Resequencing probesets designed for a 66°C melting temperature were generated for 45,524 SNP loci for inclusion on the CNV-SNP array. While longer, isothermal probes improve SNP genotyping, certain loci are more easily genotyped than others, and some remain inaccessible to microarrays and short-read next-generation sequencing technologies. For instance, SNPs in exons have greater genotyping success than SNPs in introns or intergenic regions, likely due to regions of high AT richness or interspersed sequence repetitiveness that hinder probe design and binding specificity in intronic and intergenic regions. Current SNP genotyping microarrays, such as those developed by the NIH and the Broad Institute [21
], are focused on high quality SNP loci that are easily genotyped across microarray platforms (Figure ). However, the use of isothermal probes designed at an optimal melting temperature allows us to interrogate more difficult loci and maximize the overall number of SNPs that can be robustly genotyped on the CNV-SNP array (on average, 36,948 useable SNP genotypes with 95% accuracy from a single hybridization).
An interesting debate surrounds the continued value of microarrays with the emergence of next-generation sequencing. As the cost of next-generation sequencing continues to decrease and protocols continue to improve, we will see a realization of the platform providing ultimate resolution and throughput, provoking the prediction that microarrays will soon be rendered obsolete. However, we suggest that the CNV-SNP array will continue to be useful as an 'everylab' tool alongside next-generation sequencing. Whole genome sequencing underpins the SNP discovery needed for chip design; in general, whole genome architecture and ultra-resolution mapping require fully sequenced and assembled genomes. The customizable microarray platform continues to improve in density (4.2 million element custom designs are anticipated in 2011) and offers unique configurations up to 12-plex of 135K probes, leading to a scenario in which a global set of SNPs identified by sequencing can be precisely represented on microarrays for regionally focused or hypothesis-driven designs. To date, microarrays remain cheaper, produce data more quickly, require less computational innovation, and are especially useful for processing large numbers of samples, while producing sufficient resolution and quality for genome-wide association studies and population genomic analysis. Furthermore, although progress is being made in scoring CNV using next-generation sequencing, that technology still lags behind the performance of microarray CGH.