The success of linkage mapping in Mendelian traits led to great optimism that the same approach could be harnessed to identify genes in disorders or traits without clear Mendelian inheritance patterns, so called complex or quantitative traits. In contrast to phenotypes that are controlled by single genes, in complex traits there is no clear-cut mode of transmission. It is likely that quantitative traits are influenced by many genes of small effect size (called quantitative trait loci, QTLs), with factors such as gene-gene interactions (epistasis), pleiotropy, and gene-environment interactions complicating matters further [1
Although many linkage screens for complex disorders have been reported, linkage studies are underpowered to detect genes of small effect size [3
]. In contrast, case-control association analysis, even using stringent significance levels, promises to provide the power required to detect QTLs that are too weak to be detected by linkage analysis alone [2
]. However, this power comes at an expense, as a systematic genome wide association study requires very large numbers of DNA markers, perhaps as many as 500,000 [4
]. Furthermore, the ability to detect QTLs of small effect size requires large samples. For example, 80% power (p
< .01, two tailed) to detect an effect of 0.5% in an unselected sample requires samples of at least 1000 individuals [6
Consequently, considerable effort has gone into developing high-throughput genotyping methodologies that allow the genotyping of dense marker sets in large sample sizes quickly, accurately, with minimal optimisation and very low unit cost [7
]. Until such technologies become widely available, one way to address the cost, time and labour involved in using large sample sizes is to perform analyses not on individual DNA samples, but on pools made up of DNA from multiple individuals for cases and for controls, a technique that can dramatically reduce the genotyping burden [8
]. There is a growing literature addressing methodological issues such as DNA pool construction, genotyping assays, and statistical analysis [9
], and the strengths and weaknesses of DNA pooling have recently been reviewed [13
]. DNA pooling has been used successfully to identify replicated associations with complex traits [14
]. However, using the current SNP genotyping methodologies on DNA pools even for a small number of DNA pools for a dense marker set is still labour-intensive and expensive. One solution to the problem of genotyping many DNA markers is SNP genotyping microarrays which use a one-primer assay to genotype thousands of SNPs, offering the first real hope of a systematic survey of DNA variation in the human genome. However, microarrays can be used only once and are expensive in studies consisting of the large samples needed to detect QTLs of small effect size. One solution to the QTL conundrum is to combine both DNA pooling and SNP microarrays, an approach that we call SNP-MaP (SNP m
ooling), which can dramatically reduce the cost of screening large numbers of SNPs on large samples.
We hypothesised that quantitative estimates of allele frequencies – especially the relative allele frequencies comparing groups like cases and controls – can be derived from pooled DNA using SNP genotyping microarrays, similar to the way that expression microarrays estimate quantitative frequencies for mRNA transcripts [17
Affymetrix software (GDAS) uses the hybridisation fluorescence signals from the SNP microarrays to generate 'Relative Allele Signals' (RAS), a ratio of the measurement of allele A to the summed measurement of alleles A and B. Thus, RAS values vary between 0.0 and 1.0. Two independent RAS values are derived for each SNP from the sense strand (RAS1) and the anti-sense strand (RAS2). As explained in the Methods, Affymetrix software plots RAS1 scores against RAS2 scores and uses empirically derived clustering information to trichotomise these RAS scores as genotypes for DNA of an individual. RAS values near 0.0 are identified as a BB homozygote, 0.5 as an AB heterozygote, and 1.0 as an AA homozygote. We propose that RAS values can be used as quantitative indexes of allele frequencies in DNA pools.
The purpose of our current report is to follow up our previous study that addressed the feasibility of SNP-MaP [18
]. We explore the reliability validity and sensitivity of SNP-MaP in greater detail using Affymetrix GeneChip®
Mapping 10 K Array Xba 131 which genotypes more than 10,000 SNPs.
We constructed a control DNA pool consisting of 100 individuals independently three times (control pool A, B, and C), each assayed on triplicate microarrays. We used these replicate control pools to assess the reliability of estimating allele frequencies from pooled DNA. To assess validity and sensitivity, we compared allele frequency estimates from microarray assays using pooled DNA to individual genotyping. In addition, in order to assess sensitivity experimentally, we reconstructed two 'case' DNA pools that differed by 15% and 20% in allele frequencies from the controls by spiking an aliquot of a control pool with an individual's DNA who was also individually genotyped on the microarray. Each case pool was assayed on duplicate microarrays.