On balance, while there are a number of stimulating results emerging from the HapMap and other large-scale genotyping studies, there remain many challenges for WGA studies (Hirschhorn & Daly 2005
; Wang et al. 2005
). At present, we are not in a position to clearly delineate precise genomic regions that are worthy of greater (lesser) attention in disease-gene mapping, nor are we able to safely depict the boundaries of association or linkage intervals. The output of most immediate and practical relevance for association studies is a vast set of genetically validated markers (). This is an impressive achievement, considering that only a few years ago, candidate gene studies required the use of restriction fragment length polymorphisms (RFLPs) or extensive resequencing to find even common-allele markers. The single nucleotide polymorphism (SNP) information is valuable to nearly all association studies, family- or population-based, and involving candidate genes or the entire genome.
Figure 3 Increase in availability of genetic markers. The number of markers available in dbSNP is shown as a function of the dbSNP release date (all data from www.ncbi.nlm.nih.gov/SNP). The full distribution (dark grey; greater than 10 million in 2005) reflects (more ...)
Unfortunately, while the validated markers will make WGA studies cheaper, faster and more accessible, they do not guarantee greater success. Small sample sizes have long plagued allelic association studies, and genotyping more markers does not compensate for a lack of power to detect real effects in the first place. When the expected effect size of each locus is small, as in common complex traits, large numbers of individuals are required whatever the density of markers genotyped (Risch & Merikangas 1996
; Zondervan & Cardon 2004
). This sampling problem is exacerbated in the context of WGA, as the study of many markers creates multiple-testing problems with the correlated variants. WGA studies generally require larger, not smaller, sample sizes than previous investigations of specific candidate genes or regions.
Nevertheless, arguments for or against WGA studies, or even descriptions of the likely effects of genetic variability and sample sizes noted above, are mainly theoretical as, to date, there have been only a few association studies involving extensive marker coverage (Ozaki et al. 2002
; Roses et al. 2005
), and none that have spanned the entire genome (i.e. coding plus non-coding regions). Without real data, it is difficult to design and test new methods of analysis, compare strategies and study designs and consider the diversity of genetic architectures for different phenotypes.
To gain an initial view of what a WGA study might look like, we merged two publicly available datasets: one involving gene expression (Morley et al. 2004
) and another involving interim data from the HapMap project (Gibbs et al. 2003
). We considered the 100 most heritable gene-expression values as phenotypes (Y
), which we regressed on each diallelic HapMap marker one-at-a-time: E
, where Gj
was coded to detect additive genetic effects (0, 1, 2 for genotypes aa, Aa, AA, respectively) for the j
th marker. Our analyses are not meant to represent a true genome-wide study nor a thorough assessment of the factors influencing gene expression, as the association sample size (42 unrelated individuals) is too small for such consideration and the HapMap data are not yet complete for a full assessment. Rather, we are interested in simply gaining an initial view of any trends that might emerge in a study of WGA.
The results of the pseudo-WGA studies revealed a number of encouraging findings. For most of the expression phenotypes that showed evidence for family-based linkage, evidence for strong association (considered as p
due to conservative Bonferroni correction) was observed in at least one location in the genome. However, this was not always the case, even with a very high density of markers. In some cases, despite strong evidence for linkage and association testing of more than 500
000 non-redundant SNPs, no evidence for allelic association was apparent. In this case, further genotyping, resequencing or a non-association based strategy (e.g. for rare alleles) would be required to identify the variants contributing to the linkage profiles, as the WGA approach would fail with the available data.