Most current genetic association studies, including genome-wide association studies (GWAS), look for the single-nucleotide polymorphisms (SNPs) with relatively high minor allele frequencies (MAFs) (say, MAF >5%) [1
] in the search for genetic loci underlying susceptibility to complex diseases. The strategy of focusing on common SNPs in genetic association studies is very effective under the common-disease-common-variant (CDCV) scenario, that is, when common diseases are caused by common variants with relatively small to moderate effects. GWAS based on a quarter of a million to one million common SNPs have been very successful in identifying disease-susceptibility regions through indirect linkage disequilibrium (LD) mapping [5
]. Under the CDCV paradigm, the set of common SNPs (tagSNPs) provided by the existing high-throughout genotyping platforms can cover the genome well enough so that they can capture the relationship between the ‘common’ causal variants at unmeasured loci and the disease through their high LD with the functional loci.
Although the CDCV has been the dominant dogma guiding the conduct of association studies for the past decade, growing evidence from recent empirical and simulation studies [6
] suggests that the causal variants for common diseases have a wide spectrum of MAFs, ranging from rare to common. For example, Gorlov et al. [14
] found that functional SNPs tended to have low MAFs. A recent study by Need et al. [16
] suggested that common genetic variants do not appear to have a major impact on predisposition to schizophrenia and that rare copy number variants (CNVs) may be more important in susceptibility to schizophrenia than common polymorphisms. Thus, in addition to the CDCV scenario, the common-disease-rare-variant (CDRV) hypothesis, which asserts that there are multiple rare variants underlying the susceptibility to a common disease, is a very plausible scenario for many complex diseases. Furthermore, some researchers believe that both CDCV and CDRV hypotheses could be true even within the same susceptibility gene for a complex disease [5
Under the CDRV scenario, the population-based association studies that adopt the strategy of using common tagSNPs would be underpowered, as those common SNPs tend to have a low correlation with the unmeasured disease-causing (rare) variants, and thus are not very informative when used in indirect LD mapping [5
]. Given the fact that the majority of SNPs in the human genome are rare [14
] (MAF <5%) and that the CDRV scenario appears to be the norm instead of a rarity for complex disease, it would be beneficial to study rare SNPs in large-scale population-based association studies to enhance the chance of disease-gene detection.
There are a vast number of analytic approaches for studying the association between the disease and a genetic variant or set of variants. Most of them are designed for the analysis of common variants, relying on asymptotic distributions for their statistical significance evaluation. Their accuracy on rare variants could be suspected. Li and Leal [5
] have recently proposed a method targeting the analysis of multiple rare variants within a candidate region. Their approach, called the collapsing method, tries to enrich the association signals and to reduce the degrees of freedom by collapsing genotypes at multiple rare SNPs into a univariate test.
In anticipating the agnostic screening for rare SNPs or CNVs in future studies, we focus on the single-marker analysis of rare variants. Fisher's exact test is the standard approach when the sample size is limited, although it is well known that it is conservative [17
] and thus has its power diminished to some extent. The aim of this work is to develop more powerful single-marker tests for the analysis of rare variants (SNPs or CNVs) with MAFs below 5%.