There are three main statistical approaches to gene discovery, linkage, association, and admixture mapping. Linkage analysis tests to determine whether a variant co-segregates with disease in families; association analysis tests to determine whether a genetic variant occurs more often in individuals with disease than without disease; and admixture mapping tests to determine whether there particular regions of the genome at which inheriting DNA from ancestors from a certain region of the world predisposes one to particular diseases. Linkage studies can be performed only in family-based studies, while association testing and admixture mapping can be performed in both population- or family-based studies. These approaches may appear to ask the same questions, but statistically these are independent tests, and the strategy affects the hypotheses that can be tested.
Linkage analysis is based on the assumption that the genetic marker and the disease variant are in close proximity and transmitted intact across generations30
. Thus, markers in close proximity to the disease-causing gene segregate with disease in families. However, the resolution of linkage is poor with candidate regions encompassing hundreds of genes. Thus, linkage analysis only identifies regions not genes or variants. Further, as linkage is statistical evidence, replication is the gold standard to minimize the risk of false positives.
An alternative approach is an association study, which can utilize population or family based designs. It is important to recognize that association does not equal causation. Association studies simply measures statistical dependence between two or more variables. Significant associations can be due to one of several misleading factors including LD, population stratification, or random chance. Once significance is achieved, replication is required to ensure the validity31
Admixture occurs when two or more genetically diverse populations merge to form a new population32
. Localizing disease genes using an admixed population is called admixture mapping. In human admixture studies, researchers combine information about known population history with information from individuals’ measured genotypes using known ancestry informative markers (AIMs). Studies consistently show that allergic disorders such as asthma are more common in people of West African ancestry compared with people of European ancestry33
. The African-American population is an admixed population for which about 20% of the genetic material traces to European ancestry34
. The association between increased asthma risk and African ancestry and the admixed nature of the African-American population34
suggests that admixture mapping35
might be an important asthma gene-finding strategy to study genetically heterogeneous populations.
With current technology, it is not cost prohibitive to perform genome-wide linkage and association studies. An advantage of the genome-wide approach is that it requires no a priori
evidence and, thus, has the ability to identify regions and variants in genes previously not implicated in allergic disorders and provide insights into the biologic underpinnings for these disorders. Researchers using genome wide approaches must adjust the level of significance to ensure that findings did not occur by chance; with the increased numbers of statistical tests, the likelihood of obtaining a p-value of 0.05 increases. For the current GWAS SNP chips (density 1M SNPs), significance thresholds of 10−8
to control for multiple comparisons. Given this level of significance, the number of samples required to obtain adequate power in a genome wide association study (GWAS) is in the thousands for a gene with modest effect. By limiting the analysis to those gene regions, which have promising a priori evidence of being involved with asthma, the severity of the correction for multiple testing becomes much less severe. A candidate gene study examining 1000 SNPs will require only 60.5% of the sample size required by a GWAS study examining 1 million SNPs to obtain the same statistical power of 80%. This reduced sample requirement may permit better phenotyping and reduced heterogeneity, which will also improve the power. Thus, there are benefits to both GWAS and candidate gene approaches.
Because asthma is a prevalent disorder, the classic population based sampling strategy is case-control
. In this approach, the researcher collects individuals with disease (cases) and unrelated individuals without disease (controls). This method is very efficient; compared to a random sampling design, only 35% of the total sample would be required for equivalent power (assuming an asthma frequency of 10%). While this approach appears simple, the challenge is ensuring that the controls come from the same ancestrally homogeneous population as the cases. When cases and controls are not drawn from the same ancestral population, population stratification can result in spurious associations36
. For example, suppose most people of African ancestry in a sample had brown eyes and also happened to have asthma, while most people of European ancestry were blue-eyed and asthma-free. A naïve analysis might conclude that the brown-eyes SNP is responsible for asthma, even if eye color and disease are completely unrelated. That is, the methods are likely to nab the wrong SNP suspects, due to “guilty by association”. This problem becomes more pronounced in studies surveying the entire genome because of the huge number of ancestry-related SNPs being tested. To address this genetic-mixing problem, researchers can test whether cases and controls differ over a large number of variants not expected to be associated with disease. If differences exist, adjustments can be made to minimize this effect37
. Currently, three fundamentally different methods are used to correct for confounding in allergy genetic association studies37–39
. These methods are (1) genomic control, (2) structured association, and (3) principal component analysis. Genomic control uses a set of non-candidate, unlinked loci to estimate an inflation factor, l
, which was caused by the population structure present and then corrects the standard Chi-square test statistic for this inflation factor. The structured association method utilizes Bayesian techniques to assign individuals to “clusters” or subpopulation classes using information from a set of non-candidate, unlinked loci and then tests for an association within each “cluster” or subpopulation class. To control for population confounding by variations in background ancestry during structural association testing (SAT), ancestry informative markers (AIMs) panel can be used35
. Therefore, AIMs can be also termed structure informative markers (SIMs). These markers exhibit differences in frequencies between population groups. Importantly, care should be taken in selecting which AIMs to use as some sets may be population specific40
. Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. It can be used to identify and adjust for population substructure37
. Family based association tests protection against stratification, a decided advantage of family based designs41