|Home | About | Journals | Submit | Contact Us | Français|
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these ‘parental’ populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.
Theoretical developments, computer simulations, and empirical evidence from population studies continue to indicate that population stratification due to genetic admixture, as well as other departures from random mating, can confound genetic association studies and produce false positive results [1,2,3,4]. Population admixture, however, can also ‘mask’ true genotype-phenotype associations and produce false negative results. In either case, departures from non-random matings can result in biased estimates and faulty conclusions. This form of population heterogeneity is often regarded as an impediment to genetic association studies given its potential to confound statistical analyses and induce spurious genotype-phenotype associations.
Experimentally controlling mating type in plant and animal studies is the most extreme way to control for this confounding effect and is accomplished with the use of recombinant inbred strains. However, this is not necessarily feasible for all plant and animal studies and is impossible in human genetic research. Concerns about the effects of population stratification led to the recommendation of using familial data and to the development of the seminal paper on the transmission-disequilibrium test (TDT) , based on a related idea proposed by Rubinstein et al.  and later by Falk and Rubinstein . The TDT is a family-based association test designed for testing linkage disequilibrium by comparing the proportion of alleles transmitted versus the proportion not transmitted from informative parental matings (i.e., matings with at least one heterozygous parent) to affected offspring. By focusing on affected offspring (i.e., case-only), the TDT assesses whether the distribution of alleles among affected children conditional on parental genotypes differs from what is expected under the null hypothesis of no linkage and/or no association.
Although effective at eliminating false positives due to stratification and genetic admixture, TDT type designs may result in substantially lower power relative to other types of association studies since they utilize only those individuals who are informative for allelic transmission and exclude all others. Population-based association studies (e.g. the case-control study design) usually have greater power than family-based and case-only designs as long as correction for population stratification is properly modeled. Recently, significant advances have been made in statistical methodology to control for the potential confounding effects of population admixture via use of measures of ‘individual admixture’ and related techniques [8,9,10,11,12,13,14,15,16]. We refer to such methods as structured association testing (SAT). Of equal interest are exciting new developments in the use of individual admixture estimates for what we call regional admixture mapping (RAM) [16,17,18,19,20,21]. In principle, these methods allow researchers to localize genomic regions containing trait-influencing genes in samples of unrelated individuals.
With novel procedures being proposed at such a rapid pace, it is difficult for investigators to keep abreast of the latest methods and their utility. Thus, here we review many of the statistical procedures which aim to create valid test statistics for linkage and disequilibrium mapping studies that control for confounding due to population stratification.
In the late 1980s and early 1990s, several approaches were proposed to identify disease genes that combined the advantages of linkage and population association approaches [6, 7,22,23,24]. These methods typically compared alleles transmitted from parents to affected offspring against alleles that were not transmitted, considering the parental alleles that were not transmitted as ‘pseudo controls’. For example, Rubinstein et al.  and later Falk and Rubinstein  proposed a method for calculating the odds ratio of transmitted vs. non-transmitted alleles to offspring from parents. They termed this ‘Haplotype Relative Risk’ (HRR) because they were investigating HLA haplotypes, and an odds ratio is similar to relative risk if the disease prevalence is low. It is an unmatched case-control design comparing frequencies of transmitted alleles vs. non-transmitted alleles from parents. A similar method was proposed by Terwilliger and Ott . Ott , who studied the properties of HRR and theoretically derived the expected frequencies of transmitted and non-transmitted alleles assuming a recessive disease. Although the test proposed by Falk and Rubinstein  was not a valid test for linkage , Spielman et al.  proposed a valid test for linkage in the presence of association1 based on the idea of Falk and Rubinstein . The transmission disequilibrium test or TDT is a McNemar  test for a matched case-control design that compares transmitted alleles from heterozygous parents to an affected offspring with the expected non-transmitted alleles, assuming there is no transmission distortion. Here, transmitted and non-transmitted alleles from heterozygous parents are considered both as cases and controls, creating a matched case-control design. Tiwari et al.  noted that the informative families used in TDT designs can be viewed as a mixture of experimental backcrosses (one heterozygous parent) and F2 intercrosses (two heterozygous parents) as an analogy to experimental crosses.
The original TDT design requires the collection of family trios that include two parents and an affected offspring and is limited to di-allelic marker loci, and dichotomous traits. Although the TDT method is a valid test for linkage, it only has power in the presence of population association and is robust against population admixture . There are more than two hundred publications describing extensions and variations of the original TDT. Figure Figure11 shows the distribution of 223 published extensions and variations of the TDT from 1993 to 2007. In supplemental table 1 (www.karger.com/doi/10.1159/000119107), we summarize some (but not all) of the extensions or variations of the TDT type procedures.
The extensions to the TDT fall mainly in four categories: (1) relaxing the requirement of only two alleles at the marker locus; (2) relaxing the requirement of the trait to be dichotomous; (3) relaxing the requirement of a parent/offspring trio design, and (4) extension to using genotype information from the X chromosome (X-linked TDT). Other extensions to the TDT include multiple loci, Bayesian TDT, multiple phenotypes, parent of origin/imprinting effects, inbreeding, TDT for haplotypes, censored data, simultaneous and separately modeling of the linkage and association parameters, and other variations to increase power; we choose to focus this review mostly on the four main categories listed above with some discussion of the other extensions.
Several extensions to the TDT have been proposed to allow for multiple alleles at the marker locus. Bickeboller and Clerget-Darpoux  extended the TDT for multi-allelic markers by comparing the genotypes formed by the two transmitted alleles (genotype of index) and the genotypes formed by the two nontransmitted alleles (internal control genotype) similar to Terwilliger and Ott , thus using the information on both parents simultaneously. This test of transmission patterns of genotypes (Tg) was based on the homogeneity test for contingency table of genotype frequencies. Bickeboller and Clerget-Darpoux  also proposed an allelic test (Tc) based on testing the complete symmetry of the contingency table of allele frequencies. In addition, Rice et al.  proposed an extension of the TDT that allows analysis with multi-allelic markers, and at about the same time Sham and Curtis  introduced an extended TDT (ETDT) based on a logistic regression procedure. The advantage of the ETDT is that it can be easily programmed in any standard statistical software. Other adaptations have followed: Morris et al.  used a likelihood ratio test. Spielman and Ewens  proposed an alternative test of marginal homogeneity (Tmhet) that is similar to Biekeboller and Clerget-Darpoux , allowing for multi-allelic markers. Kaplan et al.  used a Monte Carlo approach, called the MC-Tm statistic, and showed that MC-Tm is more powerful than Tmhet and ETDT. Cleves et al.  proposed an exact test which is implemented using an exact algorithm and Monte Carlo-Markov chain (MCMC) simulation. Finally, Schaid  proposed using each allele separately and then using the maximal TDT as the test statistic to infer linkage. He also proposed a class of model-based approaches using conditional likelihood analyzing all alleles simultaneously under specific genetic models . The maximal TDT statistic, however, does not follow a chi-square distribution. Bentensky and Rabinowitz  provided a refinement to Bonferroni's correction for multiple testing based on maximal spanning trees to calculate accurate upper bounds for type 1 error and p values for the maximal TDT.
Extensions of the original TDT test of a dichotomous trait to quantitative traits are mainly based on regression framework where covariates can be easily modeled. Allison  proposed five TDTs for quantitative traits sequentially called TDTQ1 to TDTQ5. The first four versions of these TDTQs were based on extreme-threshold sampling, and TDTQ5 uses the full distribution of a quantitative trait. TDTQ5 is the most flexible in the sense that it can be easily extended to multiple alleles, multiple loci, gene-environment interaction, etc., and it is also most powerful of the five. TDTQ5 requires family trios consisting of at least one heterozygous parent and one child. In TDTQ5, the quantitative trait is regressed on offspring genotypes while controlling for parental mating types defined by their genotypes. The test statistic for TDTQ5 is an F ratio that compares the fit of two models with or without the genetic effect in a regression framework that includes the offspring's genotype and parental mating type. Xiong et al.  developed a similar approach that allows for more than one child per family.
A non-parametric TDT for quantitative traits was introduced independently by Rabinowitz . The advantage of this test lies in its flexibility in modeling multiple alleles at the marker locus, inclusion of other siblings, and incorporation of covariates. Sun et al.  extended Rabinowitz's  approach to include families with only one parent available. All these tests assume that model residuals are independent, and therefore they are applicable, as a test for linkage, only for nuclear family data.
George et al.  proposed a regression-based TDT for linkage between a marker locus and a quantitative trait locus, treating the trait as the dependent variable and transmission status along with other predictors and confounders, as independent variables. This method does not require independence of observations, thus allowing for analysis of extended pedigree data as well, and modeling any number of covariates. Zhu and Elston  proposed conditional likelihood-ratio test statistics that allow multi-generational data as well as a test either for linkage in the presence of allelic association or for allelic association in the presence of linkage. Abecasis et al. [43, 44] proposed a general test of association for quantitative traits in nuclear families (QTDT) based on Fulker et al.'s  variance components approach. Monks and Kaplan  introduced three extensions to the TDT for quantitative traits: (1) TQP statistic uses genotype information for parents and their children; (2) TQS uses genotypes for at least two siblings having different genotypes in the absence of parental genotypes, and (3) TQPS which was a combination of TQP and TQS. Note that the TQP statistic is similar to the statistic proposed by Rabinowitz . Waldman et al.  proposed a logistic regression framework instead of the ordinary linear regression for continuous and categorical data. This framework can be easily extended to include multiple phenotypes by simply including phenotypes as predictors in the regression model, and it can easily accommodate multiple offspring per nuclear family. No phenotype distributional assumptions are required with this approach. Lastly, it does not require stand alone software and any standard statistical software such as SAS or SPSS can be used for the analysis.
Liu et al.  offered a unified framework for TDT analysis for discrete and continuous traits based on a conditional score test that maximizes power to detect small effects for any distribution in the exponential family, regardless of skewness or kurtosis. Kistner and Weinberg  proposed quantitative trait extension of their log-linear approach for qualitative traits . Like the log-linear approach for quantitative traits their quantitative trait extension allows for population admixture by conditioning on parental genotypes.
Parental genotype data are often difficult or impossible to obtain when studying diseases with adulthood or late in life onset. Several approaches have been developed to alleviate the problems that arise from missing and incomplete parental genotypic data.
When unaffected siblings are available for the study, their genotype information can be used in tests for allelic transmission. Curtis  proposed an extension to the TDT utilizing only discordant sibling pairs for both phenotype and genotype. S-TDT, a similar approach developed by Spielman and Ewens , requires (1) that at least one affected and one unaffected sibling, and (2) that all members of the sibship do not have the same genotype at the marker locus. With these requirements met, the S-TDT can be used to analyze linkage disequilibrium between a marker allele and a putative disease allele without reconstructing parental genotypes and without relying on allele frequency estimates. Statistically, the S-TDT tests for significant marker allele frequency differences in affected offspring compared to their unaffected siblings . Generally, the S-TDT is less powerful than the TDT when parental genotypes are available because data on the preferential transmission of parental alleles is more informative. In fact, the S-TDT can be used jointly with the TDT to construct a combined test (C-TDT) using nuclear families, trios, and discordant siblings. Schaid and Rowland  showed that the S-TDT is equivalent to the conditional likelihood with the log-additive effects of the marker alleles.
The sibling TDT method by Curtis  requires randomly selecting one affected sibling and then selecting one unaffected sibling whose marker genotype is different from that of affected sibling. To include all available siblings from the same family, Horvath and Laird  proposed a sibling disequilibrium test (SDT) based on a standard nonparametric sign test. The SDT is effective in cases where parental information is not available. The data design requirement is the same as S-TDT, with the only difference being that the SDT is a non-parametric test. In 1998, Boehnke and Langefeld  introduced seven association tests for multi-allelic markers which they represent using a 2 ×k contingency table (k is the number of alleles at the marker locus). The rows represent the disease status and columns represent marker alleles. In some cases these discordant-alleles tests (DATs), (AC1, AC2, and ACws) are identical to each other as well as equivalent to S-TDT but the AC2 statistics have the best power overall. Boehnke and Langefeld  proposed to get p values for these DATs by a permutation procedure involving randomly permuting affection status of the siblings. Risch and Teng [56, 57] noted that one can derive additional information from the sample by analyzing the relative frequency of different sibship genotype configurations. This information can then be used to estimate the proportion of mating type frequencies for a di-allelic marker. Weinberg  proposed a likelihood approach for families with incomplete parental data. Schaid and Rowland  proposed a score test statistic using parents as controls, siblings as controls, or unrelated individuals as controls. Note that their method generalizes the S-TDT and the DAT. In 2000, Siegmund et al.  introduced a test of association in the presence of linkage using multivariate regression for correlated outcome data to analyze sibship data.
Bias can arise in the TDT statistic when information is only available from one heterozygous parent, leading to higher false positive rates . Sun et al.  introduced 1-TDT to detect linkage between candidate locus and a disease locus using genotypes of affected individuals and only one available parent of the affected individual. The 1-TDT is a valid test of the null hypothesis of no linkage or association. In 2000, Wang and Sun derived the sample size needed to detect linkage disequilibrium for S-TDT and 1-TDT, finding that the required sample size is roughly the same as for the S-TDT with one affected and one unaffected sibling, and is about twice the sample size needed for the original TDT . Clayton , Weinberg , and Cervino and Hill , also provided extensions to TDT when one parent is missing. Allen et al.  extended parental controlled association tests for a di-allelic marker and disease that are valid when parental genotype data are informatively missing (i.e. when the missing genotype of parent influences the probability of the parent's genotype data being observed). Also, Allen et al.  proposed a multi-allelic extension of their missingness model  which also incorporated a bootstrap calibration of missing at random (MAR) procedures to account for informative missingness.
For some families it might be possible to reconstruct the genotypes of missing parents. However, Curtis , Spielman and Ewens  and Knapp  pointed out that reconstructing genotypes to achieve more power for the TDT procedure can introduce bias. Knapp  proposed a statistical procedure to overcome the potential bias induced by the parental genotype reconstruction. Knapp  incorporated a reconstruction approach that corrects for bias into C-TDT and called the resulting procedure the reconstruction combined TDT (RC-TDT). Comparisons showed that RC-TDT is more powerful than the S-TDT.
Because no inference on linkage disequilibrium can be obtained from homozygous parents or other cases of non-informative transmissions, these types of nuclear families are not included in the classical TDT analysis. This problem is often encountered when using binary markers, such as single-nucleotide polymorphisms (SNPs), which are highly abundant throughout the genome and cost effective. The maximum frequency of heterozygotes at a binary marker locus in Hardy-Weinberg equilibrium is 0.5. In this scenario, at least half of the parents would be non-informative in a traditional TDT. Analyzing marker haplotypes is a relatively straightforward solution. However, the haplotype phase is often uncertain, and restricting analyses to pedigrees where the phase is known may lead to bias. As a result, Clayton  proposed a new approach to TDT methods using tests based upon score vectors which are averaged over all possible parental haplotypes and transmissions consistent with the observed data (TRANSMIT 2.5.4 documentation: www-gene.cimr.cam.ac.uk/clayton/software). At its implementation, this approach possessed three distinct advantages over earlier TDT methods: (1) it could use any available parental data; (2) it could use multiple affected offspring in the analysis, and (3) it was the only approach that could adequately deal with phase uncertainty in multilocus haplotypes . The TRANSMIT program also implements Allen et al.  bootstrap calibration of missing at random procedures to account for informative missingness.
The premise behind sibling-based quantitative traits in a regression framework is that any test of association between a genetic marker and a phenotype is also a valid test of linkage if one conditions on parental genotypes since full siblings are nested within parental genotypes. Allison et al.  proposed two sibling-based tests of linkage and association for quantitative traits. One is a mixed model, in which the genotype is modeled as a fixed effect and the sibship as a random effect. This test is extremely flexible and can be implemented in standard statistical software. It allowed for multiple alleles at the marker locus, sibships of any size, multiple loci, gene-gene interaction, gene-environment interaction and additional covariates of any number. The second procedure of Allison et al. is a permutation test. Schaid and Rowland  proposed another TDT for quantitative traits that allows for missing parental data. Van den Oord , Whitmore and Tu , Rabinowitz and Laird , and Horvath et al.  have all offered methods incorporating missing data.
Traditional TDT-type tests in trios or discordant sibships assume that observations are independent, an assumption violated when trios or discordant sibships from the same extended family are used. Thus, when larger pedigrees are investigated using these methods, only one unit from the pedigree is analyzed and the rest of the information from the pedigree is discarded. As a result, the pedigree disequilibrium test (PDT) was developed . Using this method, the average disequilibrium for each general pedigree is treated as an individual observation. Martin et al.  proposed two alternatives to the PDT which correct for bias when multiple generations are contributing to the disequilibrium, the genetic effect due to the locus is strong, and marker-allele frequencies are uneven. The PDT-avg averages all phenotypically informative units regardless of heterozygosity in trios or informative discordant sibships. Meanwhile the PDT-sum method removes the within-family LD average from the original PDT. The PDT-avg gives equal weight to all families, whereas larger families are more heavily weighted in the PDT-sum. The geno-PDT was later developed to test genotype-specific association in general pedigrees [76, 77].
Testing non-random transmission of an allele from parent to affected offspring follows similar statistical methodology regardless of locus, pedigree, or trait characteristic. Developing the FBAT statistic, Laird et al.  took advantage of this trend which generalized some of the more specific TDT-type tests including the original TDT, the S-TDT, and the RC-TDT. At the time of implementation, the FBAT statistic could be manipulated by a set of user-defined codings to analyze data from di-allelic or multi-allelic loci and dichotomous, quantitative, or censored traits . At the present time, the FBAT software  is able to accommodate a wide variety of pedigree structures, genetic models, and trait characteristics as well as perform haplotype analysis and test multiple markers simultaneously.
Schaid and Sommer [22, 79] developed a likelihood procedure for trios by modeling the probability of an affected offspring's genotype conditional on parental genotypes as a function of the genotype relative risks of the offspring. In 2000, Whittemore and Tu  developed a class of likelihood-based score tests for arbitrary family structure and incomplete data extending the work of Schaid and colleagues [22, 35, 79, 80]. The score statistic comprises of two components, namely, a non-founder statistic (NFS) and a founder statistic (FS). The non-founder statistic evaluates transmission disequilibrium from parents to offspring and is based on the conditional distribution of the offspring genotypes given the observed or inferred genotypes of their parents. The non-founder statistic is a direct extension of the transmission disequilibrium test (TDT). The founder statistic compares marker genotypes in the family founders with those expected under the null hypothesis. In companion paper  they examined these two statistics using nuclear family data. Shih and Whittemore  and further extended previous work of Whittemore and others [63, 71, 79, 81] to accommodate affected and unaffected offspring, missing parental genotypes, and to include other phenotypes such as censored survival data and quantitative traits. These algorithms are implemented in software named Family Genotype Analysis Program (FGAP). Whittemore and Halpern  compared FGAP with FBAT  and another alternative association test proposed by Rabinowitz . They observed that FBAT procedures tended to have less power than the other two tests, particularly when applied to families in whom all offspring were affected. The Rabinowitz test and the tests implemented in FGAP performed equally well with respect to overall statistical power.
Methodology is still being developed to improve the power and robustness of TDT approaches to the various forms of ascertainment, genotype, and phenotype characteristics. The informative-transmission disequilibrium test (i-TDT) improves on the design of extended pedigree analysis first addresses by Martin et al. . The i-TDT is a valid joint test of linkage and association that is more powerful than its alternative approach in FBAT because it also incorporates transmission information for heterozygous parents to unaffected offspring . A recent study has expanded the robustness of QTDT methods that rely upon a normality assumption  by developing methodology to adequately analyze linkage disequilibrium when traits are not normally distributed . These discoveries show that TDT methods can still be extended further.
Horvath et al.  modified S-TDT  for X-linked diseases (XS-TDT). In addition, they extended RC-TDT  to the X-linked reconstruction-combination TDT (XRC-TDT). These tests make no assumption about the mode of disease inheritance or the ascertainment of the sample, and they protect against spurious association due to population stratification similar to S-TDT and RC-TDT. The X-linked RC-TDT employs parental-genotype reconstruction by combining data from families in which parental genotypes are available with data from families in which genotypes of unaffected siblings are available but parental marker information is incomplete, and corrects for the biases resulting from the reconstruction. It does not depend on population allele frequencies, and it outperforms X-linked S-TDT with respect to power. Also, a freely available SAS implementation of these tests allows for the calculation of exact p values. Ho and Bailey-Wilson  independently extended the TDT, S-TDT, and the C-TDT  for X-linked loci, terming them X-linked TDT, X-linked S-TDT, and X-linked C-TDT.
Recently, several approaches to association/linkage mapping have been proposed that utilize the data from multiple loci simultaneously. All these methods are based on the assumption that cases are expected to share not only the disease allele, but also haplotype flanking markers containing the disease allele. This led van der Muelen and te Meerman  to propose a haplotype sharing statistic comparing the extent of similarity between transmitted and un-transmitted haplotypes. Wilson  extended the TDT to include information on two linked multi-allelic markers instead of one marker only following the likelihood ratio test proposed by Sham and Curtis's  ETDT. She also described how the contribution from each locus could be evaluated, both separately and jointly. Collins and Morton  proposed a likelihood-based procedure for haplotype sharing in the case-control study design setting. Clayton and Jones  offered a haplotype TDT for both qualitative and quantitative traits. However, Wilson  and Clayton and Jones  assume that the haplotypes of the parents are known. Thus, their methods are not applicable to haplotype phase-unknown data. McPeek and Strahs  introduced the decay of haplotype-sharing procedure by modeling the decay of sharing of the ancestral haplotype by descendents, where the number of haplotypes with common ancestral DNA decreases with increasing genetic distance from the variant. Zhao et al.  proposed variations of the TDT using multiple tightly linked markers based on phase-known or phase-unknown haplotypes of parental data. In 2000, MacLean  proposed the trimmed-haplotype test for linkage disequilibrium applied to both parent-offspring trios and multiplex pedigrees. There are multiple other extensions that also allow for multiple linked markers [98,99,100,101,102,103,104,105,106,107,108,109,110]. Furthermore multiple marker loci can be accommodated when considering quantitative traits in a regression framework. Using this methodology, epistatic effects can be investigated and haplotypes of linked loci can be treated as multi-allelic markers.
However, there are two major issues that must be considered more carefully about these haplotype sharing procedures. Firstly, in general haplotype sharing procedures often assume the haplotype phase is known or inferred accurately. However, the misspecification of the distribution of parental haplotypes can lead to substantial bias in parameter estimates even when complete genotype information is available. To resolve this problem, Allen et al.  proposed a geometric approach to estimation in the presence of nuisance parameters and derived locally efficient tests and estimators of haplotype effects that are robust to misspecification of the haplotype frequency distribution. Allen and Satten  generalized a previous result of Allen et al. , allowing for missing genotype data and haplotype × environment interactions. Secondly, Allen and Satten  pointed out that variance estimation of haplotype sharing statistic is either very complex [100, 110] or requires the use of permutation testing [91, 101, 102,104,105,106, 108]. Permutation testing can be computationally prohibitive in case of genome-wide association studies. Also, permutation variances may be invalid if the model used for the reconstruction of haplotypes is invalid (i.e. Hardy-Weinberg Equilibrium is not met in the data). Therefore, Allen and Satten  proposed a simple framework for a class of haplotype sharing statistics for association testing in case-parent trio data by providing a simple variance estimator for haplotype sharing statistics.
Genomic imprinting, also known as ‘parent-of-origin effect’ is an epigenetic product. A natural way to identify parent-of-origin effects is to stratify transmission/non-transmission allele counts of parental origin and test for their symmetry. To date, there are more than 1700 mutations with parent-of-origin effects catalogued in the University of Otago database  (http://igc.otago.ac.nz/home.html). Wilcox et al.  proposed a simple method to analyze case-parent trios in effort to detect maternal genetic risk and estimate relative risks associated with both the mother's and the offspring's genotype. Weinberg et al.  further extended the TDT to detect parent-of-origin effects based upon a log-linear likelihood approach. In 1999, Weinberg provided a new test of imprinting which resolved deficiencies in her previous test . Recently, Zhou et al.  and Hu et al. [118, 119] proposed several methods to detect imprinting in a TDT-type framework. Van den Oord  used mixture models to perform a test of parent-of-origin effects for quantitative traits. Furthermore, imprinting can be tested in quantitative traits in a regression framework by coding an additional dummy variable to indicate which parent is heterozygous in a heterozygote-homozygote mating. A significant interaction between this variable and offspring's genotype would indicate imprinting.
The genetic architecture of complex traits may involve multiple genetic or environmental factors and interactions between them. Multiple alleles at a marker locus associated with disease susceptibility may differ in their sensitivity to certain environmental exposures. Most methods developed for G ×E interaction using trios implicitly assume that an individual's environmental exposure status is independent of their genotype at the candidate locus of their parent's genotype. In 1999, Schaid  proposed likelihood-based methods to assess interaction. A similar test was proposed by Umbach and Weinberg  based upon likelihood ratio tests. Alternatively, Eaves and Sullivan  used a logistic regression approach, extending the original tests of main effects proposed by Sham and Curtis . This method provided separate tests of the main effects and the interaction effects. Lunetta et al.  proposed family-based tests for association and linkage by constructing a score statistic based upon the likelihood of the phenotypic distribution, given individual genotype. Their method is available in the FBAT software.
Several authors have pointed out that effective use of multivariate phenotypes can potentially enhance the power of linkage analysis [124,125,126,127,128,129,130,131,132,133,134]. Analyzing each phenotype separately requires correction for multiple testing. Multiple phenotypes can be treated as predictor variables in a TDT-based logistic regression framework by treating transmission status as the dependent variable, using multivariate analysis of variance (MANOVA) .
In a number of human populations, inbreeding is common and even encouraged. Bennett and Curnow  and Génin et al.  first investigated the consequences and benefits of using related parents on the TDT. The TDT remains a valid test of linkage in the presence of inbreeding but is not a valid test of association. However, when inbreeding is taken into account and no recombination exists between the disease susceptibility locus and the marker locus (θ = 0), power to detect linkage is gained under certain genetic models: (1) recessive mode of inheritance and the frequency of the disease allele <0.5, (2) multiplicative or additive models, and (3) dominant mode of inheritance. Meanwhile, power increases with inbreeding coefficient but is considerably reduced when linkage disequilibrium between the marker and disease susceptibility locus is decreased .
Information may be lost when known prior trait information (i.e. mode of inheritance, penetrance, etc.) is not incorporated into TDT-type analysis methods. In these circumstances Bayesian methods are an excellent alternative to more common frequentist approaches. For example, when mode of inheritance is known, incorporating this information results in an increase in power . Additionally, joint and marginal posterior distributions of the recombination fraction and disequilibrium coefficient may be attained. However, the Bayes factor, or a measure of disagreement between two competing models, is not designed for error control; it is a measure of difference between prior beliefs only.
The joint test approach was applied to TDT designs in order to capitalize on the information available in both covariance-based and marginal-based tests of linkage. The results showed that a multinomial joint test provides the highest overall power irrespective of allele frequency or mode of inheritance .
In 2004, Nagelkerke et al.  proposed a likelihood-based association analysis of the data comprising of trio data, with unrelated controls, and possibly some unrelated cases. Nagelkerke et al.  provided ad hoc procedures to determine whether trios and unrelated data can be safely combined. Epstein et al.  modified the Nagelkerke et al. approach and provided formal statistical procedures to determine when it is appropriate to combine trios, unrelated controls, and unrelated cases together in a combined association analysis.
Successfully identifying genes by linkage and association analyses using family-based designs can be difficult because the sample size required to achieve adequate power is often not attainable . Hence, many researchers have turned to population-based association studies as a powerful tool for identifying these variants that underlie complex disease risk .
Genetic association studies aim to correlate differences in disease frequency with differences in allele frequencies at a particular genetic locus, where a specific allelic variant is either a direct disease-causing variant or is in linkage disequilibrium (LD) with the disease-causing variant. The most commonly used study design in population-based association studies is the case-control design. As with any study design, the case-control design assumes that the differences in allele frequencies between cases and controls relates directly to the trait of interest; in other words, there are no confounding effects .
Allele frequencies, however, are known to vary widely within and between populations, and these differences are widespread throughout the genome [143, 144]. When cases and controls have different allele frequencies attributable to variation in genetic ancestry within or between race/ethnicity groups, population stratification (PS) is said to be present, and ancestry becomes a confounding variable leading to spurious associations in the analysis. Redden and Allison  have shown that, contrary to popular conceptions, admixture-like patterns and spurious associations can occur in the presence of non-random mating patterns that would traditionally not be considered admixture, and they have evaluated the extent to which genomic control  and structured association testing [12, 13] can manage this potential confounding. PS is not only present in recently admixed populations like African Americans and Latinos [147,148,149], but also in European-American populations [150,151,152,153] and historically isolated populations including Tibeto-Burmans and Icelanders [154, 155].
As previously discussed, a consequence of PS in association studies is the potential for bias in the estimate of allelic associations due to deviations from the Hardy-Weinberg equilibrium and the induction of linkage disequilibrium [156, 157]. In order for bias due to PS to exist, both the frequency of the marker variant of interest and the background disease prevalence must vary significantly by race/ethnicity [158, 159]. If either of these conditions is not fulfilled, bias due to PS cannot occur. Bias due to PS can induce both false positive [1, 3, 4, 8] and false negative associations . Controlling for self-reported race has generally been thought to suffice , but recent data shows that matching on ancestry is more robust; however, in many populations, whether recently admixed or not, individuals are not aware of their precise ancestry [4, 162].
No true consensus has been reached on how to test and/or adjust for population stratification [158, 159, 163], although many methods have been developed [11,12,13, 146, 164, 165]. Here, we provide short descriptions of association methods designed to identify genetic variants predisposing to the trait while simultaneously controlling for stratification in population-based data. These methods can be grouped into three categories: Genomic Control (GC), Structured Association Testing (SAT), and Regional Admixture Mapping (RAM).
One of the early methods developed to control for PS or admixture induced confounding was genomic control. It is based on the idea that the false positive rate (Type I error) increases in the presence of PS. The GC technique uses a set of non-candidate, random markers (sometimes called null markers) to estimate an inflation factor, λ; λ is equal to 1 if there is no population stratification present. Estimates for λ have been suggested for additive genetic models [146, 166] and for dominant/recessive models . This inflation is assumed to be caused by population stratification and the GC method corrects the standard χ2 association test by this factor λ, where the new χ2/λ test statistic still has χ2 distribution. Therefore, GC performs uniform adjustment to all association tests assuming the same inflation factor. One of the main assumptions of this method is that if the study population comes from larger population made up of a mixture of subpopulations with different disease prevalences and disease allele frequencies, then the χ2 association test statistic follows a non-central χ2 distribution . If the non-central parameter is truly small, then adjusting by the estimated inflation factor λ is a good approximation to the distribution, however, if the non-centrality parameter is truly large then adjusting for the estimated inflation factor λ will not be sufficient to prevent false positive associations and loss of statistical power . If AIMs are used instead of random markers, more false positive associations will result simply because the AIMs show large population differences in allele frequencies and there will be a tendency towards over-correction . However, GC is a relatively computationally easy method to implement and interpret. In addition, Bacanu et al.  have shown that the GC approach is more powerful than TDT barring substantial population stratification.
Some structured association methods utilize Bayesian techniques to assign individuals to ‘clusters’ or subpopulation classes using information from a set of non-candidate, unlinked loci and then tests for association within each ‘cluster’ or subpopulation class [2, 8, 12, 13, 19, 169]. The clusters are determined using individual ancestry estimates or principal component analysis (PCA). To estimate the ancestry of each individual, the genetic markers with different allele frequencies in the founding populations (called ancestry informative markers, or AIMs) are used to estimate the proportion of an individual's genome derived from each founding population. These proportions are then used to cluster individuals into subpopulations and to control for population structure during association testing. Alternatively, one can use PCA to estimate a genetic background score for each individual based on the AIMs, and control for stratification by accounting for variation in the data associated with the differences in allele frequencies [10, 15, 18, 170]. Satten et al.  proposed a latent class logistic regression procedure that simultaneously estimates PS and tests for association between a marker allele and a binary phenotype, assuming the marker loci are unrelated to disease and in linkage equilibrium with a putative disease gene in the same subpopulation. The advantages of this model are that it offers a unified treatment of both association and PS, using straightforward likelihood estimation accounting for substructure differences that will occur between cases and controls, which is ignored by Pritchard et al. [12, 13]. However, it has a significant drawback in that there is no software available for the analysis, which is complex given the number of nuisance parameters it involves.
RAM methodology builds upon the following premises: (1) Disease mutation occurred in one population and propagated into another through inter-mating (i.e. prevalence of the disease in the donor population is higher than in the recipient population); (2) the process of recent admixture creates disequilibrium among linked loci that tends to extend over longer genetic distances in the admixed population compared to the non-admixed population, and (3) the degree of individual regional admixture will vary with disease-predisposing loci and also in disequilibrium with loci even after appropriately adjusting for the degree of individual ancestry. Rife  was the first to point out that hybrid populations can provide useful information regarding linkage. RAM is also known as admixture mapping [17,19,20,21, 169, 172], mapping by admixture disequilibrium (MALD) , and marker location-specific ancestry mapping . RAM is a form of association testing in which genome-wide ancestry estimates and region-specific ancestry estimates are used to identify specific regions of the genome potentially harboring loci predisposing the disease or trait.
McKeigue may have been the first to introduce RAM based on sound statistical principles to control for spurious associations induced by variations in ancestry . McKeigue noted that, if one conditioned on the admixture of the individuals’ parents, linkage could be detected by testing for the association of a phenotype with the ancestry of alleles at a marker locus in an admixed sample. Using a combination of Bayesian and frequentist approaches, he employed a Hidden Markov Model (HMM) to generate the posterior distribution of individual admixture in the population and utilized likelihood-based score statistics for linkage testing [174,175,176,177]. In the RAM method of Patterson et al. , a HMM is used to scan the genome to identify regions associated with a particular trait or phenotype by simultaneously estimating individual admixture proportion and individual region-specific admixture proportions, and testing for linkage of specific genomic regions to specific phenotypes. Patterson et al.  used a likelihood ratio test for the case-only design but offered a simple t test for use in a case-control design; these algorithms are available in the software AncestryMap.
Zhu et al.  developed a method that has several advantages and is intended to be an extension to McKeigue [174, 176]. In practice, assumptions made by McKeigue [174, 176] about admixture patterns are unlikely to hold for natural populations, resulting in an inflation of the type I error rate when testing for linkage by the McKeigue method. They generalized McKeigue's approach to allow for two different admixture models: (1) hybrid isolation admixture, and (2) a continuous gene flow model. Zhu et al.'s method is very similar to Patterson et al.'s  approach for case-only testing but uses a two-stage rather than simultaneous estimation and testing procedure.
Montana and Pritchard  also introduced a test of whether cases’ region-specific admixture values are significantly different from their genome-wide admixture value, as implemented in the MALDsoft software. They recommend that the following should be collected for proper analysis: (a) a sample of affected individuals from the admixed population; (b) a sample of unaffected or random control individuals, also from the admixed population, and (c) ‘learning samples’ that consist of random individuals from each of the ancestral populations (or a close approximation thereof) that can be used to estimate the ancestral allele frequencies. However, they note that ‘it is preferable but not required to have both controls and learning samples.’ Montana and Pritchard also use a HMM to estimate the admixture proportion in the first stage and then conduct simple t tests in a second stage. In a highly similar paper, Zhang et al.  develop a simultaneous HMM framework for estimating region-specific and genome-wide individual admixture values and then incorporate these values in a logistic, regression-like framework to model case-control data. They present simulation results suggesting that their method works well for both hybrid-isolation and continuous gene-flow models. Redden et al.  developed a method based on a generalized linear model that can accommodate both SAT and RAM tests and can be used in standard statistical software, such as SAS. Recently, Clarke and Whittemore  proposed an admixture mapping test for a case only study design. The test compares the case's ancestry as inferred from his/her marker genotypes to the ancestry inferred from information from their family .
Several software packages are available to analyze recently admixed populations. These methods generally fall into one of two categories: Bayesian or Maximum likelihood. There are advantages and disadvantages of both the Bayesian (AdmixMap, AncestryMap, and Structure[2, 8, 19, 169]) and the ML methods (FRAPPE, IBGA, and PSMIX[172, 179, 180]). Table Table11 gives a listing of these available programs and compares important features. PSMIX was chosen as a representative method for the very similar ML methods. Structure is by far the most popular of all of these programs, and has been used by a number of investigators for a wide variety of populations and complex phenotypes.
A concern with all of the methods (except some of the ML methods) is the limited, or restricted, amount of testing that has been conducted. Several authors reported correlations of their estimates with Structure estimates but not of their estimates with true admixture, as determined by simulation. Tang et al. provide the only direct evaluation of individual admixture estimates . They show via simulation that, with informative markers and well represented parental populations, both Structure and FRAPPE estimations work well. With less informative markers, or only a few members of the parental populations, the FRAPPE estimation is unbiased while Structure estimates can be highly biased.
Exploring how the various methods are connected can help to identify why these methods work, what their underlying assumptions are, which are simply special cases of and redundant with another, and potentially point the way to gaps where new methods may be needed or how existing methods might be improved. Here we describe two frameworks with which one can conceptualize the extant methods. We then categorize methods via these frameworks as summarized in table table22.
In tests of association in the presence of linkage (TALs) (aka, tests of linkage in the presence of association; joint tests of linkage and association), we wish to identify situations in which (A) genotypes at a marker locus (either directly or indirectly through intermediary phenotypes) cause variations in a phenotype; or (B) the marker locus is in linkage disequilibrium with another locus at which genotypes cause variations in the phenotype; and to distinguish those situations from (C) situations in which genotypic variation at the marker locus is correlated with (but not linked to) some other inherited factor that causes variation in the phenotype. Although less commonly discussed, we also wish to (D) identify marker loci that are both linked to loci that (either directly or indirectly through intermediary phenotypes) cause variations in a phenotype and, when certain other variables are conditioned on, also associated with loci that (either directly or indirectly through intermediary phenotypes) cause variations in a phenotype; even in situations where (E) genotypic variation at the marker locus is not associated with variations in the phenotype in the absence of conditioning on those other variables (i.e., when the association is masked or suppressed).
Everything in the preceding paragraph is just another way of saying we need to control for potential confounding so that we may infer a causal influence of the marker locus itself or something in linkage disequilibrium with it on the phenotype. The ultimate source of the potential confounding that we wish to control for in TDT-type tests in non-linkage disequilibrium(NLD), i.e., correlation or disequilibrium among unlinked loci. NLD can result from many sources including selection , assortative mating , and the admixture process .
There is a rich literature on detecting causal effects in scientific research. To the extent that causation can ever be determined, most methodologists concur that we can have no stronger basis than a randomized experiment . This is because the act of randomization assures that, in the hypothetical population to which we wish to make inferences (not the specific sample in hand), there can be no association between the independent variable to which we assign subjects and any variable that existed prior to randomization. Therefore, randomization is the only method that controls for both known and unknown sources of confounding. Thus, in an ideal world, we would randomly assign individuals to genotypes at marker loci and then do our tests without any concern of confounding. Of course, in reality, this is not possible. So what is the next best thing?
We can find the root of the next best thing in the work of Gregor Mendel's second law of genetics – the law of independent assortment. Mendel  wrote ‘All constant combinations which in peas are possible by the combination of the said 7 differentiating characters were actually obtained by repeated crossing. Their number is given by 27 = 128. Thereby is simultaneously given the practical proof that the constant characters which appear in the several varieties of a group of plants may be obtained in all the associations which are possible according to the laws of combination, by means of repeated artificial fertilization.’2 A more formal statement of the law of independent assortment is ‘When gametes are formed the alleles for one trait segregate independently of the alleles of a gene for another trait.’ In other words, Mendel believed that genes for different traits segregate independently. We now know that this is only true for genes at unlinked loci. Nevertheless, Mendel's second law implies that every act of meiosis is an act of randomization in which parents randomly assign alleles to the gametes they form from their available alleles. This further implies that, conditional upon parents’ genotypes, all individuals have equal probability of inheriting (i.e., being assigned to) any particular genotype. Thus, conditional upon parents’ genotypes, individuals are essentially randomized to genotypes. The only caveat (which often works in our favor in genetic research) is that the genotypes to which individuals are randomly assigned at one locus will be correlated with the genotypes to which they are assigned at other loci, but only when the loci in question are physically linked. Hence, conditioning on parents’ genotypes offers us a natural randomized experiment that eliminates the possibility of confounding by NLD. It does not eliminate potential confounding by LD, but this ‘confounding’ by LD is actually just what we are counting on to help us identify genes in many association studies (especially genome-wide association studies).
How can we condition on parents’ genotypes? There are several ways in which this can be achieved. The first and most straightforward way would be to begin with two individuals that are of the opposite sex and, at every locus, are homozygous. However, at many loci the two individuals will be different from each other. If such individuals produce a large number of offspring, these offspring will all be genetically identical and heterozygous at every locus at which the two parents differed. These offspring can then be intermated to produce another generation. In the second generation that descends from the original set of parents (conventionally denoted the F2 generation) every individual would have an equal probability of being assigned to each genotype compared with every other individual. Thus, individuals are essentially randomized to genotypes and we have the equivalent of a true experiment with randomization. This is essentially a description of a F2 cross among inbred lines that is classically used to map genes for complex traits in animals such as mice and flies. It is noteworthy that the individuals comprising the F2 population are admixed. And yet there is no concern about confounding due to admixture because all individuals have the same ancestry. As pointed out by Redden et al.  this indicates that it is variation in ancestry and not variations in admixture per se that can cause confounding by NLD. Thus, the F2 cross among inbred lines can be seen as the geneticist's experiment in which meiosis is used to enact the process of randomization. It can also be seen as a precursor to the TDT.
Of course, we cannot set up inbred lines and do controlled breeding in humans. How then can we achieve similar objectives? We can do so by recognizing that in order for individuals to be assigned essentially at random (i.e. with equal probability across individuals) to genotypes at the marker locus, it is only necessary that their parents have the same genotypes at the marker locus, not that their parents are genetically identical with every other set of parents at all loci. Hence, we should select only individuals whose parents all had a common genotype at the marker locus. For example, if we had a locus that was di-allelic with alleles A and a, we could select only individuals in which one parent was AA and the other parent was Aa. In the offspring, we could then assess whether individuals who ‘randomly’ receive an ‘a’ allele from one of their parents tend to be phenotypically different than individuals that receive no ‘a’ alleles from their parents. Such a design would, at the locus in question, essentially recapitulate a backcross among an experimental population such as mice in which heterozygotes at the F1 generation are backcrossed to one of the parental strains. A design in which we only selected individuals whose parents had the genotypes Aa and Aa would essentially recapitulate an F2 cross at that locus.
In practice of course, the approach described in the preceding paragraph would be infeasible. Instead, rather than selecting individuals who only have parents with particular genotypes, we can statistically control for (i.e., condition on) the two parental genotypes (which we often denote mating types). This yields equivalent control because conditional upon the parental genotypes, the assignment to offspring genotypes is essentially random. Thus, our second way of achieving the benefits of randomization of allowing strong causal inferences and eliminating confounding by NLD is to statistically control for parental genotypes by directly observing them and including them in the statistical models. This is the basis of several TDTs [e.g. Allison, ]. Using a similar argument Tiwari et al.  and Beasley et al.  apply the rules of randomization by conditioning on parental genotypes.
Of course, one may not be able to or wish to observe the genotypes of the parents themselves. One can then recognize that full siblings (by definition) share the same parents. Therefore, if one controls for sibship using studies of multiple siblings, one has effectively controlled for parents’ genotypes because all siblings have the parents with the same genotypes offering yet another way to effectively condition upon parental genotypes and enjoy the inferential strength that randomization offers.
As this discussion indicates, there are multiple variables one could control for that may yield valid inference in this context allowing the randomization by meiosis to eliminate confounding by NLD. Rabinowitz and colleagues [39, 72, 84, 186, 187] have extended this idea to talk about conditioning on sufficient statistics. They seek to identify statistics that are ‘sufficient’ in the sense that if conditioned upon they would eliminate confounding by NLD. At root, this is still the same concept but expressed in a different form. This different expression of the concept is the basis for several other TDT type approaches [37, 69, 78]. Horvath et al.  express succinctly the importance of conditioning on sufficient statistics: ‘The general principle is to evaluate the distribution of test statistics using the conditional distribution of offspring genotypes under the null hypothesis, where the conditioning is on the sufficient statistics for any nuisance parameters in the model . The potential nuisance parameters for nuclear families include the distribution of the phenotypes, the parental allele frequencies, and the model for ascertainment. By conditioning the offspring genotype distribution on the phenotypes, one eliminates sensitivity of the tests to misspecification of the phenotype distribution and to ascertainment conditions that depend on the phenotypes. Conditioning on the parental genotypes eliminates sensitivity to population admixture, when parents’ genotypes are unknown. The procedures in Allison's TDTs , George et al. , Allison et al. , and FBAT [73, 78] all correct for association by conditioning on the parental genotype or transmission status of the individual.
Finally, a new class of tests known as structured association tests  attempt to use the rest of the genome to derive, via various machinations, a variable that, if conditioned upon, would control for or eliminate NLD as a confounder. In the original formulations of such approaches, the variable one sought to control for was an index of genetic admixture under some assumption of a particular population dynamic including population admixture [2, 8,11,12,13, 19, 169, 172]. More recently, these approaches are being extended to allow for other background genetic factors [10, 15]. It is important to note that unlike family-based TDT-type approaches which strictly eliminate confounding by NLD, such approaches as structured association testing only do so to the extent that one has effectively captured the important background covariates for inclusion in the model and modeled them successful . Expressed in this way, one can see structured association testing as essentially trying to achieve the same goals that propensity score analysis attempts to achieve in more general epidemiologic studies [189, 190]. Indeed, our group is currently working on formalizing the propensity score analysis approach to structured association testing. Note that the genomic control method  achieves valid inference by correcting the variance inflation factor rather than conditioning on sufficient statistics.
In conclusion, we have reviewed family- and population-based designs that have been in the literature proposed for eliminating or controlling for confounding due to population stratification in order to draw valid inference in the context of genetic linkage and association studies. Also, we described how and why these methods of linkage and association testing follow general statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies. Because of the vast number of options available, the reader is cautioned to take care when applying these methods in terms of meeting the required assumptions and assuring that the method is testing the hypothesis that the reader intends to test.
Partial list of extensions and variations of TDT type procedures.
This study is supported in part by R21LM008791, T32AR007450, R01DK52431, U01HL072510-02, P20RR016430, P30DK56336, R01RR017009, R01DK56366, R01ES09912, P01AR049084, U54CA100949, R01GM077490, R01AR052658, and 3R01AR052658-03S1, 2R01HL055673-11A1, R01GM074913-01A1.
1By valid test of linkage in the presence of association, we mean a test that (A) yields p values less than or equal to no more than 100 * α% of the time when the marker is either unlinked to or not associated with a locus causing variation in the phenotype; and (B) yields p values less than or equal to α more than 100 * α% of the time when the marker is both linked to and associated with a locus causing variation in the phenotype.