|Home | About | Journals | Submit | Contact Us | Français|
We read with some interest the paper by Zhao, Rebbeck and Mitra “A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors.” We feel there are a number of issues raised by this paper that are not adequately addressed, which motivates this letter. Further, we take this opportunity to make some general comments that contrast stratification-based and direct (model-based) control of confounding by population stratification.
Zhao, Rebbeck and Mitra (ZRM) give as their stated goal the development of a propensity score-based method for correcting for confounding by population stratification in population-based association studies. We note that ZRM demonstrate their genomic propensity score (GPS) approach using simulation studies of case-control data. However, the propensity score can only be properly used to adjust for confounding in a cohort or case-cohort study [Joffe and Rosenbaum, 1999]. The artificial ratio of case to control participants in a case-control study yields a biased estimate of the propensity score; stratification on a biased estimate of the propensity score can lead to residual bias [Månsson et al., 2007]. With no theoretical basis for believing that adjustment using the propensity score can control bias, the GPS approach applied to case-control data under a general association model is suspect even given the favorable simulation results in ZRM. The suggestion of Månsson et al.  that the propensity score can be estimated using data from control-participants only and then making the rare disease approximation should be viewed with caution, as this scoring function corresponds to one of Miettinen’s confounder scores [Miettinen, 1976]. Estimated odds ratios after stratification using Miettinen’s confounder scores are known to have biased variance estimates due to collinearity [Pike, 1977].
The propensity score is typically used to compare the differences in a mean response; in the genetic context, the proportion of cases among participants with the risk genotype(s) is compared to the proportion of cases among participants with the non-risk genotype(s). A comparison of this type seems questionable for a case-control study as the data are sampled conditional on case status. Instead, ZRM claim estimation of the odds ratios with bias close to zero under the alternative hypothesis. This claim should be viewed with caution, as it is known that even for prospective studies where the propensity score is appropriate, estimation of odds ratios may be biased [Austin et al., 2007]. Although the direction of bias found by Austin et al. agrees with that reported by ZRM in their discussion (who note that odds ratios estimated by GPS are consistently underestimated), Austin et al. also rely on Monte Carlo studies and so the direction of bias may in fact be indeterminate. For case-control studies, where there is no theoretical basis for use of GPS, finding a low bias in a simulation study should not be taken as demonstration of the applicability of the method in general. In fact, a logistic model for case–control status that includes genotype and the GPS as a covariate (as proposed by ZRM) is not compatible with either the logistic model that includes genotype and covariates, typically assumed when using direct adjustment for confounders, or with the marginal logistic model for genotype alone, typically assumed in the absence of confounding, in the sense that the parameters estimated by ZRM do not correspond to parameters in either of these models.
In our previous work [Epstein et al., 2007], we developed the stratification score to account for confounding when testing hypotheses in genetic association studies. The Epstein, Allen and Satten (EAS) stratification score approach first uses genomic (or non-genomic) covariates to model case-control status, and then creates a stratification score based on fitting this model. Thus, for each individual, we compute the EAS stratification score, which is the estimated probability that an individual is a case given their genomic (and other) covariates. We then group the data into strata based on the values of the EAS stratification score. Finally, the association between case or control status and G is evaluated using the stratified data. Comparing the EAS stratification score and the genomic propensity score (GPS), we see that the roles of case-control status and genotype are reversed; for the GPS, the first step is to construct a model for a binary coding of G as a function of genomic and non-genomic covariates (ignoring case-control status). This model is then used to adjust for population stratification.
We believe that, among stratification methods, the EAS stratification score has several advantages over GPS. First, GPS calculates the probability of a risk genotype (or set of risk genotypes) while the EAS stratification score calculates the probability of being a case. Because case status is a binary covariate, the EAS stratification score is superior to the GPS in that it does not require an arbitrary dichotomization of genotype into “risk” and “non-risk” genotypes. Second, if multiple loci are tested, a different GPS score must be calculated at each locus; this is a heavy burden when analyzing genome-wide data. When using the EAS stratification score, the same strata can be used to test association at every locus. This distinction is crucial when constructing permutation tests for multi-locus analyses.
The EAS stratification scheme leads to a simple approach to permutation testing for multi-locus or genome-wide analyses while GPS does not. When conducting a permutation test, it is crucial that the permutation (replicate) data sets must be similar to the original data set in every way except for the association between genotype and outcome. In particular, each replicate data set must reproduce the same population stratification as the original data. When using the EAS stratification score, this is easily accomplished by permuting disease and genotype vectors within score-based strata. This scheme preserves population stratification while also preserving linkage disequilibrium in the replicate data sets. Using GPS it is difficult to see how an analogous calculation can be carried out. As each locus requires a different stratification scheme, permutation of either disease or genotype labels within strata would not preserve linkage disequilibrium. Further, permutation of disease labels before analysis does not preserve population stratification.
Both GPS and the EAS stratification score approach require a model for how genetic covariates affect the score. For genome-wide association studies, we have recently had success calculating the stratification score using principal components (PCs) calculated using a thinned set of marker genotypes [Fellay et al., 2007] and have found that it controls stratification as measured by the variance inflation factor after stratification [Allen and Satten, 2009a,b; Sarasua et al., 2009]. Further, because the EAS stratification score is the same at each locus, we have successfully applied the EAS stratification score to haplotype-based analyses without modification [Allen and Satten, 2009a,b].
It may seem at first glance that using PC to construct the stratification score is equivalent to including PCs as covariates in a model such as Eigenstrat. However, this is not the case, as the variance estimators are different. Extensive experience with propensity scores for prospective data [e.g., Lunceford and Davidian, 2004], as well as simulations performed by EAS, attest to the validity of the stratification variance estimators. In contrast, McPeek and Abney  have shown a variety of situations in which Eigenstrat does not preserve size or has diminished power. These ideas are demonstrated in the following example of data from a population with three strata. The first stratum provided 80% of cases but no controls; the second stratum provided 80% of controls but no cases; the third stratum provided the remaining participants. The minor allele frequency (MAF) at a locus we wish to test for association with case-control status was 0.05 in the first two strata and 0.1 in the third. It should be noted that while there is extreme stratification in this simulation, there is no confounding by stratification as the MAF is the same in the first two strata. We generated 5,000 simulated data sets each having data from 300 cases and 300 controls. Data on 500 ancestry-informative markers having MAFs uniformly distributed between 0.05 and 0.5, with 400 having F_st = 0.20 and 100 having F_st = 0.01, was generated using the model of Balding and Nichols . We used the first 10 PCs based on these 500 markers in a logistic model to calculate the EAS stratification score. We and also give results for linear and logistic models for case-control status where both genotype and the first 10 PCs were used as covariates.
At the nominal significance level of 0.05 (0.01), we found the size of the naïve (unstratified) analysis was 0.052 (0.011), which was as anticipated given the lack of confounding due to stratification in the sample. However, the size of the test that includes PCs as covariates in the logistic model was 0.093 (0.028); the size of the test using the analogous linear model was 0.12 (0.039). In contrast, testing after stratification using the EAS stratification score had a size of 0.056 (0.009). These results highlight the differences between stratification-based tests and direct adjustment, and also suggest that the stratification score approach to controlling confounding by population substructure deserves wider use when analyzing genetic association data.
In summary, post-stratification for control of confounding due to population stratification when testing for association is an attractive strategy. Two different scores have been proposed: the EAS stratification score and the GPS of ZRM. Both methods can easily make use of nongenetic covariates. Of the two scores, the EAS stratification score has several advantages over GPS when applied to case-control data: a natural binary outcome to model; the same stratification at each locus; easy permutation testing for multi-locus or genome-wide analyses. We are unaware of any advantages of the GPS when compared to the EAS stratification score. Finally, score-based methods may have advantages over direct adjustment for population stratification due to a more robust variance calculation.
This work was funded in part by NIH grant HG003618 (to M. P. E.) and R01 MH084680 (to A. S. A.). The findings and opinions expressed in this letter are those of the authors and do not necessarily reflect the official position of the CDC.