From an epidemiologic perspective, the use of 'genetic' clusters, as suggested by Wilson et al.
], instead of self-reported ethnicity will not alleviate but rather will actually create and/or exacerbate problems associated with genetic inferences based on racial differences. The true complication is due to the fact that racial and ethnic groups differ from each other on a variety of social, cultural, behavioral and environmental variables as well as gene frequencies, leading to confounding between genetic and environmental risk factors in an ethnically heterogeneous study. For example, with respect to treatment response, "An individual's response to a drug depends on a host of factors, including overall health, lifestyle, support system, education and socioeconomic status - all of which are difficult to control for and likely to be affected, at least in the United States, by a person's 'race'" [3
Specifically, let us consider the practical implications of the "race-neutral approach" [3
] advocated by Wilson et al.
]. As an example, we revisit a recent study of the efficacy of inhibitors of angiotensin-converting enzyme (ACE) in 1,200 white versus 800 black patients with congestive heart failure [31
] that generated a great deal of controversy [1
]. In that study, the authors showed that black patients on the ACE inhibitor Enalapril showed no reduction in hospitalization compared with those on placebo, whereas white patients showed a strong, statistically significant difference between treatment versus placebo arms. Let us suppose that instead of using racial labels, the authors had performed genotype cluster analysis on their combined sample. They would have obtained two clusters - cluster A containing approximately 1,200 subjects, and cluster B, containing approximately 800 subjects. They would then demonstrate that cluster A treated subjects show a dramatic response to Enalapril compared to placebo subjects, while cluster B subjects show no such response. The direct inference from this analysis would be that the difference in responsiveness between individuals in cluster A and cluster B is genetic - that is, due to a frequency difference in one or more alleles between the two groups. But the problem should be obvious: cluster A is composed of the Caucasian subjects and cluster B the African Americans. Although a genetic difference in treatment responsiveness between these two groups is inferred, the conclusion is completely confounded with the myriad other ways these two groups might differ from each other; hence the culprit may not be genetic at all.
A racial difference in the frequency of some phenotype of interest (disease, or drug response) or quantitative trait is but a first clue in the search for etiologic causal factors. As we have illustrated, without such racial/ethnic labels, these underlying factors cannot be adequately investigated. Although some investigators might quickly jump to a genetic explanation for an ethnic difference, this is rarely the case with epidemiologists, who have a broad view of the complex nature of most human traits [33
]. Indeed, epidemiologists employ several different approaches to disentangling genetic from environmental causes of ethnic differences, including migrant studies and stratified analyses.
The rationale underlying migrant studies is to compare the frequency of a trait (such as disease) between members of the same ethnic group (who are assumed to be genetically homogeneous) who are residing in different environments. For example, breast cancer rates in Asian (Chinese and Japanese) women are vastly lower than the rates among US Caucasians. However, the breast cancer rates of Chinese and Japanese women living in Hawaii and the San Francisco Bay Area are comparable to those of US Caucasians [34
]. These results suggest an environmental source of the racial difference. Asians are also known to have much lower rates of multiple sclerosis than European Caucasians [35
]. But within a single country, namely Canada, this racial difference persists [36
], increasing support for (but not proving) a genetic explanation.
The best approach to resolve confounding is through matched, adjusted or stratified analyses, but this depends on having the confounding variables (or their surrogates). Such analyses can be performed in a racially heterogeneous sample, but it is potentially more powerful when performed within a single group. The reason is that the correlation between confounding variables (such as genes and environment) may be stronger in a heterogeneous study population than in a more homogeneous one. The ability to disentangle the effects of confounding variables is greater when their sample correlation is low.
A simple example is provided in Figure . Here we assume two populations (for example, races), groups A and B. As shown in the figure, where both environmental and genetic factors differ between the populations, it is impossible to determine which is the functional cause of the racial difference if the genetic and environmental effects are completely correlated in frequency within the two groups. More importantly, if the relative frequency in the two groups of the environmental factor was not measured, analysis stratified on the genetic differences yields the correct interpretation that the genetic difference does not contribute to the racial difference if the genetic and environmental factors are not correlated. But if they are fully correlated, analysis stratified on the genetic factor alone would lead to the incorrect conclusion that it is the cause of the racial difference.
Figure 2 An example of confounding and a stratified analysis of environmental and genetic factors. Here we assume two populations (for example, races), groups A and B. G1 and G2 represent dichotomous genotype classes at a candidate gene locus (here one of the (more ...)
Epidemiologists often perform analyses of racial differences stratified on numerous environmental variables, such as socio-economic status, access to health care, education, and so on. The persistence of racial differences after accounting for these covariates raises the index of suspicion that genetic differences may be involved. For example, Karter et al.
] recently demonstrated persistence of racial differences in diabetes complications in a health maintenance organization after controlling for numerous potential confounders including measures of socioeconomic status, education, and health-care access and utilization. Such evidence is indirect, however, as other unmeasured factors may still be responsible [38
]. Ultimate proof depends on identifying a specific gene effect within each population, with an allele frequency difference between populations. One such example involves the lower risk of type 1 diabetes in US Hispanic versus Caucasian children. The HLA allele DR3 is predisposing to type 1 diabetes in both populations but has a lower frequency in Hispanics than Caucasians [39
Another approach often taken by genetic epidemiologists is to consider the prevalence of disease or drug response (D) in individuals who are admixed between groups A and B - for example, in individuals who are 100%A, 75%A-25%B, 50%A-50%B, 25%A-75%B and 100%B (corresponding to 4, 3, 2, 1 and 0 grandparents that are group A, respectively). A continuous cline in the frequency of D with genome proportion that is group A is taken as suggestive evidence of genetic factors explaining the prevalence difference between groups A and B. An example of this type of analysis is the decreasing trend in type 2 diabetes in Pima Indians with degree of Caucasian admixture [23
]. Analyses stratified on environmental factors can again strengthen the argument. But the same caveat applies here as described above. If an unmeasured environmental variable (such as socio-economic status) covaries in the same fashion as the proportion group A, the racial difference could be due to this unmeasured variable. At best, one could argue that the racial difference is not explained by any of the measured covariates.