In recent years, imputation has become a key tool in the success of genome-wide association studies. Genotype imputation has proven to increase the power of genetic association studies, by boosting the number of SNPs to be tested for association and facilitating the detection of rare variants in addition to common variants [
14,
19,
34,
35]. Furthermore, imputation aids in fine-mapping studies of the disease-associated region thus increasing the chance of identifying additional candidate SNPs [
36]. Finally, genotype imputation enables meta-analysis that combines results across studies based on different genotyping platforms [
37,
38]. This approach has been effective in identifying novel associations in different traits [
39-
44].
However, an important concern with respect to imputation lies in the selection of an appropriate reference panel. Most of the GWAS to date have been conducted in populations well represented by the available reference panels (e.g. European or East Asian populations), and used only one relevant reference population during the imputation process [
6,
43,
45-
47]. However, for populations that are phylogenetically distant from the samples present in the reference panels, the selection of a suitable reference panel for imputation becomes less clear. In this situation, differences in the pattern of LD between the study and reference populations may affect imputation accuracy. Different approaches have been suggested for this particular scenario. For example, Huang et al. [
16] explored imputation accuracy in the samples of the HGDP-CEPH panel, which is a worldwide collection of individuals from different locations, using the HapMap II reference panels. The authors found that for most of the studied samples, mixtures from at least two HapMap reference samples maximized imputation accuracy [
16]. Another study showed that using tag SNPs from all the HapMap reference populations combined captured common variation in African American, Latino and Hawaiian samples more effectively than tag SNPs obtained from the individual HapMap reference samples [
48]. This ‘cosmopolitan’ approach to imputation, combining reference haplotypes from all the reference populations available, is the strategy currently recommended by the most widely used imputation packages, such as IMPUTE [
19,
20] and MACH [
21].
African American and Hispanic/Latino populations have unique challenges for imputation. These populations are the result of recent admixture between continental groups (primarily European, Native American and West African populations) and admixture proportions show substantial geographic variation [
49-
51]. Several studies have evaluated imputation performance in recently admixed populations. In a recent GWAS of coronary heart disease and its risk factors in a large African American sample [
52], a high imputation concordance (95.6%) was obtained when SNPs were imputed using a combined reference panel of haplotypes from the HapMap phase II CEU and YRI panels. In another study in African Americans [
53], the highest imputation yield and coverage were attained using the two HapMap reference panels (CEU and YRI) separately and then merging the results. Another approach for imputation in African American populations has been recently suggested by Paşaniuc et al. (2011) [
54]. This strategy, termed ‘local ancestry aware imputation’, uses local ancestry to guide the choice of reference haplotypes for imputation and shows marginal improvement in imputation accuracy in the admixed sample. However, this approach will be more difficult to implement in Hispanic/Latino populations, due to the lack of reference data for the relevant Native American parental populations, which is key to obtain accurate estimates of local ancestry. In the study by Huang et al. (2009), using combinations of two (European and East Asian) or three HapMap reference samples (East Asian, European and West African) produced the highest imputation accuracies (>95%) for two Native American samples (Pima and Maya) and a sample from Colombia [
16]. A recent study [
55] showed that, when performing imputation in a Hispanic sample from San Francisco with the program IMPUTE v2 and the HapMap II reference panel, using local haplotype weights based on a coalescent method provided lower error rates (7.8%) than using no weighting (8.9%), or a global weighting method based on empirical estimates of ancestry (9.0%) [
56]. It is important to note that most of the aforementioned studies have used the HapMap II panel as the reference dataset for imputation. However, the recent progress of the 1000 Genomes project (http://www.1000genomes.org/) has provided the scientific community with much more complete reference panels, both in terms of the number of markers and the number of populations. Importantly, the reference databases are updated on a regular basis. For this reason, it is currently recommended to perform the imputation in two stages: pre-phasing the study genotypes to estimate haplotypes, and then imputing untyped genotypes in a separate run. This substantially reduces imputation time with respect to single-step approaches at the expense of a small loss in accuracy. An important advantage of this approach is that, as new reference data become available, it is only necessary to repeat the imputation step.
In this study, we evaluated the imputation performance of the widely used program IMPUTE in an admixed sample from Mexico City using different imputation strategies (single-step vs. two-step imputation) and reference panels (HapMap and 1000 Genomes). We have previously described that this sample primarily has Native American (62%) and European contributions (33%), with a low proportion of African ancestry (5%) [
30]. Importantly, there are no Native American reference samples in the HapMap or 1000 Genomes datasets, so it is of relevance to test the relative imputation performance of these reference panels in the Mexican sample. In an analysis of imputed markers on chromosome 12, we observed that for this sample there are only minor differences in imputation accuracy between the single-step and two-step approaches (Table ). The concordance rate of the single-step approach is only slightly higher than that of the two-step approach (99.1% vs. 98.4% when using the HapMap phase II combined reference sample, respectively). In contrast, the imputation efficacy (i.e. proportion of non-missing genotypes) was higher for the two-step than the single-step imputation approach (90.1% vs. 85.5%, respectively). Therefore, our study confirms the two-step approach as the preferable imputation strategy, because it provides flexibility and faster imputation times, while providing an overly similar imputation performance to the single-step approach.
As expected, we observed that adding the HapMap Phase III Mexican American sample from LA to the HapMap Phase II combined reference sample there were marginal increases in both accuracy (99.4% vs. 99.1%) and efficacy (85.9% vs. 85.5%) (Table ). We also anticipated to find that reducing the threshold of the imputation confidence scores (the INFO score measures) when calling the imputed genotypes would result on lower imputation accuracy and higher proportions of non-missing genotypes. The reductions observed in imputation accuracy were relatively minor, from 98.4% with an INFO score threshold of 0.9 to 95.5% with an INFO score threshold of 0.5 (Figure ). This relatively small reduction in overall imputation accuracy is primarily due to the fact that most genotypes (and markers) have very high INFO scores. Therefore, adding the relatively small percentage of genotypes with lower INFO scores (and lower concordance rates) does not produce a major shift in overall imputation accuracy. Of all masked markers on chromosome 12, 61.5% had INFO scores higher than 0.9, 15.4% had INFO scores between 0.8 and 0.9, 6.5% had INFO scores between 0.7 and 0.8, 6.2% had info scores between 0.6 and 0.7, 3.8% had INFO scores between 0.5 and 0.6, and 6.7% INFO scores lower than 0.5 (see also discussion below about the relationship of imputation efficacy and accuracy and allele frequency).
We also examined the potential improvement in imputation performance obtained with the recently available 1000 Genomes panel (June 2011 release), with respect to the HapMap panel, using the two-step imputation protocol. The 1000 Genomes panel is a much more comprehensive and powerful resource for imputation, comprising more than 37 million autosomal SNPs present in 1,094 individuals from different populations around the world. Here, we show that for the Mexican sample the major improvement associated with the use of the 1000 Genomes reference panel is the substantial increase in imputation efficacy, in addition to the larger number of imputed markers (Table ). Genotype concordances were similar for both reference datasets (around 98.4%). However, imputations with the 1000 Genomes panel resulted in 94.7% of non-missing genotypes (employing an INFO score threshold of 0.9), in comparison with 90.1% for the HapMap phase II combined panel (using the same threshold). When the INFO scores are plotted for different allele frequency bins, either as an average (Figure ) or as histograms of the individual scores (Figures A and 3B), it is evident that the confidence of the genotype calls is higher with the 1000 Genomes panel for all allele frequency categories. There is a high correlation between the INFO scores obtained with the 1000 Genomes and HapMap phase II reference panels (Figure ), but the former are systematically higher than the latter (Figure ).
The results described above are based on an analysis of markers on chromosome 12. An analysis of markers on chromosome 22 gives consistent results: The concordance rates using the HapMap phase II and 1000 Genomes reference panels are very similar (97.6% vs. 97.3%, respectively), but the proportion of non-missing genotypes is lower with the HapMap reference panel than with the 1000 Genomes panel (83.2%, and 89.9%, respectively). Interestingly, in the HLA region on chromosome 6, which spans approximately 5 megabases (29–34

Mb) and has shown signatures of natural selection in previous studies (31–33), both the imputation accuracy (concordance) and the imputation efficacy (proportion of non-missing genotypes) were higher than those observed for chromosomes 12 and 22. When analyzing locus ancestry with a panel of Ancestry Informative Markers in the sample from Mexico City (data not shown), we observed that in a broad region of chromosome 6, including the HLA loci, there was an excess of European ancestry with respect to the rest of the genome, in both type 2 diabetes patients and controls. This may be a potential explanation for the increased imputation accuracy and efficacy identified in the HLA region (i.e. both reference panels, HapMap and 1000 Genomes, have a good representation of European populations, but Native American populations are not well represented in these panels).
The imputation performance of the 1000 Genomes reference panel for rare variants is substantially better than that of the HapMap phase II panel. However, the average imputation confidence (INFO score) is considerably lower for rare variants than for common variants (Figures and ), irrespective of the reference panel. The rare alleles (<1%) present in the Mexican sample are not properly captured by any of the reference panels, in spite of the inclusion in the 1000 Genomes panel of dense data from another sample of Mexican ancestry from LA. This is also evident in a more detailed comparison of imputation accuracy and efficacy for heterozygotes in the following allele frequency categories: <1%, 1–5% and 45–50%. For common variants (45–50%), the imputation accuracy and efficacy were very high (>97% concordance and >90% non-missing genotypes). However, for rare variants (<1%), the proportion of missing genotypes was quite high (> 21%), and importantly, even for the genotypes with high INFO scores (>0.9), there was a large proportion of discordant calls (>39%). It is important to note that our analyses were based on markers from a commercial microarray (in order to minimize genotyping errors, the program PLINK was used to merge the genotype calls obtained with two genotyping algorithms: BRLMM-P and Birdseed), and it is not clear to which extent these findings can be extrapolated to other scenarios (e.g. sequencing data). However, our results highlight the need to be cautious with the interpretation of the results for rare variants in GWAS in Hispanic samples.