The frequencies of pigmentary phenotypes collected in the 5 component studies are presented in . The samples were broadly similar.
The distributions of human pigmentary phenotypes in the five component studies
We compared the distribution of observed p-values from each of the 528,173 SNPs in the GWAS with those expected under the global null hypothesis that none of the tested SNPs is associated with natural hair color (). The distribution of the observed p-values for the crude analyses that restricted analysis to women of self-reported European ancestry but did not further adjust for potential population stratification shows evidence for systematic bias: the genomic control inflation factor for the crude analyses (the ratio of the median observed test statistic to the theoretical median) is λGC
1.24. This systematic bias is most likely due to confounding by latent population stratification. Hair color varies along a light-dark gradient from northern to southern Europe, so it will be associated with any SNP marker whose minor allele frequency also varies along a North-South gradient, even if that marker is not in linkage disequilibrium (LD) with a causal hair-color locus 
. Adjusting for the top four principal components of genetic variance 
eliminated most of the apparent residual confounding due to population stratification (λGC
1.02 for the adjusted analyses); further control for up to 50 principal components did not alter the λGC
. All of the association results from the initial GWAS reported below are from analyses that adjusted for the top four principal components of genetic variation.
Quantile-quantile plot of the -log10 p-values from an analysis of the initial GWAS that did not adjust for principal components of genetic variation (black dots) and an analysis that did adjust for the four largest principal components (red dots).
The GWAS identified several genomic locations as potentially associated with hair color (). Of 528,173 SNPs tested, the 38 SNPs with the most extreme p-values associated with hair color are listed in .
-log10 p-values from the primary test of association with hair color in the initial GWAS, by position along chromosome.
Thirty-eight SNPs with the smallest p-values of the 528,173 tested for association of hair color in the initial GWAS of in 2,287 women of European ancestry
We selected 31 of these 38 SNPs for further study in an independent sample. The remaining seven SNPs were in strong LD (r2>0.8) with one of these 31 SNPs (). The sample consisted of 870 controls of European ancestry from a nested case-control study of skin cancer within the Nurses' Health Study (NHS). Thirty of the 31 attempted SNPs were genotyped successfully.
Twenty-two of these 30 SNPs showed very strong evidence for association with natural hair color (p<9.5×10−8
0.05/528,173) in a pooled analysis of the initial GWAS and the validation sample (). Of the remaining eight SNPs, three showed very strong evidence for association with hair color either after excluding women with red hair or when comparing women with red hair to those without (). The associations between these 30 SNPs and with tanning ability and skin color are presented in Table S1
and Table S2
Thirty-one SNPs among the controls in the skin cancer study within the NHS, and pooled with the GWAS data
The SNP rs12203592 in intron 4 of the IRF4
gene was strongly associated with hair color in the initial GWAS and validation study (black to red, pooled p value for trend
; black to blonde, pooled p value for trend
). The percentage of residual variation in hair color from black to blonde explained by this SNP after controlling for the top four principal components of genetic variation was 7.0%. This SNP is within 69.7 kb of two SNPs (rs4959270 and rs1540771) that were identified by a recent GWAS of natural hair color in women of European ancestry resident in Iceland 
. However, neither of these variants, which lie between EXOC2
) and IRF4,
was as strongly associated with natural hair color in our initial GWAS as the IRF4
SNP rs12203592 (). In our GWAS, the p values for association between hair color (black to blonde) and rs4959270 and rs1540771 were 2.9×10−4
and 0.007, respectively, and those for tanning ability were 0.002 and 0.001, respectively. In fact, the p-value for association between rs12203592 and natural hair color was more than 13 orders of magnitude smaller than the p-value for any other SNP on chromosome 6. This should not be taken as evidence that the loci that influence hair color in Iceland are different from those for the rest of Europe; rather, the previous GWAS may have failed to identify rs12203592 because this SNP is not on the Illumina HumanHap300 array used in that study, while it is on the Illumina HumanHap550 array used here.
Association analysis of SNPs across IRF4 region.
We genotyped rs12203592 in an additional 6,155 individuals of predominantly European ancestry from the United States, including 3,750 women from the NHS and 2,405 men from the Health Professionals Follow-up Study (HPFS), and in an additional 1440 individuals of European ancestry from Australia (Queensland Institute of Medical Research (QIMR)). The association with hair color showed strong reproducibility; the independent p-values for association with hair color (black to blonde) in these three follow-up studies was 3.2×10−40, 2.8×10−35, and 4.6×10−23 respectively, and the pooled p-value across all five studies was 1.5×10−137 (). There was no statistical evidence for heterogeneity in the magnitude or direction of the hair color-minor allele correlation across the studies. The rs12203592 SNP was also highly associated with skin color (6.2×10−14), eye color (6.1×10−13), and tanning ability (3.9×10−89) in subsets of these individuals for which this information was available (). This variant allele was associated with lighter skin color, less tanning ability, and blue/light eye color ().
Association of IRF4 SNP rs12203592 and SLC24A4 SNP rs12896399 with pigmentary phenotypes in five studies
Distributions of IRF4 rs12203592 and SLC24A4 rs12896399 with pigmentary phenotypes in the pooled five studies
On the same chromosome, 145.8 kb centromeric from the IRF4
rs12203592, the SNP rs6918152 in the EXOC2
gene was associated with hair color (black to blond) in the initial GWAS, the NHS skin cancer controls, and the Australian samples ( and ). These two SNPs are in very weak LD (r2
0.04). Genotypes for the SNP rs1540771 previously reported by Sulem et al. 
were available in the initial scan and the Australian samples. In a mutually adjusted multivariable regression of rs12203592, rs6918152, rs1540771, the strength of the association between the first SNP with hair color was attenuated but remained significant (p
rs12203592 in the Australian samples) (). The previously reported SNP rs1540771 was not significant (p>0.05) after adjustment for the other SNPs, and association between rs6918152 and hair color was much weaker in the GWAS and no longer significant in the Australian samples after adjustment. While the IRF4
SNP rs12203592 was also associated with skin color, eye color and tanning ability, the EXOC2
SNP rs6918152 was not associated with these phenotypes. These results suggest that the IRF4
SNP rs12203592 is most likely to be in strong LD with the causal variant in this region.
Association between SNPs in EXOC2 and IRF4 and hair color (black to blonde)
The rs12896399 SNP 15.5 kb upstream of the SLC24A4
gene was highly associated with light hair color, and relatively weakly associated with less tanning ability in the pooled analysis of four studies (p
for hair color, and p
0.01 for tanning ability). The percentage of residual variation in hair color from black to blonde explained by this SNP after controlling for the top four principal components of genetic variation was 2.6%. This variant was also associated with blue/light eye color (p
in the HPFS set). The SLC24A4
gene belongs to a family of potassium-dependent sodium/calcium exchangers. At least two other members of this family are associated with skin pigmentation. The SLC24A5
gene was recently shown to be involved in skin pigmentation in both zebrafish and humans 
Another member of this family, MATP (SLC45A2)
, is a pigmentation gene transcriptionally regulated by MITF 
. We identified the SNP rs28777 in the MATP
gene from the GWAS, and the association with hair color was replicated in the controls of the skin cancer study (pooled P value
). This SNP was also associated with skin color (pooled P value
) and tanning ability (pooled P value
). Three SNPs in the MATP
gene have been associated with human pigmentation: rs16891982 (Phe374Leu), rs26722 (Glu272Lys), and rs13289 C/G (-1721 in the promoter region) 
. We genotyped these three SNPs in the controls of the skin cancer study. None of the three previously reported SNPs were in LD with rs28777 (r2
≤0.01), which is an intronic SNP. A multivariable analysis mutually adjusting for rs28777, rs16891982, rs26722, and rs13289 simultaneously showed that only rs16891982 remained significant in the model (P
0.036 for hair color (black to blonde), P
0.016 for tanning ability, and P
0.0009 for skin color) and other SNPs became non-significant (p>0.05). These data suggested that rs16891982 is most likely to be the causal variant or in strong LD with the causal variant in the MATP
Eleven SNPs spanning 1 Mb on chromosome 15 were strongly associated with hair color in the initial GWAS. These SNPs were located in the OCA2
5′ regulatory region and the HERC2
gene region and included the 3 SNPs reported previously with eye color: rs7495174, p
; rs6497268, p
; and rs11855019, p
. In an analysis mutually adjusting for all 11 SNPs simultaneously, only the HERC2
SNP rs12913832 (not on the HumanHap 300 version used in Sulem et al. 
) remained significantly associated with hair color (p
) and tanning ability (p
). The associations between all other SNPs and hair color became non-significant (p>0.05). This suggested that the SNP rs12913832 was most likely to be in strong linkage disequilibrium with the causal variant in this region. The percentage of residual variation in hair color from black to blonde explained by this SNP after controlling for the top four principal components of genetic variation was 10.7%.
We observed 12 SNPs on chromosome 16 associated with hair color in the GWAS, spanning >756 kb. The MC1R
gene, well established to be associated with red hair color, is located within this region. We had previously genotyped 7 common MC1R
variants among the NHS skin cancer controls 
. The analysis mutually adjusting for all 19 SNPs in the controls of the skin cancer study indicates that the signals that we detected in this region were mainly due to the three MC1R
red hair color alleles (Arg151Cys, Arg160Trp, and Asp294His) (Table S3
). The pairwise LD among these 19 SNPs was very low (the pattern of LD across these 19 SNPs is shown in Figure S1