|Home | About | Journals | Submit | Contact Us | Français|
We used resequencing and genotyping in African Americans with sickle cell anemia (SCA) to characterize associations with fetal hemoglobin (HbF) levels at the BCL11A, HBS1L-MYB and β-globin loci. Fine-mapping of HbF association signals at these loci confirmed seven SNPs with independent effects and increased the explained heritable variation in HbF levels from 38.6% to 49.5%. We also identified rare missense variants that causally implicate MYB in HbF production.
HbF is a strong and heritable modifier of disease severity for individuals with sickle cell disease (SCD, including sickle cell anemia (HbSS) but also HbSC and HbS-β-thalassemia) and β-thalassemia; individuals with high HbF levels have less severe complications and a longer life expectancy. Three loci (at BCL11A, HBS1L-MYB and β-globin) carry DNA polymorphisms that modulate HbF levels1–4. To fine map the HbF association signals, we resequenced 175.2 kb from these loci in 190 individuals, including the HapMap European CEU and Nigerian YRI founders and 70 African Americans with SCA (Supplementary Methods). We discovered 1,489 DNA sequence variants, including 910 previously unreported variants (Supplementary Fig. 1 and Supplementary Tables 1 and 2). Using this information and data from HapMap, we selected and genotyped 95 SNPs in 1,032 African Americans with SCA (Supplementary Methods). We genotyped 17 and 35 SNPs at the BCL11A and HBS1L-MYB loci, respectively, to characterize previously reported HbF association signals4. We also genotyped 43 SNPs at the β-globin locus to capture the majority of the common genetic variation on the main sickle cell haplotypes. Association results are presented in Supplementary Table 3.
BCL11A is a direct repressor of HbF production5 and a major regulator of developmental globin gene switching6. Consistent with previous reports3,4, rs4671393 in BCL11A intron 2 was the genetic marker most strongly associated with HbF levels (P = 3.7 × 10–37) (Table 1). Stepwise conditional analyses found two other SNPs (rs7599488 and rs10189857) which independently associated with HbF levels (Table 1). These two SNPs, located in BCL11A intron 2, are in weak linkage disequilibrium (LD) with rs4671393 (r2 = 0.17 and r2 = 0.15 for rs7599488 and rs10189857, respectively) but are in strong LD with each other (r2 = 0.96). When we used principal component analysis to control for admixture, we observed only minor differences in the results (Supplementary Table 4).
To further understand the contribution of rs10189857, rs7599488 and rs4671393 to the BCL11A HbF association signal, we performed a haplotype analysis. These three SNPs form four haplotypes that represent 99.7% of all haplotypes at this locus. These haplotypes were more strongly associated with HbF levels (P = 4.0 × 10–45) than rs4671393 (P = 3.7 × 10–37) and explained 18.1% of the phenotypic variation in HbF levels (Supplementary Table 5). Thus, these haplotypes explain more phenotypic variance than the cumulative sum of the three BCL11A SNPs taken individually (14.7%) (Table 1 and Supplementary Methods). Although there are caveats in calculating variance explained by adding up single SNP main effects (for instance, it ignores possible interactions between markers), this approach reflects current practices in estimating variance for loci identified through large meta-analyses of genome-wide association study (GWAS) results. At the BCL11A locus, it is likely that the difference in phenotypic variance explained is due to the presence of HbF-increasing and HbF-decreasing alleles on the same haplotype background, where associated SNPs in LD masked each other's phenotypic effect (Supplementary Table 5). This antagonistic effect could represent an important source of the ‘hidden’ heritability highlighted by GWAS7. Imputation of ungenotyped markers did not reveal other SNPs with stronger association to HbF levels than rs10189857-rs7599488-rs4671393 (Supplementary Table 6).
The HBS1L-MYB intergenic interval carries DNA polymorphisms that influence HbF levels in healthy Europeans and in individuals of African ancestry with SCD1,3,4. We performed single-marker regression analysis and identified rs9402686, which was more strongly associated with HbF levels than the previous index HbF SNP at this locus (P = 1.9 × 10–13 for rs9402686 compared to P = 3.5 × 10–10 for rs9399137)4 (Table 1 and Supplementary Table 3). Stepwise conditional analysis uncovered two additional SNPs, ss244317976 and rs28384513, which were independently associated with HbF levels (Table 1). LD between rs9402686, ss244317976 and rs28384513 is weak (r2 < 0.03). As for BCL11A, haplotypes defined by these three SNPs explained more variation in HbF levels than the cumulative sum of the phenotypic variance explained by the SNPs individually (7.3% compared to 6.8%), although the difference was not statistically significant (Supplementary Table 7).
In contrast to the BCL11A locus5, we do not know the identity of the gene(s) that influence HbF levels in the HBS1L-MYB region. MYB is a transcriptional regulator of erythropoiesis, whereas HBS1L expression levels correlate with genotypes at HbF-associated SNPs1. In principle, one can establish causality by identifying rare and penetrant mutations in nearby candidate genes8. Resequencing 70 individuals with SCA identified one, six and four rare missense variants (minor allele frequency <1%) in BCL11A, HBS1L and MYB, respectively, that were absent from the 120 HapMap CEU and YRI samples. We genotyped these 11 rare variants in 1,032 individuals with SCA to assess their burden at the gene level by comparing normalized HbF levels in carriers and non-carriers (Supplementary Methods). To minimize ascertainment bias, we removed resequenced SCA cases from this analysis. This excluded singletons and left five and three variants to analyze in HBS1L and MYB, respectively. Results for HBS1L were not significant (corrected P = 1). However, we observed a significant difference for MYB (corrected P = 0.005), with the 25 carriers having on average 1.4% more HbF than the 937 non-carriers (Table 2). These data suggest that MYB is causally involved in controlling HbF production.
Recently, it has been suggested that some of the genetic associations identified by GWAS are due to collections of rare variants captured by common variants9. We tested whether the HbF association signals with common SNPs in the HBS1L-MYB intergenic region are due to the rare variants identified in MYB. LD between the three common SNPs and the three rare missense variants, as measured by D′, is high (r2 < 0.01, D′ > 0.4; Supplementary Table 8). When we considered the three MYB missense variants as covariates, the association results between HbF levels and the three common HBS1L-MYB SNPs were not affected (Supplementary Table 9), indicating that ‘synthetic associations’ with rare markers in MYB cannot explain the HbF association signal in the HBS1L-MYB intergenic region. These results provide a clear example where both common and rare DNA sequence variants at the same locus are independently associated with the same phenotype.
The sickle cell mutation in the β-globin locus is associated with five ‘classic’ haplotypes (Benin, Bantu, Cameroon, Senegal and Arab-Indian) that are characterized by different degrees of clinical severity and HbF levels10. An XmnI polymorphism (rs7482144) in the proximal promoter of HBG2 marks the Senegal and Arab-Indian haplotypes and is associated with HbF levels in African Americans with SCD4,11. It remains unclear whether rs7482144-XmnI is a causal variant at the β-globin locus. We replicated the association between rs7482144-XmnI and HbF levels (P = 3.7 × 10–7) (Supplementary Table 3). However, rs10128556, located downstream of HBG1, was more strongly associated with HbF levels than rs7482144-XmnI by two orders of magnitude (P = 1.3 × 10–9) (Table 2). When we conditioned on rs10128556, the HbF association result for rs7482144-XmnI was not significant (P = 0.78 and P = 0.047 for rs10128556 when conditioned on rs7482144-XmnI) (Supplementary Fig. 2). This indicates that rs7482144-XmnI is not a causal variant for HbF levels in African Americans with SCA. Similarly, the recently described association between rs5006884 in the olfactory receptor gene cluster upstream of the β-globin genes and HbF levels was not significant after conditioning on rs10128556 (P = 0.055 and P = 1.2 × 10–6 for rs10128556 when conditioned on rs5006884) (Supplementary Fig. 2)12. Finally, when we conducted a haplotype analysis with the 43 SNPs genotyped at the β-globin locus and used rs10128556 as a covariate, the result was not significant (P = 0.40), indicating that rs10128556 (or a marker in LD with it) is the principal HbF-influencing variant at the β-globin locus in African Americans with SCA (Supplementary Table 10).
Studies of the genetic regulation of HbF have provided new biological insights: BCL11A maintains γ-globin silencing and is required for developmental switching within the β-globin cluster5,6. HbF-associated variants have also shown potential predictive value: these variants are associated with transfusion-independent β-thalassemia3,13,14 and reduced pain crisis rate in SCD4. In this study, we showed that fine mapping of known associated loci through resequencing and dense genotyping can reveal additional independent association signals that could account for a significant fraction of the ‘hidden’ heritability7. For HbF levels, we increased the HbF phenotypic variation explained by the same three loci from 23.5% to 30.1%. Assuming a heritability of 60.9%, this translates to an increase from 38.6% to 49.5% of the heritable variation15. Thus, characterization of loci identified by GWAS will likely identify previously untested variants and explain part of the ‘hidden’ heritability for complex traits.
We thank all the individuals who participated in this study, and T. Nguyen and M. Beaudoin for DNA genotyping support. We thank S. Raychaudhuri for critical reading of the manuscript, G. Boucher for statistical advice and the CARe Sickle Cell Disease working group for providing the Cooperative Study of Sickle Cell Disease (CSSCD) principal components. This work was funded by the Fondation de l'Institut de Cardiologie de Montréal (to G.L.) and was supported by an Innovations in Clinical Research Award grant from the Doris Duke Charitable Foundation (to G.L. and J.N.H.). Resequencing services were provided by the University of Washington, Department of Genome Sciences, under US Federal Government contract number N01-HV-48194 from the National Heart, Lung, and Blood Institute.
V.G.S., S.H.O., J.N.H. and G.L. conceived and designed the experiment. G.G., C.D.P. and G.L. performed the experiments. G.G., C.D.P. and G.L. analyzed the data. G.G., C.D.P., V.G.S., S.H.O., J.N.H. and G.L. contributed reagents, materials and/or analysis tools. G.G. and G.L. wrote the paper with contributions from all authors.
Supplementary information is available on the Nature Genetics website.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.