|Home | About | Journals | Submit | Contact Us | Français|
Increased levels of fetal hemoglobin (HbF, α2γ2) are of no consequence in healthy adults, but confer major clinical benefits in patients with sickle cell anemia (SCA) and β thalassemia, diseases that represent major public health problems. Inter-individual HbF variation is largely genetically controlled, with one extreme caused by mutations involving the β globin gene (HBB) complex, historically referred to as pancellular hereditary persistence of fetal hemoglobin (HPFH). These Mendelian forms of HPFH are rare and do not explain the common form of heterocellular HPFH which represents the upper tail of normal HbF variation, and is clearly inherited as a quantitative genetic trait. Genetic studies have identified three major quantitative trait loci (QTLs) (Xmn1-HBG2, HBS1L-MYB intergenic region on chromosome 6q23, and BCL11A on chromosome 2p16) that account for 20–50% of the common variation in HbF levels in patients with SCA and β thalassemia, and in healthy adults. Two of the major QTLs include oncogenes, emphasizing the importance of cell proliferation and differentiation as an important contribution to the HbF phenotype. The review traces the story of HbF quantitative genetics that uncannily mirrors the changing focus in genetic methodology, from candidate genes through positional cloning, to genome-wide association, that have expedited the dissection of the genetic architecture underlying HbF variability. These genetic results have already provided remarkable insights into molecular mechanisms that underlie the hemoglobin ‘switch’.
In humans, a shift from γ- to β-globin gene expression around birth, underlies the switch from fetal to adult hemoglobin (Hb) production such that by 6 months of age, the major Hb is HbA (α2β2) (1). Residual amounts of HbF, however, continue to be synthesized throughout adult life with the majority of adults having <1% HbF. Inappropriately high γ globin expression in adult life is associated with deletions in the HBB cluster or with point mutations upstream of the HBG genes (2). These hereditary persistence of fetal hemoglobin (HPFH) mutations are characterized by significant elevations of HbF ranging from 10–40% in heterozygotes with the HbF homogeneously distributed among the erythrocytes, prompting the descriptive term, pancellular HPFH. Apart from the isolated increase in HbF levels, carriers for HPFH mutations are otherwise normal. Although the increases in HbF mirror the molecular diversity of HPFH mutations, within each molecular class a range of HbF levels that do not show clear Mendelian inheritance, has been noted.
Variable increases in HbF levels have also been noted in individuals with sickle cell anemia (SCA) and β thalassemia (3–5), which are caused by mutations affecting the HBB gene and inherited as Mendelian recessives. Individuals with SCA (Hb-SS) have HbF levels ranging from 1–30% (6). An indication that additional variation at the β globin cluster is responsible for some of this variability came from the discovery of the ‘sickle β haplotypes’, and that the βS gene on certain βS haplotypes are associated with higher HbF levels and a milder disease (7). HbF response is also variable in β thalassemia (8). Although some of this variability can be explained by the specific β thalassemia mutation itself, and the β chromosomal background (9), a substantial proportion of the HbF increase is clearly unlinked to the HBB cluster (10–13). In some cases, the levels of HbF increase is sufficient to compensate for the complete lack of HbA, resulting in mild disease, i.e. β0 thalassemia intermedia and no transfusion dependence (14).
It is now clear that the variable HbF increases in disease occurs on an underlying background of variable HbF persistence that is inherited as a quantitative genetic trait. We describe the discovery of the loci influencing the common HbF variation, their contribution to an emerging understanding of Hb switching, and the genetic architecture underlying HbF control in healthy adults, and in patients with β thalassemia and SCA. Finally, we speculate on the experimental approaches that will be required to elucidate the maximum genetic variance of HbF.
The residual amounts of HbF in adults are distributed unevenly among the red blood cells; those that contain measurable amounts, are termed F cells (FC) (15). The first evidence that HbF persistence shows variability in a general population came in the early 1960s from screening of Swiss army recruits. Thirty-one out of 3000 recruits had HbF above 0.7% (16,17), outliers for a normal population, and were labeled as having ‘Swiss Type HPFH’. Although heritability was shown, Swiss HPFH was otherwise quite different from classical HPFH: the inheritance did not follow Mendelian patterns, the HbF elevation was modest, and the HbF was unevenly distributed among the erythrocytes, hence the alternative term heterocellular HPFH. Other population surveys confirmed that the levels of FC and HbF vary considerably (by more that 20-fold); the distribution is continuous and positively skewed (18–21). Heterocellular HPFH represents ~10% of the population with HbF levels between 0.8 and 5% within the upper tail of this continuous distribution [Fig. 1; (22)].
HbF and FC are similarly distributed and closely correlated traits in normal individuals (r > 0.9) (19,21). Twin studies show that genetic factors account for 89% of the variability in F cell levels (23); the remaining 11% of FC variance is accounted for by age and sex (2%), and unknown environmental factors. HbF is usually measured by high-performance liquid chromatography which is highly accurate and reliable. However, the assay is imprecise at values <0.4% of total hemoglobin, which includes a large proportion of the non-anemic population. Thus, in the normal population, FC which can be quantified with high sensitivity by immunofluorescence using an anti-γ globin antibody, provide a much better measure of the trait (24).
In 1985, a polymorphism (C/T at position −158 of HBG2, later termed Xmn1-HBG2 or rs7482144) (Fig. 2A) was identified from re-sequencing of the HBG genes (25), and shown to promote the expression of HBG2, and to contribute to HbF variability. Subsequent independent studies confirmed the association between the Xmn1-HBG2 T allele and increased HbF and FC, as well as milder disease among individuals with SCA and β thalassemia from different population groups (9,26). The T allele was also shown to be associated with the Swiss-type HPFH (27).
Statistical analyses have identified no evidence for dominance at the locus, suggesting an additive effect of the Xmn1-HBG2 polymorphism T allele. In a non-anemic North European population, the Xmn1-HBG2 genotype was estimated to account for 13–32% of the total F-cell phenotypic variation (28). The QTL identified with the Xmn1-HBG2 does not show Mendelian segregation and the distributions of the trait within each genotype class overlap considerably (20). Presence of the ‘T’ allele does not always dictate the presence of a high HbF phenotype, and high HbF has been associated with β haplotypes that do not include this allele (27,29,30). The quantitative trait is characterized by genetic heterogeneity such that to produce a full high-HbF phenotype, Xmn1-HBG2 must exist on a genetic background requiring the presence of additional factors. The Xmn1-HBG2 site achieves its importance through its large impact on the trait variance and its high frequency (~30%) in most population groups, including Europeans, Africans and Asian Indians.
Early studies suggested that high HbF determinants segregated independently of the HBB in some families with β thalassemia and SCA; these families were often discovered through the probands who had unexpectedly mild disease (11,31,32). In one such extended family of Asian-Indian origin, complex segregation analysis showed strong evidence for a major HbF gene that was inherited independently of the HBB cluster (11,33). Using a regressive model that included the major QTL genotype and the effects of age, β thalassemia and the Xmn1-HBG2 site, a genome-wide linkage analysis of the Asian-Indian kindred identified a major locus on chromosome 6q23–q24 (34). Follow-up haplotype mapping narrowed the candidate interval to 1.5 Mb (35) but re-sequencing of the five known genes in the region (ALDH8A1, HBS1L, MYB, AHI1 and PDE7B) revealed no mutations that could be implicated based on functional significance and segregation within the family (36).
High resolution association mapping was then carried out on a sample of North European ancestry (37). A set of common SNPs spanning a nearly contiguous segment of ~79 kb within HBS1L, the intergenic region 5′ to HBS1L and the MYB oncogene, showed very strong association with F cell levels (p-value ~10−75 at the most significantly associated SNP) (37). The SNPs were distributed in three linkage disequilibrium blocks referred to as HBS1L-MYB intergenic polymorphism (HMIP) blocks 1, 2 and 3. Common alleles within the three haplotype blocks completely account for the FC variance attributed to the 6q23 QTL in the European sample, with block 2 (24 kb) (Fig. 2B) showing the strongest effect and accounting for the majority of the variance. In Northern Europeans, the 6q locus accounts for ~19% of the population trait variance.
By early 2006, developments in genetic tools and genotyping platforms expedited by the International Human HapMap Project, led to a dramatic extension of the scope of genetic association studies, resulting in the genome-wide association study (GWAS) replacing the genome-wide linkage study as the most popular agnostic approach to whole-genome analysis (38–40). Two GWAS of HbF/FC have been reported. The first GWAS utilized a selected genotyping study design, targeting 179 individuals with contrasting extreme FC values (FC values above 95th or below fifth percentile points), chosen from a phenotyped cohort of 5184 individuals (41). Not only were the γ globin gene region (with the strongest signal at the Xmn1-HBG2 site) and the chromosome 6 locus identified, but the study also found a new F-cell locus in intron 2 of the oncogene BCL11A on chromosome 2p16 (Fig. 2C). All SNPs implicated at this locus are common polymorphisms (with >10% minor allele frequency); the BCL11A locus accounts for 15.1%; 6q, 19.4% and γ globin region, 10.2% of the F-cell variability in Northern Europeans (41).
Statistical genetic theory predicts that association studies using the selected genotyping approach will be more powerful and cost effective than a design based on unselected individuals (42). A criticism of the study design has been that it will be powerful for identifying genotypes that underlie the rare and extreme trait values, which are different from the genotypes responsible for normal variation in the trait. This criticism was put to rest by the second GWAS of 4000 individuals from Sardinia with unselected HbF, which replicated significant association at the same three loci (43). The fact that both approaches, using FC or HbF as a quantitative trait, have identified the same set of three major loci is an additional argument that HbF and F cell are closely related traits.
Influence of the Xmn1-HBG2 variant (rs7482144) could be explained by a direct effect on HBG2 gene expression (supported by relative increase of Gγ globin in carriers) but, unlike the non-deletional variants associated with the Mendelian forms of HPFH, in-vitro functional studies have been inconclusive.
Of the three major QTLs, SNPs within the 14 kb intron 2 of BCL11A correlate most strongly with HbF expression (41,43–45). The BCL11A genotype that is associated with high HbF is associated with reduced BCL11A expression (46). BCL11A is highly expressed in erythroid progenitors, the shorter iso-forms appeared to be restricted to primitive erythroblasts, and the full-length isoforms to adult-stage erythroblasts (46). More recent studies using human β globin locus YAC transgene and bcl11a−/− mouse, showed that in the absence of BCL11A, developmental silencing of the human γ globin genes is markedly impaired in the definitive erythroid lineage. Relaxation of HBG silencing in bcl11a+/− mouse suggests that not only is BCL11A a developmental-stage specific repressor but the effect is quantitative (47). The exact molecular mechanisms by which BCL11A represses HBB expression in humans are still not clear, but studies in K562 cells suggest BCL11A binds to a core motif (GGCCGG at positions −56 to −51) in the HBG promoter to form a repressor complex (48).
The variants most strongly associated in the 6q QTL reside in a 24 kb non-protein coding region between HBS1L and MYB oncogene (HMIP block 2) (37). Recent studies show that HMIP 2 contains a distal regulatory locus evidenced by the presence of several prominent erythroid-specific GATA-1 signals that coincided with DNaseI hypersensitive sites, and the presence of intergenic transcripts in erythroid precursor cells (49). It is suggested that the HMIP 2 regulatory elements distally control MYB expression, that in turn influences erythroid differentiation, and indirectly, the control of HbF levels. MYB is a quantitative trait gene (50); erythroid precursor cells from individuals with higher HbF and higher F cell levels have lower MYB expression that was also associated with lower erythrocyte count but higher erythrocyte volume, and higher platelet counts. Further, genotype variability at HMIP 2 has a pleiotropic impact on several types of peripheral blood cells: erythrocytes counts and volume, hemoglobin concentration, platelet and monocyte counts in healthy individuals of European ancestry (51).
Thus, the biological effect of the QTLs on HbF expression includes two plausible mechanisms: (1) direct effect on HBG expression (activation or repression of HBG transcription) thereby increasing or decreasing the amount of HbF per cell; and (2) alteration of the kinetics of erythroid maturation and differentiation, mimicking a situation encountered in stress erythropoiesis that results in accelerated erythropoiesis with the release of more erythroid progenitors that synthesize predominantly HbF i.e. FC, leading to an increase in circulating HbF [Fig. 3; (52,53)].
Influence of the HBB locus and the Xmn1-HBG2 site on HbF levels in SCA and β thalassemia has been validated by many studies in several populations.
In the Asian Indian family in which the chromosome 6q23.3 locus was first identified, the QTL affects individuals with and without β thalassemia (33). Among β thalassemia heterozygotes, those who were homozygous for the 6q23.3 high F cell QTL allele had trait values ranging from 10 to 24% HbF compared with a range of 0.3–3.6% in the individuals that were homozygous for the alternative allele. In family members without the β thalassemia mutation, high F cell 6q23.3 QTL homozygotes had 1.1–3.0% HbF compared with 0.1–1.0% HbF in homozygotes for the alternative allele. Systematic studies of other patient groups have subsequently shown that the 6q QTL is also important in healthy individuals and in African American (44) and British sickle cell patients with predominantly African admixed ancestry (54) and sickle cell patients from Brazil (44). The chromosome 6q23.3 QTL contributes 3–7% of the trait variance in these populations (44). The comparatively small contribution of the QTL to the trait variance in sickle patients compared with northern Europeans could be attributed to lower allele frequencies in African populations (55). Strong association with the locus was shown in Chinese β thalassemia heterozygotes, despite the confounding influence of the variable underlying ineffective erythropoiesis due to the wide spectrum of thalassemia mutations (56).
The BCL11A QTL has shown the strongest effect on HbF/F cell levels to date. In patients with SCA, the QTL accounts for 7–12% (44,45) of the trait variance. The locus has also been shown to influence HbF or F cell levels in individuals of Chinese and Thai descent with β thalassemia or HbE (45). In the Sardinian population the C alleles of the rs11886868 SNP at the BCL11A locus is strongly associated with high HbF levels and it is significantly more frequent in patients with milder β thalassemia disease (thalassemia intermedia) compared with the transfusion-dependent thalassemia major patients (43). Both groups have identical β and α genotypes, the Hb pattern comprised only HbF and trace amount of HbA2 with no HbA, suggesting that variation at the BCL11A locus can alleviate disease severity through raising HbF levels.
In SCA, the three known Hb/F cell loci contribute >20% to the HbF variance with a corresponding reduction in the frequency of acute pain associated with sickling (44).
High HbF expression clearly has ameliorating effects on sickle cell disease and β thalassemia, however, it is unknown if the effects translate to a selective advantage for carriers of the high HbF alleles in regions with a high incidence of the diseases. Genetic studies have shown that individuals with hemoglobinopathies concurrent with high expression of HbF can maintain normal fitness levels that would otherwise be severely limited by the debilitating consequences of their disease. The hemoglobinopathies are not present in the ancestral European population yet the alleles at the known major genes remain at high frequency and variation in the HbF expression persists. Strong evidence for natural selection has not been reported around the three major HbF/F cell QTLs (57) and similar allele frequencies have been observed for the QTL in different ethnic groups, suggesting that a selective sweep favoring genes for HbF persistence has not occurred in the relatively recent past. It is possible that the pleiotropic effect of the major genes for HbF expression (for example, HMIP effect on other hematological variables) (51) ensure the persistence of the alleles in the absence of strong selection from the hemoglobinopathies. After the transition to expression of adult hemoglobin, there is no known biological role for HbF. Genetic studies of HbF/FC in varied populations will allow for more powerful analysis of the evolutionary history of the alleles at the known QTLs and a better understanding of their biological importance.
Identification of the exact same set of three major loci in two independent GWASs of Europeans suggests that we have identified the principal QTLs with frequent alleles affecting HbF production in the general European population. The genetic variance that remains unaccounted for may be due to many common and/or rare alleles with small effects on the trait, i.e. a polygenic component (58). Indeed, what differentiates the genetics of HbF expression from other quantitative traits that have been equally studied (such as height, blood pressure and BMI) are the existence of common major genes and with those genes now identified, Hb F expression is likely to be as complex and difficult to define genetically as other quantitative traits (39,59–61). Identifying the common alleles will require large population samples for additional screening studies and even larger samples for replication, and possibly denser maps (62). Identification of rare alleles will also require next-generation DNA sequencing technologies and statistical analysis methods that enable genome-wide sequence analysis in large samples (63). Identification of the chromosome 6q23.3 QTL in the Asian Indian kindred (34) illustrates the power of large genome-wide linkage studies that could uncover rare alleles with moderately high penetrance that would not otherwise be identified in association studies due to their low frequencies. Functional characterization of the known QTLs should lead to discoveries of biological pathways underlying Hb F expression and new candidates for genetic analysis. With next-generation DNA sequencing methods coming into widespread use, deep sequencing of the QTL regions in targeted samples could ease the complexity of defining the causal variants (quantitative trait nucleotides, QTNs). No statistical interactions between the known QTLs have been found, however, it is possible that genetic interaction does exist between the known QTLs with a biological mechanism that is not detectable by statistical interaction models. Statistical evidence for interaction between the Xmn1-HGB2 locus and a locus on chromosome 8q has been reported but validation and replication of the finding has been elusive (64). If genetic interactions between currently unknown loci with weak marginal effects are significantly affecting HbF expression then traditional linkage and association studies are unlikely to be productive without greater knowledge of the underlying biology. The possibility remains that major QTLs for HbF trait could exist in non-European populations that have not been the subject to genome-wide analysis. Given that a primary motivation for studying the trait is its clinical impact on β thalassemia and sick cell disease, genome-wide association and linkage studies should be undertaken in populations that are stricken with a high prevalence of these disorders.
The story of HbF genetics serves as a paradigm for understanding quantitative traits in humans, their genetics and biology and involvement in disease. Inappropriately high HbF (α2γ2) expression caused by rare mutations affecting the HBB cluster indicate that the γ globin gene can be reactivated but does not explain the inter-individual variation in HbF, a major contributing factor in the severity of sickle cell disease and β thalassemia.
Common variations in HbF has all the characteristics of a quantitative genetic trait, associated with multiple QTLs, some linked and others unlinked to the HBB cluster. The first known QTL, Xmn1-HBG2, has long been implicated through family and population studies and subsequently validated by genetic studies. The second QTL, the HBS1L-MYB intergenic region (HM1P) on chromosome 6q, was discovered through genetic linkage studies in an extensive kindred, and subsequently validated by genetic association studies. These two QTLs have now been joined by BCL11A discovered through GWAS. GWAS not only ‘rediscovered’ the known QTLs, but highlighted BCL11A, previously known as an oncogene involved in leukemogenesis but whose relevance to HbF trait was previously unsuspected. Biological studies of these recently identified QTLs provide data supporting long-held views on mechanisms of hemoglobin switching—changes in trans-acting factor environment and perturbation of the kinetics of erythropoiesis. These mechanisms, however, may not be mutually exclusive. The challenge now is to delineate the causal variants or QTNs and the physiological pathways to facilitate identification of novel targets for therapeutic reactivation of HbF. True understanding of how the QTLs and QTNs function require integration of these genetic mapping results with expression patterns in the erythroid cell and knowledge of the causal network of interacting genes. Unlike other complex traits and diseases, only three QTLs explain a relatively large proportion of the underlying genetic contribution to HbF variation, an immediate impact of these findings may be the application of the SNPs in the three QTLs to improve the prediction of one's ability to produce HbF in response to disease which will have implications for genetic counseling and prenatal diagnosis of these hemoglobin disorders.
S.L.T. received funding from NIH-NHLBI (R01-HL69259-03), and the Medical Research Council, UK (MRC UK G0000111) for this work.
We thank Claire Steward for help in preparation of the manuscript.
Conflict of Interest statement. None declared.