In the case of globin gene disorders many variants were conventionally reported in genetics journals and these identified and/or elucidated many mechanisms underlying key aspects of gene regulation in-cis (e.g. promoters, enhancers, silencers, mRNA processing signals, translational signals) and in-trans (e.g. transcription factors, chromatin remodeling factors, protein chaperones. Furthermore, these variants helped to establish the mechanisms underlying human genetic disease. Implementation of the microattribution approach has significantly added to the repository of variants and use of this expanded database will continue to provide an important resource for generating and testing new hypotheses in the globin field (below, we provide some recent examples illustrating the value of comprehensive datasets in this system). The value of the comprehensive globin variant database (pre- and post-microattribution) clearly emphasizes the importance of developing similar databases for other genes and disease systems for which microattribution will become the main route to publication.
The first example of the value of the microattribution approach is the finding that the distribution of promoter mutations differs among globin genes. Although a great deal has been learnt about mammalian promoters from previous analysis of the globin genes, additional variants continue to develop our knowledge of how they are normally activated and how they are altered in human genetic disease. Globin gene promoter mutations contributing to β-like thalassemias and HPFH comprise approximately 10% of the total variants and result in various phenotypes, from the asymptomatic non-deletional HPFH conditions to the mild forms of β- and δ-thalassemia. The HBB promoter region harbours several genetic variants associated with β+ (expressing lower than normal levels of β-globin) and β0 (expressing no β-globin) thalassemia; these cluster in cis-regulatory elements known to bind transcription factors (). Many of these have been published, but an increasing number of unpublished variants have been contributed to HbVar from investigators around the world. The unpublished variants provide a more complete view of the contribution of genetic variants to phenotype. In this particular case, they reveal phenotypic consequences of variants in more positions of well known transcription factor binding sites (the “CACC” box and the “TATA” box), and show that additional substitutions in other binding sites contribute to phenotype (e.g. positions c.-80, c.-81, and c.-138). The HBB:c.-121C>T transition is adjacent to the CCAAT box. This motif was recognized 30 years ago as a component of some promoters, but the newly reported mutation is the first indication that genetic variation close to the motif affects HBB gene expression in humans.
In contrast to the promoters for HBB and HBD, variants are not found in the first 100 bp of the HBG1 and HBG2 promoters, but instead variants occur in the upstream region from approximately -100 to -200 bp (). The HBG1/HBG2 gene promoters have several cis-regulatory elements in common with HBB and HBD promoters, such as a “TATA” box and a proximal “CCAAT” box, but no variants have been found in them. However, the “CCAAT” box is duplicated in the promoters of HBG1/HBG2 genes, and the upstream CCAAT box (and nucleotides very close to it) does carry variants associated with HPFH. A newly discovered, unpublished variant, c.-250C>T, calls attention to a tight cluster of mutations all associated with HPFH. An HPFH-associated variant has now been reported at each nucleotide from c.-251 to c.-248 (-198 to -195 related to the gene transcription start site), and a variant at c.-255 (-202) is associated with a similar phenotype (). Given these phenotypes, this cluster of variants within the motif CCCTTCCC delineates a response element important for the silencing of the HBG1 and presumably HBG2 genes in adult erythroid cells (the same c.-250C>T mutation has been found in the promoter of the HBG2 gene; data not shown).
Functional role of HBG1 and HBG2 promoter variants
To test the hypothesis, derived from the documented variants, that this motif delineates a response element important for silencing of the HBG1
genes, we recently produced human β-globin locus (β-YAC) transgenic mice containing the -248 C>G Brazilian HPFH mutation in the HBG1
gene, which directly alters the CCCTTCCC sequence at the 3' C. Adult mice display a HPFH phenotype with an increased number of HbF-containing cells (), and real-time quantitative RT-PCR analyses demonstrated that one line shows an 8–34 fold increase of HBG1
gene expression relative to wild-type β-YAC mice (). By comparison, -117 Greek HPFH β-YAC transgenic mice display a 56-fold increase of γ-globin gene expression relative to wild-type β-YAC mice. Future experiments will examine the mechanism of repression at this region. Recent studies have shown that the transcription factor BCL11A acts to repress HBG1
expression in adult erythroid cells, acting with the protein SOX6 13
. Although BCL11A showed no binding in HBG1
proximal promoters, SOX6 showed strong binding which overlapped with GATA1 binding in these regions. In this way the database has posed a new testable hypothesis. The CCCTTCCC element, which is adjacent to a GATA binding site, may bind a currently unknown protein that acts in concert with BCL11A to repress production of γ-globins.
Overall, comparative analysis of the globin gene promoter mutations revealed a distinct distribution pattern for each gene. In the HBD
gene, promoter mutations are widely spread within the proximal promoter region and do not form mutational clusters around cis
-regulatory elements (). Interestingly, mutations at positions c.-81A>G and c.-80T>C have been found in the TATA boxes of the HBB
genes, suggesting that they could be the result of genetic recombination events 14
A second example of the value of the microattribution approach was the discovery of a-thalassemia resulting from inherited or acquired mutations in the ATRX
gene. The comprehensive database originally identified and defined some of the key trans-acting factors in the globin gene system. The expanded database continues to refine our understanding of such trans acting factors. Unlike the common forms of α-thalassemia, resulting from cis
-acting genetic defects, two rare forms of α-thalassemia are caused by trans
-acting mutations in the X-linked ATRX
gene. These mutations cause ATR-X syndrome, which is characterized by a severe form of syndromal mental retardation with characteristic dysmorphic faces, genital abnormalities, and a mild but variable form of hemoglobin H disease 3
. In addition, acquired mutations in the ATRX
gene are seen in patients who develop the ATMDS syndrome, a condition in which α-thalassemia (AT) is associated with myelodysplastic syndrome (MDS) 4
. In both conditions, the levels of α-globin mRNA are reduced, suggesting that the ATRX
gene is involved in the normal regulation of α-globin gene expression. To date, 107 unique inherited and/or acquired disease-causing missense mutations have been found, which are located predominantly in two highly conserved domains of the ATRX protein (Supplementary Fig. 4
). These variants cluster within a globular domain that contains a plant homeodomain (PHD) which binds the N-terminal tails of histone H3, and the 7 helicase sub-domains which identify ATRX as a member of the SNF2 family of chromatin-associated proteins. Structure/function studies based on natural mutations in the comprehensive database (Figure 4) have elucidated precisely how ATRX is recruited to some of its targets via an interaction with the N-terminal tails of histone H3.
Notably, the degree of α-thalassemia seen in ATMDS patients (acquired ATRX
gene mutations) is much greater than in patients with the ATR-X syndrome (inherited ATRX
gene mutations), even when (by comparing mutations on the comprehensive database) we can see the same ATRX mutation occurs in either condition 15
. Again analysis of the comprehensive variant database poses a new testable hypothesis. These findings suggest that another component of the ATRX pathway may frequently be mutated in patients with the common forms of MDS.
A third example of the value of microattribution is the discovery of variants in KLF leading to elevated HbF levels. KLF1
encodes a key erythroid transcriptional regulator that has many target genes with essential functions in erythroid cells including the globins, membrane proteins and heme synthesis enzymes17
. The first report on KLF1
mutations in humans linked them to the rare blood group In(Lu) phenotype18
, in which the expression of the Lutheran blood group antigens is diminished. The reported individuals carried eight different loss-of-function mutations and one mutation abolishing a GATA1 binding site in the KLF1
promoter. In all cases, the mutant KLF1
allele occurred in the presence of a normal KLF1 allele. A subsequent study on a large Maltese pedigree demonstrated that haploinsufficiency for KLF1 causes HPFH7
. A mutation in KLF1
, resulting in p.Lys288X, was present exclusively in all individuals in this family with HPFH. This mutation ablates the complete zinc finger domain and therefore abrogates DNA binding of the mutant KLF1 protein ( and Supplementary Table 2
). The occurrence of HPFH in the individuals with In(Lu) has not been investigated. An analysis of archived blood samples from a number of these individuals with In(Lu) showed that their HbF levels were raised compared to those observed in control samples. Also, 30 out of 31 Sardinian individuals bearing four different KLF1
mutations showed raised HbF levels compared to control samples. In addition, two individuals suffering from dyserythropoietic anemia carried a KLF1 p.Glu325Lys alteration and had an HbF level of 40% ( and Supplementary Table 2
. Mutations at this position alter the DNA binding specificity of KLF1. We note that the mouse neonatal anemia mutant (Nan) has an alteration in the orthologous amino acid of Klf1, p.Glu339Asp21, 22
. Adult heterozygous Nan animals show increased expression of embryonic globins, a condition akin to HPFH. Collectively, these data support the link between KLF1 and HPFH and highlight the importance of the second DNA-binding zinc finger for normal KLF1 function. This raises the possibility that some of the KLF1 mutations which result in altered DNA binding specificity may have increased impact on HbF levels. This hypothesis can now be experimentally tested in vitro by DNA binding assays and in vivo in animal models.
Figure 3 Correlation of the different KLF1 gene variants deposited into HbVar (shown as blue and red squares, depicting unpublished and published information, respectively) and their corresponding HbF levels (median value in cases of three or more individuals) (more ...)
A final example of the value of microattribution is the discovery of hemoglobin variants. A large proportion of genetic variation in the human globin genes leads to hemoglobin variants. Most hemoglobin variants are rare, result from single amino acid substitutions of a globin chain and have a negligible or even no effect on hemoglobin function.
The documented hemoglobin variants reside solely within exons and include: (a) Structural variants with a pleiotropic effect [e.g. HbS (HBB:c.20A>T), HbE (HBB:c.79G>A) and HbC (HBB:c.19G>A], (b) Variants (138 different variants) leading to unstable hemoglobin, where mutations affect the heme pocket of the globin chain, (c) Variants leading to methemoglobinemia, where the ferrous ion (Fe2+) of the heme group is oxidized to the ferric state (Fe3+). Most of these variants involve replacement by tyrosine of the histidine residues that anchor heme. (d) Variants (92 different variants) with altered oxygen affinity. Most of these result in increased oxygen affinity.
Although all of these correlations between structure and function have depended on the comprehensive database, new insights and questions continue to arise as new mutants are added to the repository, an initiative that sparked the implementation of the microattribution process for hemoglobinopathies. Notably, 14 hemoglobin variants result from the same mutation but on a different α-globin gene paralogue 16
, involving related genes that have evolved from recent gene duplication and as such are subject to frequent gene conversion events. HbF-Sardinia and HbF-Lesvos provide another such example, involving the same mutation (c.227T>C) but on the paralogous HBG1
genes, respectively 17