|Home | About | Journals | Submit | Contact Us | Français|
Insights into the evolution of hemoglobins and their genes are an abundant source of ideas regarding hemoglobin function and regulation of globin gene expression. This article presents the multiple genes and gene families encoding human globins, summarizes major events in the evolution of the hemoglobin gene clusters, and discusses how these studies provide insights into regulation of globin genes. Although the genes in and around the α-like globin gene complex are relatively stable, the β-like globin gene clusters are more dynamic, showing evidence of transposition to a new locus and frequent lineage-specific expansions and deletions. The cis-regulatory modules controlling levels and timing of gene expression are a mix of conserved and lineage-specific DNA, perhaps reflecting evolutionary constraint on core regulatory functions shared broadly in mammals and adaptive fine-tuning in different orders of mammals.
A wide range of animals, vertebrate and invertebrate, use hemoglobins to transport oxygen, carrying it from lungs, gills, or other respiratory organs to peripheral tissues that need the oxygen for efficient metabolism. Hence it is natural to compare the structure and function of hemoglobin proteins between species both to explore adaptation and to discover aspects of biochemistry and physiology that are conserved. Comparative studies also have been conducted on the genes and gene clusters that encode the hemoglobins, revealing a rich history of gene duplications and losses as well as translocations. One motivation for comparative studies has been to use the insights from the evolutionary analyses to better understand mechanisms of gene regulation. Many human hemoglobinopathies result from inadequate expression of globin genes, and attempts to modulate globin gene expression are a fundamental approach to seek novel avenues to therapy. This article summarizes broad aspects of the evolution of hemoglobins and related globins, starting with the five basic types of globins found in humans, and then progressively focuses more specifically on hemoglobin gene clusters in vertebrates and eutherian mammals. Finally, the impact of these and related studies on hemoglobin gene regulation is discussed.
Hemoglobins were originally discovered as abundant proteins in red blood cells of mammals and other jawed vertebrates (gnathostomes) that bind and release oxygen reversibly. The major hemoglobin in adult humans, hemoglobin A, is a heterotetramer composed of two α-globin and two β-globin polypeptides, each with an associated heme group. These are encoded by the duplicated HBA1 and HBA2 genes and by the HBB gene, respectively (Fig. 1). Hemoglobins are produced only in erythroid cells, where they are the major protein. The multi-subunit hemoglobin binds oxygen cooperatively in the lungs, transports the oxygen through the blood, and releases it in peripheral tissues to support oxidative metabolism. The heme molecule contains an iron atom (Fe) in the reduced state (ferrous or +2 oxidation state), and five of the six coordination sites of the Fe are occupied, four by the porphyrin ring and one by the “proximal” histidine (His) of the surrounding globin polypeptide. The sixth coordination site is bound reversibly by oxygen. Oxygen is loaded onto hemoglobin, transported, and unloaded with no chemical (covalent) change to either the oxygen or the heme groups, and the iron in the heme group stays in the reduced state. This contrasts with the familiar role of many heme proteins, such as cytochromes, which catalyze redox reactions involving changes in the oxidation state of the heme iron.
Myoglobin (encoded by the MB gene) is a related, monomeric heme-bound globin protein found predominantly in skeletal and heart muscle. The Fe in the heme is pentacoordinate. It has long been described as an oxygen storage protein, and it facilitates diffusion of oxygen to the mitochondria (Wittenberg and Wittenberg 1987).
Additional heme-containing globins were discovered by mining the wealth of information in the sequence of the human genome and transcripts produced from it. Cytoglobin, encoded by the CYGB gene (Fig. 1), is found in many tissues (Burmester et al. 2002; Trent and Hargrove 2002), in sharp contrast to the stringently tissue-specific expression pattern of hemoglobin and myoglobin genes. The most distantly related globin found in the human genome is neuroglobin, encoded by NGB (Burmester et al. 2000). Its mRNA is abundant in brain tissue but also is present in many other tissues. It is related to invertebrate nerve globins, indicating that an ancestral gene was present before the divergence of vertebrates and invertebrates more than 800 million years ago (Fig. 1). In contrast to the pentacoordinate heme complex in hemoglobins and MB, heme forms a hexacoordinate complex with both NGB and CYGB, having two His residues, termed proximal and distal, coordinated with the Fe. Ligands such as oxygen and nitric oxide compete with the distal His for binding, but despite this, NGB and CYBG still have high affinity for the ligands. Both these hexacoordinate heme globins have been implicated in nitric oxide metabolism, with CYGB showing nitric oxide dioxygenase activity, converting nitric oxide to nitrate (Oleksiewicz et al. 2011), and NGB showing nitrite reductase activity to form nitric oxide (Tiso et al. 2011). The latter activity also has been shown for myoglobin (Hendgen-Cotta et al. 2008) and deoxy-hemoglobin (Gladwin and Kim-Shapiro 2008). Physiologically, the nitrite reductase activity could provide a means to produce nitric oxide under hypoxic conditions, signaling from which could regulate mitochondrial respiration and protect tissues (nerves by NGB, heart muscle by MB) from damage under ischemic conditions (Dietz 2011). A role for CYGB in oxygen-requiring reactions, such as hydroxylation, has not been ruled out (Fago et al. 2004). These proposed enzymatic roles in nitric oxide and other metabolism may harken back to functions performed by ancestral hemoglobins in primordial life (Hardison 1998, 1999; Tiso et al. 2011).
The five types of globin genes listed in Figure 1 are located on five different chromosomes: HBA1 and HBA2 at chromosomal position 16p13.3, HBB at 11p15.4, MB at 22q12.3, CYGB at 17q25.1, and NGB at 14q24.3. MB, CYGB, and NGB are present as single-copy genes, whereas HBB and HBAs are in clusters with multiple related genes. All of the genes consist of at least three exons separated by two introns. Although the introns differ dramatically in size, they are in homologous locations. The CYGB and NGB genes each have an additional exon. The conservation of intron position in vertebrate globin genes has been proposed to facilitate the shuffling of exons during protein evolution (Gilbert 1978). However, intron positions differ considerably in globin genes outside vertebrates, suggesting that the conservation of intron position could simply reflect an ancestral state that has not changed over vertebrate evolution (Hardison 1998).
Species in an early diverging branch of vertebrates, the cyclostomes (represented by hagfish and lampreys), also use a heme-containing globin for oxygen transport, but surprisingly, it is more closely related to CYGB than to the gnathostome hemoglobins (Fig. 1) (Hoffmann et al. 2010). This suggests that the oxygen transport function of heme-containing globins arose by independent, convergent evolution in the two major branches of vertebrates. For cyclostomes, it appears that oxygen transport is derived by cooption of the widely expressed CYGB gene.
In all jawed vertebrates, erythrocytes produced at distinct developmental stages contain different forms of hemoglobin. All species examined make embryonic-specific hemoglobins in primitive erythroid cells derived from the yolk sac, some species make a fetal-specific form in the liver, and all species produce an “adult” hemoglobin in erythroid cells produced in the bone marrow (Maniatis et al. 1980; Karlsson and Nienhuis 1985). Like the major adult hemoglobin A, each of these is a heterotetramer of two α-like globins and two β-like globins, each bound by heme. The α-like globins are paralogous, meaning that they are homologous genes generated by gene duplication. ζ-globin is made in embryonic red cells, and α-globin is produced in fetal and adult red cells (Fig. 2) (Higgs et al. 2005). Likewise, the paralogous β-like globin genes are also expressed at progressive stages of gestation. In humans, ε-globin is made in embryonic erythrocytes, γ-globins are produced in fetal liver erythroid cells, and the δ- and β-globins are made in erythroid cells from adult bone marrow (Grosveld et al. 1993). The hemoglobins produced at distinct developmental stages have different affinities for oxygen and are subject to complex regulation by cofactors, favoring an overall movement of oxygen from the maternal bloodstream to that of the fetus or embryo.
The multiple, developmentally regulated genes in the gnathostome α-globin gene clusters are derived from a common ancestral gene cluster (Flint et al. 2001). However, the β-like globin genes in mammals are more similar to each other than they are to the multiple β-like globin genes in birds (Hardison and Miller 1993; Reitman et al. 1993). This implies that the β-like globin gene clusters were generated by independent gene duplications in the bird and mammal lineages. Because differential regulation during development is a consistent property of these independently derived gene clusters, either an ancestral developmental regulatory mechanism was enforced on the newly duplicated genes or the mechanism evolved by convergence. Regulatory mechanisms are complex, and as is discussed in the last section, the current mechanisms are combinations of conserved and acquired features.
Expression of α-like and β-like globin genes must be strictly coordinated. A balanced production of α-globin and β-globin in erythroid cells is required for the efficient formation of hemoglobin, and an imbalance leads to the pathological phenotypes of inherited anemias called thalassemias (Weatherall and Clegg 2001). The separation of α-like and β-like globin gene clusters in amniotes requires coordination of expression between different chromosomes.
Fish species show an interesting contrast, in that the gene cluster orthologous (homologous genes generated by a speciation event) to that of the mammalian α-globin gene cluster contains both α-like and β-like globin genes (Fig. 2). Some of the genes in the larger globin gene cluster in fish are expressed in larvae, and others are expressed in adults (Chan et al. 1997). Thus within this fish globin gene cluster, genes are regulated coordinately (balancing α-globin and β-globin synthesis) and differentially during development (larval vs. adult).
As just discussed, differential expression of paralogous globin genes within a cluster and coordinated regulation between gene clusters on different chromosomes are consistent properties in amniotes (birds and mammals). The unfolding story regarding how this arose during vertebrate evolution is dynamic and complex. Analysis of the maps of globin genes and surrounding genes in contemporary vertebrate species suggests a model featuring movement to new locations or differential retention of globin genes but still leading to multiple hemoglobin gene clusters in most if not all vertebrates examined (Fig. 2).
Genes diagnostic for a particular genomic region can be found flanking hemoglobin gene clusters (Bulger et al. 1999; Flint et al. 2001; Gillemans et al. 2003). The diagram in Figure 2 focuses on single-copy, flanking diagnostic genes for clarity (Hardison 2008). One globin gene cluster is found in all gnathostomes examined; it is flanked on one side by the genes MPG and NPRL3 (Flint et al. 2001), and the locus can be called “MN,” the acronym for these two genes (Fig. 2). The major DNA region regulating expression of the globin genes (MRE) is located in an intron of NPRL3 (Higgs et al. 1990). Frequently, the gene RHBDF1 is adjacent to the MPG gene. In contrast to placental mammals and chickens, which have only α-like globin genes at the MN locus, the orthologous loci in the monotreme platypus and in marsupials have a set of α-like globin genes plus a globin gene related to β-globin, the ω-globin gene (Wheeler et al. 2004; Patel et al. 2008). In addition, the platypus MN locus contains a homolog to the globin Y gene (GBY), a globin discovered in amphibians. Direct molecular cloning from the genome of the frog Xenopus laevis (Jeffreys et al. 1980) and examination of the genome assembly of Xenopus tropicalis (Fuchs et al. 2006; Hellsten et al. 2010) reveal a different β-globin gene linked to several α-globin genes at the MN locus. Given the presence of globin genes at this locus in all gnathostomes examined, one can infer with considerable confidence that the MN locus contained globin genes in the last common ancestor (LCA) of vertebrates (Fig. 2).
A second locus contains α- and β-globin genes in the pufferfish Fugu rubripes (Gillemans et al. 2003), and examination of the genome assemblies of zebrafish and Medaka shows a similar arrangement (Fig. 2). The globin genes in this locus are flanked by the genes LCMT1 and AQP8, and the locus can be called “LA.” The gene ARHGAP17 is also part of this locus in many species. These three nonglobin genes are in the same arrangement and order in the tetrapods (human, platypus, chicken, and frog), but the LA locus is devoid of globin genes in these species. This suggests two different models for this locus in the LCA of jawed vertebrates. One model posits that the LCA had globin genes at the LA locus (Gillemans et al. 2003), and these globin genes were retained in fish but lost in tetrapods. The converse posits that the globin genes were not present at the LA locus of the LCA, but moved into it during the lineage to fish.
A third locus contains only β-like globin genes in amniotes. The β-like globin genes in amniotes are flanked by olfactory receptor (OR) genes (Bulger et al. 1999; Patel et al. 2008). In placental mammals, hundreds of OR genes are in this locus, along with additional multigene families such as TRIM genes. Thus one has to look several megabases away from the β-like globin genes to find single-copy genes that are distinctive for this locus, which are DCHS1 on one side and STIM1 on the other. Hence this locus can be called DS (Fig. 2); the RRM1 gene is adjacent to STIM1 in many species. The presence of β-like globin genes in the DS locus in amniotes but absence in both fish and amphibians is most easily explained by transposition of the β-like globin genes into the DS locus in the stem amniote (Fig. 2) (Patel et al. 2008). A proposal that they were present at the DS in the LCA of jawed vertebrates also requires independent deletions in the fish and amphibian lineages; thus, parsimony favors the transposition model. One possible source for the β-like globin genes could be the MN locus (Patel et al. 2008), but it also could be from the LA locus (Hardison 2008).
No globin genes have been mapped to the LA or DS loci in the current assembly of X. tropicalis, but one contig covers a cluster of β-like globin genes linked to RHBDF1 (Fig. 2). Further work is needed to ascertain whether this cluster is linked to the MN locus (Fuchs et al. 2006) or if they are on different chromosomes.
In summary, the history of the gene clusters encoding hemoglobins is dynamic and complex. The MN locus now contains only α-globin genes in eutherians; it retained these and non-globin flanking genes since the gnathostome LCA, while losing β-globin genes in many vertebrate lineages. β-like globin genes were acquired at the DS locus in the stem amniote, and subsequently they duplicated and acquired differential developmental expression independently in the avian and mammalian lineages. The LA locus has undergone dramatic losses or gains of globin genes.
The consistent location of α-like globin genes in the MN locus in gnathostomes indicates a more stable history than that of the β-like globin genes. This greater stability is also seen in the composition and expression patterns of the α-like globin genes. Extensive phylogenetic comparisons indicate that this gene cluster in the LCA of tetrapods contained orthologs to ζ-globin, μ-globin (also called αD), and α-globin (or αA) genes (Hoffmann and Storz 2007; Hoffmann et al. 2008), and this arrangement is still seen in chickens (Fig. 3). Before the divergence of the three major subclasses of mammals (monotremes, marsupials, and placentals), both the ζ-globin and α-globin genes duplicated. Most contemporary mammals retain at least two copies of these genes (in some cases, they are pseudogenes). The θ-globin gene appears to have been generated by a duplication of an α-globin gene after the divergence of monotremes from the other mammals (Hoffmann et al. 2008). The μ-globin and θ-globin genes each are present in only a single copy, and although they are transcribed, no evidence has been found for polypeptide products of either in mammals (Clegg 1987; Hsu et al. 1988; Leung et al. 1989; Goh et al. 2005; Cooper et al. 2006). The ortholog of the μ-globin gene is expressed in adult erythroid cells in birds, producing αD-globin.
The expression timing of the genes encoding α-like globins is remarkably consistent. In all species examined, including birds, the active orthologs to the ζ-globin gene are expressed in embryonic erythroid cells, and the orthologs of the α-globin gene are expressed in fetal and adult erythroid cells (Fig. 3) (Higgs et al. 1989; Whitelaw et al. 1990).
The α-like globin gene cluster does show some dynamic features. Genes are lost and gained in specific lineages (Hoffmann et al. 2008), and some of the genes have undergone multiple conversion events during mammalian evolution (Hess et al. 1984; Song et al. 2011, 2012). However, the genomic context, that is, the MN locus, has been a constant across gnathostome evolution, and the expression patterns of the ζ- and α-globin genes are strikingly consistent in amniotes.
Within the three major subclasses of mammals, the β-like globin genes at the DS locus have been duplicated and lost in specific lineages (Fig. 4). Both monotremes and marsupials have two β-like globin genes. In marsupials the ε-globin ortholog is expressed in embryonic erythrocytes, whereas the β-globin ortholog is expressed in fetal and adult erythroid cells (Koop and Goodman 1988). The two β-globin genes in platypus are more similar to each other than to other β-like globin genes, consistent with either a gene conversion event (Patel et al. 2008) or a gene duplication independent of the one that established therian ε-globin and β-globin genes (Opazo et al. 2008b). Both the β-globin genes in platypus are expressed in adults (Patel et al. 2008), but no information is available currently on whether the leftmost β-globin gene in platypus (in the orientation in Fig. 4) is also expressed in embryonic erythroid cells, as is expected based on its position.
A proposed cluster of five β-like globin genes, in the orientation 5′-ε-γ-η-δ-β-3′, in the stem eutherian is consistent with the gene arrangements in contemporary species (Goodman et al. 1984; Hardies et al. 1984; Hardison 1984). The relative similarities among orthologous genes indicate that this gene cluster was formed by a series of duplications, first to make the ancestor to β- and δ-globin genes and the ancestor to ε-, γ-, and η-globin genes, followed by duplications to generate the proposed five-gene cluster (Hardison and Miller 1993). The initial duplication established two major lineages of β-like globin genes that differ in their positions in the gene clusters and in their timing of expression (Fig. 4). Genes in the β- and δ-globin lineage are located to the right in the gene clusters and, if active, are expressed in fetal and/or adult erythroid cells. Genes in the ε-, γ-, and η-globin gene lineage are toward the left in the gene clusters and are expressed in embryonic erythroid cells, except for the γ-globin genes in anthropoid primates, which were coopted for fetal-specific expression.
The full set of five β-like globin genes is not used in any extant mammal examined. At least one pseudogene is found in this gene cluster for almost every eutherian species (Fig. 4), and any exceptions to this could reflect a lack of detailed characterization of the genes. Pseudogenes are DNA segments with sequences homologous to those of actively expressed globin genes, but they harbor mutations, such as frameshifts or chain terminators, that preclude expression to form a globin protein.
Deletion can completely inactivate a gene, and gene loss has also occurred widely in the β-like globin gene clusters of eutherians. Some gene losses tend to be consistent across the members of each eutherian order (Fig. 4). No ortholog for the η-globin gene is found in the species sampled from the order Glires (rodents and lagomorphs), but it is present in the sister clades Primates and Laurasiatherians (represented by dog, horse, bat, goat, and cow). This strongly suggests that the η-globin gene was lost in the LCA for Glires (Opazo et al. 2008a). The η-globin gene is also absent from sampled members of the superorders Xenarthra or Afrotheria, which can be explained either by gene loss in the LCA, or perhaps the duplication to form η-globin occurred after these superorders diverged from the other eutherians. No active γ-globin gene has been identified in Laurasiatherians, with the gene either being absent, partially deleted, or harboring inactivating mutations. Note that the loss of the γ-globin gene was not in the stem Laurasiatherian, but rather different losses and inactivations have occurred in the lineages to each species.
All species examined within the therians (marsupial and placental mammals) have an ortholog of the ε-globin gene. This gene has the most consistent features across species of any of the paralogous β-like globin genes. It is always present at the left end of the gene cluster, it is almost always a single gene, and in all species examined, it is expressed only in embryonic erythroid cells (Fig. 4).
The γ-globin genes of both the prosimian primate galago and species in order Glires (rabbit, mouse, and rat) are expressed in embryonic erythroid cells (Rohrbaugh and Hardison 1983; Whitelaw et al. 1990; Satoh et al. 1999), whereas the γ-globin genes of anthropoid primates (monkeys, apes, and humans) are expressed only in fetal erythroid cells (Fig. 4) (Johnson et al. 1996, 2000). One common interpretation is that the embryonic expression pattern was ancestral, and the recruitment to fetal expression was an adaptation in the anthropoids, coinciding with a duplication of the γ-globin gene (Johnson et al. 1996). γ-globin genes are also present in Afrotherians, but the developmental timing of their expression has not been reported.
The η-globin gene homolog in goats is expressed embryonically (Shapiro et al. 1983). Currently, this is the only example of an active η-globin gene, but studies of expression in other Laurasiatherians would reveal whether it is active in other species, and if the timing of expression is embryonic. Fetal and adult hemoglobins were found to be identical for horse (Stockell et al. 1961) and dog (LeCrone 1970), and based on the absence of evidence for a fetal-specific hemoglobin, the η-globin homologs are predicted to be expressed embryonically in Figure 4. The η-globin gene is a pseudogene in all primates.
The δ-globin gene is present in almost all eutherian species examined, but it is frequently a pseudogene (Fig. 4). In every case examined in sufficient detail, the δ-globin gene has been involved in a gene conversion, with sequence from the paralogous β-globin gene copied into the δ-globin gene locus (Spritz et al. 1980; Martin et al. 1983; Hardies et al. 1984; Hardison and Margot 1984; Song et al. 2012). The boundaries of the conversions are different in each species, indicating that these are independent gene conversions. The structural and mechanistic bases for this propensity for conversion are not understood. In galago, the replacement with β-globin gene sequences extends into the promoter region, leading to high-level expression from this gene (Tagle et al. 1991). That, in turn, led to efforts to engineer a form of the δ-globin gene that would express at sufficiently high levels to provide potential therapy (Tang et al. 1997).
In most eutherian species, the β-globin gene is expressed in fetal and adult erythroid cells (Fig. 4). Concomitantly with the recruitment of γ-globin genes to fetal expression in anthropoid primates, the onset of expression of the β-globin gene was delayed to shortly before birth in catarrhine primates (Old World monkeys, apes, and humans) (Johnson et al. 2000). The onset of β-globin gene expression is earlier in fetal life in the New World monkeys (Johnson et al. 1996), perhaps representing a transitional state intermediate between the fetal onset seen in most eutherians and the prenatal onset observed in humans.
This overview of the evolution of β-globin genes illustrates the diversity of events that have been inferred, including duplications, deletions, inactivations, and reactivations. It shows that the ε-globin genes have been stable over eutherian evolution, whereas the γ-, η-, and δ-globin genes have been gained and lost frequently, sometimes in entire orders of mammals. Furthermore, the timing of expression can change dramatically between clades, notably the delay in γ-globin (fetal) and β-globin (adult) gene expression in anthropoid primates. Strategies being pursued to reactivate γ-globin gene expression in adult erythroid cells, either pharmacologically or by gene therapy, in a sense are attempts to modulate expression patterns in humans that recapitulate expression changes that have occurred during eutherian evolution.
The previous discussion shows that expression of globin genes is tightly regulated. Hemoglobin gene expression is restricted to erythroid cells. The genes are expressed at extremely high levels late in erythroid differentiation, with balanced production of α-globin and β-globin. Paralogous globin genes are expressed at progressive developmental stages. This exquisite regulation is exerted, at least in part, by the binding of specific transcription factors to DNA sequences that serve as cis-regulatory modules (CRMs), such as promoters and enhancers (Maniatis et al. 1987).
Detailed studies over the past three decades have led to the discovery of numerous CRMs in both the α-globin gene (HBA) and β-globin gene (HBB) clusters (Fig. 5). Some are located proximal to and within the genes, such as promoters and internal enhancers (Mellon et al. 1981; Wright et al. 1984; Myers et al. 1986; Antoniou et al. 1988; Wall et al. 1988), and others are located distal to the genes (Grosveld et al. 1987; Talbot et al. 1989; Higgs et al. 1990). For instance, the major regulatory element (MRE) of the HBA gene complex is located distal to the adult HBA genes (~60 kb upstream in human), residing in an intron of the large NPRL3 gene (Fig. 5A). Several additional CRMs are present around the MRE (Anguita et al. 2004) both in human and mouse (Fig. 5A,B). A cluster of CRMs called the locus control region (LCR) is found 50–70 kb upstream of the HBB gene (Grosveld et al. 1987; Talbot et al. 1989; Moon and Ley 1990) in human and mouse (Fig. 5C,D). These distal regulatory regions are enhancers (Tuan et al. 1989; Ney et al. 1990; Pondel et al. 1992) required for high-level expression of the globin genes (Grosveld et al. 1987; Talbot et al. 1989; Higgs et al. 1990; Bender et al. 2000a; Anguita et al. 2002). They are in regions of open chromatin marked by DNase hypersensitive sites (Forrester et al. 1986, 1990; Vyas et al. 1992; Gourdon et al. 1995), and they can protect against some repressive position effects (Grosveld et al. 1987; Caterina et al. 1991; Milot et al. 1996). They are bound by key transcription factors active in erythroid cells, such as GATA1 and TAL1 (Johnson et al. 2002; Anguita et al. 2004; Grass et al. 2006). The protein CTCF is bound at specific sites in the gene clusters, some of which serve as insulators that localize the effects of distal enhancers on target genes (Bulger et al. 2003). The detailed information gleaned from decades of work on these gene clusters is recapitulated with high sensitivity and specificity in recent genome-wide analyses (Cheng et al. 2009; Fujiwara et al. 2009; Yu et al. 2009; Kassouf et al. 2010; Soler et al. 2010; Wilson et al. 2010; Wu et al. 2011), as illustrated in Figure 5.
Given the large number of diverse regulatory modules in the globin gene clusters and the multiple modes of regulation, it is not surprising that the mechanisms of gene regulation are complex and not fully understood. A few themes have emerged. Many of the distal CRMs interact in some manner, as shown by phenotypes of mutations (Bungert et al. 1995; Jackson et al. 1996; Molete et al. 2001) and by direct physical mapping using chromosome conformation capture (Dostie et al. 2006; Bau et al. 2011). One proposal is that they form a discrete structure called an active chromatin hub (de Laat and Grosveld 2003; Palstra et al. 2003; Zhou et al. 2006), whereas other experiments indicate that the distal CRMs act more independently (Bender et al. 2000b, 2001). Regardless of the structure(s) formed by the distal CRMs, activation of target globin genes involves a direct interaction between the distal LCR (or equivalent) and the promoter of the activated genes, presumably through a looping mechanism (Carter et al. 2002; Tolhuis et al. 2002; Vakoc et al. 2005; Ragoczy et al. 2006; Vernimmen et al. 2009). Activation of globin genes and other erythroid genes occurs with relocation of the chromatin domain to transcriptionally active regions of the nucleus (Schubeler et al. 2000; Osborne et al. 2004; Schoenfelder et al. 2010). Stabilization of the transcription complex through LCR–promoter interactions, perhaps at transcription factories, is one mechanism for gene activation (Wijgerde et al. 1995).
The general similarities in arrangement of CRMs, transcription factor occupancy, and mechanisms of regulation in globin gene clusters suggest that cross-species evolutionary comparisons of genomic DNA could be effective for finding regulatory regions (Hardison et al. 1993, 1997a; Gumucio et al. 1996; Hardison 2000). This approach has been successful in many studies (e.g., Gumucio et al. 1992; Elnitski et al. 1997; Hardison et al. 1997b; Flint et al. 2001; Hughes et al. 2005), but it has some limitations. The approach is most sensitive for finding CRMs under evolutionary constraint, and although this is true for many CRMs (Pennacchio et al. 2006), a large fraction is species specific (King et al. 2007; Bourque et al. 2008; Hardison and Taylor 2012). Only a small fraction of CRMs are preserved throughout vertebrate evolution (King et al. 2007), and comparisons of noncoding genomic DNA between human and fish have not been a rich resource for discovery of CRMs in globin gene clusters (Flint et al. 2001). Furthermore, critical aspects of the expression patterns or mechanisms can differ, even between mouse and human. For example, the phenotypes differ between mouse and human for deletions of the homologous distal MREs of the HBA gene complexes (Anguita et al. 2002). This showed that some aspects of regulation differ between the two species, and further studies were pursued by engineering a homologous replacement of the mouse Hba gene complex with that of human (Wallace et al. 2007). The change in developmental timing of expression of γ-globin orthologs between mouse and human (Fig. 4) is driven by interspecies differences in the expression of the transcription factor BCL11A, which is a repressor of γ-globin gene expression (Sankaran et al. 2009). This latter study illustrates the larger point that hypotheses can be derived from evolutionary signatures of either constraint or adaptation. Comparative analyses continue to be a rich source of insights, but they must be performed in a context that embraces both conservation and lineage-specific innovation (Hardison and Taylor 2012).
One useful model for interpreting the differences in phylogenetic depth of preservation of CRMs is that those involved in core regulatory functions may be conserved across a wide group of species and show evidence of evolutionary constraint. The CRMs found in one or a limited range of species could be involved in fine-tuning those core regulatory processes, perhaps helping that species adapt to a unique niche. For example, a strong distal enhancer is required for high-level expression of globin genes. Both the HBA and HBB gene complexes have such enhancers (Fig. 5), and interspecies DNA alignments show that both are preserved across mammals and have been under strong constraint (Elnitski et al. 1997; Flint et al. 2001). These strong enhancers could be a key component of the distal regulatory structure, perhaps an active chromatin hub. In contrast, some of the other distal CRMs are present only in mouse, such as the one in the first intron of Nrpl3 and the most distal CRM in the Hbb LCR (Fig. 5). These could contribute to functional differences between the species, perhaps by modifying the distal regulatory structure.
The diversity of hemoglobins, their critical functions, their exquisite regulation, and the pathological consequences of some mutations make this a fascinating family of proteins and genes. Exploration of these genes in many different species continues to illuminate some and challenge other evolutionary models. Production of different forms of hemoglobin at progressive developmental stages is widespread in vertebrates and beyond, and studies of hemoglobin switching are pursued in several non-human species as models of the process in humans. The evolutionary comparisons summarized here illustrate the power of this approach, but they also remind us that such studies are best done while embracing both interspecies conservation of some elements and lineage-specific changes for others. Indeed, this can lead to important insights, such as the impact of differences in expression pattern of a key transcription factor driving a change in developmental timing of expression in humans.
This work is supported by National Institutes of Health grants from NIDDK (R01 DK65806) and NHGRI (RC2 HG005573).
Editors: David Weatherall, Alan N. Schechter, and David G. Nathan
Additional Perspectives on Hemoglobin and Its Diseases available at www.perspectivesinmedicine.org