|Home | About | Journals | Submit | Contact Us | Français|
The functional diversification of multigene families may be strongly influenced by mechanisms of concerted evolution such as interparalog gene conversion. The β-globin gene family of house mice (genus Mus) represents an especially promising system for evaluating the effects of gene conversion on the functional divergence of duplicated genes. Whereas the majority of mammalian species possess tandemly duplicated copies of the adult β-globin gene that are identical in sequence, natural populations of house mice are often polymorphic for distinct two-locus haplotypes that differ in levels of functional divergence between duplicated β-globin genes, HBB-T1 and HBB-T2. Here, we use a phylogenetic approach to unravel the complex evolutionary history of the HBB-T1 and HBB-T2 paralogs in a taxonomically diverse set of species in the genus Mus. The main objectives of this study were 1) to reconstruct the evolutionary history of the different HBB haplotypes of house mice, 2) to assess the role of recombinational exchange between HBB-T1 and HBB-T2 in promoting concerted evolution, 3) to assess the role of recombinational exchange between HBB-T1 and HBB-T2 in creating chimeric genes, and 4) to assess the structural basis of hemoglobin isoform differentiation in species that possess distinct HBB paralogs. Results of our phylogenetic survey revealed that the HBB-T1 and HBB-T2 genes in different species of Mus exhibit the full range of evolutionary outcomes with respect to levels of interparalog divergence. At one end of the spectrum, the two identical HBB paralogs on the Hbbs haplotype (shared by Mus domesticus, Mus musculus, and Mus spretus) represent a classic example of concerted evolution. At the other end of the spectrum, the two distinct HBB paralogs on the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes (shared by multiple species in the subgenus Mus) show no trace of gene conversion and are distinguished by a number of functionally important amino acid substitutions. Because the possession of distinct HBB paralogs expands the repertoire of functionally distinct hemoglobin isoforms that can be synthesized during fetal development and postnatal life, variation in the level of functional divergence between HBB-T1 and HBB-T2 may underlie important physiological variation within and among species.
An important goal of evolutionary genomics is to identify mechanisms responsible for the initial retention and subsequent functional divergence of duplicated genes. In some cases, identical gene duplicates may be retained in the genome because selection favors the production of increased quantities of the encoded RNA or protein (Sugino and Innan 2006). In other cases, duplicated genes may acquire novel functions or partition ancestral functions of the single-copy progenitor gene (Ohno 1970; Lynch et al. 2001; Zhang 2003; Lynch and Katju 2004). Each of these evolutionary outcomes may be strongly influenced by mechanisms of concerted evolution such as interparalog gene conversion. In some cases, concerted evolution may facilitate the spread of an adaptive mutation to multiple members of a multigene family (Mano and Innan 2008). By contrast, in cases where selection favors some type of division of labor between the products of functionally distinct paralogs, the homogenizing effects of gene conversion may counteract adaptive sequence divergence (Innan 2003; Teshima and Innan 2004, 2008). Finally, gene conversion between distinct paralogs can also create chimeric genes with novel functions or expression patterns.
The β-globin gene family of house mice (genus Mus) holds much promise as a model system for understanding the evolutionary dynamics of duplicated genes. First, the β-globin gene cluster of Mus musculus is an intensively studied system from the standpoints of molecular genetics (Leder et al. 1980) and functional genomics (Hardies et al. 1984; Hill et al. 1984; Shehee et al. 1989; Moon and Ley 1990; Hardison and Miller 1993; Hoffmann et al. 2008a). Secondly, natural populations of house mice are often polymorphic for distinct two-locus β-globin haplotypes that differ in levels of amino acid divergence between the two paralogs. The two tandemly duplicated β-globin genes of house mice, HBB-T1 and HBB-T2, encode the β-chain subunit of adult hemoglobin (Hb) and are separated by ~12–15 kb on Chromosome 7 (Hoffmann et al. 2008a; Sato et al. 2008).
Five main classes of HBB haplotypes have been characterized in house mice: Hbbd, Hbbp, Hbbs, Hbbw1, and Hbbw2. Figure 1A provides a graphical summary of amino acid variation within and between the HBB genes on each of these different haplotype backgrounds. As illustrated in figure 1A, the Hbbd haplotype harbors two distinct HBB paralogs that are distinguished from one another by nine amino acid substitutions. The more highly expressed HBB-T1 gene encodes the β-chains of the major Hb isoform (isoHb), dmaj, whereas HBB-T2 encodes the β-chains of the minor isoHb, dmin (Hutton et al. 1962; Gilman 1974; Whitney 1977). The Hbbw1 haplotype also harbors two distinct HBB paralogs, and the two sequences are distinguished from one another by 12 amino acid substitutions. Similar to the case with the Hbbd haplotype, the β-chain subunits of the major and minor isoHbs are encoded by HBB-T1 and HBB-T2, respectively. The Hbbd and Hbbw1 haplotypes are distinguished by three amino acid substitutions at HBB-T1 and two amino acid substitutions at HBB-T2. As illustrated in figure 1B, intergenic recombination in Hbbd/Hbbw1 heterozygotes has produced one recombinant chromosome (Hbbp) that carries an HBB-T1 allele derived from Hbbd and an HBB-T2 allele derived from Hbbw1, and another recombinant chromosome (Hbbw2) that carries an HBB-T1 allele derived from Hbbw1 and an HBB-T2 allele derived from Hbbd (Ueda et al. 1999; Sato et al. 2006, 2008). In contrast to the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes, the Hbbs haplotype harbors two HBB paralogs that are identical in sequence due to a history of HBB-T1 → HBB-T2 gene conversion (Erhart et al. 1985; Storz, Baze, et al. 2007). Consequently, mice that carry two copies of Hbbs synthesize a single β-chain isoHb during postnatal life.
Electrophoretic surveys of β-globin polymorphism in natural populations of Mus domesticus and M. musculus have revealed that the Hbbd and Hbbs haplotypes are nearly always present at intermediate frequencies in mice sampled from disparate geographic localities across Europe and the Americas, and it has been suggested that the polymorphism is maintained by overdominance of fitness or some other form of balancing selection (Selander and Yang 1969; Selander et al. 1969; Berry and Murphy 1970; Wheeler and Selander 1972; Myers 1974; Berry and Peters 1975, 1977; Berry 1978; Berry et al. 1978; Gilman 1979; Petras and Topping 1983). Consistent with this hypothesis, levels of nucleotide variation and linkage disequilibrium in wild mice indicate that the Hbbd and Hbbs haplotypes have been maintained as a long-term balanced polymorphism (Storz, Baze, et al. 2007). Sequence data from additional species of Mus are needed to elucidate the evolutionary origins and antiquity of these different β-globin haplotypes.
Although there does not appear to be any segregating variation in HBB copy number in house mice, there is extensive variation in levels of amino acid divergence between the two HBB paralogs on each of the different haplotype backgrounds (fig. 1A). As a result of gene conversion between HBB-T1 and HBB-T2, the Hbbs haplotype has essentially reverted to an unduplicated state. This pattern of concerted evolution is typical of the globin gene families in mammals (Hardison and Gelinas 1986; Hardison and Miller 1993; Hoffmann et al. 2008a, 2008b; Opazo et al. 2008a, 2008b; Storz et al. 2008, 2009; Opazo et al. 2009). The majority of mammals possess two or more tandemly duplicated HBB genes, and the paralogous copies are typically identical in sequence (Opazo et al. 2008a, 2008b). Thus, the Hbbs haplotype, with its two identical HBB paralogs, is typical of the situation observed in most mammals, whereas the Hbbd and Hbbp haplotypes, with their two highly divergent HBB paralogs, are quite unusual. Sequence data from the HBB paralogs of additional Mus species are needed to determine which pattern is the norm in this particular group.
In cases where tandem gene duplicates have escaped from concerted evolution, as in the case of the two distinct HBB paralogs on all haplotypes other than Hbbs, recombinational exchanges between the two paralogs can produce novel chimeric sequences (Zangenberg et al. 1995; Storz, Sabatino, et al. 2007; von Salome et al. 2007; Hoffmann et al. 2008b; Storz and Kelly 2008; Opazo et al. 2009). For example, Gilman (1972, 1974) reported that Mus caroli possesses a single, chimeric HBB gene that is characterized by T2-like sequence at the 5′ end and T1-like sequence at the 3′ end. It was hypothesized that the unusual HBB of M. caroli is a chimeric fusion gene that was produced by unequal crossing-over between distinct HBB-T1 and HBB-T2 parent genes. In the genus Mus, it thus appears that recombinational exchanges between tandemly duplicated HBB genes have produced a variety of different evolutionary outcomes, in some cases, promoting concerted evolution, as in the case of the Hbbs haplotype, and in other cases, creating novel, chimeric genes, as in the case of M. caroli.
Here, we use a phylogenetic approach to unravel the complex evolutionary history of the HBB-T1 and HBB-T2 paralogs in a taxonomically diverse set of mouse species in the genus Mus. This set of species includes house mice of the Eurasian musculus group that carry the Hbbd, Hbbp, Hbbw1, Hbbw2, and Hbbs haplotypes as well as representatives of three other subgenera of Mus (Coelomys, Nannomys, and Pyromys). The main objectives of this study were 1) to reconstruct the evolutionary history of the different HBB haplotypes of house mice, 2) to assess the role of recombinational exchange between HBB-T1 and HBB-T2 in promoting concerted evolution, 3) to assess the role of recombinational exchange between HBB-T1 and HBB-T2 in creating chimeric genes; and 4) to identify the structural basis of isoHb differentiation in species that possess distinct HBB paralogs.
Our phylogenetic survey of nucleotide variation in the HBB-T1 and HBB-T2 genes included 12 species in the genus Mus. This set of species included nine members of the subgenus Mus (M. caroli, M. castaneus, M. cookii, Mus cervicolor, M. domesticus, M. macedonicus, M. musculus, M. spicilegus, and M. spretus) and single representatives of three other subgenera: Coelomys (M. pahari), Nannomys (M. minutoides), and Pyromys (M. saxicola; fig. 2). We cloned and sequenced the HBB-T1 and HBB-T2 genes in each of the 12 species listed above, and we retrieved additional sequences from public databases. HBB sequences from the Hbbd and Hbbs haplotypes of M. domesticus were taken from the study of Storz, Baze, et al. (2007; GenBank accession numbers EF605358, EF605359, EF605487, and EF605488). We also retrieved publicly available sequences for the following haplotypes of M. musculus: Hbbs from the C57BL/6J strain (NC_000073), Hbbd from the BALB/cByJ strain (NT_095534), Hbbp from the MSM/Ms strain (AB020015, AB020016, and AB189411–AB189418), Hbbw1 from the BALB/c-Hbbw1 congenic strain (AB020013, AB020014, AB189420-189427), and Hbbw2 from the BALB/c-HBBw2 congenic strain (AB364474 and AB364475). We used the HBB-T1 and HBB-T4 genes of Rattus as outgroup sequences (NC_005100), as these genes are 1:1 orthologs of the HBB-T1 and HBB-T2 genes in Mus, respectively (Hoffmann et al. 2008a). Tissue samples from all Mus species other than M. domesticus were kindly provided by P. Tucker (University of Michigan).
We designed paralog-specific primer sets for HBB-T1 and HBB-T2 by using a multispecies sequence alignment of orthologous genes from Rattus, Mus, and deer mouse (Peromyscus maniculatus). Each of the two locus-specific primer combinations (HBB-T1F 5′-CAATTCAGTAGTTGATTGAGC and HBB-T1R 5′-CAAGCTATGTTATTGGTGCAA) and (HBB-T2F 5′-GTG GCT TAC TGC TTG CTG TCC and HBB-T2R 5′-CTC TTT GGT ATT TTA TT CTT G) amplified a ~1.8-kb DNA fragment that spanned the complete coding region of each HBB paralog in addition to 338 bp of 5′-flanking sequence and 290 bp of 3′-flanking sequence. Amplification of the two paralogs was conducted using the Roche High Fidelity PCR System (Roche Diagnostics, Indianapolis, IN). We used the following thermal cycling protocol: 94 °C (120 s) initial denaturing (94 °C [30 s], 44 °C–53 °C [30 s], 72 °C [105 s]) 35 cycles and a final extension of 72 °C (7 min). PCR products were cloned into pCR4-TOPO vector following the manufacturer's protocols (Invitrogen, Carlsbad, CA). For each species, we sequenced a total of 8–10 colonies per gene using the vector primers T3 and T7 (54 °C annealing). In several species, we recovered distinct alleles at one or both HBB paralogs. In such cases, the cloning of diploid PCR products allowed us to determine the exact haplotype phase for all heterozygous sites. Sequences were run on an ABI 3730 capillary sequencer using Big Dye chemistry (Applied Biosystems, Foster City, CA). Sequences were deposited in GenBank under the accession numbers GQ250367–GQ250397.
Sequences were assembled into contigs using Sequencher (Gene Codes, Ann Arbor, MI) and were aligned using ClustalX (Thompson et al. 1997) with manual adjustment. Intron 2 of HBB-T2 was manually aligned because this gene region is characterized by an extremely high density of insertions and deletions (Erhart et al. 1985; Sato et al. 2006, 2008; Storz, Baze, et al. 2007).
Because gene conversion is primarily restricted to the coding regions of mammalian globin genes, reliable inferences about orthologous relationships require an examination of flanking sequence or intronic sequence (Hardison and Gelinas 1986; Hardison and Miller 1993; Hoffmann et al. 2008a, 2008b; Opazo et al. 2008a, 2008b; Storz et al. 2008; Opazo et al. 2009). We therefore conducted phylogenetic reconstructions that were based on four different partitions of the alignment: 5′-flanking sequence (338 bp), coding sequence (441 bp), intron 2 sequence (773 bp), and 3′-flanking sequence (290 bp). We inferred phylogenetic relationships among HBB-T1 and HBB-T2 sequences in a maximum likelihood framework using Treefinder, version April 2008 (Jobb et al. 2004), and assessed support for the nodes with 1,000 bootstrap pseudoreplicates. The Bayesian Information Criterion in Treefinder was used to select the best fitting model of nucleotide substitution for each data partition. Phylogenetic reconstructions of the flanking and coding regions were conducted using the HKY model of nucleotide substitution (Hasegawa et al. 1985) in which rate variation conformed to a discrete gamma distribution (HKY + γ). Phylogenetic reconstructions of intron 2 sequences were conducted using the TN93 model (Tamura and Nei 1993) with a gamma distribution (TN93 + γ). Both global and simple tree searches were conducted. Global searches for each data partition were conducted using seven different starting trees.
To reconstruct amino acid sequences of HBB-T1 and HBB-T2 in the common ancestor of Mus, we used the maximum likelihood approach of Yang et al. (1995) and Koshi and Goldstein (1996). Specifically, we reconstructed ancestral sequences using the 3 × 4 codon model in PAML 4 (Yang 2007). Ancestral reconstructions were conducted separately for each paralog and sequences that harbored ectopic conversion tracts were excluded from the analysis. Marginal posterior probabilities were calculated for each reconstructed residue position. We also applied a codon substitution model to the same alignment of unconverted HBB-T1 and HBB-T2 sequences to estimate relative rates of synonymous and nonsynonymous substitution. This allowed us to evaluate possible differences in selective constraint between the two paralogs.
We used an alignment of adult β-globin sequences from 51 species of mammals to characterize site-specific variation in structural constraint across the β-globin polypeptide. Conservation scores were calculated at each amino acid residue using the method of Valdar (2002) with a modified PET91 distance matrix. For visualization purposes, we used the Pymol program (DeLano, http://www.pymol.org) to project color-coded conservation scores onto the 3D structure of the Hb molecule.
To characterize physicochemical differences between the β-chain products of HBB-T1 and HBB-T2 for each species, we used an in silico approach (Gasteiger et al. 2003) to compute the isoelectric point (pI), the inhibition constant, Ki (a measure of the free energy of oxygen binding), and the grand average of hydropathicity (a measure of hydrophobicity; Kyte and Doolittle 1982). For each pair of β-chain isoHbs, we calculated a normalized 3D distance based on calculated values of pI, Ki, and hydrophobicity. To characterize the structural basis of isoHb differentiation, we used SWISS-Model (Arnold et al. 2006) to map observed amino acid substitutions onto a 3D homology-based model of Mus Hb. The D chain of 1JEB (Kidd et al. 2001) was used as a template for all models.
We successfully cloned two adult β-globin genes from each of the 12 species of Mus, including M. caroli, which was previously thought to have only one HBB gene copy (Gilman 1972, 1974). Remarkably, the Hbbs haplotype (previously characterized in the C57BL/6J inbred strain) was shared between M. domesticus, M. musculus, and M. spretus, the Hbbd haplotype (previously characterized in the BALB/cByJ inbred strain) was shared between M. castaneus, M. domesticus, M. macedonicus, M. musculus, and M. spicilegus, and the Hbbp haplotype (previously characterized in the AU/SsJ inbred strain) was shared between M. castaneus and M. musculus.
Phylogenetic reconstructions of 5′- and 3′-flanking sequences grouped HBB-T1 and HBB-T2 into two reciprocally monophyletic groups (fig. 3). The sole exception was the 5′-flanking sequence of the HBB-T2 gene in M. pahari. This sequence was not nested within the clade of HBB-T2 sequences from the other species, although it was more closely allied with the HBB-T2 clade than with the HBB-T1 clade. Closer inspection revealed perfect sequence identity between the HBB-T1 and HBB-T2 genes of M. pahari in the 104 bp immediately upstream of the start codon. In M. pahari, it appears that the 5′-flanking sequence of HBB-T2 has been partially converted by HBB-T1. Aside from the HBB-T2 gene of M. pahari, we found no evidence of gene conversion in the flanking regions of HBB-T1 or HBB-T2 in any of the other species of Mus.
In general, phylogenetic reconstructions based on intron 2 also recovered the same two clades of orthologous HBB-T1 and HBB-T2 sequences (fig. 3). There were five cases of paraphyly: HBB-T1 sequences of M. saxicola and M. cookii were both nested within the HBB-T2 clade (indicating a T2 → T1 conversion of intron 2), and HBB-T2 sequences from the Hbbs haplotype of M. domesticus, M. musculus, and M. spretus were nested within the HBB-T1 clade (indicating a T1 → T2 conversion of intron 2).
Despite these few cases of paraphyly, phylogenies based on flanking regions and intron 2 clearly group the HBB-T1 and HBB-T2 sequences into two distinct clades. Although phylogenetic signal was relatively weak due to the restricted number of informative sites within each partition of the multiple alignment, the tree topologies were largely consistent with species phylogenies inferred from independent data (Lundrigan et al. 2002; Tucker et al. 2005; Tucker 2007). In contrast to the generally well-defined HBB-T1 and HBB-T2 clades in the phylogenies of flanking and intronic sequences, the phylogeny of coding sequences was characterized by extensive paraphyly as HBB-T1 and HBB-T2 sequences were intermingled throughout the tree. In the phylogeny of coding sequences, M. cervicolor, M. cookii, M. minutoides, M. pahari, and M. saxicola, each exhibited the hallmarks of concerted evolution, as paralogs from the same species grouped together to the exclusion of their presumed orthologs in other species (fig. 3). The hypothesized fusion gene of M. caroli did not show clear affinities with the HBB-T1 or HBB-T2 genes of other species, but the HBB-T2 of M. caroli formed a clade with HBB-T1/dmaj and HBB-T2/dmin sequences from Eurasian members of the subgenus Mus.
Gilman (1972, 1974) reported that the β-chain subunit of M. caroli Hb is a hybrid polypeptide characterized by a C-terminal portion that is nearly identical to dmin (the HBB-T2 allele on the Hbbd haplotype) and an N-terminal portion that is nearly identical to dmaj (the HBB-T1 allele on the Hbbd haplotype). Gilman hypothesized that this chimeric dmin/dmaj fusion gene was created by unequal crossing-over between misaligned copies of HBB-T1 (dmaj) and HBB-T2 (dmin). The product of this dmin/dmaj fusion gene would be structurally similar to the β-chains of “Hb Lepore,” a human Hb mutant that incorporates the products of a chimeric δ/β-globin fusion gene (Forget 2001). Based on a comparison of amino acid sequences between the M. caroli β-chain and the dmaj and dmin β-chains, Gilman (1972, 1974) hypothesized that the crossover break point was located in the interval of exon 2 that encodes amino acid residues 58–73. To test this unequal cross-over hypothesis, we conducted a phylogenetic analysis of the HBB-T1 and HBB-T2 genes of M. caroli and the corresponding genes on the Hbbd haplotype of M. castaneus and M. musculus. We reconstructed separate phylogenies for four partitions of the multiple sequence alignment: fragment 1 (5′-flanking sequence), fragment 2 (exon 1 + intron 1 + exon 2), fragment 3 (intron 2), and fragment 4 (exon 3 + 3′-flanking sequence). According to Gilman's unequal cross-over hypothesis, sequence from the 5′ end of M. caroli HBB-T1 (fragments 1 and 2) should group with HBB-T2/dmin sequences of M. castaneus and M. musculus, whereas sequence from the 3′ end of M. caroli HBB-T1 (fragments 3 and 4) should group with HBB-T1/dmaj sequences of the other species.
Phylogenetic reconstructions showed that the 5′-flanking sequence of M. caroli HBB-T1 (fragment 1) grouped with HBB-T1/dmaj sequences of M. castaneus and M. musculus, and likewise, the HBB-T2 sequence of M. caroli grouped with HBB-T2/dmin sequences of the other species (fig. 4). By contrast, in the case of fragment 2, the M. caroli HBB-T1 and HBB-T2 sequences grouped together to the exclusion of dmaj and dmin sequences in the other species. Further downstream, phylogenies of fragments 3 and 4 reverted to the same pattern of reciprocal monophyly between HBB-T1 and HBB-T2 sequences that was observed for fragment 1. Thus, contrary to Gilman’s (1972, 1974) hypothesis, the chimeric sequence of the M. caroli HBB-T1 gene is not attributable to unequal crossing-over. Rather, it is attributable to a HBB-T2 → HBB-T1 gene conversion event that was restricted to exon 1, intron 1, and exon 2.
Gene conversion between HBB-T1 and HBB-T2 was pervasive in all species other than those that carried the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes (M. castaneus, M. macedonicus, M. musculus, and M. spicilegus; table 1). Our analysis of gene conversion between the two HBB paralogs revealed four noteworthy patterns. First, conversion tracts were almost exclusively restricted to coding regions. Second, conversion tracts spanned the entire coding region in some cases (HBB-T1 of M. cookii and M. saxicola, and HBB-T2 on the Hbbs haplotype of M. domesticus, M. musculus, and M. spretus), and in the remaining cases, the conversion tracts generally spanned just the 5′ portion of the gene (exon 1, intron 1, and exon 2; table 1, fig. 5). Third, gene conversion was bidirectional as roughly equal numbers of identified conversion events occurred in the 5′ → 3′ direction (T1 → T2) and in the 3′ → 5′ direction (T2 → T1; table 1 and fig. 5). And fourth, despite the pervasiveness of interparalog gene conversion, the HBB genes of most species have at least partially escaped from concerted evolution. Gene conversion has completely homogenized amino acid sequence variation between the HBB paralogs of M. cookii and those species carrying the Hbbs haplotype, but all other species carry HBB paralogs that are distinguished by 1–12 amino acid substitutions (table 1 and fig. 5).
We inferred that the common ancestor of Mus possessed two distinct HBB-T1 and HBB-T2 paralogs that were distinguished by seven amino acid substitutions at residues 20, 58, 76, 80, 121, 125, and 135 (fig. 6). The three substitutions at residues 58, 76, and 80 also distinguish the HBB paralogs on the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes. The remaining differences between HBB-T1 and HBB-T2 on each of these haplotypes are attributable to substitutions that accumulated in the HBB-T2 paralog. In the case of the five main haplotypes that are found in the Eurasian members of the subgenus Mus, the HBB-T2 sequences have accumulated a preponderance of amino acid changes, whereas the HBB-T1 sequences are more highly conserved (fig. 6). The difference in rates of amino acid substitution between the two paralogs on the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes is mirrored by differences in the ratio of nonsynonymous to synonymous substitution rates for the full set of Mus species (dN/dS = 0.31 for HBB-T1 and 0.51 for HBB-T2). There is no way of knowing whether the HBB paralogs on the Hbbs haplotype have experienced a similar disparity in rates of amino acid substitution as any changes that accumulated in the HBB-T2 sequence have since been overwritten by gene conversion from HBB-T1.
The Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes are characterized by the highest level of physicochemical differentiation between the products of HBB-T1 and HBB-T2 (table 1). In species that possess distinct HBB-T1 and HBB-T2 genes, most of the amino acid differences between products of the two paralogs involve exterior, solvent-exposed residues. Most of the amino acid substitutions that distinguish the two paralogs on the Hbbd and Hbbp haplotypes (dmaj vs dmin, and pmaj vs pmin) are located in positions that appear to be subject to relatively low levels of functional constraint, with the exception of sites 20, 58, and 109, each of which had conservation scores ≥0.75 (fig. 7). In the case of Hbbd and Hbbp, the especially high levels of physiochemical differentiation between the two coexpressed isoHbs is largely attributable to the β109(Ala → Met) substitution in the internal, water-filled cavity of the Hb tetramer (fig. 8A). Whereas the β-chain Hbs of almost all mammals studied to date contain Val at position 109, the β-chain product of HBB-T1/dmaj contains Met. In human Hb, the rare β109Met mutant (Hb San Diego) is characterized by unusually high O2-binding affinity and impaired cooperativity and is associated with pathological erythrocytosis (Anderson 1974; Nute et al. 1974). Residue position 109 is located immediately adjacent to an α1β1 intersubunit contact, and substitution of Met at this highly conserved site disrupts an H-bond between β35Tyr (the N-terminal residue of the β-chain C helix) and α122His on the α-chain H helix (Anderson 1974; fig. 8). This loss of intradimer contact between α- and β-chain subunits destabilizes the low-affinity deoxyHb structure, thereby shifting the allosteric equilibrium in favor of the high-affinity oxyHb (Anderson 1974). Thus, the red blood cells of mice that carry the Hbbd and Hbbp haplotypes contain a mixture of distinct β-chain isoHbs that may differ in allosteric equilibria between the deoxy-Hb and oxy-Hb conformations.
Among species in the genus Mus, the HBB-T1 and HBB-T2 genes exhibit the full range of evolutionary outcomes with respect to levels of interparalog divergence. At one end of the spectrum, the two identical HBB paralogs on the Hbbs haplotype (shared by M. domesticus, M. musculus, and M. spretus) represent a textbook example of concerted evolution. At the other end of the spectrum, the two distinct HBB paralogs on the Hbbd, Hbbp, Hbbw1, and Hbbw2 haplotypes (shared by multiple species in the subgenus Mus) show no trace of gene conversion and are distinguished by a number of amino acid substitutions that alter important biochemical properties of the Hb protein. Moreover, the ancestral sequence reconstructions indicate that the species of Mus included in our analysis descend from a common ancestor that possessed two HBB paralogs that were distinguished by seven amino acid substitutions. Thus, with the exception of individuals of M. cookii that are homozygous for the same HBB haplotype and individuals of M. musculus, M. domesticus, and M. spretus that are homozygous for the Hbbs haplotype, mice in the genus Mus are unusual among mammals in that they are capable of synthesizing two distinct β-chain isoHbs during fetal development and postnatal life.
Phylogenetic analysis of the coding sequences showed that the HBB paralogs from M. caroli, M. cervicolor, M. minutoides, M. musculus, M. pahari, M. saxicola, and M. spretus were more similar to each other than to their orthologs in other species. In principle, this pattern could be attributable to the effects of gene conversion or it may reflect recent ancestry between the products of de novo gene duplication events that occurred independently in multiple lineages. In the α-globin gene family of primates, sequence similarity between paralogous genes in the same species was sometimes attributable to gene conversion, but in a surprising number of cases, it was attributable to recent ancestry between the products of lineage-specific gene duplications (Hoffmann et al. 2008b). In the case of the HBB-T1 and HBB-T2 paralogs of Mus, the phylogenies of flanking sequence and intronic sequence provided no evidence of lineage-specific gene duplications. It is clear that each of the 12 mouse species included in our analysis inherited the same pair of HBB-T1 and HBB-T2 genes from a common ancestor, but the antiquity of the two paralogs has been obscured by recurrent gene conversion that has occurred independently in each descendant lineage. It appears that the original duplication event that gave rise to the β-globin genes of Mus predated the diversification of muroid rodents as 1:1 orthologs of the HBB-T1 and HBB-T2 genes have been identified in Rattus and two species of Peromyscus (Hoffmann et al. 2008a).
Estimates of gene conversion tract lengths in the human β-globin gene cluster range from 113 to 2266 bp (Papadakis and Patrinos 1999). The conversion events that we detected in the present study all fall well within this range. The largest conversion tracts that we detected, such as the HBB-T1 → HBB-T2 conversion event on the Hbbs haplotype, did not extend much beyond the initiation and termination codons and were therefore less than 1.4 kb in length. In contrast to other eukaryotic gene families in which interparalog gene conversion has been documented (Chen et al. 2007), we observed no consistent bias in the directionality of conversion events. In the human β-globin gene cluster, the directionality of gene conversion is associated with the relative expression levels of the two genes involved in the exchange as the gene that is expressed at a higher level is more likely to convert the gene that is expressed at a lower level (Papadakis and Patrinos 1999). In M. musculus, the expression level of HBB-T1 is roughly 4-fold higher than that of HBB-T2 (Hutton et al. 1962; Gilman 1974; Whitney 1977). If this discrepancy in relative expression levels between the two HBB paralogs is consistent among other species of Mus, then it would appear that the association between expression level and directionality of gene conversion does not hold in mice.
Because the possession of distinct HBB paralogs expands the repertoire of functionally distinct isoHbs that can be synthesized during fetal development and postnatal life, variation in functional divergence between HBB-T1 and HBB-T2 may underlie important physiological variation within and among species. For example, coexpression of multiple isoHbs may permit higher intraerythrocytic Hb concentrations by increasing solubility and inhibiting protein aggregation (Weber 1990; Storz and Moriyama 2008). It is interesting that the alternative 2-locus haplotypes that represent opposite ends of the spectrum with respect to interparalog divergence—Hbbs and Hbbd—are maintained at intermediate frequencies in natural populations of M. domesticus and M. musculus. Two of the most commonly used inbred strains of laboratory mice, C57BL and BALB/c, are homozygous for the Hbbs and Hbbd haplotypes, respectively. Like C57BL, humans effectively express a single major isoHb during postnatal life as the minor HbA2 isoHb (which incorporates β-type chains that are encoded by the δ-globin gene) typically accounts for <2% of Hb in circulating red blood cells. Thus, strains of mice like BALB/c that coexpress multiple isoHbs may not be ideal models for research on pathologies of the cardiopulmonary system.
The β-globin haplotypes Hbbs, Hbbd, and Hbbp are shared among multiple species in the subgenus Mus. Mus castaneus is known to segregate the Hbbd and Hbbp haplotypes (Gilman 1976; Bonhomme et al. 1984; Miyashita et al. 1985), whereas M. domesticus and M. musculus both segregate the Hbbd and Hbbs haplotypes (Selander and Yang 1969; Selander et al. 1969; Selander 1970; Storz, Baze, et al. 2007). It will be necessary to collect polymorphism data for M. macedonicus and M. spicilegus to determine whether the Hbbd haplotype is fixed in these two species or whether they are also polymorphic for two or more haplotypes. In principle, the sharing of identical 2-locus β-globin haplotypes among species could be attributable to introgressive hybridization or the retention of ancestral polymorphism. At face value, introgressive hybridization seems like a plausible explanation for the sharing of identical HBB haplotypes among some of the species that were included in our study as admixture has been documented between natural populations of M. castaneus and M. domesticus, between M. domesticus and M. musculus and between M. domesticus and M. spretus (Moriwaki et al. 1979; Ferris et al. 1983; Yonekawa et al. 1988; Bonhomme et al. 1989; Boursot et al. 1989; Orth et al. 2002; Payseur et al. 2004; Geraldes et al. 2008; Teeter et al. 2008). Even in the absence of introgressive hybridization, the sharing of identical haplotypes among M. castaneus, M. domesticus, and M. musculus can also be plausibly explained by the retention of ancestral polymorphism. These three species are thought to have diverged from one another ~500,000 yrs ago, and gene genealogies at many unlinked autosomal and X-linked loci exhibit paraphyletic and polyphyletic patterns of relationship (Salcedo et al. 2007; Geraldes et al. 2008). In other words, it is not uncommon for alleles at a given gene in M. castaneus to be more closely related to alleles in M. domesticus than to other alleles in M. castaneus (and vice versa). In the case of more distantly related sets of species in the subgenus Mus that share the Hbbs haplotype (M. musculus and M. spretus) and the set of five species that all share the Hbbd haplotype (M. castaneus, M. domesticus, M. macedonicus, M. musculus, and M. spicilegus), each of the alternative haplotypes would have to be maintained for several million years in order to explain transspecific polymorphism without invoking introgression (Lundrigan et al. 2002; Salcedo et al. 2007). The retention of alternative alleles for especially longtime spans becomes more plausible if polymorphism is actively maintained by some form of balancing selection, as has been suggested in the case of the Hbbs and Hbbd haplotypes (Storz, Baze, et al. 2007). Surveys of β-globin polymorphism in additional species in the subgenus Mus would be useful to assess whether balancing selection needs to be invoked to explain the observed patterns of transspecific polymorphism.
We thank P. Tucker for providing tissue samples for 11 of the species included in this study, and we thank two anonymous reviewers for helpful comments on the manuscript. This work was supported by a National Science Foundation Fellowship in Bioinformatics to A.M.R. (0630779), grants to J.F.S. and H.M. from the National Institutes of Health/NHLBI (R01 HL087216), the Nebraska Research Council, and a grant to J.F.S. from the National Science Foundation (DEB-0614342).