|Home | About | Journals | Submit | Contact Us | Français|
Until around 1990, most multigene families were thought to be subject to concerted evolution, in which all member genes of a family evolve as a unit in concert. However, phylogenetic analysis of MHC and other immune system genes showed a quite different evolutionary pattern, and a new model called birth-and-death evolution was proposed. In this model, new genes are created by gene duplication and some duplicate genes stay in the genome for a long time, whereas others are inactivated or deleted from the genome. Later investigations have shown that most non-rRNA genes including highly conserved histone or ubiquitin genes are subject to this type of evolution. However, the controversy over the two models is still continuing because the distinction between the two models becomes difficult when sequence differences are small. Unlike concerted evolution, the model of birth-and-death evolution can give some insights into the origins of new genetic systems or new phenotypic characters.
A multigene family is a group of genes that have descended from a common ancestral gene and therefore have similar functions and similar DNA sequences. A group of related multigene families is sometimes called a supergene family. A well-known example of a supergene family is the globin superfamily that is composed of the α-like, β-like, and some other gene families (134a). A gene family may also be subdivided into subfamilies whenever convenient.
The evolution of multigene families has been the subject of controversy for many years. The paradigm of evolution of multigene families before 1970 was that of hemoglobin α, β, γ, and δ chains and myoglobin (54). The genes encoding these polypeptides or proteins are phylogenetically related and have diverged gradually as the duplicate genes acquired new gene functions. This mode of evolution may be called “divergent evolution” (Figure 1a). Around 1970, however, a number of researchers showed that ribosomal RNAs (rRNAs) in Xenopus are encoded by a large number of tandemly repeated genes and that the nucleotide sequences of the intergenic regions of the genes are more similar within a species than between two related species (11).
These observations were difficult to explain by the model of divergent evolution, and a new model called “concerted evolution” was proposed (Figure 1b). In this model all the members of a gene family are assumed to evolve in a concerted manner rather than independently, and a mutation occurring in a repeat spreads through the entire member genes by repeated occurrence of unequal crossover or gene conversion. This model is capable of explaining previously puzzling observations about the evolution of rRNA genes.
This apparent success led many authors to believe that most multigene families evolve following the model of concerted evolution, and a number of authors investigated the evolutionary modes of various multigene families (48, 110, 178). Later, however, the applicability of concerted evolution to some gene families was questioned as both DNA and amino acid sequence data became available, and another model called birth-and-death evolution (98) was proposed. In this model new genes are created by gene duplication, and some duplicated genes are maintained in the genome for a long time, whereas others are deleted or become nonfunctional through deleterious mutations (Figure 1c). This model applies to most multigene families concerned with immune systems such as immunoglobulins and major histocompatibility complex (MHC) (53, 116) and disease-resistance genes (173). In recent years, even a highly conserved gene family such as the ubiquitin family was shown to be subject to birth-and-death evolution. Yet the controversy over the evolution of multigene families continues, partly because there are so many different types of gene families and partly because the general mechanism of gene conversion is still unclear.
In this paper we review recent studies of evolution of multigene families with some historical backgrounds.
One of the best-known examples of concerted evolution comes from the study of rRNA genes. The rRNA gene family of the African toads Xenopus laevis and X. mulleri consists of about 450 repeats or members. Each member consists of the 18S, 5.8S, and 28S RNA genes, external transcribed spacers (ETS1 and ETS2), internal transcribed spacers (ITS1 and ITS2), and an intergenic spacer (IGS) (Figure 2) (15). Using DNA or RNA hybridization techniques, Brown et al. (11) showed that the nucleotide sequences of IGS are very similar among member genes of the same species but differ by about 10% between X. laevis and X. mulleri. This observation could not be explained by the then popular model of divergent evolution. According to this model, the differences in nucleotide sequence between different repeats of the same species are expected to be as large as those between repeats of different species. The explanation becomes more difficult to accept if we note that the nucleotide sequences of the 18S and 28S coding regions are virtually identical between X. laevis and X. mulleri. Actually, the 18S and 28S coding regions are very similar even among distantly related organisms such as animals and plants.
Concerted evolution: a form of multigene family evolution in which all the member genes are assumed to evolve as a unit in concert and a mutation occurring in a repeat spreads through the entire member genes by repeated occurrence of unequal crossover or gene conversion
Gene conversion: a form of nonreciprocal recombination in which a DNA segment of a recipient gene is copied from a donor gene
This puzzling observation can be explained by the model of horizontal or concerted evolution originally proposed by Brown et al. (11). According to this model, unequal crossover occurs randomly among members of a gene family, and repeated occurrence of unequal crossover has an effect of homogenizing the member genes, as mentioned above. In this case, the number of member genes may increase or decrease by chance, but a certain range of the number of genes is maintained because of the functional requirement. In the absence of mutation, this process will eventually lead to the homogeneity of all member genes of a family. In reality, of course, mutation always occurs, so that a gene family is expected to have some variant genes. It should now be clear that when a species diverges into two species and the gene cluster in each descendant species evolves independently, the cluster within each species tends to have similar gene copies because of unequal crossover, whereas the genes belonging to the cluster of the two species gradually diverge by mutation (Figure 1b). This is exactly what we observe with the IGS regions of rRNA genes in Xenopus. Later, Smith (144, 145) conducted computer simulations to show that concerted evolution indeed can explain the observation about the evolutionary change of the IGS region. As mentioned above, the 18S and 28S coding regions are virtually identical between X. laevis and X. mulleri as well as between different copies of the same species. This identity has apparently been maintained by strong purifying selection that operates for the coding regions. Thus, we can explain the entire set of observations about the rRNA gene family of Xenopus in terms of unequal crossover, mutation, and purifying selection.
In addition to these factors, gene conversion (55, 143) was proposed as a mechanism of homogenizing the member genes of multigene families. The role of gene conversion is similar to that of unequal crossover. The only difference is that the latter may increase or decrease the number of genes, whereas the former does not. The idea of gene conversion later became popular among theoretical population geneticists, partly because it is easy to develop mathematical models of concerted evolution (8, 90-92, 111-113, 161). In reality, the molecular mechanism of gene conversion in multigene families is not well understood, particularly when sequence identity is patchy, though gene conversion in yeast can be explained by a DNA breakage followed by invasive DNA replication (120). Furthermore, in most mathematical formulations the effect of purifying selection operating in the coding regions of 18S and 28S genes has been neglected. Therefore, caution should be exercised in the application of the mathematical formulas to real data. The gene conversion theory has also become popular among researchers of MHC polymorphism, as is discussed below.
Note that the relative contributions of unequal crossover (or gene conversion) and purifying selection to the homogenization of the rRNA genes have rarely been discussed. For this reason, the homogeneity of the rRNA-coding regions (18S and 28S) was often attributed to unequal crossover rather than to purifying selection. Actually, even the IGS regions appear to be subject to purifying selection in Xenopus because this region contains elements of promoters and enhancers (15, 132). Furthermore, the ITS regions have a level of variability as low as that of the rRNA-coding regions in the fungus Fusarium graminearum (K. O'Donnell, personal communication). This has probably occurred because they are closely linked to the highly conserved rRNA-coding regions or because they have some important functions. It is therefore necessary to keep in mind that concerted evolution applies primarily to the IGS region, and even in this region a substantial proportion of mutations may be eliminated by purifying selection.
In recent years, the so-called complete genome sequence has been published for many different organisms. Unfortunately, the rRNA gene regions are usually excluded from the sequence, mainly because of the difficulty of sequencing a large number of similar genes. In humans, chimpanzees, gorillas, and orangutans, however, some sequence data are available for a small portion of the rRNA gene regions (35). The hominoid genome has about 500 rRNA gene repeats, and the molecular structure of each repeat unit is similar to that of Xenopus (15). However, hominoid rRNA genes are clustered in the telomeric regions of five different chromosomes. The pattern of sequence similarity among the IGS, 18S, and 28S regions of each chromosome is very similar to that of Xenopus genes. The 18S and 28S gene regions are virtually identical within and between hominoid species, and the IGS regions from the same chromosomes are also similar and show only about 0.5% nucleotide differences.
The rRNA gene clusters closely located at the chromosomal ends are also very similar in each hominoid species, but the IGS regions distal to the telomere are somewhat differentiated (35). This observation suggests that genes are exchanged between different chromosomes through unequal crossover or gene conversion that occurs primarily in the chromosomal region proximal to the telomere. In fact, the distal IGS regions often showed a substantial amount of sequence difference (3 ~ 7% per site).
The 5S RNA genes form separate gene clusters and are located on a different genomic region. This gene family includes 9000~24,000 member genes in Xenopus (10) and about 500 members in humans (35). These 5S rRNA genes are also known to undergo concerted evolution (10). Furthermore, the gene families of small nuclear RNAs (snRNA) involved in intron-splicing and other important cellular metabolisms apparently undergo concerted evolution. An extensive study of concerted evolution of U2 snRNA genes in primates has been conducted by Weiner and his group (70, 122, 123), and these authors have shown that the coding regions of U2 snRNA gene members are very similar to one another but the intergenic regions are heterogeneous within each species. These results again demonstrate the importance of purifying selection in the coding regions.
In yeast, the basic unit of rRNA gene repeats includes an additional RNA gene (5S gene) in the middle of the IGS region, but IGS is again subject to concerted evolution (125, 146, 151, 170). In bacterial genomes there are only a few rRNA gene repeats, and they are generally dispersed in the genome (69). For example, Escherichia coli has 7 copies of the rRNA operon, which is composed of 16S, 23S, and 5S rRNA genes (Figure 2). The spacer (ITS1) between the 16S and 23S genes usually contains one or two tRNA genes, and these tRNA genes are not necessarily the same among different copies of the operon.
The rRNA-coding regions are again very similar among different operons. The sequence difference of the 16S gene among repeats is 0.0055 per site in E. coli and 0 in Haemophilus influenzae. However, the ITS1 is quite heterogeneous. This region often contains unique nucleotide sequences shared by only a few operons. These sequences are patchy and could represent traces of gene conversion events that occurred in the past. On the basis of these observations, Liao (69) concluded that the rRNA genes in bacteria are generally homogenized by gene conversion. However, these observations can also be explained by strong purifying selection and occasional unequal crossover. If unequal crossover occurs in the ITS1 regions as well as outside the rRNA operons, a unique nucleotide sequence in an ITS can be transferred to other ITSs. In this case, we would expect that the sequence length of ITS varies from operon to operon. In fact, this is the case in E. coli, and the length of the ITS1 between the 16S and 23S genes varies from 500 bp to 800 bp.
The above literature survey indicates that most RNA genes in both prokaryotes and eukaryotes are subject to concerted evolution. However, there are exceptions to this rule. The most conspicious is the rRNA genes in species of the malaria parasite plasmodia. In these species the number of rRNA gene is low (a few to a dozen copies), and they are dispersed in different chromosomes (68a, 86a). These genes are grouped into a few different classes in terms of function and structure. These different classes of rRNA genes are used in different stages of the life cycle of plasmodia, which infest both insects and vertebrates (68a). The gene sequences are similar within each class of rRNA genes but are different between different classes. The inter-class differences are often substantial and amount to about 10 percent. Conducting a phylogenetic analysis of the sequences from several closely related species, Rooney (135) concluded that the gene family in plasmodia actually evolve following the birth-and-death model. The existence of different classes of rRNA genes in the same genome has been reported in several other organism such as flat-worms (14a) and oak tree (89a).
The 5S RNA gene family is also heterogeneous in some species. The best known example is that of filamentus fungi. The number of copies of 5S RNA genes in these species is 50 ~ 100, and they are dispersed in the genome rather than organized as a tandem array. They can also be grouped into a few different classes by means of sequence similarity. Studying the evolutionary pattern of the sequences from four species of this group of fungi, Rooney & Ward (137) concluded that this 5S RNA gene family is subject to birth-and-death evolution. They found that 18% to 83% of genes are pseudogenes. There are several other species in which heterogeneous 5S genes exists. Examples are Xenopus laevis (162a) and wheat (58a).
Not long after the evolution of rRNA gene families was explained by the model of concerted evolution, many researchers began to assume that this model applies to various other multigene families (17, 110, 115, 144, 145). The general view then was that a gene family that produces a large amount of gene products is subject to concerted evolution to homogenize the genes. One such family was the histone gene family of sea urchins (20, 21, 44, 47, 57). This is a large multigene family with several hundred members that are divided into four classes on the basis of developmental and tissue-specific expression patterns (Figure 3): (a) “early histone genes” that are active during late oogenesis through the blastula stage of embryogenesis, (b) “cleavage stage histone genes” that encode the first histones expressed after fertilization, (c) “late histone genes” that are active from the late blastula stage onwards, and (d) “sperm histone genes” that are expressed only during spermatogenesis (19, 78, 83, 101, 102).
The early histone genes are present in about 300–500 repeat units in most sea urchin species. In the sea urchin Lytechinus pictus, they are arranged in 3 tandem arrays that consist of virtually identical repeating units of the 5 histone genes (H1, H2A, H2B, H3, and H4), each of which is separated by noncoding IGS regions (21, 47) (Figure 3). In this species, only the early genes are arranged in tandem array, whereas the other three classes of genes appear to be dispersed throughout the genome and present in significantly fewer copy numbers (16, 78). Using DNA heteroduplex and restriction mapping analyses, it was demonstrated that the IGS regions of the early gene tandem arrays in L. pictus show a considerable amount of variation, whereas the protein-encoding regions are highly conserved (20, 21, 47). This was taken as evidence for concerted evolution in the early histone genes of this species. Not long afterwards, researchers studying the sequence divergence of late histone genes of L. pictus claimed that these genes also undergo concerted evolution (130), as did another study on the late genes of the sea urchin Strongylocentrotus purpuratus (83). However, it was later shown that histone genes generally evolve following a birth-and-death process with strong purifying selection (127, 136), as discussed below.
Like the sea urchin genes, the 5 histone genes in Drosophila melanogaster are arranged in a repeating unit (71). This unit is repeated about 110 times in a tandem array found on chromosome 2L (121). Two types of repeat units that differ with respect to length (5.0 kb and 4.8 kb) have apparently arisen owing to differences in the noncoding region (71). In addition to the tandem array, so-called variant histone genes are located in other parts of the genome (119). On the basis of restriction fragment patterns, Coen et al. (18) argued that the histone genes of D. melanogaster and its close relatives underwent concerted evolution. According to these authors, the absence of different banding patterns within a species was evidence for concerted evolution, because a restriction site must have spread to all other repeat units after it had arisen subsequent to the divergence of Drosophila species. However, this line of inference based on negative data should not be used as support for concerted evolution.
Matsuo & Yamazaki (82) later obtained nucleotide sequences of several different histone H3 genes and their flanking regions from Drosophila. They obtained data from 10 clones of a single chromosome (a single individual), 10 clones from different chromosomes (a population sample) of D. melanogaster, and a single clone from a single chromosome of a sibling species, D. simulans. They found that variability within an individual was 1.7 times less than that of a population sample and 14 times less than the interspecific variation. Using complex mathematical models, they argued that these observations can be explained by concerted evolution. However, their sequence data indicate that the number of polymorphic sites is quite small and is not substantially different between the protein coding and flanking regions.
Pseudogenes: nonfunctional genes generated by nonsense mutation, frameshift mutation, or partial nucleotide deletion
By the end of the 1980s, most researchers in the field had concluded that virtually all multigene families evolve in a concerted fashion. Therefore, the studies with sea urchin and Drosophila histone genes mentioned above were received as confirmation of the general view. No one considered the possibility that purifying selection is an alternative explanation for the similarity of histone sequences. This happened because the above studies, which were often performed using restriction fragment analysis, did not have sufficient power to distinguish between intralocus and interlocus variation of histone genes. Recently, this view has changed, as discussed below.
There are many other genes for which concerted evolution has been reported. A well-known example is the heatshock protein gene (hsp70 genes) in Drosophila. D. melanogaster has a relatively large family of hsp genes, of which two, hsp70Aa and hsp70Ab, are a pair of inverted tandem repeats, and the nucleotide sequences of the two genes are virtually identical. D. simulans, a sibling species of D. melanogaster, also has a similar tandem pair of nearly identical hsp70 genes. Finding these pairs of genes, Leigh Brown & Ish-Horowicz (68) and Bettencourt & Feder (7) proposed that the within-species identity of the two genes is caused by gene conversion. This is certainly a plausible explanation, but if we consider the evolution of the entire members of this gene family, the evolutionary pattern does not necessarily conform to concerted evolution (see below).
There are many reports about possible gene conversion in small multigene families. One example is that between the two loci of Rh blood group genes in hominoids (14b, 60). Conducting a detailed statistical analysis, Kitano & Saitou (60) reported that several gene conversions occurred in each of humans, chimpanzees, and gorillas. However, it is very difficult to distinguish between gene conversion and independent nucleotide substitution in their case (65a). Their results are also dependent on the phylogenetic tree of the genes used. Therefore, their conclusion is questionable. Furthermore, the sequence identity of the two genes was rather low compared with the average identity of genes from humans and chimpanzees. A more detailed study is necessary about this gene family.
Many authors have claimed that the genetic variability of MHC loci is caused by gene conversion, and this was thought to be a source of genetic variability within loci rather than a homogenizing factor (36, 73, 86, 112, 164). Similar hypotheses were presented to explain the high degree of antibody diversity (110). These views remained popular until a more realistic model of maintenance of polymorphism was proposed.
The birth-and-death model of evolution of multigene families was first proposed to explain the unusual pattern of evolution of MHC genes in mammals (51, 65, 96, 98). The function of MHC genes is to bind self or foreign peptides and present them to T lymphocytes, thereby triggering an immune response (63). MHC genes can be divided into class I and class II genes on the basis of molecular structure and function of the polypeptide encoded. Class I genes can be further divided into classical and nonclassical genes. The classical class I (Ia) genes are highly polymorphic, and the number of alleles per locus sometimes exceeds 100. This high degree of polymorphism is important for protecting the host from attack by various types of parasites (viruses, bacteria, fungi, and others), which are always changing with time. By contrast, the nonclassical class I (Ib) genes are less polymorphic and their functions may be quite different from those of Ia genes.
In the 1970s and 1980s when most investigators believed that multigene families were generally subject to concerted evolution, the MHC gene family was no exception. Therefore, some authors attempted to explain the polymorphism of MHC genes by means of unequal crossover or gene conversion (74, 111, 112, 164). In particular, Ohta (112) and Weiss et al. (164) proposed that the high degree of polymorphism at Ia loci could be explained by gene conversion. This view is based on the idea that if some parts of a sequence at a monomorphic locus are converted by another nucleotide sequence from another locus, polymorphism is generated at the first locus. Enhancement of polymorphism would occur even if both loci are polymorphic to some extent as long as the nucleotide sequences between the two loci are sufficiently different and gene conversion occurs in both ways between the two loci. The problem with this idea is that it does not explain why gene conversion starts to occur between two previously differentiated loci suddenly at an evolutionary time. The coexistence of Ia and Ib genes in the same DNA region is also difficult to explain. If gene conversion occurs continuously between the two loci, the extent of polymorphism should be essentially the same for the two loci. Furthermore, if phylogenetic analysis is conducted for the alleles from different loci, there would be no monophyletic clades formed for each locus. In reality, this is not the case. Klein & Figueroa (62) and Kriener et al. (65a) critically examined data that seemingly supported the idea of concerted evolution and concluded that the evidence is weak. They argued that some of the data showing the identical gene segments between paralogous pairs of genes can be explained by co-ancestry of the segments or even clustered mutations (65a, 169).
The idea of gene conversion was weakened considerably when Hughes & Nei (50, 52) showed that MHC polymorphism is primarily caused by overdominant selection that operates at the peptide-binding region of MHC molecules. This finding made it unnecessary to invoke gene conversion as an explanation for MHC polymorphism. It also provided a theoretical basis for the concept of trans-specific (long-term) polymorphism previously discovered (5, 28, 62, 66, 85). In a phylogenetic analysis of MHC class I and class II genes, Hughes & Nei (52, 53, 97) showed that the evolutionary pattern of these genes was very different from what would be expected under concerted evolution. The tree for class I genes from a number of vertebrate species is presented in Figure 4a. It indicates that different orders or families of mammals often have different genes or genetic loci. For example, the classical loci A, B, and C are shared only by hominoid species (e.g., human, gorilla, and orangutan), but the New World monkeys (e.g., tamarin) and nonprimate mammals do not have the genes. Similarly, cats and mice have different Ia loci. In other words, different families or orders of mammals do not have truly orthologous genes. This evolutionary pattern indicates that some genes were generated by gene duplication and some duplicate genes were lost after the divergence of mammalian orders (51, 97). Actually, the genomic regions of human and mouse MHC genes contain a large number of pseudogenes (58), exactly as would be expected under the birth-and-death model (Figure 1c). Also, this conclusion makes sense biologically because the genetic variability at MHC loci is generated to defend the host from many new types of parasites. Gene conversion and unequal crossover are not essential for this purpose, though they may occur.
Figure 4a also shows that a few nonclassical loci as indicated by (b) diverged from Ia loci a long time ago and now have different functions, as discussed below. This type of acquisition of new functions by duplicate genes is also an important feature of birth-and-death evolution that is not shared by concerted evolution. Phylogenetic analysis of class II region genes showed essentially the same evolutionary pattern as that of class I genes (53). However, the rate of gene birth and gene death is much lower in the class II gene family than in the class I gene family (97, 126, 152).
Despite these findings, a substantial number of authors maintain that gene conversion or unequal crossover is an important factor for generating polymorphism (12, 114, 124). Some investigators invoked gene conversion to explain the identity of a single or a few nucleotides between different alleles at the same locus or different paralogous sequences (24, 162, 167). Actually, there is evidence that intralocus gene conversion or recombination occurs occasionally (6, 46, 79, 84, 138, 162, 172). Exon exchange between different loci has also been documented (40, 51). Furthermore, there is some molecular evidence that interlocus gene conversion occurs at MHC loci (2, 33, 41, 45, 46, 56, 168). Hogstrand & Bohme (45) reported that the frequency of occurence of gene conversion between the Ab and Eb loci in mice is about 2 × 10−6 per sperm per generation. If this estimate is right, it is much higher than the rate of nucleotide substitution (of the order of 10−9 per generation). Therefore, the nucleotide sequences of the two loci, Ab and Eb, would become nearly identical in the long run. In reality, however, a phylogenetic analysis of allelic sequences from loci Ab and Eb has shown that alleles from each locus form a monophyletic clade and the alleles from different loci do not intermingle (40). This is also generally true with the genes from different organisms (40), indicating that the effect of gene conversion on MHC polymophism is quite minor. These results suggest that either the current estimate of gene conversion frequency is too high or many gene conversions observed in the mouse experiment are not fixed in the natural population because of selective disadvantage compared with wild-type alleles (40). Martinsohn et al. (80) conducted a critical examination of papers claiming gene conversions and concluded that gene conversions or interlocus recombinations do occur but those that enhance polymorphism are not proven. This conclusion agrees with that of Gu & Nei (40). Therefore, MHC polymorphism appears to be primarily generated by nucleotide substitution and positive selection.
Immunoglobulins are composed of heavy (H) chains and light chains, and the latter chains can be further divided into λ and κ chains. Each of the three groups of chains consists of constant and variable regions (63), and the polypeptides of variable regions in each category are encoded by a genomic cluster, called a variable region gene family. There are about 50 ~ 100 genes in each of the H, λ, and κ variable region gene families in human. About 50% of these variable region genes are pseudogenes (81, 118). All of these multigene families are subject to birth-and-death evolution (116-118, 141, 142, 148). The molecular structure and the genomic organization of T-cell receptors are similar to those of immunoglobulins, and the variable region gene families for different classes of T-cell receptor genes are also known to be subject to birth-and-death evolutions (147, 149, 150). Most of these gene families include many pseudogenes. Therefore, the death rate of these genes is quite high. In both immunoglobulin and T-cell receptor families, individual functional genes apparently maintain their own identity without much effect from interlocus recombination or gene conversion, because the branch lengths of individual genes are usually very long and sometimes correspond to tens of millions of years. In the case of immunoglobulin κ variable region genes, the human genome contains about two times as many genes as does the chimpanzee because of a DNA block duplication that occurred in the human lineage about 2 mya. In this case, the orthologous genes between the original DNA segment and the new duplicated segment can easily be identified (141). This supports the idea that intergenic recombination and gene conversion have little effect on sequence evolution. Note also that the extent of polymorphism in these genes is much smaller than that of MHC genes.
In addition to the above multigene families, the gene families concerned with innate immunity have also been shown to undergo birth-and-death evolution (42, 43, 107, 108). The evolutionary pattern of these genes is more complex than those of the adaptive immune system. For example, the natural killer (NK) cell receptors of humans [KIR: killer cell immunoglobulin-like receptor (KIR genes)] are composed of immunoglobulin-like domains, but those of rodent receptors (Ly49) genes are of lectin-type, and the molecular structures of these two groups of genes are very different (63). It is unclear how these different types of NK cell receptors originated in two different orders of mammals. Both KIR and Ly49 gene families are known to be subject to birth-and-death evolution (42, 43). KIR genes are also subject to domain shuffling as well as to mutational changes (9, 42, 129, 156, 166). Furthermore, the number of member genes of these gene families has expanded very rapidly by gene duplication during the past 20–30 million years (42, 43, 59). Yet, about a half of these genes are apparently nonfunctional. For example, the rat genome contains 33 Ly49 genes, but 16 of them are pseudogenes (42).
Many other immune systems gene families undergo birth-and-death evolution (Table 1). In fact, almost all immune systems genes except solitary or small-sized gene families appear to be subject to this model of evolution, and the rate of gene turnover is generally quite high, as expected from their function.
The MHC and immunoglobulin variable gene families are quite large, but the largest gene family in mammals is the olfactory receptor (OR) gene family. Olfactory (odor molecules) receptors are G-protein–coupled receptors that contain 7 α-helical transmembrane regions. OR genes are about 310 codons long and have no introns in the coding regions. These genes are expressed in sensory neurons of olfactory epithelia in nasal cavities. The human and mouse genomes contain about 800 and 1400 genes, respectively; over 60% of human OR genes are pseudogenes whereas only about 25% of mouse OR genes are pseudogenes (Table 2) (32, 34, 103, 106, 168, 175, 176, 179). These genes are scattered over many different locations of almost all chromosomes, and they are generally arranged as a tandem array in each genomic location. It is relatively easy to identify the orthologous gene pairs between humans and mouse by conducting phylogenetic analysis. This suggests that gene conversion or unequal crossover has not occurred frequently and that the number of OR genes apparently has increased by tandem gene duplication and chromosomal rearrangement (Figure 5). There are not many traces of transposition of genes mediated by ransposons.
As is shown in Table 2, the human and mouse genomes have about 390 and 1040 functional genes, respectively. To examine whether this is due to the loss of OR genes in the human lineage or the gain of genes in the mouse lineage, Niimura & Nei (105) estimated the number of functional OR genes that existed in the most recent common ancestor (MRCA) of humans and mice. The number obtained was approximately 754 genes. This indicates that the human lineage lost many functional OR genes, whereas the mouse lineage gained a substantial number of genes (Figure 6a). Some authors have suggested that mice require a larger number of OR genes because they are nocturnal and heavily dependent on olfaction, whereas humans do not need so many genes because they often use the visual sense for finding mates and food.
In reality, the ability to smell is controlled not only by the number of OR genes but also by the brain function for odor recognition, and the human brain possibly has a higher power of distinguishing between subtle differences of odor molecules than the mouse brain (140). At present, however, the mechanism of odor recognition in the brain is virtually unknown, and therefore we do not consider this factor in this paper.
As indicated in Table 2, mammalian OR genes can be divided into class I and class II genes. Previously, class I genes were thought to be for aquatic odorants, and class II genes for airborne odorants (29). In mammals and chickens, most genes belong to class II genes, whereas class I genes make up only 2.5% ~ 13%. It was later found that the zebrafish has one class II OR genes, although the search for OR genes in fish is still incomplete. Conducting phylogenetic analysis of OR genes from fish, X. tropicalis, chickens, and humans, Niimura & Nei (106) estimated that the MRCA of these species had at least 9 genes and that one of them generated mammalian class I genes and another generated class II genes (Figure 6b). Amphibians also have additional class I and class II genes, as well as five other groups of genes. Fish appear to have eight different groups of genes, but none of them appears to be very common. These results indicate that the currently dominant class I and class II genes in mammals are relatively recent products of multiple gene duplication events.
Pheromones are water-soluble chemicals emitted and are sensed by individuals of the same species to elicit reproductive behaviors or physiological character changes. In terrestrial vertebrates they are perceived by the vomeronasal organ (VNO), which is located at the base of the nasal cavity and is separated from the main olfactory epithelium. One supergene family that controls the VNO receptors is called the V1R (vomeronasal receptor 1) gene family, a member of which consists of about 330 codons without introns (25). The pheromone receptors are G-protein–coupled receptors as in the case of olfactory receptors, but there is little sequence similarity between the two proteins.
The mouse genome has about 350 genes, but the number of functional genes is 187. The rat genome has 102 functional genes and about 50 pseudogenes (39). Similarly, opossum and cow have a substantial number of functional genes. By contrast, the human has only 4 functional genes and nearly 200 pseudogenes. The number of functional genes in dogs is also quite small. The difference between the mouse and human genomes apparently occurred by massive pseudogenization of V1R genes in the human. In fact, some primate species including humans do not have functional VNOs and therefore are thought to have no perception of vomeronasal pheromone. The small number of functional V1R genes in humans seem to be relics of V1R gene pseudogenization. This pseudogenization in humans apparently occurred because humans use visual and auditory senses for sexual and physiological behavior. This example indicates that the number of copies of a multigene family can vary enormously even among different orders of mammals.
In insects a more advanced pheromone recognition system has developed. Mate finding in most moth species is achieved by female-emitted sex pheromones that are dispersed in a wide area. The key enzymes responsible for producing the pheromones are acetyl COA desaturases, which are encoded by a medium-sized gene family (133, 134). This gene family is also known to be subject to birth-and-death evolution.
As noted above, when genetic variation of multigene families was studied by restriction enzyme analysis, many gene families that are required to produce a large quantity of gene products were assumed to be subject to concerted evolution. One example is the histone gene family. In this gene family, even authors who studied nucleotide sequences maintained that histone gene families are subject to concerted evolution (82). This view arose from their preconception about gene conversion, as well as the fact that the number of sequences studied was small.
By the 1990s, however, a substantial number of sequences of histone genes had been accumulated from various species of animals, plants, fungi, and protists. Rooney et al. (136) and Piontkivska et al. (127) conducted an extensive statistical analysis of these data to examine whether the histone gene families are subject to concerted evolution or birth-and-death evolution. They reasoned that if concerted evolution is the main factor, both the number of synonymous differences per synonymous site (pS) and the number of nonsynonymous differences per nonsynonymous site (pN) must be virtually 0 for any pair of genes because gene conversion affects both synonymous and nonsynonymous sites in the same way. By contrast, if protein similarity is caused by purifying selection but every member gene evolves independently, pS is expected to be greater than pN because in this case synonymous substitutions accumulate continuously whereas nonsynonymous substitutions do not.
When this approach was applied to histone H3 and H4 genes from diverse groups of organisms, pS was clearly higher than pN in almost all cases (Table 3) (59a, 127, 136). Similar results were also obtained from an extensive study of the histone H1 gene family (26). These results therefore clearly show that the histone gene families are subject to strong purifying selection but all member genes evolve according to a birth-and-death process. Of course, occasionally there were cases in which pS and pN were both 0 or close to 0. For such a case,> there is a possibility that gene conversion has occurred. It can also be explained by recent gene duplication. Since the number of cases in which pS > pN were overwhelmingly large, Rooney et al. and Piontkivska et al. concluded that histone genes are generally subject to birth-and-death evolution with strong purifying selection.
Ubiquitin is a small protein consisting of 76 amino acids that plays a major role in both cellular processes and protein degradation in eukaryotes. It is one of the most conserved proteins, and 72 of the 76 amino acids appear to be invariant in animals, plants, and fungi (38). Ubiquitins are encoded by a small- to medium-size gene family, which comprises monomeric and polymeric genes. Monomeric genes consist of 228 nucleotides (76 codons) with an additional sequence that encodes a ribosomal protein (Figure 3b). By contrast, polymeric genes known as polyubiquitins (poly-u) are composed of tandem repeats of a 76-codon gene with no spacer sequence between them. The number of ubiquitin units in a poly-u locus, the number of poly-u loci, and the number of monomeric genes per genome vary considerably among eukaryotic species. Yet all ubiquitin genes encode the same amino acid sequence in each species.
These properties of the ubiquitin genes clearly indicate the importance of purifying selection. However, this multigene family was thought to be subject to concerted evolution until recently (88, 100, 139, 153, 159). This view rested partly on the preconception by many authors that homogenization of member genes of a family is caused by concerted evolution and partly on the fact that they generally worked with the genes from a few closely related species.
This view changed after 2000 when Nei et al. (99) conducted an extensive statistical analysis of all available data. The results of this study clearly indicated that it is purifying selection rather than concerted evolution that homogenizes protein sequences. In most species, nonsynonymous nucleotide differences among the member genes was 0, whereas the synonymous differences were virtually saturated (see Table 4). Of course, some pairs of member genes showed small nucleotide differences apparently because of recent gene duplication.
As noted above, the pair of inverted repeat genes hsp70Aa and hsp70Ab for heat shock proteins in Drosophila has virtually identical nucleotide sequences and therefore it probably represents one of the best cases of gene conversion. However, this does not mean that all hsp70 genes for this highly conserved protein gene family are subject to concerted evolution. Figure 4b shows a phylogenetic tree for hsp70 genes from D. melanogaster, two species of nematodes (Caenorhabditis elegans and C. briggsae), and yeast. The sequence identity of hsp70Aa and hsp70Ab is clear from this tree, but other Drosophila hsp70 genes do not necessarily show a pattern of concerted evolution, though heat-inducible genes (marked *) are generally closely related. For example, hsp70-2, hsp70-3, and hsp70-4 are quite different from hsp70Aa or hsp70Ab.
In C. elegans, the protein sequences of hsp70-7 and hsp70-8, a pair of inverted repeats, are identical, but other hsp70 genes are not necessarily closely related. Similarly, in C. briggsae, genes hsp70-7 and hsp70-14 are closely related to each other but other genes are not necessarily so. Actually, there are many C. briggsae genes that pair with C. elegans genes. These pairs of genes diverged before the separation of the two species, and therefore they have evolved independently for a long time (50 ~ 100 million years). Furthermore, yeast genes show no indication of concerted evolution. We can therefore conclude that the heat shock gene families have been subject to both concerted and birth-and-death evolution.
Essentially the same situation was observed with another highly conserved family of amylase genes in Drosophila (177). Some species (e.g., D. orena, D. lini, and D. kikkawai) of Drosophila have a pair of identical genes, but most other species do not. A phylogenetic analysis of 49 active genes by Zhang et al. (177) showed that some gene lineages were lost from the genome while some other genes were duplicated. These observations suggest that this gene family is subject to a mixture of concerted and birth-and-death evolution.
Historically, a well-known gene family that is apparently subject to a mixed process of concerted and birth-and-death evolution is the α-like globin gene family. The human genome contains 4 functional genes (α, α2, θ1, and ζ) and 3 pseudogenes (ψa1, ψα2, and ψζ). The α1 and α2 genes are virtually identical within hominoids (178), but other genes are considerably differentiated. In practice, the evolution of the globin families is much more complicated than previously thought (A.P. Rooney, unpublished).
Although single-locus genes can be the major determinants of some phenotypic characters such as color vision and the oxygen-transporting function of hemoglobins, most genetic systems or phenotypic characters are controlled by the interaction of multigene families. Here genetic systems mean any functional units of biological organization such as the olfaction (odor recognition) and adaptive immune system in vertebrates, flower development in plants, meiosis, and mitosis. Evolution of these genetic systems is obviously very complicated, but it appears that the only way to understand it is to know the evolution of each component multigene family and their interaction with other gene families using the simplest possible organisms. Note that if multigene families evolve following the model of birth-and-death processes, groups of new functional genes often evolve, whereas concerted evolution does not allow the functional differentiation of genes because all member genes are supposed to evolve as a unit in concert.
Previously, we noted that some MHC class Ib genes have functions different from those of Ia genes. For example, gene HFE in humans (Figure 4a) evolved independently of other class I genes and has acquired a new function. It now has the ability to form complexes with the receptor for iron-binding transferrins and thus regulate the uptake of dietary iron by cells of the intestine (27). A mutation of this gene is known to cause the genetic disease hemochromatosis. This is one example that was generated by a birth-and-death process. Figure 4b shows that heat shock protein genes are expressed in three different locations of a cell. These are also products of gene duplication and acquisition of new functions. Similarly, we have seen that olfactory genes diverged extensively from fish to mammals. In the following section, a few examples are considered in some details.
One of the best-studied cases of acquisition of new gene functions is the evolution of adaptive immune system in jawed vertebrates. In the adaptive immune system, lifelong immunity is maintained for certain groups of parasites (viruses, bacteria, fungi, and others) once the host is attacked by them (e.g., the immunity to smallpox viruses). However, the jawless vertebrates and other nonvertebrate animals do not have this system, though most animals have the so-called innate immune system, which defends the host from parasites but does not retain memory of past attacks (63).
How did the adaptive immune system evolve? This is still a mystery and is currently under active investigation (14, 64, 157). However, it is well known that this system works through the interaction of many different multigene families such as the MHC, immunoglobulin, and T-cell receptor gene families. Most of these multigene families are evolutionarily related and are apparently products of long-term birth-and-death evolution. Therefore, it seems that continuous operation of birth-and-death evolution has generated a new genetic system. Klein & Nikolaidis (64) argued that the adaptive immune system evolved by assembling elements that have evolved primarily to serve other functions and incorporated existing molecular cascades, resulting in the appearance of new organs and new types of cells.
Homeobox genes are member genes of an important supergene family that control animal and plant development. They encode transcription factors and can be divided into two groups; Typical and Atypical groups. Typical homeobox genes contain a homeobox of 60 codons, whereas Atypical group genes have a homeobox with either a few more or less codons (13). The Typical group includes several dozen gene families. A well-known example is the HOX gene family that plays a key role in animal body patterning. This family has also undergone birth-and-death evolution (1). Another example is the PAX6 gene family that controls the eye development (31). The Atypical group includes about seven gene families, five of which are called the TALE group. The TALE group genes are characterized by three extra codons between the helix 1 and helix 2 regions. These gene families were all concerned with some aspects of development in eukaryotes and were derived from a common ancestor that existed before the separation of animals, plants, and fungi. In other words, these diverse gene families were generated by successive gene duplication and differentiation over a long period of time. Interestingly, occasional loss of paralogous genes also appears to contribute to the differentiation of phenotypic characters (J. Nam & M. Nei, unpublished).
Another example of the origin of new genetic system by birth-and-death evolution is the evolution of the olfactory system in vertebrates. We have seen that fish and mammals have quite different OR gene families, which have apparently adapted to receive different odorants that are available. Furthermore, class II OR genes are a relatively new invention in vertebrate evolution, and this gene family contained an enormously large number of genes. Yet, many of the genes are pseudogenes. It is now known that the class II OR genes can be divided into many subfamilies, each specialized to receive a given group of orders such as good smell or bad smell (77, 140). Because this system falls within the realm of neuroscience (175), it is outside the scope of this review.
Flowers of angiosperms (flowering plants) are composed of sepals, petals, stamens, pistils, etc., and differ from poorly developed flower-like organs in gynosperms (seed plants). The development of flower organs are controlled by transcription factor genes called MADS-box genes. There are several classes of MADS-box genes that are essential for flower development (76, 155, 163). In a phylogenetic analysis of MADS-box genes, Nam et al. (93) predicted that MADS-box genes controlling flower development (floral MADS-box genes) originated about 650 mya. Tanabe et al. (154) identified floral MADS-box like genes in three species of green algae, which are believed to have originated about 700 mya. If we note that the oldest fossil records of angiosperms and gymnosperms are about 150 and 300 million years old, respectively, it appears that the ancestral genes of floral MADS-box existed a long time ago before the flowering system evolved. Tanabe et al. speculate that this group of genes originally controlled the development of haploid and diploid stages of green algae. MADS-box genes are ancient genes and are known to exist in plants, animals, and fungi. In animals they control muscle development. In the process of evolution of gymnosperms and angiosperms, however, different MADS-box genes appear to have evolved to form flowers (93).
In this paper we have discussed the controversy over the models of concerted and birth-and-death evolution. The model of concerted evolution was originally thought to apply to gene families that are responsible for producing a large quantity of the same gene products, as in the case of rRNA genes. However, the production of a large quantity of the same gene products can also be achieved by strong purifying selection without concerted evolution. In fact, the histone and ubiquitin genes use this strategy, and the underlying DNA evolution occurs as a birth-and-death process. It is therefore important to distinguish between purifying selection and concerted evolution in producing homogeneous gene products. In the past, a number of authors have assumed that gene products are homogenized only by concerted evolution.
There are some exceptions to the above statement. Relatively small gene families with strong purifying selection such as the heat shock protein and amylase gene families undergo a mixed process of concerted and birth-and-death evolution. In these families, a pair of inverted gene sequences appears to be particularly susceptible to gene conversion or gene conversion-like events that homogenize the pair of genes. It is possible that gene conversion occurs more easily for inverted gene pairs. However, other member genes are apparently subject to birth-and-death evolution.
The gene families that produce a variety of gene products are usually subject to birth-and-death evolution. This is quite reasonable, because this model of evolution promotes genetic variation. There are reports indicating that the extent of genetic variability is enhanced by gene conversion or unequal crossover. However, these processes are primarily for homogenization of member genes, and the interpretation of data supporting this idea should be reexamined. This does not mean that gene conversion or unequal crossover never occurs in these genes. Actually, there is some evidence for the occurrence of gene conversion or domain shuffling in MHC genes. However, the contribution of these events to the diversification of multigene families in long-term evolution seems to be minor. The controversy over the concerted and birth-and-death evolution has occurred partly because of misunderstandings and misconceptions. Since each author usually works with one gene family from a limited number of species, the results cannot be blindly extended to other genes or other organisms. It is important to examine each gene family carefully and derive objective conclusions.
We have indicated that the model of birth-and-death evolution would give a reasonable explanation of generation of new gene families but the model of concerted evolution cannot. However, how different subgroups of a gene family acquire new functions is not yet well understood. Since the generation of new gene clusters occurs largely by chance, the initial stage of evolution of multigene families could be fortuitous. However, the newly generated duplicate genes or gene families may evolve to interact with other existing gene families and promote the adaptation of organisms to new environments. In the future it will be important to examine the relative contribution of positive selection and random genetic drift in the evolution of multigene families.
We have also indicated that many genetic systems or important phenotypic characters are controlled by the interaction of a number of multigene families and therefore we may be able to understand the evolution of new genetic systems or new phenotypic characters by studying the evolution of component multigene families. We may then be able to study the interaction between different gene families. This will be one of the most important problems in evolutionary biology in the future.
In the 1960s little was known about the multigene families. Studying the rates of gene duplication and formation of pseudogenes, Nei (95) stated “there may be a great deal of duplicate genes and also nonsense DNA in today's vertebrates.” He also stated that “higher organisms including man have ample scope to evolve into various directions.” The first prediction is now confirmed, but the validity of the second prediction remains to be seen.
We would like to thank Dan Graur, Li Hao, Jan Klein, Eddie Holmes, Jongmin Nam, Yoshihito Niimura, Naruya Saitou, Yoshiyuki Suzuki, and Jianzhi Zhang for their comments on an earlier version of the manuscript. This was supported by NIH grant GM020293-32 to M.N.
*The U.S. Government has the right to retain a nonexclusive, royalty-free license in and to any copyright covering this paper.