The familiar textbook view of balancing selection stresses the most dramatic cases, with alleles maintained for very long evolutionary times. Balancing selection is often portrayed as “diversifying,” meaning that there is an advantage to new alleles, as with plant self-incompatibility (S
) alleles, where the frequency-dependent selective advantage of rare pollen and pistil types is well understood to maintain many alleles [6
], or fungal incompatibility alleles [8
], whose selective maintenance remains unclear, despite evident similarity to plant SI.
When the same alleles persist for long times, balancing selection may be detectable from its effects at nearby neutral sites. The population genetics of balancing selection shows that, as well as maintaining diversity at the selected sites themselves (generally maintaining different amino acids), it increases diversity at closely linked neutral sites [10
]. Regions of genome close to a site under balancing selection, which rarely recombine with the selected site(s), will have common ancestors longer ago than other regions (longer coalescence times), because migration of variants between allelic classes depends on recombination. This high diversity is not due to diversifying selection, since systems with just two states, such as sex-determining genes, where selection on sex ratio gives the rarer sex an advantage, with no diversifying selection, can also be maintained in the long term (though sometimes a sex-determination system is replaced by a new one [13
]). This example clearly illustrates the evolution of high diversity. The divergence of the X and Y chromosomes were once homologues. With the acquisition of sex-determining functions and loss of recombination, genes on these chromosomes now have, in several taxa, higher sequence divergence than between related species [14
If different functional types of alleles at a locus persist long enough, each allele class can acquire its own unique set of neutral mutations, each associated with the class in which it arose, until eventually recombination causes “migration” into a different allele (reviewed in [17
]). The region around alleles of functionally different types can thus differ at multiple non-selected sites, so that polymorphism will be higher than in unlinked genome regions, over a distance depending on the local recombination frequency, and variants in the region will show linkage disequilibrium (LD) due to associations between functionally different alleles [11
High diversity can thus provide evidence for balancing selection. In plant species with CMS, large frequency differences of females in natural populations, and differences in the frequency of restoration of male fertility when females from one population are pollinated by males from elsewhere, indicate highly variable frequencies of the genetic factors involved. This might reflect regular turnover of the sterility and restorer factors, in an arms race [19
], or perhaps frequency oscillations [3
]. However, high diversity has been found in sequences of a mitochondrial gene within populations of Silene acaulis,
a plant with CMS [21
], which excludes turnover of cytoplasmic genotypes, or in prolonged periods of low frequency for any of these genotypes. In this species at least, the male-sterility polymorphisms must therefore have been maintained for long times.
The CMS case is extreme, because, like sex chromosomes, mitochondrial genomes rarely recombine, since heteroplasmy is rare. Even with recombination, however, considerable sequence diversity can exist several kilobases from a selected site, in systems with many different alleles (). Long-term maintenance of honeybee sex-determining alleles may be one such case, with high amino acid and synonymous site diversity [22
]. Nucleotide diversity is also extremely high throughout the sequences of multi-allelic pistil recognition genes of plants with gametophytic self-incompatibility, e.g. [23
], and in the pistil and pollen S
-loci of species with sporophytic incompatibility [26
]. Recombination rates between the pollen and pistil S
-loci are not known, but may be low, because selection against self-fertile recombinants is likely to be strong.
Sequence Diversity Expected at Neutral Sites at Different Distances from a Site under Balancing Selection
If host–pathogen co-evolution leads to long-term maintenance of variation, this should therefore be detectable from these “footprints” at nearby silent sites and marker loci, even if we are unable to classify the functional types of alleles and determine their number (though fewer alleles are expected than for incompatibility loci). Some loci known to be involved in defence processes indeed have high sequence polymorphism. One such locus in Arabidopsis thaliana,
is estimated to have nucleotide diversity above 4% for synonymous sites, and even for non-synonymous ones [28
], much above the average for this species (<1% for synonymous sites [29
]). These genes are difficult to study, because they are often members of gene families, and it is essential in studying polymorphism to be sure that the sequences are from a single locus, and to exclude “migration” from paralogous genes, which might occur by gene conversion or other exchange processes.
If exchanges between alleles are frequent, or allele numbers are not large, even long-term balancing selection causes high differentiation between alleles only very close to the selected sites [12
], while exchanges erode differences at synonymous and intron sites elsewhere in the gene (). It may thus be difficult to distinguish between long-term balancing selection with recombination, and short-term maintenance of alleles (the likely situation for allozyme loci, discussed later). Recombination also implies that tests may fail to detect selected loci by searching for high diversity genes, e.g., [31
]. Loci will be missed where selection has not acted for long enough, or exchanges are too frequent, to allow for diversity to build up between alleles.
Recombination clearly occurs at the histocompatibility (MHC) loci [32
]. Although their diversity per nucleotide site is only a few percent [33
], this is exceptionally high for human sequences (though much lower than diversity in plant or fungal incompatibility gene sequences). These genes' much-cited high allelic diversity largely results from recombination between differentiated haplotypes, and this differentiation clearly indicates long-term balancing selection. Arguments against MHC alleles being maintained by overdominance are based on the difficulty of maintaining large allele numbers [35
], but although numbers of functionally different alleles are currently unknown they must be lower than haplotype numbers.