Genetic factors underlying both complex and Mendelian diseases should, under most circumstances, be affected by selection, because most diseases will have an effect on organismal survival or reproduction. One exception might be diseases with late onset under the assumption that older individuals do not contribute to the fitness of their offspring. Examples of negative selection acting on disease-causing mutations include mutations in GBA
causing Gaucher disease77
, mutations in nucleotide-binding oligomerization-domain-containing 2 gene (NOD2
; also known as CARD15
) causing Crohn disease78
, mutations in CMH1
, and CMH4
causing familial hypertrophic cardiomyopathy79
, and a host of other genetic disorders that are caused by de novo
mutations or segregating recessive mutations. If negative selection alone is operating, the disease mutations are expected to segregate at low frequencies, and to be predominantly recessive. However, there are rare examples of partially dominant disease mutations segregating, such as in some forms of familial hypertrophic cardiomyopathy79
, in cases in which the fitness effects of the mutations are relatively low.
In some cases, disease-causing mutations can segregate at relatively high frequencies (for example, diabetes32,80–82
), which is not easily explainable if only negative selection has been acting on the disease mutations. Possible explanations for this include balancing selection, for example, in the case of mutations in the G6PD
locus or in the α-globin gene, which cause G6PD enzyme deficiency and sickle-cell anaemia, respectively, in the homozygous state, but can confer partial protection against malaria in the heterozygous state83,84
. Another example is provided by mutations in the CFTR
locus, which causes cystic fibrosis in the homozygous state but protects against asthma in the heterozygous state85
Another possible explanation for the segregation of disease alleles at moderate or high frequencies is that genetic drift has acted on mutations that have only moderate fitness effects, possibly exacerbated by bottlenecks in the population size, as suggested for Gaucher disease in Askhenazi Jews77
. Yet another explanation is that there might have been a recent change in the direction of selection. For example, according to the popular thrifty-genotype hypothesis86
, selection has originally worked to maximize metabolic efficiency, especially in population groups that often encountered a scarcity of food. With (evolutionarily) recent dietary changes, the direction of selection might have been reversed, causing many common alleles that are now related to metabolic diseases and/or diabetes to be selected against. There are also other reasons why disease genes might be associated with positive selection; for example, an increased frequency of moderately deleterious mutations due to genetic hitch-hiking25
during a selective sweep.
There is considerable interest in further elucidating the relationship between heritable diseases and selection. Bustamante et al.3
compared human genetic variation across more than 11,000 protein-coding genes that were re-sequenced in 39 individuals (19 African Americans and 20 European Americans). Comparing polymorphism and divergence between humans and chimpanzees at synonymous versus non-synonymous sites, they quantified the amount of positive or negative selection acting on each gene, including both current selection and that which has occurred during the shared evolutionary history of humans and chimpanzees. The study showed that mutations in evolutionarily constrained genes are disproportionately associated with heritable disorders. Specifically, although less than 12% of known genes have been associated with a Mendelian disorder, genes that show plentiful amino-acid variation in human populations (at least four amino-acid- replacement SNPs), but no divergence between humans and chimpanzees, have a 50% chance of causing at least one Mendelian disease. This is the expected pattern in genes if negative selection is acting on new mutations.
These results have been extended in a recent analysis by R. M. Bleckham et al.
(unpublished observations) aimed at identifying differences in selective constraint among Mendelian disease genes, genes that contribute to complex disease and genes that are not associated with a disease. This study used the same data as in the Bustamante et al.3
study, in addition to divergence data from Human–Macaque comparisons to quantify selective constraints. They correlated their findings with a hand-curated version of the Mendelian Inheritance in Man database (OMIM), and concluded that Mendelian disease genes tend to be more constrained than those that contribute to non-Mendelian disease, with stronger purifying selection acting on genes with dominant rather than recessive disease mutations. As previously demonstrated by Thomas et al.87
, they also found that genes that are implicated in complex diseases tend to be under less purifying selection than either Mendelian disease genes or non-disease genes, with some showing evidence of recent positive selection as reflected by high values of Tajima’s D
. This might be taken as support for the thrifty-genotype hypothesis, but could also be consistent with balancing selection acting on these genes. A recent survey by Zlotogora88
discusses 14 common autosomal recessive diseases that show genetic heterogeneity, even in isolated populations. They argue that the pattern observed of multiple mutations segregating at high frequency can be adequately explained only if selection is, or has been, acting in favour of the mutations.
Whatever the role of positive and negative selection in explaining the presence of disease mutations, it is clear that there is a well-established relationship between disease status and selection that can be exploited when searching for the genetic causes for heritable diseases. This can be done at two levels: by selecting candidate loci using bioinformatical methods for detecting selection, or by prioritizing candidate SNPs or haplotypes by ranking them according to the magnitude of their inferred fitness effects. The latter methodology is already well-established in the use of computational methods for determining levels of conservation, such as SIFT89