An estimated 65% of human adults (and most adult mammals) downregulate the production of intestinal lactase after weaning. Lactase is necessary for the digestion of lactose, the main carbohydrate in milk [1
], and without it, milk consumption can lead to bloating, flatulence, cramps and nausea [2
]. Continued production of lactase throughout adult life (lactase persistence, LP) is a genetically determined trait and is found at moderate to high frequencies in Europeans and some African, Middle Eastern and Southern Asian populations (see Additional File 1
and Figure ).
Interpolated map of Old World LP phenotype frequencies. Dots represent collection locations. Colours and colour key show the frequencies of the LP phenotype estimated by surface interpolation.
The most frequently used non-invasive methods for identifying the presence of intestinal lactase are based upon detecting digestion products of lactose produced by the subject (Blood Glucose, BG) or gut bacteria (Breath Hydrogen, BH). For both methods a lactose load is administered to the subject following an overnight fast. In individuals producing lactase this leads to a detectable increase in blood glucose. In individuals who are not producing lactase, the undigested lactose will pass into the colon where it is fermented by various gut bacteria, producing fatty acids and various gases, particularly hydrogen. Hydrogen passes through the blood into the lungs and so can be detected in the breath using a portable hydrogen analyser. Both the BG and the BH tests have asymmetric type I and type II error rates. Thus any study seeking association between a particular polymorphism and LP should take these error rates into account. In addition it should be noted that while in most cases the presence/absence of intestinal lactase in an adult is likely to be genetically determined, the loss of lactase can also be caused by gut trauma such as gastroenteritis [3
]. Other non-invasive methods for detecting the presence/absence of lactase include assaying for urine galactose and detecting metabolites of Carbon-14-labelled lactose. These methods are rarely used today. The most reliable method is intestinal biopsy, which provides a direct determination of intestinal lactase activity. However, this procedure is very rarely used for diagnosing healthy individuals because of its invasive nature [7
With the recent discovery of nucleotide changes associated with LP comes the prospect of direct genetic tests for the trait [8
]. However, it has become clear that there are multiple, independently derived LP-associated alleles with different geographical distributions [1
]. LP is particularly common in Europe and certain African and Middle Eastern groups. As a consequence these are the regions where most genetic studies have been focused and all currently known LP alleles have been identified [7
]. The first allelic variant that was shown to be strongly associated with increased lactase activity is a C>T change 13,910 bases upstream of the LCT gene in the 13th
intron of the MCM6 gene [13
]. Functional studies have indicated that this change may affect lactase gene promoter activity and increase the production of lactase-phlorizin hydrolase mRNA in the intestinal mucosa [14
] but, as with all LP-associated variants, there remains the possibility that linkage to as yet unknown causative nucleotide changes may explain observed associations. Haplotype length conservation [18
], linked microsatellite variation [19
] and ancient DNA analysis from early European farmers [20
] later confirmed that this allele has a recent evolutionary origin and had been the subject of strong positive natural selection. Furthermore, a simulation model of the origins and evolution of lactase persistence and dairying in Europe has inferred that natural selection started to act on an initially small number of lactase persistent dairyers around 7,500 BP in a region between Central Europe and the northern Balkans, possibly in association with the Linearbandkeramik culture [21
]. Another simulation study has inferred that it is likely that lactase persistence selective advantage was not constant over Europe, and that demography was a significant element in the evolution and spread of European lactase persistence [22
However, the presence of this allele could not explain the frequency of LP in most African populations [8
]. Further studies identified three additional variants that are strongly associated with LP in some African and Middle Eastern populations and/or have evidence of function, all are upstream of the LCT
gene in the 13th
intron of the MCM6
gene: -13,907*G, -13,915*G and -14,010*C [11
]. Where data were sufficient, some of these alleles also showed genetic signatures of a recent origin and strong positive natural selection [12
Although at least four strong candidate causative alleles have been identified, only a small number of populations have been studied, and those are confined to Europe, Africa and the Middle East. It is therefore unlikely that all LP-associated or LP-causing alleles are currently known. As a consequence, genetic tests based on current knowledge would underestimate the frequency of LP in most world populations. As part of the first study to seek a genetic explanation for the distribution of LP in Africa [8
], a statistical procedure (GenoPheno
) was developed to test if the frequency of an LP-associated allele could explain reported LP frequency in ethnically matched populations. Crucially, this statistical procedure was designed to account for sampling errors and the asymmetric type I and type II error rates associated with different phenotype tests (BH and BG).
In this study we have sought to extend this approach to the whole of the Old World. However, while there is a rich literature on the frequencies of LP in different geographic regions [1
] and a growing body of publications reporting the frequencies of candidate LP-causing alleles, in most cases the genetic and phenotypic data are not from the same people and often not of closely neighbouring groups. Thus, characterization of the extent to which LP frequency can be explained by current knowledge of LP-associated genotype frequencies is limited to populations where both data types are available. To overcome this problem we performed surface interpolation of various data categories (genetic, phenotypic, sample numbers, phenotype tests used and their associated error rates) and applied the statistical procedures described on a fine grid covering the Old World landmass. This has allowed us to identify regions where reported LP-associated allele frequencies are insufficient to explain the presence of LP. These regions should be good candidates for future genotype/phenotype studies.