To investigate broad-scale geographic differentiation between continental regions we assessed the fraction of SNPs in each CD risk locus that fall in the top 1% of the genome-wide FST distribution (assessed from over 2 million randomly chosen SNPs) for that between-continent comparison. Further, that fraction of high FST SNPs at each locus was assigned a significance (P) value based on a genome-wide random sample of 11,000 loci.
The regions around CD risk loci are much more likely to show significant intercontinental differentiation than are SNP loci randomly chosen across the genome. Averaged over all CD loci, the proportion of high FST
outliers exceeded 1% in all three between-continent comparisons (Table S1
). In our bootstrapping procedure, thirteen of fifty-four (~24%) of the regions flanking independent CD risk loci are significantly differentiated (have more than expected high- FST
SNPs, P<0.05) for at least one between-continent comparison. This may be a slight overestimate due to the fact that flanking regions overlap in a few loci.
The differentiation of the CD risk network stands out most prominently from other loci when comparing European and Asian samples (). In this case, the simplest comparison is the overall fraction of high- FST
SNPs across all regions. For Europe and Asia, 1.5 percent of SNPs across the regions flanking CD risk loci exceed the 1 percent genome-wide threshold (Table S1
). In other words, fifty percent more high- FST
SNPs are found in these regions than expected from the genome-wide distribution. These SNPs are linked into regions and so the appropriate significance test on the network is based on the proportion of regions with an excess of high FST
SNPs. High FST
SNP loci are especially clustered within nine of the chromosomal regions around CD risk-associated loci. Nine of these regions are significant in a Europe-Asia pairwise comparison genome-wide at P<0.05, two of which are significant at P<0.01 (). This high proportion of high FST
blocks is strongly significant (P<0.01) when considering the total number of comparisons.
Genome-wide high-FST ratio decay and non-HLA risk loci values.
The Europe-Africa comparison also shows a highly elevated fraction of high FST
SNPs. However, these are clustered mainly in four of the chromosomal regions (). The number of regions showing elevated FST
between Europe and Africa is therefore not significant. The four high FST
regions include two loci that show significant evidence of selection by long-range linkage 
. To the extent that these loci do show evidence of selection, our comparison across the network does not show that other loci have been significantly selected in the history of European and African population differentiation.
We consider this pattern of between-continent results a rejection of the null hypothesis (H0) that the CD background risk network has experienced the same pattern of evolution compared to the genome as a whole. The results point in an interesting direction. Loci identified by earlier long-range haplotype tests as recently selected (such as the SH2B3 locus on chr 12) account for a high fraction of the high- FST SNPs in all comparisons. Furthermore, evidence of elevated FST across the broader CD network that has not been shown to be subject to positive selection is concentrated between Europe and East Asia. Importantly, the loci with elevated FST between Europe and East Asia are, for the most part, not the same loci that are highly differentiated between Europe and Africa. Therefore, we may be picking up a signal of selection on standing variation, not strong positive selection, in addition to previously demonstrated evidence of recent positive selection on two of these loci. Evolution of the CD risk network within Europe might account for some of these observations.
We turned to within-continent comparisons to address whether evolution of the CD risk-associated loci occurred uniformly over time. Previous work suggested that a proportion of autoimmune genetic risk (including CD) may reflect positive selection on the immune system within the last 10,000 years 
. If the selection that we identified with between-continent comparisons were very recent, we might expect additional loci to show high- FST
fractions. If there were more recent (Holocene) selection in Europe or East Asia, we might expect additional loci to show up as significant outliers when comparing populations within each continental region. What we observe is the opposite. Within Africa, approximately five independent regions show a significant excess of high- FST
SNPs. Within Asia, only four regions have a significant excess of high- FST
SNPs. In Europe there is a strong deficit of high- FST
SNPs. No within-continent comparisons show a significant (P<0.05) excess of regions with a significant (P<0.05) ratio of high- FST
This result may appear to contradict previous evidence of strong recent selection on at least four of these loci 
. The contradiction might be a unique property of the evolution of these loci for which our comparisons, directed to the broader issue of selection across the entire network, may be less informative. We chose to investigate one of these loci further. The Iceman genome 
provides an alternative test of recent selection on the CD risk network. The strongest prior evidence of selection in this network is the risk variant rs3184504 in SH2B3 reported by Zhernakova and colleagues 
. This locus is represented in the data from the 5,300-year-old Ötzi genome. Zhernakova and colleagues estimated an age for this locus using the EHH statistic of only 1,200–1,700 years ago. This age estimate makes the clear prediction that the iceman genome should not have the risk allele. However, Ötzi is a heterozygote carrying this allele. For this reason, we propose that the evidence of selection on this locus may actually pertain to a time period well before 5,300 years ago.
We do not know the extent to which GWAS SNPs contribute to CD risk, but the Iceman genotypes can provide a rough test of positive selection across the network even without this information. The Ötzi genotype was drawn from the European population of 5,000 years ago. If the ancient population was different from today’s European population in the frequencies of SNP alleles associated with CD risk, then Ötzi will carry some genotypes that may be unlikely given their frequencies today in Europe. Ötzi is a heterozygote at nine out of forty-nine total GWAS risk sites (mean heterozygosity ≈ 0.184). The average heterozygosity in the Ötzi genome across these loci is identical to the minimum heterozygosity among Europeans in the 1000 Genomes Project sample today and closest to the mean heterozygosity observed in Africa (). Low coverage in the Ötzi genome could lead to a bias in the ascertainment of heterozygous sites due to chance. We tested whether a coverage bias could explain the reduction in heterozygosity across the GWAS risk sites (avg. coverage
5.6) by randomly generating sets of genotypes based on allele frequencies in modern Europeans and used these sets of genotypes to resample reads from each genotype randomly based on read number in each risk site in Ötzi. The resulting distribution of over one million randomly generated average heterozygosities demonstrates a marked reduction in ascertainment of heterozygous sites (). Nonetheless, the average number of heterozygous sites that we measure for Ötzi remains in the bottom of the heterozygosity distribution for Europe. Because Ötzi is an outlier compared to present-day Europeans for genotypes across these CD risk loci, we cannot reject the hypothesis that strong positive selection has affected the frequencies of any large proportion of these loci during the past 5,000 years.
Mean observed heterozygosity across CD GWAS SNPs from 1000 Genomes and Iceman.
Estimated effect of low coverage in the Iceman genome on mean heterozygosity at GWAS loci.
As a separate means of assessing a signal of recent positive selection across the total network of CD risk loci, we utilized existing long-range haplotype data in the form of normalized integrated haplotype scores (iHS) 
. We performed a randomization test that measures the intensity of maximum iHS signals in the CD network compared to randomly generated sets of loci, genome-wide. The genome-wide distribution of randomly selected sets of loci include whatever fraction of loci in the genome actually was selected recently, therefore this test is conservative. This test was performed on European, East Asian, and Bantu iHS datasets downloaded from the UCSC genome table browser (see methods). Interestingly, there is no evidence of out of the ordinary mean iHS scores among the European (p
0.114) or Bantu (p
0.236) samples. In contrast, the East Asian sample p-value (0.031) is in the upper tail of the genome-wide distribution (). These results suggest that the non-HLA autosomal network of genes underlying CD risk may have been under recent positive selection in East Asian populations.
Within-continent mean normalized iHS values in CD network compared to genome-wide sample.