Our analysis of population structure through both PCA and Bayesian approaches reiterates previously reported patterns of genetic structure across Europe. PC1 clearly differentiates northern from southern Europe, whereas PC2 differentiates eastern from western Europe.28, 29, 30
What was perhaps less well described was the subtle structuring of variation across Britain and Ireland. Through PCA we were able to resolve the Irish population from both the Scottish and English populations, consistent with previous observations based on a similarly sized data set.29
Further, we observed subtle differentiation between Scotland and England, as previously illustrated in WTCCC data.12, 31
A previous report of the European population structure that was focused primarily on mainland Europe reported interpopulation FST
values of 0.003.28
It is interesting to note that across Britain and Ireland, FST
values were an order of magnitude lower (average English/Irish/Scottish value=0.0005). Although intermediate between Irish and English populations by PCA, our Scottish population seems genetically more similar to the English than to the Irish population using FST
. This result is in keeping with the geographical proximity between Scotland and England and the sharing, therefore, of more historical and prehistorical influences than with Ireland. The degree of sharing between Scotland and Ireland or England is probably structured according to geography. It would be interesting, for example, to quantify genetic sharing between western Scottish regions (eg, Argyll or Galloway) and Ireland or, conversely, between the Border region and England. Evidence for genetic structure within Scotland exists from Y-chromosome analysis, which reveals both a shared ancestry between eastern Scottish and eastern English samples, such as ours, and a similarity between Scotland and Ireland to the exclusion of England (JF Wilson, unpublished data).
The degree of differentiation observed in this study is conservative, given that samples were not collected on the basis of ancestry, but rather on the basis of residence only. Given the massively increased level of population mobility in the last century, one would expect increased differentiation if sampling were restricted to individuals with all four grandparents from particular regions of interest.
The Utah population is known to have a majority English ancestry. Our results are consistent with this; the HapMap CEU and southern English populations being virtually indistinguishable using both FST and PCA.
In agreement with the population structure results, our comparisons of LD illustrated close parallels between the populations tested in this study. At the level of D
′ and r2
, the populations seemed indistinguishable, consistent with our own and other authors' previous results.10, 32
However, it is interesting to note that using the LDU parameter, we observed increased levels of LD and reduced numbers of ‘LD holes' in Irish and Swedish populations (see , ). However, it should be stressed that the increase in LD is marginal. It is well known that population isolates show significantly increased levels of LD. Classic isolates typically generate LDU maps (of chromosome 22) with lengths in the region of 400–700 units.19, 21
Although the corresponding LDU map length of Ireland (858 units) and Sweden (818 units) is shorter than that of Scotland, southern England and Utah CEPH (894–903 units), it is still significantly longer than that of a typical isolate. In this sense, the concept of the general
Irish population as anything approaching a population isolate can be dismissed. However, reflecting the situation across Europe, populations showing characteristics of a genetic isolate probably exist within the rural communities of Ireland, or on islands off the mainland. Known European examples of genetic isolates located close to cosmopolitan populations include the town of Rucphen in the Netherlands and the Orkney Islands off Scotland. Focusing research on rural Irish communities would shed further light on this question.
Analysis of ROH is a powerful method to gauge the extent of ancient kinship and recent parental relationship within a population. This is because ROH arise from shared parental ancestry in an individual's pedigree. The offspring of cousins have very long ROH, commonly over 10
Mb, whereas at the other end of the spectrum, almost all Europeans have ROH of ~2
Mb in length, reflecting shared ancestry from hundreds to thousands of years ago. By focussing on ROH of different lengths, it is therefore possible to infer aspects of demographic history at different time depths in the past.22
We used FROH
measures to compare and contrast patterning across populations. These measures are genomic equivalents of the pedigree inbreeding coefficient, but do not suffer from problems of pedigree reconstruction. By varying the lengths of ROH that are counted, they may be tuned to assess parental kinship at different points in the past. We used two different measures, FROH1
, which includes all ROH over 1
Mb and hence includes information on recent and background parental relatedness, and FROH5
, which sums ROH over 5
Mb in length, more typical of a parental relationship in the last four to six generations.22
results indicate slightly elevated levels in the Irish and Swedish populations (compared with southern England, Scotland and HapMap CEU) of both the overall number of ROH and the proportion of genome in ROH (see ). This pattern was exaggerated when we restricted analysis to ROH greater than 5
Mb in length (ie, FROH5
, see ), indicating increased levels of parental relatedness in the last six generations in the Irish and Swedish populations compared with other populations tested in this study. When we remove individuals with ROH over 5
Mb from the FROH1
analysis (Supplementary Figure S5), Ireland remains as the population with the most homozygous runs and the longest sum length of homozygosity. This provides further evidence that the elevated proportion of shorter ROH, and hence the number of ancient pedigree loops in Ireland, is indeed real and not driven by a limited number of offspring of cousins.
Famine and mass emigration may have driven the increased levels of autozygosity in the Irish population. However, we consider it likely that the increased levels we have observed are at least partially attributable to the genetic remnants of ancient Gaelic patrilineal dynasties,33, 34, 35
in combination with the traditionally agricultural nature of Irish society. Ireland was not affected by industrial revolution to the same extent as Britain. Industrial revolution has been associated with mass migration from rural to urban communities and an expansion of effective population size. The absence of such a pattern in Ireland would have resulted in an extended adherence to primogeniture-style inheritance of land, with frequently only one adult sibling from each family being enabled to marry and reproduce by accession to farm ownership. Such patterns would have restricted growth in effective population size. However, a potential confounder in this study is the sampling scheme: at least in areas with little mobility, recruiting primarily from rural as opposed to urban areas could increase levels of autozygosity, as could sampling a group of people who were born earlier.36
Similar to Ireland, Sweden has also traditionally been an agricultural society. Although the industrial revolution drove urbanization in Sweden towards the end of the 19th century, the more recent time to population bottleneck (as indicated by the LDU map) and the resulting rapid expansion during the 20th century might be contributing to the lack of diversity we observe in our data. Further, work taking these variables into account will be necessary to fully understand the causes underlying the patterns we see.
The results of our HD analysis are consistent with expectations from our understanding of European population history. They recapitulate the famous cline in allele frequencies first noted by Ammerman and Cavalli-Sforza37
and associated by them with the spread of Neolithic farmers across the continent. Data from both Y chromosomes38
reveal the same patterns. Simulations have since shown that a cline in both allele frequencies and genetic diversity from southeast to northwest Europe would also be expected from the original dispersal of hunter–gatherers into Europe.40
It is likely that both processes contributed, along with postglacial expansions from southern refugia, for example, in Iberia, where diversity is clearly higher in our data set. Similar geographical patterns are seen using SNP heterozygosity.28
Our analysis is the first to reveal that, at the autosomal level, this diversity gradient extends across Britain and Ireland, reaching a low point in Scotland and Ireland, at the edge of the Atlantic.
In summary, our results illustrate a subtle genetic structure across Britain and Ireland in the context of the comparatively homogenous nature of the European genetic pool. We have observed slightly elevated levels of LD and genome-wide homozygosity in Ireland and Sweden compared with neighbouring British and European populations, although these levels do not approach those of traditional population isolates. Similarly, we have illustrated a decrease in HD in Britain and Ireland, more so in Scotland and Ireland than in England. All these characteristics can be advantageous for genetic mapping. A reduced structure moderates the issue of cryptic stratification, although appropriate corrective steps should always be taken. Elevated LD and fewer LD holes will enhance the efficiency and power of genome-wide association study platforms. Reduced HD should reflect the disease allele architecture. Increased ROH will improve power for the identification of recessive effects. Aside from promoting Ireland and Scotland as resources for genetic mapping, the elevated levels of kinship illustrated in Ireland, and potentially in Scotland, would in theory make these particularly amenable populations to apply long-range phasing and haplotype imputation methods for rare variation.41