It is interesting to consider the time depth of this differentiation. By removing first-, second- and third-degree relatives, we controlled for structure arising from mating behaviour in the past ~120 years, and thus focused on the patterning arising earlier than this. The signal of structure is stronger when we include close relatives, but also persists when we use more stringent thresholds for relatedness (not shown), indicating that the patterns arise from ancient shared ancestry within the villages compared to their neighbouring subpopulations. Inbreeding and more-distant shared parental ancestry will also contribute to among-population differences. We used the genomic measure FROH9
to remove all individuals with total shared parental ancestry equivalent to one second-cousin pedigree loop (FROH
>0.015), whereas including inbred subjects led to more obvious structuring (not shown). Thus, the observed structure in each of the populations arises partly from recent relatedness and shared parental ancestry and partly from deeper patterns of kinship within the subpopulations, overlaid with mating among them and now with immigrants.
To explore how many markers are required to recover these fine scale patterns of structure, we ranked SNPs by FST
among villages and repeated the PCA for the most differentiated subsets of 30
000, 3000 and 300 SNPs in each population. In all three populations, 10
000 or more high FST
SNPs recovered an essentially identical picture to that using the full data set, and even 3000 SNPs preserved considerable separation between the villages (not shown). Using only the most discriminating 300 SNPs, little structure could be observed between the two Croatian villages; however, in Scotland and Italy one of the three settlements included in each location remained completely differentiated from the other two (not shown). We note that these results are only indicative of the minimum number of SNPs required to separate these populations, as by necessity SNPs have been selected intrinsically on the basis of FST
within the same data set, rather than extrinsically from other data.
The slightly lower differentiation of the Croatian villages is not surprising given the fact that they are physically the closest of those considered here, being 8
km apart, with only low hills separating them. In contrast, the settlements in the Scottish Isles and Italy are separated by 15–30
km of sea in the former case, and of 3000
m mountains in the latter, although there are deep connecting valleys.
Such fine-scale differentiation is consistent with the highly non-random nature of human mate choice over the millennia. The average distance between the birthplaces of spouses in rural parts of Finland, the Po valley in northern Italy and the isles of Scotland in the nineteenth century was ~1.5–3
Such close endogamy was probably the norm in rural Europe due to lack of transport or economic opportunities. The breakdown of these isolates has since dramatically altered the population structure.11
The exquisite structure preserved in the genomes of people with all grandparents from the same settlement demonstrates that very detailed genetic and geographical ancestry information can be obtained by genome-wide SNP analyses. This provides novel opportunities, under certain circumstances, to predict the micro-geographical origin of an individual. Genetic association studies that include rural populations must also model this genetic structure, but it is not a barrier to gene discovery.12
When whole-genome sequences become widely available, the ability to use many more variants, including rarer ones, to identify short shared genomic segments will perhaps allow routine identification of regional ancestries, given a suitably large and carefully collected reference sample.