|Home | About | Journals | Submit | Contact Us | Français|
Human isolates have been postulated as a good resource for the identification of QTL due to reduced genetic diversity and a more homogeneous environment. Isolates may also have increased linkage disequilibrium (LD) due to small effective population size and, either loss or increase in frequency of alleles that are rare in the general population from which they originate. Here we investigate the difference in allele and genotype frequencies, LD and homozygous tracts between an isolate—several villages from the island of Vis in Croatia—and an outbred population of European origin: the Hapmap CEPH founders. Using the HumanHap300 v1 Genotyping BeadChip, we show that our population does not differ greatly from the reference CEU outbred population despite having a slightly higher proportion of monomorphic loci, a slightly higher long-range LD, and a greater proportion of individuals with long homozygous tracts. We conclude that genotyping arrays should perform equally well in our isolate as in outbred European populations for disease mapping studies and that SNP–trait associations discovered in our well-characterized Croatian isolate should be valid in the general European population from which they descend. Genet. Epidemiol. 34: 140–145, 2010. © 2009 Wiley-Liss, Inc.
Human isolates have been postulated as a good resource for the identification of QTL due to reduced genetic diversity and a more homogeneous environment. Isolates may also have increased linkage disequilibrium (LD) due to small effective population size and, either loss or increase in frequency of alleles that are rare in the general population from which they originate [Wright et al., 1999].
Here we investigate the difference in allele and genotype frequencies, LD and homozygous tracts between an isolate—several villages from the island of Vis in Croatia—and an outbred population of European origin: the Hapmap CEPH founders (Utah residents with ancestry from northern and western Europe, CEU).
Croatia has 15 Adriatic Sea islands with population greater than 1,000. The villages on the islands have unique population histories and preserved isolation from other villages and outside world through centuries. The village populations of these islands represent well-characterized genetic isolates [Bennett et al., 1983; Rudan et al., 1999; Rudan et al., 1992]. Komiza and Vis, on the island of Vis, have excellent church and census records that show evidence of very limited immigration from other populations and this is supported by the very high endogamy (calculated as the percentage of grandparents born in the same village as the participant) estimated for the villages: 91% for Komiza and 85% for Vis. Several rare autochthonous Mendelian diseases occur in these Adriatic islands [Bakija-Konsuo et al., 2002; Saftic et al., 2006] where at least four highly unusual rare genetic variants segregate [Barac et al., 2003; Borot et al., 1991; Tolk et al., 2001; Turcinov et al., 2000]. Each one of these findings is generally consistent with the hypothesis that all affected (carrier of a particular variant) chromosomes descend from a single founder.
60 founders from the CEPH European sample have been genotyped by the HapMap project.
For comparison we selected 60 unrelated (based upon their pedigrees obtained from church/parish records) and healthy individuals from our study population from the island of Vis, Croatia (CROATIA, for description see Vitart et al. ).
The Croatian samples were genotyped using the Illumina's Sentrix HumanHap300 Genotyping BeadChip (v1) comprising 317,503 SNPs. Genotypes for these same SNPs for the 60 CEPH founders were obtained from Illumina Inc.1
In both populations, we excluded markers with less than 90% call rate in each population (9,075 in CROATIA, 71 in CEU, 9,088 in total) and markers on the sex chromosomes (9,173), leaving 299,242 SNPs. This set of SNPs was used to produce the results presented except when otherwise stated. We estimated the proportion of loci segregating in each population, the allele and genotype frequencies and the proportion of loci out of Hardy-Weinberg equilibrium (HWE). Data were analyzed using PLINK 1.04 [Purcell et al., 2007], R [R Development Core Team, 2008] and custom-made software.
Pairwise Fst statistics were calculated for all SNP markers segregating in both populations. A mean Fst was calculated for all markers and for the subset with minor allele frequency (MAF) 40.05 in both populations. We studied the sampling properties of Fst estimates by bootstrapping as suggested by Weir et al. . For a window of size 5 Mb (2.5 Mb to each side of each available SNP), we used 1,000 bootstrap samples to obtain the distribution of Fst estimates for that window, and obtained the mean Fst (Fstb) and 95% confidence intervals.
LD was estimated as r2 and D', for all segregating pairs of SNPs less than 10 Mb apart, and for the subset of these SNPs that were in HWE (P-value ≥0.01) and with MAFs ≥0.05 in both populations.
Homozygous tracts over 200 kb in length were recorded by counting the number of consecutive homozygous SNPs including monomorphic SNPs. We allowed one heterozygous (i.e. one potential genotyping error) and one missing SNP genotype per 200 kb segment around each analyzed SNP, and recorded the positions of the start and end SNP for each tract. Tracts with an average density of less than one SNP per 50 kb or with less than 10 SNPs in total were excluded to avoid regions of low SNP coverage such as the centromeric regions. This analysis was implemented in PLINK 1.04, using a sliding window of 200 kb.
The average inter-marker distance for adjacent marker pairs was 9,285 bp (range 1–22072916) (Figure S1). Table I summarizes the results on allele frequency in each population. In total 170 SNPs were monomorphic in both populations. The average MAF was similar at 0.25 for CROATIA and 0.26 for CEU; however, the distributions of allele frequencies differed. CROATIA exhibited an excess of loci in the 0–0.05 MAF range compared to CEU, but the trend was the opposite for loci with MAFs of 0.05–0.15. For MAFs of 0.15–0.5 the populations were very similar (Figure S2). Mean heterozygosity was also similar for the populations (34.17% (range 0–78.33%) for CROATIA and 34.98% (0–71.67%) for CEU). CROATIA had an excess of loci with lower heterozygosity (range 0–10%) compared to CEU but CEU had an excess of loci in the range 10–20%, with the rest of the distribution being similar (Figure S3).
We used the exact test for HWE described in Wigginton et al. . 3.41% of loci showed P-values o0.05 in CROATIA and 2.77% in CEU when only loci with MAFs 40.05 in both populations were used (for all loci, these figures were 3.25 and 2.64%, respectively), so no more loci were found to be out of HWE in either population than expected by chance.
Average Fst for these populations was 0.014 both when loci polymorphic in both populations were tested and when only loci with MAF 40.05 in both populations were used, indicating very little differentiation [S. Wright, 1978]. Overall the populations are quite similar, but several loci show high Fst values (Figure S4). A group of markers with Fst 40.25 is located on chromosome 2; it spans around 1.8 Mb and includes SNPs in the Lactase (LCT) gene. Results were similar when using all polymorphic loci or using only loci with MAF 40.05 in both populations (Figure S5). Weir et al.  suggest that values of Fstb greater than the chromosome average plus three Fstb standard deviations reflect ‘‘truly exceptional differences’’ between populations. Again, the region around the LCT locus is highlighted by this method, and so are additional regions on chromosomes 3, 6 and 8, although less clearly.
Table II summarizes the results on LD for SNPs less than 10 Mb apart, in HWE (P-value ≥0.01) and with MAFs ≥0.05 in both populations. The proportions of marker-pairs either in perfect or “useful” LD are slightly higher for CROATIA than for CEU. Figure 1(a) shows plots of LD decay (average r2 for a given inter-marker distance, with markers distances grouped in 250 bp bins) with distance (up to 1 Mb) for chromosome 18 (gene poor). For this and the other autosomes (data not shown), CROATIA exhibits slightly higher r2 than CEU, and that is more evident for distances greater than 200 kb, where both populations seem to reach an “equilibrium long-range LD.” Figure 1(b) shows the moving average of r2 along chromosome 18. Again, consistently CROATIA shows higher r2 than CEU, for this chromosome but also for the remaining autosomes (data not shown). In supplementary materials, we have included the same figures for chromosome 19 (gene rich) (Figure S5).
For homozygous runs greater than 200 kb, the average total homozygous tract length per individual was 145,191.8 kb for CROATIA and 122,790.6 kb for CEU. We grouped homozygous tracts in seven bins of increasing size, 200–500 kb, 500–1 Mb, 1–2 Mb, 2–5 Mb, 5–10 Mb, 10–20 Mb and over 20 Mb. CEU individuals showed an excess of shorter tracts when compared to CROATIA individuals, but the opposite trend was true for longer tracts (2Mb and over). Figures S6 and S7 give more details on the tract length and number distributions. In total, 13 individuals had individual tracts longer than 20 Mb. 12 of these individuals belong to CROATIA, whereas only one belongs to CEU. We compared locations of tracts longer than 2 Mb, that are found more often in CROATIA than in CEU, and observed that, in general, tracts found in CEU are also found in CROATIA (Figure S8).
The aim of this study was to characterize an isolated population from the Dalmatian Island of Vis in Croatia in terms of allele frequencies, homozygosity and LD, and compare it to the CEU Hapmap sample. Our analyses revealed that although both populations show very similar average MAF, the Croatian isolate has a larger proportion of monomorphic loci than CEU, and a larger proportion of markers in the [0–0.05] MAF range. This trend is reversed for loci with MAFs in [0.05–0.15] and, for MAFs in [0.15–0.5] both populations are very similar. The Human-Hap300 v1 Genotyping BeadChip was designed based on CEU samples, and SNPs were selected to be relatively common (with allele frequencies >0.05) in that population and to exhibit low pairwise LD between each other. This could partly explain why CEU shows a deficit of SNPs with low allele frequencies when compared to CROATIA and a similar trend may have been observed if a different population to our study population had been compared to CEU, using the same panel of SNPs. It could also explain the relative enrichment of the CEU sample for SNPs with MAFs in [0.05–0.15]. Nonetheless, our findings are also consistent with a higher level of inbreeding (and low effective population size) of the Dalmatian isolate. We could hypothesize that a small number of population founders together with isolation (and drift) are responsible for a higher frequency of monomorphic loci in CROATIA when compared to CEU. We could further hypothesize that these same phenomena may potentially have caused some rare SNPs present in the general European population to be lost or conversely increase in frequency in the isolate. The total proportion of markers with MAF <0.10 in both populations studied is slightly above the range reported by Service et al.  for a range of 11 population isolates. Their study used a set of around 2500 SNPs located on chromosome 22, and that might account for the difference observed. Average heterozygosity was also very similar between CROATIA and CEU, and also similar to that reported by Service et al.  for their populations. Again, looking at the distribution of heterozygosity, we observe differences between CROATIA and CEU, with the former showing an excess of SNPs with lower heterozygosity with respect to the later, consistent with the differences in the distribution of allele frequencies. When testing for HWE, we did not observe more loci in disequilibrium than expected by chance, and that probably reflects the good quality of the genotyping. We proceeded to compare individual SNP allele frequencies and overall both populations were very similar, with the exception of few groups of SNPs that had Fst values >0.15. Among these loci is the LCT gene, which is known to have different allele frequencies across Europe [Bersaglieri et al., 2004; Burton et al., 2007]. We did not find reports describing differences for the remaining loci, which consist mostly of SNPs grouped within the same regions—rather than individual SNPs—and cover from few tens of kb to in excess of 1 Mb (for chromosome 6). Differences at some of these loci could be the result of (and are consistent with) the population having been founded by few individuals and/or of genetic drift.
We estimated LD between pairs of loci located less than 10 Mb apart for both CROATIA and CEU using the same sets of markers, both using all loci with a call rate > 90% in both populations or excluding loci with MAF <0.05 and HWE P-value >0.01. Summary results for these sets of markers did not differ significantly within populations. We have presented in Table II the proportions of marker pairs in perfect LD or “useful” LD, and these are slightly higher for CROATIA. We have also shown that consistently, CROATIA exhibits slightly higher r2 than CEU and that for distances greater than 200 kb, both populations reach what we call an “equilibrium long-range LD,” that is also slightly higher for CROATIA. Higher long-range LD could reflect that the population may have undergone a relatively recent bottleneck [Tenesa et al., 2007] and therefore would exhibit reduced variation (i.e. more monomorphic loci and lower heterogeneity). We used a genotyping array that has been designed to avoid genotyping markers in very high LD with each other. To explore how SNP ascertainment influenced our LD results, we downloaded the r2 estimates for chromosome 22 for CEU from www.hapmap.org (Phase II data, pairwise r2 estimates for SNPs up to 200 kb apart), and summarized these data as detailed for Table II (Table SI). We observed that results for our set of markers show consistently a lower proportion of markers with high r2 than the whole HapMap data set for all ranges of distances, which is consistent with the SNPs having been chosen to avoid high LD among them.
We scored homozygous tract length for each individual, and we show that both the average total tract length and the average count of tracts longer than 2 Mb is greater in CROATIA than in CEU, but this trend is reversed for shorter tracts, probably because longer tracts are broken down given the difference in number of monomorphic SNPs and SNPs with lower allele frequencies between the two populations. Gibson et al.  showed that “long (over 1 Mb) homozygous tracts are relatively common even in the unrelated individuals from the outbred populations represented in the HapMap samples” and that they are usually located in regions of low recombination rate. They also claim that very long tracts of homozygosity, particularly if not associated with regions of low recombination, are likely to be a signature of recent inbreeding. 13 individuals in our study display very long (over 20 Mb) tracts of homozygous SNPs. Only one of these individuals (NA12874) is from the reference CEU outbred population. This individual has already been reported as having a particularly long tract and a higher total tract length when compared to the other CEU samples by Gibson et al. . They suggest that his parents are likely to share a recent common ancestor. All 12 Vis individuals displaying very long tracts also have higher than average total tract length and have both their parents born in the same village on the island of Vis (either Komiza, Vis, Okljuèna, Podhumlje or Podšpilje) except for one (for whom parental origin is unknown) so these individuals are probably the offspring of somehow related ancestors.
In all, using the HumanHap300 v1 Genotyping Bead-Chip, we have shown that our population does not differ greatly from the reference CEU outbred population, but has a slightly higher proportion of monomorphic loci, a slightly higher long-range LD, and a greater proportion of individuals with long homozygous tracts. These are consistent with genetic drift and high levels of endogamy, and with the demographic history of the isolate described by Vitart et al. . We can extrapolate that the trends we observe for genotyped loci will remain for untyped loci, and therefore conclude that genotyping arrays should perform equally well in our isolate than in outbred European populations for disease mapping studies. Furthermore, and as highlighted as well by Thompson et al.  and Van Hout et al. , susceptibility alleles should be the same in the isolates as in outbred European populations, so any findings made in those more homogeneous (in terms of environment) and well-characterized populations should be valid in the general European population from which they descend.
HapMap data: http://www.hapmap.org/
P. N. was partly supported by the Genes to Cognition Program from the Wellcome Trust. A. F. W., C. H., C. S. H., N. D. H., P. N. and V. V. acknowledge support from the MRC. Croatian genotyping was performed by the Wellcome Trust Clinical Research Facility Genetics Core and funded by MRC Human Genetics Unit. S. A. K. acknowledges support from RCUK. L. Z. was partly supported by The National Foundation for Science, Higher Education and Technological Development of the Republic of Croatia.