A high density genomic scan was used to compare several aspects of the structure of genetic variation in a genetic isolate (AJ), and a representative population of northern European ancestry (CEU). Prior studies of haploid regions of the genome, including mitochondrial DNA and the nonrecombining portion of the Y chromosome, support the hypothesis that AJ are genetically distinct, with little recent admixture with host European populations [40
]. While the relative endogamy of AJ has been postulated based on historical documentation of population bottlenecks and the isolation of AJ in Europe, initial analysis of LD structure in AJ compared to Europeans showed modest increases in LD [36
]. In those studies, comparisons between AJ and Europeans were made based on LD in two 1 Mb regions per chromosome, typed with 16 SNPs per region [37
], or were restricted to a single chromosome [36
]. In contrast to these studies which utilized about one SNP per 62.5 kb across the genome [37
] or 2,589 SNPs at a density of one marker per 13.8 kb on a single chromosome [36
], in the current study, a high density analysis of genetic variation in AJ was performed utilizing 435,632 SNPs at a mean density of one SNP per 2.5 kb (median one SNP per 5.8 kb) across the entire genome. While our study also found regions where AJ show greater LD than CEU (e.g. the region analyzed by Service et al. [36
]), LD structure was highly variable with CEU showing greater measure of LD on other chromosomes.
Our analysis reveals small but significant differences in measures of genetic diversity between AJ and CEU. The mean value of FST
, a useful measure of overall genetic divergence among subpopulations, was only 0.009, but because of the large number of SNPs typed, this small value is nevertheless highly unlikely to occur by chance (P
< 0.001). The biological interpretation of FST
values as high as 0.05 generally indicate negligible genetic differentiation [44
], underscoring the power of dense SNP genotyping for detecting evidence of historical isolation despite small genetic differences. The areas of greatest difference as assessed by FST
were on chromosome 2 and chromosome 6, presumably involving the HLA region on chromosome 6 and the LCT locus on chromosome 2. The differences in the HLA regions are also consistent with the SNP-by-SNP comparisons. Distinct patterns of LD in the HLA region have been observed in AJ [46
] and selective sweeps at LCT have been shown in European populations [47
]. Associations have been made between specific HLA alleles and several disorders [48
]. Very recently, a consortium group has derived an FST
value comparing Northern Europeans to Ashkenazi Jews, also on the Affymetrix 500 K platform [51
]. They report a value of 0.009, identical to that observed here. Together this study and the current study begin to create an Ashkenazi specific "HapMap" and also begin to define subsets of markers sufficient to distinguish these populations.
Because of the founder effect, inflated sampling variance of haplotype frequencies in turn results in inflated variance in LD. For SNPs that are very close to one another, if the parent population had high LD, the AJ could witness lower LD due to this sampling. Most SNPs at somewhat greater distances would have had nearly zero LD, and it is the founding effect that produced the observed greater LD observed in AJ at intermediate SNP distances. Because haplotype blocks are generally inferred from the relatively close, high-mutual LD set, this sampling variance could be expected to erode the size of these blocks. When actual measures of genome-wide LD structure in the two populations were compared, haplotype blocks inferred from pair-wise LD statistics as well as by EM haplotype phase inference did indeed tend to be smaller in AJ. Analysis of global LD decay showed essentially no difference between AJ and CEU, although there was a tendency for faster decay of nearby SNPs and slower decay of intermediate distance SNPs in the AJ. These data are more consistent with the AJ as an older, larger population than CEU, and would suggest that the LD structure of AJ may not provide a global advantage for whole-genome association mapping. In contrast, however, the proportion of SNP pairs in CEU showing no evidence of recombination (D' = 1) among SNP pairs at different distance intervals was greater in CEU than in AJ only at short distances, with the AJ generally showing more LD at longer distances. Similarly, analysis of local LD, for example the analysis of LD decay at chromosome 1, revealed a similar pattern, with AJ showing slower decay (more LD) at longer distances. In addition, a likelihood ratio approach showed that runs of homozygous SNPs were approximately 25% longer in AJ than CEU, which is more consistent with expectation in a genetic isolate. These aggregate data suggest that CEU, like AJ, underwent a population bottleneck, but given that the overall diversity of AJ is not lower than that of CEU, the greater homozygous tract lengths in AJ imply an average shorter time back to common ancestry for the AJ sample.
The data presented here demonstrate that founder effect advantages for AJ as applied to LD mapping will be regionally variable. A recent analysis of LD structure utilizing 2,486 SNPs on chromosome 22 [36
] revealed generally greater average r2
in AJ individuals, leading to the conclusion that association analyses in groups like AJ would require at least 30% fewer markers than studies in outbred populations [36
]. The analysis reported herein, based on more than twice the SNP density, revealed a pattern of LD on chromosome 22 that is virtually identical to that observed by Service et al[36
]. However, analysis of other loci (e.g., chromosome 1) as well as a global genome analysis revealed significant variability in local LD structure. Thus, in undertaking LD mapping for gene discovery in AJ, regional variability lending a founder effect advantage will occur in only some regions of the genome.
To explore the basis of the differences in LD structure noted here, it is important to consider possible sources of ascertainment bias resulting from the selection of AJ subjects or those in the comparison CEU group. In this study we utilized an American AJ cohort of women without a history of breast cancer, while a prior study utilized an Israeli AJ cohort [37
]. Shifman et al. [37
] found very high similarity of allele frequencies (r2
= 0.96) comparing AJ and Caucasian individuals. In contrast, an r2
of 0.83 was observed in this study. This latter value did not change using major allele frequencies, or if SNPs were filtered based on HWE violations by Fisher's exact test or Spearman's Rho test. These findings suggest possible biases due to population stratification, or alternatively, that the genomic measures employed more accurately reflect true allele frequency differences than in the prior study. Notably, the American AJ samples used here and the Israeli AJ samples in the ascertainment of Shifman et al. [37
] appear to have similar proportions of SNP pairs showing no evidence of recombination (D
' = 1) for pairs less than 5 kb apart (81% in our U.S. samples versus 76% in the prior series). This suggests a general comparability in the AJ sample sets with regard to LD structure.
However, for the comparison group of "European" samples, there were striking differences; the proportions of SNP pairs showing no evidence of recombination (D
' = 1.0) for pairs less than 5 kb apart was 86% in our series versus 63% in the prior series. All or part of the comparison ascertainment in the prior series' [36
] samples was from the NIGMS Human Genetic Cell Repository at Coriell, whereas we utilized the CEPH reference families derived from Utah residents of European ancestry. It is therefore possible that differences in our findings and those of prior studies are a result of these differing ascertainments of those of European ancestry. Because of polygyny and founder effects associated with the Utah Mormon genealogies [52
], this population may not serve as a representative European comparison group. However, gene frequency data, including red cell antigen and HLA loci, were similar between CEU and northern European cohorts in an early study [52
]. Similarly, a more recent study of LD structure showed nearly identical patterns, although only regions comprising 14.3 Mb of the genome were compared [54
The data presented here are consistent with a hypothesis that the high level of similarity of patterns of LD between AJ and CEU results from the same historical events that shaped the extended LD in these two populations. The finding of regions of slower LD decay (greater LD) in AJ for distant SNPs is not readily explained, and suggests the possibility of ancestral admixture. It is clear that regions of LD around pathogenic mutations in AJ can be quite large, extending up to 10 Mb, consistent with their more recent origin. While the historical record is subject to interpretation, demographic considerations and analysis of coalescence times of founder mutations suggest at least three periods of founding and expansion of AJ, one greater than 100 generations (20 centuries) ago, marking the founding and expansion of the Jewish population in the Middle East, one approximately 20 generations (five centuries) ago [8
], marking a constriction resulting from persecution and the Plague, and subsequent expansion of AJ in central Europe, and, finally, a more recent event approximately 12 generations (three centuries) ago marking the constriction of AJ in Europe as a result of renewed persecution, and subsequent re-expansion.
Not all founder mutations in AJ resulted from the most recent bottlenecks; the I1307 K allele of APC
, for example, seen in both Sephardic as well as Ashkenazi Jews, dates to the initial bottleneck from 100 generations ago [9
]. Given these demographic and historical observations, it is perhaps not surprising that the LD map of the Ashkenazim reveals features both of its ancient origins (a greater number of smaller sized haplotype blocks) but also greater endogamy (increased size of homozygous regions identical by decent) compared to Europeans. It is also likely that the local genomic differences observed between AJ and CEU (regional differences in Fst
and local differences in LD decay) reflect the impact of both selection as well as genetic drift. Analysis of specific regions of local difference by tests of evolutionary neutrality will be needed to explore selective effects, which have recently been documented in European, Chinese, and African populations [56
]. The predominant impact of founder effects in AJ, however, is evidenced by the documentation of pathogenic mutations for more than 20 heritable diseases, including heterozygous syndromes (e.g. hereditary breast and ovarian cancer), where there is less precedent for selective advantage than for carriers of recessive traits [8
Whether the genetic characteristics of AJ, revealed here to be complex and showing local differences in LD structure, will prove to be helpful in genomic association studies remains to be determined. Based on computer modeling, it has been demonstrated that if haplotypes were introduced by a small number of founders, LD will be greater in isolated compared to outbred populations [57
]. In the case of extreme genetic isolates, extended LD was clearly evident around even common alleles when compared with neighboring populations [58
]. While clearly advantageous for mapping rare alleles with population frequencies less than the reciprocal of the effective number of founding chromosomes, initial experience indicates that gene mapping advantages may be limited in discovering alleles associated with complex disease [4
]. In that setting, rare mutations may be present on an extended haplotype, as a result of one or several original founding chromosomes carrying the particular mutation. More common alleles may also enter small founder populations multiple times resulting in lengths of shared haplotypes around these alleles that are indistinguishable from the larger ancestral population [57
Despite these potential limitations of LD mapping in AJ, SNP-based LD mapping successfully "rediscovered" BLM
in proof-of-principle exercises using AJ cohorts [11
]. While genomic association studies in large outbred populations are seeking to map loci for common cancer susceptibility genes, it remains to be seen if this same approach using AJ will benefit from local increases in LD around candidate loci. Based on this preliminary LD map of AJ, the advantage of genome-wide association studies in AJ compared to CEU are likely to be modest and highly dependent on regional LD structure.