The genetics of Jewish populations, particularly that of Ashkenazi Jews, has been studied extensively to answer questions of human evolutionary, historical, and medical significance [1
]. Human evolutionary or anthropological studies have typically focused on mitochondrial DNA (mtDNA) or Y-chromosomal data, because the absence of recombination in these regions of the genome allows researchers to infer past human behaviors and evolutionary events such as migrations, founder events, population bottlenecks or expansions, relative male and female contributions to an admixed population, marriage practices, and mode of transmission of languages [12
]. However, medical research necessitates the use of autosomal data. The depth of data collection and the necessary characterization of subpopulations to control for population stratification during case-control association studies provide a unique resource to augment mtDNA and Y-chromosomal studies and to facilitate the investigation of selection events. For population groups in which group identification is based on cultural practices rather than geographic origin (such as religion for the Jews or Spanish language for Hispanics), the hazard in neglecting such structure may be particularly great in medical genetics studies [16
Y-chromosomal and mtDNA studies of Jewish populations and their local host populations have, at times, provided conflicting results, but can be summarized as supporting the following: 1. Almost all Jewish populations are derived from Middle Eastern ancestral populations [3
]; 2. Bottleneck events have had an effect on the gene pools of Jewish populations [2, 4-6, 21]; 3. Local female contribution was significant in the establishment of Yemenite, Ethiopian, and Indian Jewish populations [6
]; 4. Local male contribution has been less significant for the establishment of most Jewish populations [23
], but may have contributed more to Ashkenazi than to non-Ashkenazi populations [3
Several large-scale studies using autosomal markers demonstrated substructure among European populations, specifically non-Jewish Northern European, non-Jewish Southern European, and Ashkenazi Jews [24
]. Additionally, based on haplotype analysis, recent mtDNA surveys of Ashkenazi and non-Ashkenazi Jewish populations and non-Jewish host populations demonstrated substructure among Jewish populations [6
]. Although Jewish populations other than Yemenite, Ethiopian, and Indian have not been entirely endogamous, local admixture from host populations, the amount of which varies among populations, has generally occurred at low levels. These historical events may contribute to population structure and stratification that should be taken into consideration in the analysis of data from association studies.
Using thousands of SNPs and principal components analysis (PCA), Seldin et al [25
], Price et al [24
], and Tian et al [26
] found "Northern" and "Southern" components in non-Jewish European populations, which followed a gradient from Northwest Europe to Southeast Europe or North to South, depending on the SNPs used. However, they also reported that both Ashkenazi and Sephardic Jewish samples showed, on average, more than 85% ancestry from the "Southern" component, regardless of grandparental country of birth. They concluded that this reflects a Middle Eastern origin of both Southeast Europeans and Ashkenazi Jews, which both admixed subsequently to varying extents with populations already occupying Europe. A recent study analyzing a large set of autosomal SNPs [10
] using PCA demonstrated that not only is it possible to cluster Ashkenazi Jews separately from non-Jewish Europeans but also that the number of Ashkenazi Jewish grandparents determined where a sample fell on the PCA plot relative to non-Jewish Europeans. Recently, using a large number of STRs and several clustering methods, Kopelman et al [28
] showed that four Jewish populations (Tunisian, Moroccan, Turkish, and Ashkenazi) clustered together and intermediate to other European and Middle Eastern populations. In all cases, the authors attributed these clustering patterns to the partial and shared Middle Eastern ancestry of Jews.
Middle Eastern ancestry may be a common factor among Jewish populations; however, the majority of Jewish populations have been located outside of the Middle East for up to 2000 years. As is the case with other highly mobile human populations there has been historically documented gene flow between Jewish populations and local host populations. In addition, because these are populations defined, in part, by religion, gene flow into Jewish populations is a product of conversion as well as marriage. Thus, there should be genetic admixture in Jewish subpopulations that reflects, in part, their migratory histories and may contribute to current genetic differences among Jewish populations. It is known that detecting and quantifying recent admixture is dependent on the time since divergence of the putative parental populations as well as the number and information content of markers. Because clustering algorithms are also dependent on the relative differences between populations, the context of a sample in a given analysis (i.e., the extent of its difference from samples of other populations included in the analysis) can affect clustering patterns. This aspect of the process of population substructure detection may be overlooked in case control association studies and may affect results if not taken into consideration. Based on this, we hypothesized that the presence or absence of putative parental populations in a STRUCTURE analysis would affect the ability to detect substructure in Jewish populations and differences between Jewish populations.
To address this question thoroughly prior to conducting association studies of health behaviors among Israeli Jews, we examined population structure in Jewish populations of European, African, Middle Eastern, Central and South Asian origin. We genotyped 526 subjects, recruited in Israel, with 32 genome-wide unlinked microsatellite markers (STRs). To identify potential population structure in the Jewish population being studied, we also genotyped 254 individuals from self-identified Chinese, Thai, Ethiopian Jewish, African American, and European American samples using the same markers. The Jewish populations sampled here are not comprised of various percentages of discreet ancestral populations. Our premise is that Jewish populations originated in the Middle East but, subsequent to and in the course of long-range migrations, accumulated input from local host populations, each with its own migratory history. We include in our analysis genotypic data from present-day populations whose ancestry serves as a proxy for those populations that might have contributed once Jewish populations migrated out of the Middle East. Our results are of interest both to infer unknown and correlate with known aspects of Jewish history and for their theoretical implications for detecting substructure in seemingly homogenous populations. They are also of important applied interest for studying health-related phenotypes in our sample of Israeli Jews. To our knowledge, this is the first study to incorporate proxy parental groups into analysis of structure of a Jewish sample, as well as the first to investigate variation among and ancestry of world-wide Jewish populations with autosomal markers.
Each of the Israeli subjects provided self-reported country of birth, country of birth of parents and grandparents, world region of family origin (not necessarily the same as country of birth of grandparents), whether they considered themselves to be Ashkenazi (as defined by respondents), Sephardic (similarly self-defined), mixed, other or none, and whether they, their parents, and grandparents had been born Jewish (also self-defined). A common practice in the medical and non-medical literatures is to subsume Jews of Spanish, Balkan, Middle Eastern, African, and Asian descent under the term "Sephardic", but since this term implies Spanish origin, it is imprecise and unclear. Further, due to continuous changes in the acceptability and applicability of the term, "Sephardic" among Israelis [29
], medical and genetic studies involving Israeli participants increasingly refer to subjects as either "Ashkenazi" (AJ) or "non-Ashkenazi" (NAJ) [31
]. Below, we also follow that nomenclature. This expands on work we first presented in 2008 [35