Genotyping with binary markers yields a total of 37 haplogroups in our Jewish database (Fig. ). Interestingly, men self-reporting as Cohanim carry Y chromosomes that belong to 21 different haplogroups. However, most of these haplogroups are extremely rare, and a single lineage within the J1 sub-clade of haplogroup J (J-P58*) predominates in both Ashkenazi (51.6%) and non-Ashkenazi (38.7%) Cohanim in the current sample set (Fig. b). Only four of the remaining haplogroups are found at frequencies greater than 5% in all Cohanim sampled, three within the J2 sub-clade of haplogroup J (J-M410*, 14.4%; J-M12, 7.4%; and J-M318, 6.1%), and one within haplogroup R (R-M269, 5.6%). In contrast, the distribution of haplogroups within Israelites is more uniform, with no single haplogroup reaching a frequency greater than 14% in our Israelite sample (Fig. a).
Fig. 2 The distribution of haplogroup frequencies for all haplogroups present in Ashkenazi and non-Ashkenazi Israelites (top) and Cohanim (bottom) at a frequency >5%. The following haplogroups are not shown: C-M216, E-M96, E-P2, E-M81, F-P14, I-M170, (more ...)
When we genotype the 6 Y-STRs that defined the original CMH
(DYS19, DYS388, DYS390, DYS391, DYS392, DYS393) (Thomas et al. 1998
) in our sample of 99 Cohanim with J-P58* chromosomes, we find that 87 carry a haplotype that is identical to the original modal haplotype and 10 carry haplotypes that are one-step removed from the original CMH
(i.e., only 2 individuals were 2 or more steps removed). A total of 43 of the 99 chromosomes still match completely when we increase the number of Y-STRs to 12 (DYS19, DYS385a, DYS385b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS426, and DYS439) (Table S4). We call this 12-locus modal haplotype the extended CMH
. Figure shows a median-joining network of the 29 12-locus STR haplotypes associated with Ashkenazi and non-Ashkenazi Cohanim J-P58* chromosomes. One-step mutations at two hypermutable Y-STRs (DYS385 and DYS439) result in two relatively frequent haplotypes (i.e., present at frequencies of 10.1 and 11.1%) that are closely related to the extended CMH (Table S4). The extended CMH and the two closely related haplotypes, which are shared between Ashkenazi and non-Ashkenazi Jews, account for a total of 64.6% of the chromosomes within the Cohanim J-P58* lineage (and 29.8% of Cohanim variation).
Fig. 3 Network of J-P58* haplotypes observed within Ashkenazi (black) and non-Ashkenazi Cohanim (white). The following STRs comprise the network: DYS19, DYS385a, DYS385b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS426, and DYS439. Circle areas (more ...)
The availability of a greater number of binary markers enables examination of the distribution of the original and extended CMH across the branches of a highly resolved Y chromosome haplogroup tree. In our dataset, the original CMH is observed in a total of 215 chromosomes, all of which belong to haplogroup J. Notably, most of these chromosomes are partitioned between the J1 and J2 subclades of the J haplogroup, specifically on the J-P58* and J-M67 lineages. A small number of original CMH chromosomes (n = 9) are found within other subclades of Hg J. In contrast, the extended CMH and its two closely related haplotypes shown in Fig. are almost entirely limited to P58 chromosomes within the J1 clade: the extended CMH appears outside of J-P58* only once (i.e., within J-M319), while one of its two closely related haplotypes appears within haplogroup J-M67.
A survey of our database confirms that chromosomes carrying the original CMH are not specific to either Cohanim or Jewish populations. The original CMH is present at moderate frequencies (5–8%) in the other Jewish castes (i.e., Levites and Israelites), among non-Jewish Yemenites (13%) and Jordanians (~7%), and as singletons in a number of other non-Jewish populations (Druze, Egyptians, Palestinians, Syrians, Turks, Iranians, Italians, Romanians, and Uzbeks). In contrast, the extended CMH and its two related haplotypes are observed only among Cohanim (29.8%) and Israelites (1.5%) (i.e., it is completely absent from the Levites and non-Jews surveyed here). We also performed a search of the current literature (Arredi et al. 2004
; Cadenas et al. 2008
; Cinnioglu et al. 2004
; Robino et al. 2008
; Zalloua et al. 2008
) and found a similar pattern: the original CMH is present in several Near Eastern populations, while the extended CMH is extremely rare outside of Jewish populations.
To better estimate the age of the Cohanim J-P58* lineage, we genotyped an additional 10 Y-STRs (DYS437, DYS438, DYS447, DYS448, DYS449, DYS454, DYS455, DYS458, DYS459a and DYS459b) (i.e., a total of 22) in our sample of 215 Cohanim. Interestingly, 4 of the additional 10 Y-STRs (DYS437, DYS438, DYS459a and DYS455) do not further subdivide the group of 43 samples comprising the modal haplotype in Fig. . Using the method of Zhivotovsky et al. (2004
) (and excluding the duplicated DYS385ab and DYS459ab loci, as well as DYS449, which contains a complex repeat structure) we estimate the age of Y-STR diversity associated with Cohanim J-P58* chromosomes as 3,190 ± 1,090 (Table ). To control for differences in mutation rate among loci, we also calculate divergence times using the nine loci that Zhivotovsky et al. (2004
) employed to estimate the effective mutation rate of Y-STRs. Similar age estimates are returned for our set of 17 Y-STRs and Zhivotovsky et al.’s (2004
) set of 9 Y-STRs in our sample of all 99 Cohanim, as well as in our sample of 63 Ashkenazi Cohanim (Table ). We note that estimates of the age of the J-P58* lineage are lower when using the five Y-STRs that were employed in the original CMH study of Thomas et al. (1998
) (Table ). This effect is exaggerated when assuming the pedigree mutation rate of 0.0021/generation, which was implemented in the Thomas et al. (1998
) calculation (i.e., we obtain a J-P58* lineage divergence time estimate of 0.4 ± 0.2 kyears). A Bayesian-based coalescence analysis using BATWING (Wilson et al. 2003
) on a constructed Ashkenazi population comprised of 95% Israelites and 5% Cohanim yields an average median TMRCA for the Cohanim J-P58* lineage of 4,415 years (95% CI 1,130–21,530 years) (Table S5).
Table 1 Divergence time (based on method of Zhivotovsky et al. (2004)) (ky) (mean ± SE) of Cohanim lineages based on Y-STR loci
In sum, the high frequency of a closely related set of J-P58* chromosomes among Ashkenazi and non-Ashkenazi Cohanim that share a common modal haplotype, and that are estimated to have diverged from a common ancestor >2,000 years ago, is consistent with the hypothesis that the J-P58* lineage traces the Cohanim dynasty to a time before the Jewish diaspora. While the frequency of the J-P58* lineage is higher among Ashkenazi Jews (Fig. a), Y-STR variation associated with this haplogroup is older in the non-Ashkenazi community (e.g., we obtained divergence time estimates of 4.6 ± 1.8 and 3.5 ± 2.1 kyears for the 17- and 9-locus datasets, respectively). In this regard, it is also worth noting that the J-P58* network topology suggests population expansion, especially within the Ashkenazim. This may be attributable to the strong founder effect previously suggested for the Ashkenazi population (Behar et al. 2004
Our results also document a set of non-J-P58* lineages that are carried collectively by more than 50% of Cohanim. Some of these lineages are members of the J2 subclade, while others are more distantly related (i.e., within haplogroups R and E) (Figs. , ). To further explore the origin of these Cohanim lineages, we estimate the ages of Y-STR diversity associated with the J-M410*, J-M318, J-M12, and R-M269 lineages (Table ), all of which are found at frequencies >5% in the Cohanim sample groups examined here (Fig. b). J-M410*, which is carried by both Ashkenazi (18.9%) and non-Ashkenazi (8.6%) Cohanim, is also found in ~5% of non-Cohanim Jews, as well as in many non-Jewish populations from the Near East (data not shown). Moreover, there is a modal Cohanim haplotype that is shared between Ashkenazi and non-Ashkenazi (North African and Sephardi) communities (Figure S1), and absent from non-Jewish populations. Divergence time estimates of Cohanim J-M410* chromosomes based on 17 and 9 Y-STRs range between 5.9 ± 2.0 and 4.9 ± 1.9 kyears, respectively. However, median-joining networks constructed from our Cohanim and non-Jewish data indicate that two Cohanim individuals carry divergent haplotypes that do not appear to descend from a common (modal) cluster of Cohanim J-M410* chromosomes (Figure S2). When we exclude these two divergent haplotypes from the analysis, we obtain divergence time estimates of 4.2 ± 1.3 and 3.8 ± 1.4 kyears for 17 and 9 Y-STRs, respectively. Our BATWING analysis returns coalescence time estimates of 3.2 kyears (95% CI, 0.7–16.7 kyears) (Table S5), similar to the divergence time estimates for Ashkenazi Cohanim in Table . These results support the hypothesis that J-M410* represents a second major founding lineage of the Cohanim, coalescing to a point within the early history of the ancient Hebrews of the Near East.
Similar results are obtained for the less frequent J-M12 lineage, which is carried by 16 Cohanim in our survey (14 of which are of Ashkenazi descent). As in the case of J-M410*, a median-joining network suggests that 2 of 16 individuals (1 Ashkenazi and 1 non-Ashkenazi Cohen) carry divergent haplotypes that may have entered the Cohanim population recently (Figure S3). Divergence time estimates made after removing these individuals are comparable to those for the Cohanim J-P58* and J-M410* lineages (3.4 ± 1.2 and 4.0 ± 1.8 kyears for the 17- and 9-locus datasets, respectively) (Table ). In contrast, our network and divergence time analyses suggest that R-M269 chromosomes entered the Cohanim population via several “migration” events, and do not represent a single Cohanim founding lineage. For example, divergence time estimates are much older for Cohanim R-M269 chromosomes (>10 kyears) than for the three Cohanim lineages in haplogroup J discussed above, and median-joining networks of Cohanim R-M269 chromosomes lack a modal haplotype and show many unrelated singleton haplotypes that are interspersed among Cohanim and non-Jewish samples (Figure S4).
We note that divergence times for the J-P58*, J-M410, and J-M12 lineages are not statistically significantly different from one another as a result of the large standard deviations in Table . Moreover, we cautiously interpret dating of lineages that are not defined by the derived state at a terminal SNP (i.e., are internal nodes on the Y chromosome tree) and those that we have not typed all known downstream SNPs (J-M410*/J-P58* and J-M12, respectively) because subsets of chromosomes within these lineages may be marked by undiscovered SNPs. Our estimate of the age of J-M318 may be more reliable because this SNP represents a terminal mutation within the M410 sub-clade of the J2 branch (Fig. ). Originally discovered in a single Libyan Jew (Shen et al. 2004
), we find the derived allele at M318 to be present in 16 individuals in this survey—13 of which are Cohanim from Tunisia/Libya or the island of Jerba (the remaining 3 samples come from Tunisian or Libyan Jews who did not have information on their Cohen, Levite, or Israelite status). The much younger estimated divergence time for the J-M318 haplogroup (1.3 ± 0.5 and 1.9 ± 0.8 kyears for the 17- and 9-locus datasets, respectively) (Table ) suggests that either the M318 mutation (a) arose within the Cohanim population of North Africa, (b) expanded within this community following migration of a founding J-M318 Cohen from another geographic location, or (c) became incorporated into the Cohanim patriline via conversion, adoption or non-paternity. The first of these possibilities (a) is supported by the fact that the M318 mutation occurred on the M410 background (Fig. ), and median-joining network analysis links the cluster of Cohanim J-M318 chromosomes to that of the Cohanim J-M410* chromosomes (i.e., rather than to other J-M410* chromosomes from North Africa and the Near East) (Figure S5). The high frequency (~60%) of the otherwise rare J-M318 haplogroup in our sample from the island of Jerba may be the result of an ancient founder effect in this Jewish isolate, which is thought to be descended from one of the earliest Diaspora communities that left the Middle East before the destruction of the second Temple in 70 A.D. (Tessler and Hawkins 1980
Here, we discuss alternative explanations for presence of several founding lineages within the Cohanim. One possibility is that multiple males were designated as Cohanim early in the establishment of the priesthood. We performed exploratory simulations to assess the likelihood of survival of multiple paternal lineages in the history of the Ashkenazi Cohanim. The probability of survival of more than a single haplogroup depends mainly on the population size and to a lesser extent, on the number of haplogroups that are assumed to have founded the initial Cohanim group. For example, if we begin with an initial population of 50 Cohanim carrying a total of 10 haplogroups, we find that there is a very low probability of survival of more than a single haplogroup after 120 generations. We obtain a similar result if we begin the simulation with 50 males each carrying a unique haplogroup (Figure S6). However, if we begin with 100 males carrying 10 haplogroups the mean number of haplogroups surviving for 120 generations is >1 (Figure S7). Thus, there would be a reasonably high probability that more than a single Cohen haplogroup could have survived in the Ashkenazi population since the initial founding of the priesthood ~3,000 years ago (Thomas et al. 1998
) if we would be willing to accept an initial founding population size of >50 priests. However, our simulation results also suggest that it is highly unlikely that as many haplogroups as we actually observe (e.g. Fig. ) would persist under this simple model. Another model that deserves consideration is a metapopulation (Wakeley 2004
) in which semi-isolated communities maintain multiple Cohen lineages, each with a certain probability of extinction and replacement. In this model, multiple Cohanim lineages would then persist in the entire population, and new lineages would be expected to accrue among Cohanim over time. The presence of several founding lineages among the Cohanim of this survey—both shared between or specific to the Ashkenazi and non-Ashkenazi communities, as well as highly variable frequencies of these lineages among sub-populations within Ashkenazi and non-Ashkenazi communities (data not shown), may lend support to a metapopulation model. Mutation alone does not provide an explanation for the multiplicity of Cohanim haplogroups, because the ages of most of these haplogroups predate the foundation of the Jewish people (Cruciani et al. 2006
; Karafet et al. 2008
; Semino et al. 2004
). Indeed, our divergence time estimates for the J-P58*, J-M410*, J-M12 lineages based on variation at the set of 9 STRs in our Israelite population sample are 19.0 ± 5.6, 22.6 ± 2.9, and 15.1 ± 3.1 kyears, respectively (data not shown).
In conclusion, we demonstrate that 46.1% (95% CI = 39–53%) of Cohanim carry Y chromosomes belonging to a single paternal lineage (J-P58*) that likely originated in the Near East well before the dispersal of Jewish groups in the Diaspora. Support for a Near Eastern origin of this lineage comes from its high frequency in our sample of Bedouins, Yemenis (67%), and Jordanians (55%) and its precipitous drop in frequency as one moves away from Saudi Arabia and the Near East (Fig. ). Moreover, there is a striking contrast between the relatively high frequency of J-58* in Jewish populations (~20%) and Cohanim (~46%) and its vanishingly low frequency in our sample of non-Jewish populations that hosted Jewish diaspora communities outside of the Near East. An extended Cohen Modal Haplotype accounts for 64.6% of chromosomes with the J-P58* background, and 29.8% (95% CI = 23–36%) of Cohanim Y chromosomes surveyed here. These results also confirm that lineages characterized by the 6 Y-STRs used to define the original CMH are associated with two divergent sub-clades within haplogroup J and, thus, cannot be assumed to represent a single recently expanding paternal lineage. By combining information from a sufficient number of SNPs and STRs in a large sample of Jewish and non-Jewish populations we are able to resolve the phylogenetic position of the CMH, and pinpoint its geographic distribution. Our estimates of the coalescence time also lend support to the hypothesis that the extended CMH represents a unique founding lineage of the ancient Hebrews that has been paternally inherited along with the Jewish priesthood. However, the sharing of several less frequent haplogroups (and modal haplotypes within these haplogroups) between Ashkenazi and non-Ashkenazi communities, as well as evidence for the persistence of population-specific Cohanim haplogroups, supports the formulation that males from other remote lineages also contributed to the Jewish priesthood, both before and after the separation of Jewish populations in the Diaspora. Genotyping a larger sample of Cohanim Y chromosomes from other divergent haplogroups may further elucidate the complex paternal history of Jewish priests, and aid in the identification of lost tribes claiming ancient Hebrew ancestry.
Geographic distribution of J-P58* chromosomes for all populations listed in Tables S1 and S2. The frequency of J-P58* chromosomes for each population is indicated in black