|Home | About | Journals | Submit | Contact Us | Français|
Identification of global livestock diversity hotspots and their importance in diversity maintenance is essential for making global conservation efforts. We screened 52 sheep breeds from the Eurasian subcontinent with 20 microsatellite markers. By estimating and weighting differently within- and between-breed genetic variation our aims were to identify genetic diversity hotspots and prioritize the importance of each breed for conservation, respectively. In addition we estimated how important within-species diversity hotspots are in livestock conservation.
Bayesian clustering analysis revealed three genetic clusters, termed Nordic, Composite and Fat-tailed. Southern breeds from close to the region of sheep domestication were more variable, but less genetically differentiated compared with more northern populations. Decreasing weight for within-breed diversity component led to very high representation of genetic clusters or regions containing more diverged breeds, but did not increase phenotypic diversity among the high ranked breeds. Sampling populations throughout 14 regional groups was suggested for maximized total genetic diversity.
During initial steps of establishing a livestock conservation program populations from the diversity hot-spot area are the most important ones, but for the full design our results suggested that approximately equal population presentation across environments should be considered. Even in this case, higher per population emphasis in areas of high diversity is appropriate. The analysis was based on neutral data, but we have no reason to think the general trend is limited to this type of data. However, a comprehensive valuation of populations should balance production systems, phenotypic traits and available genetic information, and include consideration of probability of success.
The domestic sheep (Ovis aries) has been an economically and culturally important farm animal species since its domestication in the Near East approximately 9,000 years B.P. . A northern Eurasian sheep stock formed some 6,000 years ago as sheep were brought to the British Isles, northern Europe and Russia after the expansion to the European continent via Danubian and Mediterranean routes , and a possible route through Russia . Sheep dispersed across Europe in temporally separate migratory episodes: the most original and a more primitive type of domestic sheep was later replaced by a more developed wool type of sheep. Ancestry from the first immigrant wave seems to have survived only in north-western and northern peripheries of Europe .
A similar replacement process is occurring in modern days. Global standardization of production environments and breed competition have led to the disappearance of many native breeds. Food and Agriculture Organization of the United Nations (FAO) has estimated that 36% of the sheep breeds of known census size are either extinct or endangered . Furthermore, the use of a few high-quality males for intense mating has resulted in the reduction of effective population size (Ne) over time and reduced genetic diversity within breeds . These processes will lead to the decrease of effective population size of the entire species. This could restrict breeding options and genetic gain of breeding programs to the extent that unpredictable future requirements might not be met [6-8]. Breed conservation aims to maintain these options, but limited resources, e.g. financial limitation, might not allow conservation of all the breeds.
One can argue that the breeds originating from or close to the domestication centers, such as the Near Eastern region, should be particularly prioritized in conservation programs. Microsatellite studies in cattle (Bos taurus) [9-11], goat (Capra hircus)  and sheep (Ovis aries)  suggested that the breeds located close to the putative domestication centers are the most variable. These breeds might possess allelic variations retained from the wild ancestors that never reached areas further from the center of origin. Although one cannot easily differentiate these primary diversity hotspots from the secondary hotspots created by a more recent crossbreeding, continent-wide mapping of the regions of exceptional livestock diversity (genetic diversity hotspots) has been suggested as a means of targeting conservation efforts for livestock species [10,14]. DNA marker data can be used to calculate molecular coancestries within and between breeds and determine contributions of each breed to a pool of animals that would maximize genetic diversity of the pool, i.e. minimize average molecular coancestry . These calculations can provide critical information when the prioritization of breeds needs to be done for conservation of diversity of domestic animal species. Using this conservation approach, it would be possible to maximize Ne of the subdivided species and thus minimize the depleting effect of genetic drift on genetic variation.
There have been a few quite comprehensive gene diversity studies in sheep [13,16-18]. However, none of these focused on breed prioritization to describe general trends in the conservation of genetic diversity in sheep. Though genome-wide Single Nucleotide Polymorphisms (SNP) data are becoming the standard for livestock genetics , they can have a problem of ascertainment bias originating from SNP discovery protocols . Though the problem can be alleviated through using haplotypic measures  or through bias corrections , the established baseline trend using low bias markers such as microsatellites remains an important benchmark. We used a representative set of sheep types across the Northern Eurasia to explore the diversity patterns and inferred conservation priorities based on microsatellites. For breed ranking we applied the method based on the minimization of molecular coancestry in a subdivided population by Caballero and Toro . We tested the effect by weighting differently the two components of maximum genetic diversity, within-breed and between-breed variation, when doing priority settings of breeds. Based on the common statement that populations from diversity hotspot regions are more important [9,14,10], we expected large number of breeds from a hotspot region to be highly prioritized.
In total, 342 alleles were detected at the 20 microsatellite loci analyzed (Additional file 1: Table S1). A summary of the genetic diversity parameters computed for 16 regional groups is presented in Table Table11 and the breed-wise values based, on an average, on 32 sheep per breed are given in Additional file 2: Table S2. The total genetic diversity (HT ) varied from 0.651 to 0.807 in the Danish and the Ukrainian regional groups, respectively. The area having regional groups with HT values above 0.8 (Ukraine, southeast Europe, Kazakhstan and east of the Caspian Sea, Buryatia and the southern Caucasus), was termed a diversity hotspot. Among breeds, the unbiased expected heterozygosity (HS) ranged from 0.613 (the Norwegian Cheviot) to 0.806 (the Russian Karakul), with an average value of 0.759. Allelic richness varied in the similar pattern as other within-population diversity measures (e.g. HT and HS ) across the breeds (Table (Table1).1). The overall estimate of f  was 0.011. The breed-wise f estimates were significantly (P < 0.05) greater than zero only for the Norwegian Rygja Sheep and the Swedish Rya Sheep suggesting that most breeds are quite uniform (Additional file 2: Table S2).
A model-based clustering was applied to resolve the population genetic structure. At K = 3, one cluster was constituted by the breeds descending mainly from the northernmost edge of the studied distribution (termed the 'Nordic cluster'), while the fat-tailed breeds, originating mainly from the Caucasus and Caspian basin areas, geographically close to the Near Eastern domestication center, formed the second cluster (termed the 'Fat-tailed cluster'). A third cluster mainly contained the composite sheep breeds from central Eurasia (termed the 'Composite cluster') (Figure (Figure1).1). The mean similarity coefficient (SC) across 10 runs was 0.984 at K = 3. At K = 4, a split within the Nordic cluster was observed, but the drop of SC to 0.534 indicated variable assignments for breeds across runs and lack of additional strong high-level substructure among the populations. Therefore, separating the entire dataset into three clusters was chosen as the final global configuration.
To dissect the genetic structure within the three clusters, STRUCTURE analysis was further applied to each of them separately. In the Nordic cluster, the most consistent grouping of 11 north European sheep breeds was achieved at K = 7 (SC = 0.641), with the mean SC ranging from 0.250 to 0.314 at K other than seven. Breeds originating from the same country (e.g. Finnsheep and Finnish Grey Landrace) or from the neighboring regions (e.g. the Icelandic Sheep and the Faeroe Island Sheep) tended to cluster together (Figure (Figure11).
The Fat-tailed sheep cluster was composed mainly of the coarse-wool native breeds from the Caucasus and steppes of the Caspian basin and Kazakhstan. Surprisingly, the northern short-tailed coarse-wool Romanov Sheep was also assigned into this cluster. The breed's estimated fraction of the Fat-tailed cluster was 0.59. However, the most consistent subclustering of the Fat-tailed cluster was obtained at K = 3 (SC = 0.865), with the Romanov sheep forming a distinct subcluster (Figure (Figure1).1). The Andi and the Karakul type sheep breeds anchored the remaining two subclusters. Eight out of 14 breeds showed partial and varying memberships of the two subclusters, indicating their admixed origin (Figure (Figure11).
The Composite cluster hosted the remaining 26 synthetic semi- and fine-wool sheep breeds that were split into three genetic subclusters (SC = 0.932). The three subclusters identified followed a pattern of geographical separation: long-wool Marsh and Texel type breeds from the north grouped into subcluster I (light blue); fine-wool breeds from the Caucasus, Kazakhstan and Buryatia formed subcluster II (light green) and the southern European Zackel type breeds grouped into subcluster III (pink, Figure Figure1).1). Other nine European, Caucasian and Asian sheep breeds had partial membership of multiple clusters, which represents more diverse ancestries in the process of breed development (Figure (Figure11).
The PCoA results were quite in accordance with the STRUCTURE results. The breeds from the above mentioned Nordic cluster were separated from the other breeds on Axis I, which explained 48% of the distance matrix (Figure (Figure2).2). On Axis II, breeds from the Fat-tailed cluster were separated from the Composite cluster breeds, which explained an additional 7%. A notable exception was the Romanov Sheep whose yellow circle in Figure Figure22 (at -0.085,-0.004) suggested the breed's clustering with the Nordic rather than the Fat-tailed breeds (Figure (Figure2).2). This matched our prior expectations on the basis of phenotypic characters better than the STUCTURE result.
The proportions of Nordic, Fat-tailed and Composite genetic ancestries within each of the regional groups studied are presented in Figure Figure3.3. The highest proportion of Fat-tailed ancestry was recorded at the southern periphery of the studied distribution, which gradually decreased northwards and was the smallest in the northern regional groups. The proportion of Nordic type ancestry mirrored this pattern and was the largest in the northern regional groups and decreased southwards. The 16 regional groups had similar proportions of Composite ancestry, with the exception of Stavropol and Caspian depression regional groups, where the proportion of Composite ancestry was highest, and the northernmost and southernmost regional groups, where the Composite ancestry proportion was least.
To reduce the effect of possible recent breed-specific factors on the overall geographical distribution of genetic diversity, a synthetic map of genetic diversity was based on the total gene diversity (HT) for triplets of neighboring breeds (Figure (Figure4).4). The highest diversity was found in the southern region of the studied area: Buryatia (south Siberia), Caspian Sea and Black Sea basins. It decreased gradually in Central and northern Europe and the lowest HT values were recorded for southern Scandinavia (Figure (Figure4).4). The trend can be observed also based on within breed estimates (Additional file 3: Figure S1). A significant but weak positive correlation (r = 0.382, P < 0.05) was calculated between the expected heterozygosity and the level of admixture based on global STRUCTURE results (K = 3) for the 52 sheep breeds studied, suggesting that admixture does not explain the presence of diversity hotspots, though it can contribute to it in some areas.
We tested the extent of population differentiation using AMOVA in the whole dataset, as well as grouping breeds according to geographical regions (the Caucasus, Asia, or the eastern fringe of Europe), and according to the 15 regional groups (excluding the Danish group represented by a single breed) (Table (Table2).2). As expected, most genetic variation (> 86%) was retained within the breeds, whereas only 0.41% to 0.95% (P < 0.001) of the variation could be explained by geographical partitioning (Table (Table2).2). The between-breed variation within each of the three genetic subclusters was significant (P < 0.001), ranging from 2.48% (Fat-tailed subcluster) to 13.71% (Nordic subcluster) (Table (Table2).2). In our data, using genetic clustering in AMOVA gives higher between groups variance than using geographical categorizations.
Of the 52 sheep breeds 24 had contributions to the core-set when the 4 weightings (λ = 0, 0.2, 0.5, 1) of within-breed diversity were considered. The 24 breeds represented all the 16 regional groups, except those of the Altai and Buryatia regions. The distribution of breeds was relatively even, with 1 to 2 sheep breeds per region, the exception being the Scandinavian regional group, which contributed 5 sheep breeds to this accumulated core-set (Additional file 4: Table S3). Looking at the four core-sets separately, the number of contributing breeds increased from 8 to 17 when weight of within-breed variability increased from 0 to 1 (Table (Table3).3). Results of analysis based on genetic clustering are comparable to those based on geographic regions (Additional file 5: Table S4).
Every tested scenario with reduced weight for within-breed variation (λ < 1) gave a significantly higher number of breeds with non-zero contribution from the areas outside the hotspot regions (all the two-tailed P values < 0.02 using Fisher's exact test for independence; Table Table3,3, Additional file 4: Table S3). Looking at the contribution to the core-set, these non-hotspot region populations comprised of > 90% of the set. However, optimizing for global diversity (λ = 1), there is no significant difference (the two tailed P = 0.77) in the proportion of breeds included between hotspot and non-hotspot regions. Very distinctively, now the core-set consists of 65% of the breeds from the hotspot regions because the included hotspot region breeds make a significantly larger mean contribution (each ~11%) than the included breeds from the non-hotspot regions (each ~3%) (Welch two sample t-test P = 0.009). Thus the diversity hotspot areas were important for conserving total genetic diversity in terms of the effort per conserved population rather than proportion or number of breeds to be conserved.
Conservation programs might be initiated with limited information. In cases where resources allow keeping only a small number of breeds and when there is no aim to differentiate between their contributions to the core-set (assuming equal contributions), the maximum amount of genetic diversity would be maintained by giving priority to breeds from the diversity hotspot regions. In the scenario of 5 breeds, four of them are from the hotspot areas (Additional file 6: Table S5). The proportion of hotspot breeds was reduced from 80% to 45% when assuming resources to keep 20 breeds. This latter set is similar to 17 breeds identified as contributors to the core-set when λ = 1 (Additional file 4: Table S3), but includes also three fat-tailed populations from the Caucasus (Bozakh, Tushin and Lezian). These results agree with the idea of having the initial conservation focus on hotspot regions.
We present here a comprehensive genetic analysis of sheep populations originating from a broad geographical area of the Eurasian subcontinent. Our results detected the presence of a sheep genetic diversity hotspot located close to the Near East, the assumed sheep domestication center, and highlight the importance of such an area in conservation planning. The results correspond well with the geographical pattern of genetic diversity distribution reported for cattle (B. taurus)  and goat (Capra hircus)  as well as a previous study of European sheep  which focused on more southern breeds. The congruence across studies suggests the pattern to be genuine, though larger number of markers could be desirable. Based on observed allele number, we can expect the reliability to be approximately similar as in a study of 300-400 unbiased bi-allelic SNPs . However, since studies of humans do not suggest great discrepancies across nuclear marker types as long as ascertainment bias can be avoided [19,20,25], we expect the presented general diversity patterns to be robust. Since in our analysis conservation optimization was based on the same data used to define the diversity hotspot, our general recommendation for considering hotspot regions ought to be sound.
Livestock genetic diversity hotspots have been suggested to be very important for conservation because the domestic animal stocks associated with them might possess allelic variation from wild ancestors, which, due to a sequence of founding events, was lost during the dispersion of animals towards the northern parts of the continent . However, to the best of our knowledge this question has not been directly addressed previously. Our results provide additional evidence for the importance of these regions, while indicating an important refinement for the conservation goal. Our results do not suggest that a larger proportion of populations from these areas needs to be conserved, but rather suggest more emphasis be placed on each conserved diversity hotspot population. This distinction, however, is highly relevant for domestic species, where management units are in most cases clearly definable as breeds. Further the results support for directing the first conservation resources to work on hotspot regions.
Of the three identified Northern Eurasian genetic clusters, the Nordic cluster was represented by native and old commercial sheep breeds adapted to live under cold and wet northern European climatic conditions. This group includes breeds such as Gute, Icelandic Sheep and Finnsheep, which descended from the sheep stock in the first dispersion event to Europe . Strict breed boundaries over a long period and geographical isolation, particularly for insular breeds (the Icelandic Sheep and the Faeroe Island Sheep), are characteristic of the group and have resulted in a unique and genetically highly heterogeneous pool of Nordic sheep populations (Table (Table22).
The large Composite cluster with partial ancestry from improved western breeds contains genetically variable fine- and semi-fine-wool sheep breeds of admixed origin with moderate differentiation between the breeds. The presence of substructure within the cluster reflects the differences in the breeding trends within the former Soviet Union that took place in the middle of the last century. The sheep in the western part of Russia and Volga regions have Marsh-Texel type composite ancestry resulting from crossing local populations with British type long-wool sheep (Figure (Figure1).1). The second subcluster within the Composite group includes the breeds prevalent in the Caucasus, the Stavropol region and the Caspian basin, another geographical center of purposeful crossbreeding, with a significant genetic component of the Merino type sheep. The third subcluster within the Composite cluster is anchored by two Zackel type mountain sheep populations, Pramenka and Kuchugur, and reflects a common ancestry for the majority of breeds within the subcluster. The grouping of Tsigai in the same subcluster confirms the assumption that this breed was strongly influenced by Zackel (e.g. see ). Most of the populations of the Composite cluster also represent genetic diversity of local origins as the upgrading was performed on the basis of local sheep populations, mixing them with a number of improved breeds of foreign ancestry to combine desired production and robustness characteristics.
The Fat-tailed cluster hosted very variable native coarse-wool populations, living under a variety of climatic conditions, ranging from semi-desert and steppe regions around the Caspian Sea and Central Asia to Caucasian mountain terrains. The differentiation of fat-tailed sheep from the others indicates restricted gene flow between steppe or mountain environments in central Eurasia and cooler and moister northern areas of the continent. The gene pool of the fat-tailed sheep divided into the mountain type sheep (e.g. Andi and Lezgian) and steppe-desert types (e.g. Gala and Karakul). However, the majority of fat-tailed breeds have their ancestries in both of these subclusters (Figure (Figure1),1), which together with low differentiation estimates indicates substantial gene flow between them. This agrees with the traditional sheep breeding practices in the Caucasus, which promote gene flow through the long-distance nomadic pasturing of animals. Grouping of Romanov sheep within the Fat-tailed cluster (Figure (Figure1)1) should be regarded cautiously.
Decisions on adaptation conservation should largely be based on reliable phenotypic evaluations. In humans, genetic and phenotypic diversity agree , but selection might affect phenotypes reducing correlation between phenotypic divergence and general genomic relatedness [27,28]. This is particularly true for livestock which would imply need for testing (ecological) exchangeability (as in ). Unfortunately this is very difficult. A large proportion of the necessary phenotypic information exists only as informal knowledge of local breeders. Even the more rigidly collected data is rarely comparable between environments.
Molecular data can have a role in pointing out potential conservation gaps when phenotypic knowledge is limited. The usability of approaches based on molecular marker data in setting conservation priorities can be greatly improved by genome-wide surveys of molecular variation . For example, scanning tens of thousands of SNP markers has the potential to identify selected loci  and allow comparison of the conservation values of several populations, both in the neutral and non-neutral context . However, even with full genome sequences, valuation of populations can prove to be difficult due to incomplete understanding of the biology of the organisms and poorly definable conservation goals.
We used neutral molecular data for a specific set of populations and applied the method of Caballero and Toro  to calculate optimal contributions of Eurasian sheep breeds to the core set, which would minimize the mean kinship in the set and maximize Ne and genetic diversity of the species. While giving more emphasis to divergence has theoretical appeal, it did not increase ecological or phenotypic heterogeneity in the preferred set of breeds compared with the maximization of global diversity (and Ne). Maximization of global diversity prioritized a more diverse set of breeds originating from a range of biogeographic environments and having different genetic histories. Though the set looks reasonable, we acknowledge that it is based on incomplete data and we are hesitant to conclude that this particular design is optimal.
Neutral variation suggested a general rule of thumb to favour breeds from the diversity hotspot regions in the first phase of in situ and ex situ conservation actions. In the final design, however, approximately equal population presentation across environments is recommended, but still higher per population emphasis in areas of high diversity is suggested. A comprehensive valuation of breeds, particularly within each physical environment, should consider production systems, important biological characteristics and available genetic information, as well as consideration of the probability of success and the extinction risk of breeds.
In total, 1675 animals representing 52 sheep breeds were studied (Additional file 2: Table S2). Sheep were sampled from three geographical regions: The Caucasus, Asia, and the eastern fringe of Europe, including central and western Russia. Each geographical region was further subdivided into regional groups. The Caucasian area was composed of the southern Caucasus (the following breeds were sampled: Azerbaijan Mountain Merino, Bozakh, Gala, Karabakh, Mazekh, Tushin), northern Caucasus (Andi, Dagestan local, Dagestan Mountain Merino, Karachai, Lezgian), Stavropol (Caucasian, North Caucasian Mutton-Wool, Stavropol), and the Caspian depression (Aksaraisk type of Soviet Mutton-Wool, Grozny, Volgograd). The Asian area was subdivided into the Kazakhstan and east of the Caspian Sea group (Degeres Mutton-Wool, Kazakh Arkhar-Merino, Kazakh Edilbai, Kazakh Finewool, Russian Edilbai, Russian Karakul), Altay (Gorno-Altay local, Kulunda), and the Buryatia group (Baidarak, Transbaikal Finewool). The remaining nine groups covered the eastern fringe of Europe: the Volga region (Kuibyshev, Oparin), western Russia (Kuchugur, Romanov, Russian Romney Marsh), Ukraine (Carpathian Mountain, Sokolsk), southeast Europe (Moldavial Karakul, Moldavial Tsigai, Pramenka, Russian Tsigai), Poland (Olkuska, Swiniarka, Wrzosowka), Finland (Finnsheep, Finnish Grey Landrace), Scandinavia (Swedish Rya Sheep, Swedish Gottland Sheep, Swedish Gute Sheep, Norwegian Rygja Sheep, Norwegian Cheviot, Norwegian Feral Sheep), Denmark (Danish Texel), and Iceland and the Faeroe Islands (Icelandic Sheep, Faeroe Island Sheep). Unrelated animals were sampled based on pedigree records (two previous generations) or farmers' knowledge.
Genomic DNA was extracted from blood as described in , or from skin samples using DNeasy Tissue Kit (Qiagen, Crawley, West Sussex, UK). Prior to DNA extraction, skin samples stored in ethanol were washed twice with phosphate buffered saline to remove fixatives.
The polymerase chain reactions (PCR) for 20 microsatellites (Additional file 1: Table S1) were performed as described in  and genotyped using the MegaBACE™ 500 DNA Sequencer (Amersham Biosciences). Fragment sizing was performed using the MegaBaceTM Genetic Profiler 2.2 or Fragment Profiler 1.2 (Amersham Biosciences). Genotypes for 20 microsatellites were available in the earlier studies for the Romanov sheep  and for the 11 breeds from Finland, Scandinavia, Denmark, Iceland and the Faeroe Islands .
The microsatellite loci were characterized by the total number of alleles, expected heterozygosity or total gene diversity , sample-size-corrected allelic richness  corresponding here expected allele number in a sample of nine diploid individuals, and F-statistics using FSTAT v2.93 . F-statistics were estimated using Weir and Cockerham  method where f and θ correspond to Wright's coefficients FIS and FST, respectively. The genetic relationships among breeds were analyzed using principal coordinate analysis (PCoA) as implemented in PAST v1.73  using the Chord distance .
A model-based Bayesian clustering analysis was used to infer population structure and the level of admixture in the sheep breeds implemented in STRUCTURE v2.2 . The STRUCTURE algorithm assumes K populations, each of which is in Hardy-Weinberg and linkage equilibrium and characterized by a set of allele frequencies at each locus. Analysis was performed with a burn-in length of 20,000 followed by 100,000 Markov chain Monte Carlo iterations for each of K = 1 to 10, with ten replicate runs for each K using independent allele frequencies and an admixture model. Results across ten runs at each K were compared based on similarity coefficients (SC) as previously described in . The breeds were assigned to wide clusters based on major ancestry and submitted to a second round of STRUCTURE analysis performed within each wide cluster.
A linear regression analysis was performed to study the influence of breed ancestry diversity (admixture) on the level of genetic diversity. Ancestry diversity for each breed was calculated as 1-Σ(qk) 2, where qk is an average fraction of the breed's genetic ancestry from the k separate genetic clusters at the optimal K, identified in STRUCTURE analysis. To examine the significance of mixed ancestries as sources of within-breed diversity, the obtained ancestry diversity values were compared with the unbiased expected heterozygozity estimates.
For the geographical plotting of genetic diversity parameters, latitude and longitude values for each breed were obtained from the center of the sample distribution. The ArcView GIS v9.1 (Environmental Systems Research Institute, ESRI, Redlands, CA, USA) was used to map the allelic richness and expected heterozygosity for each breed and the surface was extrapolated to a full rectangle. This was based on the Inverse Distance Weighted interpolation method , which assumes each input point to have a local influence that diminishes with distance. A synthetic map for the distribution of local total gene diversity (HT) and θ calculated for the geographically neighboring triplets of populations was done similarly. Population triplets were formed using Delaunay triangulation method implemented in the program Triangle .
Components of within- and between-breed genetic diversity were calculated based on the molecular coancestry for populations following the method described by Caballero and Toro . The molecular coancestry between two individuals is the probability that two alleles at the locus taken at random from each individual are alike in state. In a structured population with n breeds the molecular coancestry between breeds i and j (fij) is the average across loci and across individuals. Defining the within-breed average coancestry as , the total population coancestry as , Nei's minimum distance as and the average Nei's minimum distance as , then the total gene diversity or expected heterozygosity is partitioned into components within breeds and another between breeds .
The importance of different breeds has been calculated based on the contribution of each breed to a pool of animals or a core set that would maximize its genetic diversity (e.g. [15,44]). In the present study, the core set refers to the smallest set of sheep breeds that still encompasses the neutral genetic diversity in the species using the co-ancestry measure detailed above. These optimal contributions can also be applied with a weighted (λ) combination of within- and between-breed components of gene diversity . Maximizing global diversity is achieved by giving equal weights to within- and between-breed diversity (λ = 1), while maximizing between-breed variation is achieved by ignoring within-breed diversity (λ = 0). Two intermediate λ values were recommended in earlier studies. Piyasatian and Kinghorn  suggested giving five times weight to the between breed variation as to the within-breed variation (λ = 0.2), reflecting the speed by which genetic change can be made across populations compared with selection within one large mixed population. Bennewitz and Meuwissen  proposed a weighting based on maximizing the total genetic variance of a hypothetical quantitative trait, which is equivalent by using a weighting factor of λ = 0.5. These four λ values were applied in estimating the optimal contributions using a simulated annealing algorithm .
MT supervised the molecular analysis, consistency of allele calling, coordinated or performed statistical analysis and wrote the final drafts of the paper. MO did the genotyping and most of the writing and statistical analyses for the first draft. IT had significant contribution both to statistical analyses and manuscript writing. MAT contributed to analysis design and molecular co-ancestry based analyses. NM, MC, GG, TK and MM have collaborated in study design, sampling and interpretation of the results. In addition, MC and TK did part of the molecular analyses. JK was in charge of the overall study including it's design, sample collection, statistical analysis, manuscript writing and coordinating the author contributions. All authors read and approved the final manuscript.
Table S1 - Marker diversity parameters. PDF file with list of microsatellites and their chromosomal location, total number of alleles, expected unbiased heterozygosity, and estimates of within-population (f) and among-population (θ) fixation indices.
Table S2 - Table of the name of sheep breeds, their origin, demographic status and diversity parameters. PDF file with data on per population sample size, expected heterozygosity, within-breed fixation index (f), allelic richness, and number of private alleles.
Figure S1 - Additional synthetic maps. PDF file synthetic maps for within-breed diversity and breed differentiation.
Table S3 - Breed-wise optimal contributions to a core-set for different weightings of the within-breed variation. PDF file with detailed data summarized in Table Table33.
Table S4 - Distribution of core-set contributions using genetic clustering. PDF file with table similar to Table Table3,3, but using genetic clusters instead of regional groups to categorize breeds.
Table S5 - Breeds, having equal contributions to the core set when the number of breeds conserved is fixed. PDF file with table of included breeds when the number of included breeds is fixed at 5, 10, 15 or 20.
This work was financially supported by the Academy of Finland and the Finnish Ministry of Agriculture and Forestry (the SUNARE and Russian in Flux programs). We thank A. Virta, M. Saura and J. Fernández for technical assistance. Our thanks are also to I.A. Kalashnikov, V. Togmitova, and N. Nikolaeva for their help in collecting Buryatian samples. Comments given by Dr Meng-Hua Li are acknowledged. The International Livestock Research Institute (ILRI) in Nairobi, Kenya is acknowledged for providing office space for IT when working on the project.