Before calculating FST, we used a PCA to determine how the animals should be allocated to groups and the degree of divergence between samples. Using the individual genotypes as the data in the PCA (Figure ), PC1 explained 6.1% and PC2 explained 3.7% of the variance. We found that PC1 separated out the zebu Brahman breed from the taurine breeds. Animals with known recent part zebu ancestry, such as the Santa Gertrudis and the Australian Friesian Sahiwal, were also separated from the taurine breeds along the PC1 axis, although not to the same extent as the Brahman. The Belmont Red, which in principle does not have recent zebu ancestry but in practise may have a small percentage of Brahman ancestry, was also separated along PC1 to the same extent as the Santa Gertrudis. The taurine breeds were partly separated along PC2. In general the animals of one taurine breed clustered together, but the clusters partly overlapped those of other breeds located near by. Animals of the Holstein dairy breed were located at one end and animals of the Angus beef breed were located at the other end of the spread of breeds along PC2. The Hereford breed occurred at the intersection of the axes of PC1 and PC2. The locations of the Angus, Hereford and Holstein are consistent with the process of SNP discovery, where most SNP in the 10 K SNP set were obtained by comparing the Holstein and Angus to the Hereford. The complete overlapping of the Angus and Murray Grey was expected given the role of the Angus in the development of the Murray Grey. The slight separation between the Angus and Murray Grey breeds and the other taurine breeds may be due to the absence of a full range of breeds rather than any particular distinctness of the Angus and Murray Grey from the others. The breeds that were furthest apart on these two axes in this study are the Angus, Brahman and Holstein. Based on the PCA, the dairy animals were lumped into four groups of reasonable size for the FST analysis, as indicated in Table . The Australian Friesian Sahiwal were excluded from the FST calculations because of their small sample size and lack of affinity to other breeds.
Animals clustered on the basis of principal components of genotypic variation. The crossbred dairy samples cluster mainly with the Brown Swiss and Channel Island breeds.
Comparing this plot of PC1 and PC2 to the plot of the first two components in the Bovine HapMap study, PC1 also separated the zebu from the taurine breeds in that study while PC2 partly separated the taurine breeds as a series of partly overlapping clusters. PC1 clearly separated the one sanga breed, the N'Dama, from the zebu breeds, to the same distance as that between the zebu and taurine breeds. PC2 clearly separated the taurine from sanga breeds. There were no purebred sanga animals among the Australian samples. The one major difference was the Hereford breed in the Bovine HapMap study, which was located well away from the other breeds as a loose, flat cluster. Of the other taurine breeds in the Bovine HapMap study, the Angus and Holstein were the furthest apart in the plot of PC1 and PC2.
To characterise the differences in per locus FST between this and the Bovine HapMap study, we compared several groupings of breeds and loci (Table ). There were minimums of 32224 SNP in the Bovine HapMap data and 8644 SNP for the Australian data with a per locus FST value when all populations were used. The Australian data showed lower per locus FST values than the Bovine HapMap data with a lower amount of dispersion around this mean value. In both studies, the mean per locus FST was larger when only the three divergent breeds Angus, Brahman and Holstein were used. In the Bovine HapMap data this was so whether all loci or only the loci in common to both studies (N = 7298) were used. The difference in mean per locus FST value between the Australian sample and the Bovine HapMap sample was less when only the three divergent breeds were compared. Although these mean differences are small, they are all significant because of the large samples of loci used. The dispersion of FST values was also greater in the three breed comparisons than when all breeds where used. To determine if removing some breeds would make a difference to the estimates, we removed the two African breeds, N'Dama and Sheko, and the two zebu breeds, Gir and Nelore, from the Bovine HapMap data. Removing the N'Dama and Sheko removed animals that were highly divergent on PC2 of the Bovine HapMap data, leaving only the partly overlapping European breeds. Removing the Gir and Nelore reduced the number of zebu breeds to one. This made the breed composition more similar in the two studies. It also still included the Angus, Brahman and Holstein. For all loci in the Bovine HapMap data the average FST = 0.126, S.D. = 0.0722 (N = 32470). This showed a reduction in the per locus FST, of nearly half the difference between the mean per-locus FST value of the full Bovine HapMap and Australian samples. This reduced set of breeds still showed the lower dispersion in per locus FST values found in the full sample of breeds compared to the three breed estimates.
The per locus FST values in different data sets
To determine whether the global pattern of FST across the genome was the same in both this and the Bovine HapMap study, the windowed FST was plotted against location in the genome (Figure ). There were obvious differences in the locations of the major peaks. The difference in height along the ordinate was consistent with the average difference found for FST (cf. above). In particular, locations on chromosomes 7, 10, 12, 14 and the X were obviously different. The windowed FST values between the two data sets at each locus were correlated with r = 0.346, N = 7298, P = 0. This indicated that genome location explained 12.0% of the variance in that comparison.
Figure 2 Genome wide picture of positive selection. The distribution of FST for all breeds calculated in a sliding 8 SNP window along the chromosomes, with the Bovine HapMap values plotted on the same axes as the values calculated in the Australian cattle sample, (more ...)
The relationship between genome location and FST was explored in more detail by using subsets of breeds and subsets of loci to determine which had the more important influence on whether a signal appeared in a particular location. Using only the loci in common with all breeds in both data sets, the un-windowed FST values showed a correlation of r = 0.615 (Figure ). Due to the number of common loci or common windowed points in common, N = 7298, all of the correlations reported below are highly significant, with P << 0.0001. For the subset of three divergent breeds the un-windowed FST values showed a correlation of r = 0.787, or 63.5% of the variance in FST (Figure ). The three breed comparison had a broader range of FST values. Comparison of the three breed to the all breed correlation for un-windowed FST values showed that changing the breed composition in this experiment resulted in a 40.4% decrease in the amount of the variance in FST explained across the genome.
The FST values in the sample of Australian cattle plotted against those from the Bovine HapMap study. The values for all breeds are in black and the values for the Angus, Brahman and Holstein are in blue.
To determine the effect of the specific loci on the distribution of FST, we compared the windowed FST values in this study to the windowed FST values in the Bovine HapMap study. The windows are for 8 adjacent loci so the composition and density of loci contributing to each reference point along the genome differed in the two data sets. For all loci the correlation was r = 0.346 for the all breed FST values and r = 0.391 for the three breed FST values. Comparing the correlation of three breed windowed FST values of the Bovine HapMap data and the Australian data to the correlation of the un-windowed values for those data shows a reduction in the variance explained of 76% (r = 0.391 vs r = 0.787). Comparing the all breed windowed to un-windowed FST values in the same way shows a reduction in the variance explained of 68% (r = 0.346 vs r = 0.615). This reduction was smaller than the three breed comparison but the full breed comparison includes not only differences in SNP between studies but also differences in the number and composition of breeds.
To determine the importance of differences in SNP density between the two studies, windowed FST values for BTA 6, 14 and 25 were compared to the other chromosomes. These three chromosomes have 2–3 times higher density than the other chromosomes in the Bovine HapMap data, which in their turn have a 2–3 times higher density in the Bovine HapMap data than in the Australian data. For all breeds, comparing the Australian to the Bovine HapMap data, the correlations were r = 0.382 for comparisons of BTA6, 14 and 25 combined and r = 0.344 for the other chromosomes combined. If the loci that are windowed are only the common ones between the two studies, that is, no difference in density, then the correlation between windowed FST values is r = 0.640, essentially the same as the un-windowed values (r = 0.615).
We calculated the confidence interval for the per locus FST
to be considered significant using bootstrap sampling. The 99.9% confidence interval for the per locus FST
in our data was 0.094 ± 0.062 while the confidence interval in the Bovine HapMap data for the same loci was 0.141 ± 0.121. In the Australian sample, the top 2.5% corresponded to a threshold FST
= 0.224 and the bottom 2.5% corresponded to a threshold FST
= 0.015, both outside the confidence interval for that dataset. In the Bovine HapMap data, the top 2.5% corresponded to a threshold FST
= 0.284, which is outside the 99.9% confidence interval and the bottom 2.5% corresponded to a threshold FST
= 0.039, which is outside a 99% confidence interval for that dataset. There were 94 loci that had an FST
above the upper thresholds in both data sets, or 1.28% of 7298 SNP. There were 35 loci that had an FST
below the lower thresholds in both data sets. For the SNP above the threshold in both data sets, the 94 SNP were located in 71 genomic regions of 1 Mb containing one or more SNP with high FST
(Additional File 2
). For the SNP below the threshold in both data sets, almost none of those SNP were close to another SNP with low FST
. Some of the low FST
values are negative, which may occur when estimating an FST
value near zero, but which may also occur in some cases where the expected variance calculated from the average allele frequency for the entire sample is sufficiently less than the sample variance in allele frequencies across populations, or which may occur in some cases where the heterozygote frequencies within populations are higher than expected given the allele frequencies.
The average δFST between the SNP pairs was strongly influenced by the distance between them. The δFST increased quickly with distance and reached a plateau after approximately 20 kb (Figure ). The average δFST associated with the bin sizes of <1 kb, 1–10 kb and 10–20 kb were all significantly different from each other, and each increase in average distance was accompanied by an increase in average δFST until the plateau was reached. The plot showed broader standard errors for comparisons in the 20–100 kb region, but this was due to smaller numbers of SNP separated by those gaps. For example, there were 2061 snp pairs separated by < 1 kb in that plot, 402 pairs separated by 10–20 kb, and between 73 and 129 pairs in each of the 10 kb bins from 20–100 kb. The plot shows that the bins between 20–100 kb had similar means. For the combined bin 20–100 kb, corresponding to a mean separation of 56.7 kb between SNP pairs, the mean δFST = 0.051, SE = 0.0017, N = 734.
The increase in difference in FST between adjacent SNP as the distance between adjacent SNP increases in the Australian cattle sample. The mean value of each bin is plotted with its standard error.
Fourteen of the 129 SNP with extreme FST in both data sets had an effect on one of the three traits (RFI, yield and IMF), six of which had an effect on more than one trait (Table ). The same homozygote in one of the six SNP increased both yield and IMF, and the same homozygotes in the remaining five SNP increased both IMF and RFI. The effects that were significant for more than one trait were all in the same breed type. Three of the six SNP were located to one small region of bovine chromosome 2. Counting these three as one independent locus, four (ie 6-2) of the 12 (ie 14-2) independent SNP had effects on more than one trait, or 33%. Only 13.9% of the SNP in the entire experiment that had an effect on RFI also had an effect on IMF and only 7.0% that had on effect on yield also had an effect on IMF. This difference in frequency, while large, was not significant (χ21 = 3.78 n.s.). All of these SNP had FST values greater than the upper threshold, representing divergence between the breeds. The observed frequency of 0 low FST versus 14 high FST, compared to the expected frequency of 35:94, was significant (goodness of fit χ21 = 5.21, P < 0.05). These loci all had much larger FST values in the three breed sub-sample than in the full breed sample.
Trait associations and high FST values
The region on BTA2:64.7 Mb that showed effects on two traits was examined in greater detail. There are eight SNP in this region separated by a total of 83.3 kb set in a region containing 31 SNP separated by a total of 9.86 Mb. Thirteen of the loci (Table ) showed a significant additive or allele substitution effect on RFI or IMF in either temperate, tropical or the combined sample. Six of the 13 SNP showed an association but the FST value was within the confidence interval for the data set. Nine of the 13 SNP showed an association and had average allele frequencies of between 0.1 and 0.9 across all breeds. Only the three SNP with the highest FST values had effects on both traits.
Significant associations to traits across 10 Mb of BTA2