Identifying positive genomic selection in domestic animals is a major challenge in contemporary agricultural research. To date only a small number of examples have successfully identified genomic regions subject to positive selection in domestic animals [1
]. Increasing the understanding of positive selection and how it shapes genetic variation in domestic animals has the potential to provide powerful insights into the mechanisms involved in evolution, help target loci for selection and possibly highlight the genetic basis of phenotypic diversity for complex traits [5
]. Domestic animals provide a unique opportunity to detect positive selection due to their extensive diversity amongst breeds, increasing availability of sequence data and large databases of polymorphisms that are accruing in domestic species like Bos taurus
Data on polymorphisms can provide evidence of selection if the patterns in the data are incompatible with a neutral model [12
]. For instance, the neutral model with constant effective population size predicts that most polymorphisms will have one common allele and one rare allele. More specifically, if p is the frequency of one of the two alleles chosen at random and f(p) is the distribution or spectrum of all polymorphisms where one allele has frequency p, then f(p) = k/(p(1-p)) where k is a constant. Tajima's D statistic [13
] measures the extent to which real data differs from this theoretical expectation. Tajima [13
] suggests that changes in the frequency spectrum of neutral polymorphic alleles can be used to detect a hitchhiking effect due to the spread of linked advantageous mutations. Therefore, high values of D indicate that common polymorphisms are more frequent than expected from the neutral theory and this is a result of genetic hitchhiking. However, polymorphisms are discovered by methods that tend to find common variants and this ascertainment bias can also generate an excess of polymorphisms with intermediate allele frequency.
The test for departure from expectation can be made more powerful if it is possible to distinguish the ancestral allele from the derived or mutant allele at each locus. If p is the frequency of the derived allele, then the distribution of all derived alleles is f(p) = k/p. Fay and Wu measure departure from this expectation with their H statistic [14
]. If derived alleles are found at high frequency more often than expected, then H will be positive. They suggest that selection causes a positive H statistic, because selection sometimes drives the derived allele to high frequency. This can occur if the polymorphisms observed are subject to selection themselves, but can also occur at neutral loci as a result of hitchhiking caused by selection acting on linked loci. This makes H a very useful test for selection because most polymorphisms are discovered randomly and few of them are likely to be directly subject to selection.
Unfortunately, both D and H can depart from expectation for reasons other than selection [13
]. The way in which polymorphisms are discovered usually means that low frequency polymorphisms are less likely to be discovered than one with alleles at intermediate frequency. D and H are also affected by changing effective population size (Ne
). If Ne
declines, polymorphisms with one rare allele become less frequent and the frequency spectrum becomes flatter. In this way a decline in Ne
(i.e. inbreeding) can mimic selection [16
]. Therefore, detecting unambiguous examples of positive selection has been difficult due the difficulty of many methods to differentiate between positive selection and demographic history. This is of particular concern in domestic species where SNP discovery typically involves some ascertainment bias and demographic fluctuations coupled with strong directional (artificial) selection, which have played important roles in the formation of domestic breeds [19
The problem of ascertainment bias will result in an observed allele frequency spectrum that is more flat than that predicted by theory. However, it is possible to construct a test that is not affected by this ascertainment bias if derived and ancestral alleles can be distinguished. Since f(p) = k/p for derived alleles with frequency p [14
], the frequency spectrum for all ancestral alleles with frequency 1-p is f(1-p). The spectrum for all alleles with derived or ancestral allele frequencies p or 1-p is then f(p) + f(1-p), which is equal to f(p(1-p)), see above. So neutrality predicts that the proportion of these alleles where the ancestral allele is p is f(1-p)/[f(p)+f(1-p)], which is equal to p. Assuming that the polymorphism discovery method cannot distinguish ancestral and derived alleles, this expectation for different p intervals is not affected by the ascertainment bias. It has only been tested for p from 0 to 0.5, since the value of any f(1-p)/[f(p)+f(1-p)] is 1-(value at 1-p). Also, because selection does not typically affect all parts of the genome equally, selection and demographic phenomena can be compared. For instance, a selected allele can drag derived alleles that are closely linked to high frequencies by hitchhiking. Therefore, selection should cause an autocorrelation of high frequency derived alleles between one locus and the next on the chromosome. To test if the observed autocorrelation could be due to inbreeding, we have used a simulation study to demonstrate the effect of inbreeding in the absence of selection and compared the results with those found in real data.
Recently it has become possible to assay large numbers of polymorphisms in cattle and this offers a new source of data with which to detect evidence of selection. In this paper we use data from two breeds of cattle (Angus and Holstein) each genotyped for over 7,500 SNPs using the Parallele/Affymetrix platform. By also genotyping these SNPs on 3 species related to Bos taurus (Bison, Yak and Banteng) we have been able to distinguish the derived and ancestral allele at each locus and use this information to test for deviations from neutrality.
The comparison between the allele frequencies in the Angus and Holstein breeds might also contain evidence of selection since they have been selected for different traits. However, their allele frequencies also differ due to genetic drift caused by finite population size or inbreeding. The difference in allele frequencies can be quantified by the statistic Fst. Inbreeding should affect all loci equally and genetic drift should affect loci randomly and not show any linkage disequilibrium between adjacent loci, but we hypothesise that selection will drive linked derived alleles to high frequency in one breed but not the other. Therefore selection should cause higher values of Fst among loci where the derived allele is common than when the ancestral allele is common. We examine how Fst between Angus and Holstein changes with allele frequency and compare the result to that obtained with the simulated data.