In the era of comparative genomics, strong signals of conservation across multiple species serve as signposts that can indicate regions where evolutionary forces may be preserving functional elements that are subject to purifying selection (e.g. 
). By contrast, signals of positive selection pointing to adaptive changes in one lineage are harder to find, often employing sets of polymorphic sequences from multiple individuals of the same species. We exploited the two recently developed techniques of genomic enrichment and high throughput sequencing to characterize the polymorphism in a single human population across 40kb neighborhoods of the 49 HARs (harseq regions). We investigated the harseq regions because the HARs were defined based on a presumption that the human lineage specific fixed differences therein might have arisen due to adaptive evolutionary forces. On the other hand, it has been emphasized by some that the presumably evolutionarily neutral mechanism of BGC can influence the frequency spectra at polymorphic positions, or cause fixation of alleles in a way that partially mimics the action of adaptive evolution. Indeed, fixation bias was noted in connection with the limited set of human specific alleles for some of the HARs when they were first described 
With the extensive novel polymorphism in our samples, we were able to carefully characterize fixation bias — both historical and ongoing — in the harseq regions and to conduct tests for recent selective sweeps across these regions. Our deep resequencing data is noteworthy because it eliminates issues of SNP ascertainment bias that could have skewed previous investigations of polymorphism near HARs. We applied several established population genetic tests, as well as an application of the MWU test, to identify differences in the fixation patterns of W2S and S2W mutations.
Consistent with published reports 
, we find evidence of historical W2S fixation bias in harseq regions. Using a MK test, we compared the proportion of W2S mutations among already fixed substitutions on the human or chimp lineage to that among the still segregating sites in our samples. We found that 11 of our 49 regions show statistically significant evidence of historical bias in allele fixation, with all but one favoring W2S fixation. These results strengthen and expand previous findings by identifying signals for W2S bias in much larger regions flanking the core HAR regions in an ascertainment-free population sample.
This study goes beyond previous approaches by also looking at ongoing W2S fixation bias in the segregating site frequency spectrum. We performed a MWU test using only sites that are still segregating in the human population, separating out W2S from S2W mutations. This second test is designed to detect a phenomenon of bias that is currently driving W2S mutations to higher frequency in the population than S2W mutations. We found statistically significant evidence for this bias (and none in the opposite direction) in the regions flanking 11 of 49 HARs.
For both of our tests, we showed that the core HAR element is generally not the main source of the signal that we detected, since the signal usually remains strong even when we mask out the central 1kb or even 5kb of the region. This is not consistent with BGC due to a recombination hot spot that has remained in the same location for millions of years, because the length scale of the effect of BGC is set by the length of the heteroduplex tract formed during recombination that needs to be repaired, which is thought to be
500bp (e.g. 
). However, it is consistent with a model in which the location of recombination hotspots drift fairly rapidly over evolutionary time scales, but may be denser in some regions 
It is noteworthy that there was little overlap in the regions identified by these two tests, one for older W2S fixations and the other for present day forces toward fixation, with a total of 20 found in one or the other. Although this is consistent with the hypothesis that the regional focus of BGC, which may be recombination hot spots, drifts significantly on a time scale of many hundreds of thousands or millions of years, we also found from simulations of GC-biased evolution over these time scales that the relatively minimal overlap between the tests is not unexpected.
Another explanation for W2S fixation bias near HARs is selection for increased GC-content or individual fitness-improving GC alleles. To investigate these hypotheses and to attempt to disentangle the possible roles of BGC and positive selection in shaping the HARs, we applied a recently developed powerful method for detecting selective sweeps. Selection was previously investigated in much larger (
500kb) regions using more sparse polymorphic loci 
. That study found 101 regions with strong evidence for a selective sweep within 100kb of a known gene. Here, we found only 5 possible candidates for such sweeps among our 49 target regions (and none that were significant after correction for multiple hypothesis testing). Three of these candidates overlap regions with significant evidence of historical (2) or ongoing (1) W2S bias. As we are dealing with a lineage-specific evolutionary period of about 5 million years, and these tests can only see back a few hundred thousand years, it is quite possible that the original signal for selective sweeps in these regions has already decayed beyond our ability to recognize it in human population genetic data. That is, the lack of evidence for recent sweeps does not rule out the possibility that some of the excess substitutions in HARs were fixed by older selection. Similarly, the evidence for GC-biased evolution based on current population genetic data may not fully reflect patterns of polymorphism in the past.
Consistent with the idea that HAR regions may have experienced positive selection too long ago to be detected with population genetic methods, very few positively selected regions in the human lineage have been identified to date, despite the existence of numerous public databases. Selective sweeps that have been identified have typically been the product of very recent events in human history, such as dairy farming affecting the lactase gene 
or climate differences influencing a salt sensitivity variant 
. Such environmental or cultural changes result in differences in the genetic makeup of disparate human populations, and such differences can be exploited to find evidence of recent, possibly still ongoing, selective sweeps.
An alternative hypothesis that deserves consideration is that HARs may have an unusually high level of recent substitution due to a recent relaxation in purifying selection along the human lineage (e.g. 
). Using previously described methods 
, we compared estimates of the rates of substitution in the 49 HAR elements to the neutral rate. We find that the human substitution rate exceeds the expected neutral rate in all 49 HARs, while this is true for the chimp substitution rate in only 10 HARs. Furthermore, in 33 HARs the human substitution rate significantly exceeds the neutral rate (Poisson p-value
) while none of the chimp substitution rates significantly exceed the neutral rate. This evidence argues against the hypothesis that these HAR elements are the product of relaxed selection.
We have focused in our study on 40kb neighborhoods of 49 HAR elements (and 13 similar control regions) because of their intrinsic interest but also because the scope of our study was appropriate to the state of the art of recently emerged enrichment and sequencing technologies. As larger data sets become available we will be able to apply our analysis on a genome-wide scale. Such analysis should give us insights into the properties associated with genomic regions that display this ongoing W2S fixation bias and their potential biological consequences.
Despite the evidence that the unusually high level of recent substitution in the more extreme HAR elements, such as HAR1 and HAR2, could be due to the process of BGC, there is ample evidence that these genomic elements remain functional, and thus the effect of BGC was to mutationally stress but not destroy these elements. HAR1 shows a very strong pattern of compensatory substitutions within its RNA helix structures, indicating a selective force to maintain these helix structures. The W2S substitutions all strengthen the RNA helices of HAR1, and in one case, a substitution appears to extend one of them. Human HAR1 and HAR2 both show evidence of specific function, the former by its highly specific expression pattern during neurodevelopment and the latter by its ability to enhance gene expression during limb development. Whether the human-specific evolutionary changes to these elements reflect a process that was essentially like swimming upstream against an onslaught of non-selective BGC just to keep in place on the fitness landscape, or whether the mutational stress pushed these elements into a configuration that enabled some positive selection for higher fitness in humans, remains to be seen.