Consistent with previous studies, we find positive selection is acting on loci associated with complex disease, when viewed from the perspective of Genome-Wide Association Studies (). While one might expect positive selection to eradicate risk-associated mutations and favor protective variants, like we find for CD (), we also surprisingly find the strongest selection working in favor of risk-associated alleles in both T1D and RA ( and ).
It may be the case that some risk alleles are positively selected individually or as components of an abstract biological function due to a presently unknown benefit they confer on the host. This implies that there may be hidden benefits associated with some of the RA and T1D positively selected risk alleles. Positive selection in favor of risk alleles may occur if these alleles lead to an advantageous trait in an alternative environmental context (e.g. protection from pathogens). It is worth mentioning that there must be some reason why the ancestral allele characterized ancient populations in the first place in the cases where the derived allele shows strong signs of positive selection. There are many possible explanations for this. Possibilities include genetic drift driving a benign mutation to fixation in ancient populations, only to have this benign mutation become deleterious relative to a different allele in a modern environment. Another possibility is that an ancient environment may have also caused the ancestral allele to undergo positive selection, only for a modern environment to induce the ancestral allele to undergo negative selection.
There is recent evidence suggesting that protection from pathogens helps explains why positive selection has occurred for T1D susceptibility alleles. Rare variants in the antiviral response gene IFIH1
have recently been shown to be protective against T1D 
. The protective alleles are functionally deleterious to IFIH1
, effectively reducing the ability to mediate an immune response against enterovirus infection. Recent immunological studies of beta cells in patients recently diagnosed with T1D have shown an abundance of enteroviral capsid proteins in the islet cells of affected patients, whereas the protein is found to be scarce among the beta cells of healthy controls 
. It is plausible that signatures of recent positive selection in T1D and other autoimmune diseases are due to an overactive immune system driven at least in part by an adaptive immune response to viruses. Due to the evolutionary trajectory of T1D favoring susceptibility alleles and the severe effect on fitness in afflicted individuals, we would expect that this evolutionary event would have happened recently in human evolution. While T1D is rapidly fatal without insulin therapy, there was likely a net selective pressure favoring intense immune responses to enterovirus, even with T1D as an occasional consequence.
RA has been found to originate from Native American populations from the Green River region in west central Kentucky. There are verified cases of RA in this population as far back as 6,500 years ago. No signs of RA were found in 63 archaeological sites bordering the original area in central Kentucky, where it was originally found 
. Yet, there is documented spread in America over time. The first evidence of RA outside the original “catchment” area occurs in western Ohio about 1,100 to 800 years ago. At the same time, virtually no incidence of RA in other parts of the world has been found towards the end of the pre-Columbian era in 1785. This suggests that some environmental factor, perhaps a microorganism or allergen, might play a critical role in the cause of RA 
. Our analysis reveals that there is a huge disparity in positive selection scores between alleles increasing and decreasing susceptibility to RA. Susceptibility alleles show very strong signs of positive selection, while alleles decreasing susceptibility are nearly devoid of any signs of positive selection (). The history of RA helps explain why susceptibility alleles show signs of positive selection in European-derived populations (). Since RA was non-existent in these populations during the pre-Columbian era, there were probably no disadvantages to selecting for the genetic-basis of RA. Indeed, there may have been many benefits associated with selecting for RA susceptibility alleles. Tuberculosis is responsible for millions of deaths worldwide in recent human history, with one in four deaths caused by tuberculosis in Western Europe in the 19th
century alone. It is suspected that this disease has historically acted as a powerful selective force. There is a stark correlation between populations having higher incidence of tuberculosis also having lower incidence of RA, and vice versa. It has been speculated that genetic variants enhancing resistance to tuberculosis underwent positive selection and provide the genetic basis for RA susceptibility today 
. Our analysis is completely compatible with this theory, since we produce evidence that RA susceptibility alleles have undergone positive selection. In addition, tumor necrosis factor inhibitors alleviate symptoms of RA while simultaneously increasing the risk of infection from tuberculosis, Myobacterium marinum tenosynovitis
, fungal infection, and other opportunistic infections 
. It is clear that factors increasing susceptibility to RA also decrease susceptibility to infectious disease. RA and T1D are known to share associated variants 
. This may partially explain why there is a small, but detectable signal of positive selection in alleles decreasing susceptibility for RA. The evolutionary history of RA is unique in that a precise date of introduction of RA into European-derived populations has been established. We have shown that strong positive selection of RA susceptibility alleles is observed, most likely due to altered ability to fight infectious disease without increasing the risk of RA itself until the pre-Columbian era ended.
Not much is known about the history of Crohn's Disease (CD) as it does not leave unambiguous signs in skeletal remains, as is the case with RA. It is known that the incidence of CD increased during the 19th
century in industrialized countries. The rate of CD increases as under-developed countries become more industrialized (e.g. Japan and Brazil) 
. Many bacteria are implicated in CD, including anaerobic organisms, paratuberculosis, Boeck's sarcoid, and mycobacteria. Mycobacterial paratuberculosis
infection of the terminal ileum in cattle (Johne's disease) resembles also closely resembles Crohn's disease, which has suggested possible bacterial associations with CD 
. Unlike RA and Type 1 Diabetes, CD shows more positive selection for alleles decreasing susceptibility to disease than for those increasing susceptibility. It may be the case that CD is in fact an ancient disease, the incidence of which was reduced due to natural selection against CD, only to see resurgence due to the advent of modern environments. However, many other possible scenarios could explain our findings, including shared genetic variants with a disease or trait that has undergone negative selection. CD is unique in the sense that while selection is detected as in RA and T1D, alleles decreasing susceptibility for CD are under positive selection, indicating a very different evolutionary history.
We acknowledge several limitations in our analysis. Controlling for LD by selecting only one SNP in each haplotype block after partitioning the genome may have complications in some regions of the genome. Haplotype blocks intuitively capture LD, but lack of complete haplotype block coverage (the fraction of the genome that is found neatly within haplotype blocks) complicates this approach 
. More complex methods to control for LD will be considered for future works requiring a similar analysis. Another complicating issue is that both iHS and LRH belong to the same class of analytical methods for detecting selection, and it is not surprising that they indicate similar results. Yet, it has been shown that these two methods are in some ways complementary as they are better at detecting selected SNPs at different allele frequencies 
. Overall, the results under iHS match the results produced with LRH with some changes in the magnitude of selection pressures on some diseases leading to more diseases appearing in the random neutral region in versus Figure S1
(more details on limitations in Materials and Methods S1
). We acknowledge that it is unknown whether or not the most associated SNPs are causative; leading to confusion when we discuss selection for risk-associated alleles as the causal SNP may show the opposite selection pattern, that is, stronger selection of the protective allele. While this is certainly a possibility, it is unlikely to occur often. If a SNP has a very low p-value of association to a disease due to its proximity to the causative allele, it implies strong LD between the two SNPs. Due to strong LD, the risk-associated allele between the non-causative and the causative SNPs are more likely to be on the same haplotype block, making the risk-associated allele in a SNP in strong LD with the causative SNP an appropriate proxy. In addition, it should be noted that all discussion on the reasons for positive selection acting on these diseases necessarily remains speculation.
In summary, we observed stark heterogeneity in the overall patterns of positive selection across seven diseases. We find that the SNPs associated with T1D, RA, and CD show strong signs of positive selection. We also find that positive selection favors risk-associated alleles in T1D and protective alleles in CD, which is indicative of an evolutionary trajectory towards increasing and decreasing risk, respectively. In addition, we have demonstrated that selection analyses of GWAS results can complement and augment the basic p-value of association attributes () as many regions appear to exclusively favor selection of risk or protective alleles.