In this study, we have demonstrated that using a sex-specific approach to analyze results from genome-wide association studies can reveal polymorphisms that are significantly associated with disease in only one sex. We applied our approach to seven common complex diseases to discover novel loci that exhibit putative sex differences in disease association. We proposed a method, PMASE, which discovered sexually dimorphic SNPs in important, previously known disease-associated regions. These findings could shed new light into sex-related differences in disease mechanisms.
We find evidence of sex differences in SNPs in CAD and CD after correcting for false discovery. The CAD SNP rs7865618 is located on chromosome 9p21, an important region known to be involved in coronary artery disease (Helgadottir et al. 2007
; McPherson 2010
). We find that the association of this SNP with disease is male-specific, which is in agreement with the male-bias in incidence of CAD (Lerner and Kannel 1986
). The two loci in CD are in an intronic region of the gene ATG16L1
(Autophagy related 16-like 1), which has been demonstrated in multiple studies to confer increased risk for CD (Cotterill et al. 2010
; Hampe et al. 2007
; Lacher et al. 2009
; Rioux et al. 2007
; WTCCC 2007
). Here we show that multiple loci in this gene have female-specific disease association. The association for rs3792106 in females is stronger than the association of the SNP in males and females combined and is not genome-wide significant in males alone. We previously discovered sexual dimorphism in ATG16L1
using a hypothesis-driven approach focusing on replicated CD risk loci (Liu et al. 2011
), and now we are showing that a more agnostic approach also discovers this result. These SNPs showing evidence of sexual dimorphism in disease risk should be further confirmed in studies with larger overall sample size, however they represent potentially interesting findings, given the known sex differences in prevalence for these diseases.
While we report sex-specific associations that are genome-wide significant in combined-sex analysis, there are likely more SNPs with weaker overall associations but significant sex-specific associations. Our stringent filtering process limits us from finding these SNPs. By focusing only on combined genome-wide significant SNPs, our method may be underpowered for finding effects significant only in one sex. In some cases, the disease cohorts were not sufficiently powered to discover more weakly associated SNPs (some cohorts included only 400 individuals of one sex). It is interesting to note here that the single SNP reported by the WTCCC in their sex-differentiated analysis, rs11761231 in RA, did not meet our Bonferroni correction for disease association significance (nor did it meet WTCCC’s threshold in the original study) and was not included in our analysis. A study that included balanced numbers of individuals and larger cohorts would enable a more thorough investigation of sex-specific effects at a truly genome-wide level. One explanation for the additional SNPs in RA discovered by regression analysis but not by PMASE is that RA is one of the most sex-unbalanced cohorts. By chance, our permutation method may be slightly more likely to generate sampling distributions with many more extreme group differences in these unbalanced cohorts, thus resulting in large PMASE p
values (see Supplementary Fig. 1
). A method for finding sex differences which takes into consideration unbalanced sample sizes and variances is Welch’s T
test on sex-specific regression coefficients as described in Heid et al. (2010)
. Future genome-wide association studies should consider recruiting large numbers of both male and female cases and controls to enable genome-wide sex-specific analysis.
We could have included more SNPs in our analysis and chosen a less-stringent threshold for LD filtering to discover possible secondary associations or SNP–SNP interactions. However, we chose the stringent thresholds to focus on known important regions and to control for multiple testing by choosing only one SNP per region. This could be expanded in future analyses. Due to our requirement of significant disease-associated SNPs as input to PMASE, it was difficult to find SNPs with opposite effects in males and females in which one allele is protective for one sex while the other allele increases disease risk for the other sex (similar to (Sirota et al. 2009
)). These SNPs would not have had significant overall disease associations since the opposite effects in males and females would cancel each other. Therefore, our approach is underpowered to discover flipped effects. Larger overall sample sizes and less stringent thresholds may allow for discovery of these flipped effects on a genome-wide scale. Simulation approaches have shown that if these flipped effects exist, adding sex analysis can increase, rather than decrease, the power of a study (Magi et al. 2010
). Testing more SNPs with PMASE, however, would be more computationally intensive and this is a limitation of the PMASE method in comparison to logistic regression or Woolf, which do not require permutation testing. The Woolf method is able to find flipped effects but at the cost of testing all SNPs and losing many possible findings to multiple testing correction. PMASE is the most appropriate for a two-stage study design like the one we have taken here, in which some initial filtering of redundant hypotheses is first performed.
We considered several alternative explanations for the sex differences we observed. First, we visually inspected the signal intensities from the genotyping chip, and found no sign of genotype calling errors (except for one SNP, which we excluded), or other unusual differences between male and females. Second, we considered the possibility that the significant differences between the two groups could be due to a confounding factor rather than an actual difference in disease association. A possible confounding factor in the context of sex-specific disease association is disease prevalence itself, as a large number of diseases have very different incidence rates in men and women. Given that the control individuals represent a sample of the general population, it is likely that a fraction of those control individuals might have a genetic predisposition to the disease but have not yet been diagnosed or do not exhibit symptoms yet. This is a well-known phenomenon in the context of GWAS that leads to a decrease in the statistical power of the study. Therefore, if the prevalence of the disease is higher in one sex, the statistical power of the study in that sex would be lower, which could explain why a SNP shows stronger association in the other sex. The sex-specific p
values we find for Crohn’s disease are more strongly associated with CD in female even though CD has a higher prevalence in females (Kappelman et al. 2007
). This indicates that we have enough power to detect an effect in females despite the female controls possibly having a higher incidence (and genetic predisposition) of CD. In other words, the large difference we observe in disease association is not likely confounded by a difference in disease prevalence. We find the same situation in the case of CAD, with higher prevalence in males (Lerner and Kannel 1986
) and stronger disease association in males.
An oft-mentioned limitation of GWAS is the “missing heritability” problem. The SNPs discovered using GWAS often explain only a tiny fraction of heritability for a particular disease, and have a small effect on increasing an individual’s disease risk (Manolio et al. 2009
). However, new advances in using GWAS data along with environmental studies and gene expression analysis in mice are yielding new insights into gene–environment interactions (Cadwell et al. 2010
). Studying sex differences can help provide a missing link in terms of different gene-endocrine and gene–environment interactions in males and females. These SNPs can lead to discovery of genes and pathways under differential regulatory control and causing different disease progression in males and females. For example, it is possible that these SNPs lie near binding sites for sex hormones and lead to differential transcriptional control of key genes. This could be investigated in future experiments and may provide insight into novel sex-specific disease mechanisms and help explain why some diseases have a higher prevalence in one sex. These results demonstrate that GWAS and other large-scale genetic association studies should take sex into account, and report sex-specific results.
We have proposed a method to quantify sex differences in disease association in a genome-wide association study, and demonstrated that our approach finds 12 polymorphisms in CAD, CD, RA, and T1D that showed sex-specific association with increased disease risk. After correcting for multiple hypothesis testing, three SNPs are significant using PMASE. We believe that this is one of the first systematic demonstrations of putative sexually dimorphic loci in multiple common complex diseases. Our approach can easily be generalized to apply to any other GWAS in which any binary population feature (not necessarily sex) is present. We propose that the inclusion of sex in the analysis of genetic association studies is necessary to obtain a more complete picture of individual disease risk. A deeper understanding of molecular sex differences will aid in the development of more personalized prevention, diagnosis, and treatment of human disease.