Endometriosis (MIM131200) is a common gynecological disease associated with severe pelvic pain, affecting 6-10% of women in their reproductive years
3,4 and 20-50% of women with infertility
5. Endometriosis risk is influenced by genetic factors and has an estimated heritability of around 51%
3.
Two large endometriosis GWA studies
1,2 have reported genome-wide significant associations. The first, in a Japanese sample of 1,423 cases and 1,318 controls obtained from the BioBank Japan (BBJ), with 484 cases and 3,974 controls for replication, implicated a SNP (rs10965235) in the
CDKN2BAS gene on chromosome 9p21.3 (overall odds ratio (OR) = 1.44, 95% CI 1.30–1.59;
P = 5.57 × 10
−12)
1. The second, by the International Endogene Consortium (IEC) in a sample of European ancestry from Australia (2,270 cases and 1,870 controls) and the UK (924 cases and 5,190 controls), with 2,392 cases and 2,271 controls from the US for replication, identified an intergenic SNP (rs12700667) on 7p15.2 (overall OR = 1.20, 95% CI 1.13–1.27;
P = 1.4 × 10
−9)
2. These two studies did not report replication of each other’s top locus, partly because rs10965235 is monomorphic in Caucasian populations. The European study did find association with rs7521902 (OR = 1.16, 95% CI 1.08–1.25,
P = 9.0 × 10
−5) near the
WNT4 gene on 1p36.12, that was reported to be suggestively associated in the Japanese (OR = 1.20, 95% CI 1.11–1.29,
P = 2.2 × 10
−6).
Encouraged by the
WNT4 association and with accumulating evidence for many complex traits that the number of discovered variants is strongly correlated with experimental sample size
6, we sought to increase the ratio of controls to cases in the Australian GWA cohort and to perform a formal meta-analysis of the Australian (QIMR), UK (OX) and Japanese (BBJ) GWA data.
To increase the power of the Australian GWA dataset we matched the existing QIMR cases and controls
2 on ancestry to individuals from the Hunter Community Study (HCS)
7. After stringent quality control (QC), the combined QIMRHCS GWA cohort consisted of 2,262 endometriosis cases and 2,924 controls, increasing the number of controls by 1,054 and the Australian effective sample size by 24%. We also performed more stringent QC incorporating the OX dataset, resulting in a revised OX GWA cohort of 919 endometriosis cases and 5,151 controls. All cases in the QIMRHCS and OX studies have surgically confirmed endometriosis and disease stage from surgical records using the rAFS classification system
8, subjects are grouped into stage A (stage I or II disease or some ovarian disease with a few adhesions;
n = 1,680, 52.8%) or stage B (stage III or IV disease;
n = 1,357, 42.7%), or unknown (
n = 144, 4.5%). Details of the final GWA and independent replication case-control cohorts are summarized in and a schematic of our study design is provided in .
| Table 1Summary of the endometriosis case-control cohorts |
Meta-analysis of all endometriosis 4,604 cases and 9,393 controls for the 407,632 SNPs overlapping in the QIMRHCS, OX and BBJ GWA data, showed that the A allele of rs12700667 at the European 7p15.2 locus (OR = 1.22, 95% CI 1.13–1.31, P = 7.2 × 10−8) also replicates in the Japanese GWA data (OR = 1.22, 95% CI 1.07–1.39, P = 3.6 × 10−3), producing an overall OR of 1.22 (95% CI 1.14–1.30) and P = 9.3 × 10−10 in the GWA meta-analysis; we also confirmed association with allele A of rs7521902 at the 1p36.12 WNT4 locus (OR = 1.18, 95% CI 1.11–1.25, P = 4.6 × 10−8) ().
| Table 2Summary of the GWA and replication study results for the seven genome-wide significant loci |
The GWA meta-analysis identified a novel locus on 12q22 near the
VEZT gene (allele C of rs10859871 OR = 1.18, 95% CI 1.12–1.25,
P = 5.5 × 10
−9). We also established association with allele G of rs13394619 in the
GREB1 gene on 2p25.1 (OR = 1.12, 95% CI 1.06–1.18,
P = 2.1 × 10
−5), previously reported (OR = 1.35, 95% CI 1.17–1.56,
P = 3.8 × 10
−5) in a small independent Japanese GWA study of 696 cases and 825 controls by Adachi et al (2010)
9. The G allele of rs13394619 approached conventional genome-wide significance (
P ≤ 5 × 10
−8) in combined analysis of the QIMRHCS, OX, BBJ, Adachi500K and Adachi6.0 GWA data (OR = 1.15, 95% CI 1.09–1.20,
P = 6.1 × 10
−8) (). In addition to the three genome-wide significant SNPs on chromosomes 1, 7 and 12 (rs7521902, rs12700667, rs10859871), the Manhattan plot of the all endometriosis GWA meta-analysis results (
Supplementary Fig. 1) showed 34 SNPs reached genome-wide
suggestive association (
P ≤ 10
−5).
Given the substantially greater genetic loading of moderate to severe (Stage B) endometriosis (rAFS stage III or IV disease) compared to minimal (Stage A) endometriosis (rAFS stage I or II disease)
2, a secondary analysis was performed for the SNPs reaching genome-wide suggestive association, where the association results from QIMRHCS and OX Stage B cases versus controls, were meta-analyzed with the BBJ association results (stage information not available).
After excluding endometriosis cases with minimal (rAFS stage I-II) or unknown severity in the QIMRHCS and OX cohorts, GWA meta-analysis implicated novel loci on 2p14 (allele C of rs4141819 OR = 1.22, 95% CI 1.14–1.32,
P = 6.5 × 10
−8), 6p22.3 (allele T of rs7739264 OR = 1.21, 95% CI 1.13–1.30,
P = 5.8 × 10
−8) and 9p21.3 (allele C of rs1537377 OR = 1.22, 95% CI 1.14–1.30,
P = 1.0 × 10
−8) (
,
Supplementary Fig. 2,
Supplementary Table 1-
2 and
Supplementary Note).
Annotated plots showing evidence for association in the combined QIMRHCS, OX and BBJ GWA data of genotyped SNPs across the seven implicated loci from the analysis of all cases and of stage B cases only are provided in
Supplementary Figs. 3-
9. Imputation up to the 1000 Genomes reference panel produced more significant
P values and helped resolve the associated region at the 1p36.12 (rs56318008,
Pall = 1.3 × 10
−10), 2p25.1 (rs77294520,
PstageB = 8.6 × 10
−8), 2p14 (rs2861694,
PstageB = 7.9 × 10
−9), 6p22.3 (rs6901079,
Pall = 1.9 × 10
−8), 9p21.3 (rs7041895,
PstageB = 5.1 × 10
−10) and 12q22 (rs11107968,
Pall = 3.9 × 10
−9) loci ( and
Supplementary Figs. 10-
16). Of particular note, the most significant imputed SNPs on 1p36.12, rs56318008 and rs3820282 (
Pall = 1.6 × 10
−10), are located 22 bp 5′ and
within the
WNT4 gene, respectively.
Interestingly, the most associated genotyped SNP at 9p21.3 (rs1537377) is 55 kb centromeric to the genome-wide significant SNP reported in the original BBJ GWA
1 (rs10965235) located in the
CDKN2BAS gene, and 49 kb 3′ to the transcription end site of
CDKN2BAS. SNP rs10965235 is monomorphic in Caucasian populations and we investigated the independence of rs10965235 and rs1537377 in the BBJ GWA data. Firstly, in the BBJ GWA data, alleles of rs10965235 and rs1537377 are very weakly correlated, with linkage disequilibrium (LD) metrics of
r2 = 0.028 and
D′ = 0.461. Secondly, the allelic association
P values for rs10965235 and rs1537377 are
P = 1.6 × 10
−4 and
P = 1.8 × 10
−2, respectively. After conditioning on rs10965235, weak residual association remains at rs1537377 (
P = 9.0 × 10
−2). Consequently, the data suggest there may be two independent genetic risk factors near the
CDKN2BAS locus on 9p21.3.
CDKN2BAS is a long non-coding RNA adjacent to and transcribed from the opposite strand to
CDKN2B (p15),
CDKN2A (p16) and
ARF (p14). Loss of heterozygosity of
CDKN2A and hypermethylation of the
CDKN2A promoter have been reported in endometriosis
10,11.
To further validate the seven SNPs implicated by the meta-analysis, we carried out a replication study using a cohort of 1,044 cases and 4,017 controls obtained from the BioBank Japan independent of the BBJ GWA cohort. As shown in the forest plots of risk allele effects estimated using all cases versus controls (), the effects (ORs) were in the same direction for all seven implicated SNPs across the GWA and replication cohorts. With the exception of rs12700667, which was previously replicated (
P = 1.2 × 10
−3) in 2,392 cases and 2,271 controls from the US
2, and rs4141819 (with a marginal
P = 5.1× 10
−2), all SNPs were replicated at the nominal
P < 0.05 threshold (). All seven SNPs surpass the conventional genome-wide significant threshold of
P ≤ 5 × 10
−8 after combined analysis of the GWA and replication cases and controls (). A conservative adjustment of the rs4141819 total
P values (
Pall = 8.5 × 10
−8;
PstageB = 4.1 × 10
−8) for performing two independent GWA studies (all and stage B endometriosis cases versus controls) would produce
P > 5 × 10
−8 (
Pall-adjusted = 1.7 × 10
−7;
PstageB-adjusted = 8.2 × 10
−8). However, the accurately imputed (Rsq > 0.95) SNP rs2861694 (
PstageB = 7.9 × 10
−9), in strong LD with rs4141819 (
r2 = 0.981,
D′ = 1.0; and
r2 = 0.867,
D′ = 1.0, in the 379 European and 286 Asian 1000 Genomes reference samples, respectively), would remain genome-wide significant (
PstageB-adjusted = 1.6 × 10
−8).
The Q-Q plots for the QIMRHCS, OX and BBJ GWA data (
Supplementary Fig. 17a-c) reflect our stringent quality control, while the GWA meta-analysis Q-Q plot (
Supplementary Fig. 17d), reveals a significant preponderance of small
P values <10
−3, suggesting many of these nominally significant SNPs likely represent true signals
12. To further examine the shared genetic risk across our European and Japanese populations we performed polygenic prediction analysis
13 to evaluate whether the aggregate effects of many variants of small effect in the BBJ GWA cohort, could predict affection status in the European GWA cohorts. The BBJ-derived risk scores significantly predicted affection status in the QIMRHCS (
R2 = 0.0064;
P = 6.9 × 10
−7), OX (
R2 = 0.0057;
P = 9.6 × 10
−6) and combined QIMRHCS+OX all endometriosis case-control sets (
R2 = 0.0054;
P = 8.8 × 10
−11). For the individual and combined QIMRHCS and OX case-control sets, the variance explained peaked in the SNP sets with BBJ GWA
P < 0.1, using all GWA meta-analysis SNPs () and after excluding all SNPs within ±2500 kb of the seven implicated SNPs listed in (). Analogously, performing the prediction in reverse, the QIMRHCS+OX-derived risk scores significantly predicted affection status in the BBJ case-control set (
R2 = 0.0106;
P = 3.3 × 10
−6) (
Supplementary Fig. 18 and
Supplementary Note).
A gene-based GWA analysis using VEGAS
14, which accounts for gene size and LD between SNPs, revealed 1,184 genes with a combined
P ≤ 0.05 and the top three ranked genes associated with endometriosis to be
WNT4 on 1p36.12 (
P = 5.0 × 10
−9),
VEZT on 12q22 (
P = 5.7 × 10
−7) and
GREB1 on 2p25.1 (
P = 2.5 × 10
−5) (
Supplementary Table 3). In addition to having genome-wide significant SNPs near these three genes, the
WNT4 and
VEZT genes easily surpass our conservative gene-based significant association threshold of
P ≤ 2.85 × 10
−6 (calculated as
P = 0.05 / 17,538 independent genes).
WNT4 encodes for wingless-type MMTV integration site family, member 4 and is important for the development of the female reproductive tract
15 and steroidogenesis
16.
VEZT encodes vezatin, an adherens junction transmembrane protein that is down regulated in gastric cancer
17.
GREB1 encodes growth regulation by estrogen in breast cancer 1, an early response gene in the estrogen regulation pathway involved in hormone dependent breast cancer cell growth
18. For the four remaining implicated regions on 2p14, 6p22.3, 7p15.2 and 9p21.3, no genes were significant (
P ≤ 1.3 × 10
−3) after adjusting VEGAS results for testing 37 genes across all seven regions, see ,
Supplementary Figs. 3-
9 and
Supplementary Table 4.
In conclusion, given their high gene-based ranking, proximity to genome-wide significant SNPs, known pathophysiology and reported gene expression (
Supplementary Note and
Supplementary Fig. 19), the
WNT4,
VEZT and
GREB1 genes are strong targets for further studies aimed at understanding the molecular pathogenesis of endometriosis. Our results also suggest that a considerable number of SNPs nominally implicated (e.g.
P < 0.1) in the European and Japanese GWA cohorts represent true endometriosis risk loci. Moreover, the significant overlap in common polygenic risk for endometriosis indicates genetic risk prediction and future targeted disease therapy may be transferred across these populations.