|Home | About | Journals | Submit | Contact Us | Français|
To identify novel loci for age at natural menopause, we performed a meta-analysis of 22 genome-wide association studies in 38,968 women of European descent, with replication in up to 14,435 women. In addition to four known loci, we identified 13 new age at natural menopause loci (P < 5 × 10−8). The new loci included genes implicated in DNA repair (EXO1, HELQ, UIMC1, FAM175A, FANCI, TLK1, POLG, PRIM1) and immune function (IL11, NLRP11, BAT2). Gene-set enrichment pathway analyses using the full GWAS dataset identified exodeoxyribonuclease, NFκB signalling and mitochondrial dysfunction as biological processes related to timing of menopause.
Menopause is the cessation of reproductive function of the human ovaries. This life stage is associated with one of the major hormonal changes of women characterized by a decline in secretion of estrogen, progesterone and, to a lesser degree, testosterone. It influences a woman’s well-being and is associated with several major age-related diseases including cardiovascular disease, breast cancer, osteoarthritis, and osteoporosis1. Ovarian aging is reflected by the continuous decline of the primordial follicle pool, which is established during fetal life, subsequently leading to endocrine changes due to loss of the negative feedback from ovarian hormones on the hypothalamic-pituitary axis. In addition to follicle loss, oocyte quality diminishes with increasing age, which is believed to be due to increased meiotic nondisjunction2. Oocyte quality may be controlled at the time germ cells are formed during fetal life, but may also reflect accumulated damage during reproductive life, and/or age-related changes in granulosa cell-oocyte communication3. Although both oocyte quantity and quality decline with increasing age, it is not clear whether they are controlled by the same mechanisms and whether they decline in parallel.
The average age at natural menopause in women of Northern European descent is 50–51 years (range 40–60 years)4. Heritability estimates from twin and family studies for age at natural menopause range from 44 to 65%5–8. Thus far most genetic association studies regarding age at menopause have focussed on candidate genes9 from the estrogen pathway10,11, or vascular components12,13. Recently two genome-wide association studies (GWAS) identified five novel loci associated with age at natural menopause on chromosomes 5, 6, 13, 19 and 2014,15. These loci, however, explained <1.5% of the phenotypic variation of age at natural menopause, suggesting additional loci of small effect are likely to be discovered in larger samples. Therefore, we conducted a two-stage GWAS of women of European ancestry, combining the women from the two previous GWAS studies14,15 with new participants for a total of 38,968 women from 22 studies in the discovery stage, and 14,435 women from 21 studies in the replication stage.
In our discovery stage of 38,968 women with natural menopause between the ages of 40 and 60 (Supplementary Table 1, Supplementary Table 2), we identified 20 regions with SNPs meeting the genome-wide significance criterion p<5 × 10−8 (Figure 1a). Four of these loci confirmed prior reports of associations on chromosomes 5, 6, 19, and 2014,15 (regions 5b, 6a, 19a, and 20, respectively, in Table 1) and 16 loci were novel. We failed to confirm one previously reported association on chromosome 13 (13q34, rs7333181, p=0.12). The overall genomic inflation factor was 1.03 (Figure 1b). Table 1 displays the SNP with the lowest p-value from each region. There was no between-study effect heterogeneity across discovery studies (p>0.05/20) for the 20 SNP associations presented. Within FHS, we tested for differences in effect size for the 20 SNPs in retrospectively and prospectively collected menopause age, and also found no significant differences (data not shown). The effect sizes ranged from 0.17 years (8.7 weeks) to nearly one year (50.5 weeks) per each copy of the minor allele. We computed the effect sizes for dichotomized age at natural menopause in the WGHS women, dichotomizing at age 45 (N=745) versus later for early menopause and at 54 (N=1632) versus earlier for late menopause. The estimated odds ratios for early menopause for the menopause decreasing allele ranged from 1.01 to 2.03. The estimated odds ratios for late menopause for the menopause decreasing allele ranged from 0.52 to 0.96 (Supplementary Table 3). The top SNPs in regions 2c, 5a, and 19b were more than 400kb but less than 1Mb from the top SNP in another region on the same chromosome. The top SNP in each of these primary regions had low linkage disequilibrium (r2<0.5) with the top SNP in the nearby region. To determine whether these associations were independent, we performed a conditional association analysis in the discovery study samples with the most significant SNP from each of the primary 17 regions included as covariates in the analysis. For regions 5a and 19b (rs890835, rs12461110), the effect estimates in the conditional analysis were unchanged compared to the discovery analysis (differences of 0.3% and 4%, respectively), and the p-values remained genome-wide significant. However, for region 2c the effect size was decreased by ~12.5% in the conditional analysis compared to the initial analysis, and the SNP p-value was no longer genome-wide significant (p=9.8×10−7) (Table 1), suggesting that the association with rs7606918 is not independent of the rs1018348 region 2b association. We attempted replication only for the 19 SNPs that represented independent regions that reached genome-wide significance (p<5 × 10−8) thus replication of rs7606918 was not pursued.
Twenty-one studies contributed 14,435 women for replication of the 19 SNPs defining the independent genome-wide significant regions from stage 1. Age at natural menopause was defined using the same criteria as in the discovery studies (Supplementary Table 1). Seventeen of these studies (N=6,639) were included in in silico replication (Supplementary Table 2); an additional 4 studies (N=7,796) contributed de novo genotypes for the 19 SNPs (Supplementary Table 2). Table 1 displays the effect sizes and p-values for replication and a combined meta-analysis of the discovery and replication samples. There was no evidence for effect heterogeneity among the replication studies (Table 1). Further, we tested for heterogeneity between the in silico and de novo genotyped studies, and found no evidence for heterogeneity of effect (data not shown), suggesting that for the significant SNPs, the genotype imputation methods did not result in significantly different effect size estimates than would have been obtained from direct genotyping. Seventeen of the 19 SNPs remained genome-wide significant and had lower p-values in combined meta-analysis of the discovery and replication samples. Regions 5a and 13a showed no evidence of association in the replication samples (p>0.50) and were not genome-wide significant in combined discovery and replication meta-analysis. Four of the 17 replicated regions were reported previously; thus our analysis identified 13 novel regions for age at natural menopause based on genome-wide significant discovery with replication. In the combined discovery and replication meta-analyses the effect estimates ranged from 8.2 to 49.3 weeks per minor allele. The estimated proportion of variance explained by the 17 replicated SNPs in the four replication studies with de novo genotyped SNPs ranged from 2.5% (Osteos) to 3.7% (EPOS and BWHHS) to 4.1% (PROSPECT-EPIC).
We used the largest study contributing data to our discovery GWAS (WGHS, N=11379) to explore whether substantial SNPxSNP interactions are present among the 17 replicated SNPs. We tested all 136 pairs of SNPs and found no evidence for interaction (all P>0.01).
All but two of the replicated SNPs are intronic or exonic to known genes (Table 2). The top SNPs in regions 6b, 12, 19b, and 20 are missense polymorphisms. Three of the four were predicted to have damaging protein function by SIFT16, and one by PolyPhen217. Using dbSNP and LocusZoom18, we identified the genes underlying the novel top regions. We used SCAN (see URLs) to identify all genes with SNPs that are in linkage disequilibrium (LD) (r2>0.5) with our SNPs (Table 2). We identified all SNPs with r2 ≥ 0.8 with our top SNPs and used several databases to determine if the SNPs were known to be associated with expression (Table 2).
The strongest novel signal was on chromosome 4 (region 4, rs4693089; P=2.4×10−19). The SNP is located in an intron of HELQ, the gene that encodes the protein HEL308, which is a DNA-dependent ATPase and DNA helicase 19. The second strongest novel signal was on chromosome 12 (region 12, rs2277339; P=2.5×10−19). This SNP is a non-synonymous variant in exon 1 of PRIM1. The top SNP was significantly associated with expression of PRIM1 in visual cortex, cerebellum, and pre-frontal cortex (Table 2).
Several other novel signals are located in introns of genes for which mouse models exist. These were region 8 in ASH2L (rs2517388; P=9.3×10−15), region 13b in POLG (rs2307449; P=3.6×10−13) and region 1b in EXO1 (rs1635501; P=8.5×10−10). ASH2L codes for a trithorax group protein, and is involved in X chromosome inactivation in women20. POLG encodes the catalytic subunit of mitochondrial DNA polymerase, the enzyme responsible for replication and repair21 of mitochondrial DNA. EXO1 is a member of the RAD2 nuclease family of proteins involved in DNA replication, repair and recombination, and the top hit is in LD (r2=0.83) with a functional polymorphism in EXO1 that affects a transcription factor binding site in the promoter. Region 11 (rs12294104; P=1.5×10−11) is near and in LD (r2=0.92) with SNPs in FSHB. Transcription of FSHB limits the rate of production of the heterodimeric follicle stimulating hormone (FSH), a key pituitary expressed hormone that stimulates maturation of follicles. Region 19a (rs11668344; P=1.5×10−59) is in tight LD with SNPs in IL11: this cytokine stimulates the T-cell-dependent development of immunoglobulin-producing B cells.
The top SNPs in two other novel regions are non-synonymous coding variants. Region 6b, rs1046089 (P=1.6×10−16) is in exon 22 of BAT2 and was associated with expression of several transcripts in the HLA region in several tissues (Table 2). Region 19b, rs12461110 (P=8.7 × 10−10) is in exon 5 of NLRP11. BAT2 encodes the gene for HLA-B associated transcript 2 and has several microsatellite repeats. NLRP11 encodes for the NLR family, pyrin domain containing 11 protein, which is implicated in the activation of proinflammatory caspases22.
Of the remaining five novel regions, the top SNPs for regions 1a, 2a, 2b, and 13b are located in introns. These were rs4246511 in RHBDL2 (0.24 years/minor allele, P=9.1×10−17), which is thought to function as intramembrane serine proteases, rs2303369 in FNDC4 coding for fibronectin type III domain containing 4 (P=2.3×10−12), rs10183486 in TLK1 (P=2.2×10−14) a nuclear serine/threonine kinases that is potentially involved in the regulation of chromatin assembly, and rs4886238 in TDRD3 (P=9.5×10−11). TDRD3 is a known binding partner for FMR1, which has been associated with primary ovarian insufficiency. The top SNP in the final novel region, 16, is within 60kb of three genes: TNFRSF17, GSPT1, and RUNDC2A. It is in LD (r2>0.5) with SNPs in these three genes as well as four others (rs10852344; P=1.0×10−11) (Table 2).
We used three independent pathway-based methods to identify connections among our single marker associations, and link them with broader biological processes. While all three approaches (Ingenuity Pathway Analysis(IPA, see URLs), MAGENTA23 and GRAIL24) are based on published data, linking the gene products of our top hits to each other in functional pathways, each uses substantially different methodology and take different aspects of our results as input. Thus, we expect complementary results from the three approaches.
We used IPA(See URLs), to identify potential biological pathways common to the 17 replicated SNPs. Based on the genes physically nearest the 17 loci, four major functional networks were identified based on direct interactions only (Supplementary Table 4). Network 1, related to “lipid metabolism, molecular transport, and small molecule biochemistry”, contained 14 of the genes nearest to the menopause loci (P=1×10−30). Central to this network is the HNF4A gene, which is known to play a role in diabetes. Network 2, containing 12 of the input genes, relates to “cell cycle, cell death, and cancer” (P=1×10−24). The ESR1 gene is central in this network, suggesting that genes in this network influence or are influenced by estrogen signaling. Network 3, also in part related to “cell death”, including TNF and NFκB (P=1×10−19). Network 4 relates to “infection mechanism, DNA replication, recombination, and repair, and gene expression” (P=1×10−12). Interestingly, several of the input genes included in network 1 (EXO1, HELQ) and network 2 (UIMC1, FANCI, TLK1) are also involved in DNA repair mechanisms.
We used a gene set enrichment analysis (GSEA) implemented in MAGENTA23 to explore pathway-based associations using the full GWAS results. Three pathways reached study-wide significance (FDR < 0.05): exodeoxyribonuclease (P = 0.0005), NFκB signalling (P = 0.0006) and mitochondrial dysfunction (P = 0.0001) (Supplementary Table 5).
Finally, we used the GRAIL method of literature based pathway analysis24 to explore the connections between the genes near our top SNPs. Genes are considered related if they share informative words. GRAIL scores for genes associated with 3 of the replicating genome-wide significant SNPs were significant: EXO1, FKBPL, and BRSK1. When applied to a deeper set of 66 SNPs from the discovery meta-analysis with significance meeting FDR<0.05, 12 genes had significant GRAIL scores: EXO1, MSH6, PARL, RHBDL2, FKBPL, TP53BP1, TLK1, RAD54L, CHEK2, H2AFX, APEX1, REV3L. In addition, BRSK1 was borderline significant with GRAIL FDR=0.06.
Within the discovery GWAS 18,327 SNPs were within 60kb of the start and end of transcription of 125 candidate genes selected because of a reported relationship with ovarian function (Supplementary Table 6). After multiple testing correction, 101 SNPs in or near five of the candidate genes (DMC1, EIF2B4, FSHB, POLG, RFPL4A) were significantly associated with age at natural menopause after multiple testing correction. SNPs in or near four of these genes were already identified as genome-wide significant (EIF2B4, region 2a; RFPL4A, region 19b; POLG, region 15; and FSHB, region 11). For the other gene, DMC1, the most significant SNP was rs763121, with nominal P=1.6×10−7, (P=0.0009 corrected for candidate gene SNP analyses); age at natural menopause decreases by ~0.18 years per copy of the minor allele. DMC1 encodes for a protein that is essential for meiotic homologous recombination and is regulated by NOBOX, mutations in which can cause Primary Ovarian Insufficiency (POI)25–27
We examined overlap of our significant regions against published GWAS results for other traits (GWAS catalog; see URLs). Twelve menopause loci were within 1Mb of a previously published genome-wide significant SNP, but most of the co-localised SNPs were in low LD (0 < r2<0.21) with our SNP in the region (Supplementary Table 7). The exception was at the GCKR locus on chromosome 2. Region 2a (rs2303369) was correlated (r2 ≈0.5) with four different SNPs reported to influence kidney function, type 2 diabetes, continuous glycaemic traits, as well as serum albumin, C reactive protein, serum urate, and triglycerides. These results increase the observed clustering of signals in complex trait genetics, whilst also adding to the increasing pleiotropy observed at the GCKR locus.
In this large 2-stage GWAS we have confirmed four previously established menopause loci and identified and replicated 13 novel loci associated with age at natural menopause. Of these 17 hits, all but two are intronic or exonic to known genes. On average for associated SNPs in GWAS studies, 40% are intergenic, while only2% of our hits are intergenic. Further, our study finds two times more non-synonymous top hits as typically seen in GWAS (24% vs 12%)28. The 17 replicated loci function in diverse pathways including hormonal regulation, immune function and DNA repair. Together, they explained 2.5–4.1% of the population variation in menopausal age in independent replication samples. Biological pathway analysis of the genetic associations with age at natural menopause in this study using distinct algorithms and databases were in close agreement in emphasizing general biological pathways for mitochondrial function, DNA repair, cell cycle and cell death, and immune response.
Aging is thought to result from the accumulation of somatic damage29. Analysis of gene expression patterns in aging organs, such as heart and brain, identified changes in genes involved in inflammatory response, oxidative stress, and genome stability30, processes also identified in analysis of age-related changes in mouse oocytes, including changes in mitochondrial function31. Comparisons of lifespans across species showed that there is a general relationship between longevity and DNA repair function32. This notion is reinforced in the Werner and Bloom syndromes, which involve genome instability due to mutations in 3′–5′ DNA helicases of the RecQ family members, and are characterized by both premature aging and premature menopause33. Similarly, an increase in meiotic errors is associated with an age-related decline in oocyte quality, compounding the progress toward menopause due to follicle depletion34.
In the biological pathways analysis, seven candidate genes identified by proximity to the 17 genome-wide significant associations with age at natural menopause are related to DNA damage repair and replication (EXO1, HELQ, UIMC1, FAM175A, FANCI, TLK1, POLG, PRIM1) (Supplementary Table 4). The protein encoded by UIMC1 physically interacts with BRCA1 and estrogen receptor α (ERα) and is thought to recruit BRCA1 to DNA damage sites and to initiate G2/M checkpoint control. PRIM1 (Primase) is involved in DNA replication by synthesising RNA primers for Okazaki fragments during discontinuous replication35. A mutation in POLG can segregate with primary ovarian insufficiency (POI)36. Polg knock-in mice show reduced lifespan, premature ageing, and reduced fertility37. FANCI, as a second gene at the same locus adjacent to POLG, is a member of the Fanconi anemia (FA) complementation group. FA is a recessive disorder characterized by cytogenetic instability and defective DNA repair. FA patients experience irregular menstruations with menopause occurring around the age of 3038. The functional polymorphism correlated to our top hit in EXO1, has been associated with longevity in female centenarians39. Male and female Exo1 knock-out mice are sterile, because the gene is essential for male and female meiosis40. In addition to the GWAS regions in/near genes previously associated with early menopause, we investigated a panel of candidate genes identified prior to the study, and found a SNP near the meiotic recombination gene DMC1 to be significantly associated with age at menopause. How the DNA repair pathways contribute to menopause remains unclear. It is possible that with altered DNA repair mechanisms damage accumulates rendering poor quality oocytes for selection. On the other hand, the number of damaged follicles may increase with ageing, resulting in an increased rate of follicle loss through atresia. Next to this, our top hit in this study, one of the four known hits, is a non-synonymous SNP in MCM8, was not included in the IPA results, probably because the exact function of this protein is still unknown. The MCM family, however, is a key component of the pre-replication complex and its main function is to restrict DNA replication to one round per cell cycle41.
The pathway analyses highlighted additional candidate genes with functions in DNA repair (exodeoxyribonuclease), but with sub-genome-wide levels of significance for association with age of natural menopause. These 12 candidates (Supplementary Table 5) included the Werner (WRN) helicase gene, mutations in which cause Werner syndrome, a classic progeria with advanced ageing phenotype, and ovarian aging42. Estrogen can enhance WRN expression preventing cell senescence, suggesting its involvement in menopause43. The identification of DNA repair as one of the biological pathways involved in menopause may also provide an explanation for the association between smoking and an earlier age at menopause. Damage caused by smoking activates several different DNA repair mechanisms. Indeed, a polymorphism in Exo1, one of our top loci, has been associated with colorectal adenomas in smokers only44. Future studies will reveal whether smoking status modifies the association between age at natural menopause and polymorphisms in DNA repair genes, as has been observed for various cancers.
The pathways-based analysis also emphasized that genes related to auto-immune disease also influence age at natural menopause. This link has not been reported before, however, in a proportion (2–10%) of women with POI, ovarian auto-immunity can play a role45. In addition, POI is frequently associated with additional autoimmune diseases, such as type 1 diabetes mellitus 46. The top SNP in region 19a is near IL11, which binds the IL11 receptor alpha chain (IL-11Rα). Female mice with null mutations in IL-11Rα are infertile due to defective uterine decidualization, the process necessary for successful embryo implantation47. NLRP11 (region 19b) is a member of the NLRP family of genes that play important roles in the innate immune system and reproductive system. Several NLRP genes show an oocyte specific expression pattern 46, while NLRP5 has been implicated in POI, and serves as an autoantigen in a mouse model of autoimmune POI48,49. Many autoimmune conditions are associated with a particular HLA type, but no such association has been reported for POI50,51. One of our top menopause associations (rs1046089) is a missense substitution in BAT2 (HLA-B associated transcript), which is in the HLA class III complex on chromosome 6 and has been associated with type I diabetes mellitus and rheumatoid arthritis. Multiple phenotypes have been associated with BAT2 SNPs in GWAS studies including BMI, neonatal lupus, HIV control and height (Supplementary Table 7), but the SNPs have low correlation with our top hit. Expression data for rs1046089 demonstrated that the polymorphism was associated with altered expression of HLA-DRB4 in monocytes and HLA-DQA1 in lymphoblastoid cell lines (Table 2). Thus, this gene is an excellent candidate for a pro-inflammatory component to oocyte depletion affecting menopause age. Indeed, the enrichment of the genes involved in NFκB signalling (TNF, TNFRSF17, CSNK2B) in the biological pathways analysis suggests that susceptibility to inflammation, which often accompanies immunosenescence in aging, may also affect ovarian aging. The finding that the innate immune response can be upregulated in response to DNA damage52 suggests that interplay between the two main pathways we have identified (DNA repair and inflammation), may contribute to variation in the age at natural menopause.
Three of the 17 regions can be linked to hormonal regulation, an additional route to follicle pool exhaustion. The top region 11 SNP (rs12294104) is in high LD with SNPs in FSHB (r2=0.92, Table 2), a gene which limits the rate of production of follicle stimulating hormone (FSH), a key pituitary expressed hormone that stimulates maturation of follicles. FSH-deficient female mice are infertile53. Transgenic mice with FSH over-expression show premature infertility due to postimplantation reduction of embryo-fetal survival54. FSH concentrations rise in women approaching menopause, which might be related to a decrease in growing follicles55. Mutations in FSHB cause hypogonadism and primary amenorrhea in women56 and raised FSH levels and infertility in males57. The latter observation is due to a promoter polymorphism that may be causal58 and is in LD (r2 = 0.7) with our most significant SNP. Although STAR, encoding steroidogenic acute regulatory protein (StAR) was not the nearest gene to the top SNP in region 8 (rs2517388), its functional role in cleavage of cholesterol to pregnenolone in response to tropic hormones makes it the likely functional candidate, and our top SNP is in high LD with SNPs in that gene (r2=0.81, Table 2). Pregnenolone is a precursor for several steroid hormones, such as estrogen and progesterone, and mutations in the StAR gene are associated with congenital lipoid adrenal hyperplasia and POI59. Furthermore STAR is a target of FOXL2, for which truncating mutations are preferentially associated with POI60. Similarly, known biological function suggests that BCAR4, encoding the breast cancer antiestrogen resistance 4 protein, is the best candidate gene near region 16. BCAR4 is expressed only in placenta and oocytes and may play a role in hormonal stimulation in the ovary. In breast cancer treatment, tumours expressing higher levels of BCAR4 are more resistant to tamoxifen treatment61, reinforcing its role in transduction of hormonal signals.
In summary, our findings demonstrate the role of genes which regulate DNA repair and immune function, as well as genes affecting neuroendocrine pathways of ovarian function, indicating the process of ageing as a shared player in both somatic and germ line ageing.
We expect a substantial number of additional common variants with small effects on ANM are yet to be identified, and that many of them will be in genes that are in the pathways identified in this study. Sequencing and exome chip studies to determine whether low frequency and rare variants of large effect also contribute to ANM are underway or being planned in many of the cohorts involved in this GWAS. A collaboration of several consortia is currently examining the contribution of common genetic variants to ANM in African American (AA) women, and will allow us to determine whether the genetic variation that affects ANM in AA women are the same or substantially different than for women of primarily European descent. We are currently conducting a study of women with primary ovarian insufficiency to determine whether the variants that are associated with ANM within the normal range also contribute to disease conditions related tothe early menopause phenotype.
Age at natural menopause was defined as the age at the last menstrual period which occured naturally with at least 12 consecutive months of amenorrhea. This analysis included women with natural menopause between the ages of 40 and 60 years. Women of self-reported non- European ancestry were excluded, as were women with menopause due to hysterectomy and/or bilateral ovariectomy, or chemotherapy/irradiation, if validated by medical records, and women using HRT before menopause. Most cohorts collected the age at natural menopause retrospectively; in the Framingham Offspring, ARIC, NHS and WGHS studies, some women became menopausal under study observation. Study specific questions, mean age at menopause and age at interview are shown in Supplementary Table 1. Genotyping and imputation information for the discovery cohorts are shown in Supplementary Table 2. Descriptions of each study are in the Supplementary Note.
A total of 14,435 women from 21 studies meeting the same inclusion and exclusion criteria as the discovery analysis women were included in the replication analysis. The women had mean and standard deviation of age at natural menopause similar to the discovery set (Supplementary Table 1). Genotyping and imputation methods for the in silico replication cohorts are shown in Supplementary Table 2. Genotyping information for the studies that genotyped the SNPs de novo is shown in Supplementary Table 2. Descriptions of each study are in the Supplementary Note.
The 19 independent genome-wide significant SNPs were tested for association with age at natural menopause using linear regression models.
Inverse variance weighted meta-analysis of the studies was performed using METAL using genomic control62. A SNP within a study was omitted if the minor allele frequency (MAF) was < 1% or imputation quality score was < 0.2. The discovery meta-analysis included 2,551,160 autosomal SNPs and 38,968 samples.
For each of the genome-wide significant menopause SNPs (Table 1), all proxy SNPs with r2>0.8 were determined in HapMap CEU release 22. Each SNP and its proxies were searched against a collected database of expression quantitative trait locus (eQTL) results, including the following tissues: fresh lymphocytes63, fresh leukocytes64, leukocyte samples in individuals with Celiac disease65, lymphoblastoid cell lines (LCL) derived from asthmatic children66, HapMap LCL from three populations67, a separate study on HapMap CEU LCL68, fibroblasts, T cells and LCL derived from cord blood69, two studies on peripheral blood monocytes70,71, CD4+ lymphocytes72, adipose and blood samples73, two studies on brain cortex70,74, three large studies of brain regions including prefrontal cortex, visual cortex and cerebellum (Emilsson, personal communication), a study of cerebellum, frontal cortex, temporal cortex and caudal pons75, a separate study on prefrontal cortex76, liver77, and osteoblasts78. The collected eQTL results met criteria for statistical significance for association with gene transcript levels as described in the original papers. Table 2 provides a summary of eQTL findings for replicated GWAS SNPs.
On each chromosome, the lowest p-value SNPs that met genome-wide significance were identified. Genome-wide significant SNPs more than 250,000 base pairs and less than 1 million base pairs apart that also had pairwise HapMap CEU linkage disequilibrium values of r2<0.5 were considered potentially independent regions. Potential independent regions that were within 1 million base pairs of a second region with more significant p-value were tested for independence using conditional analysis. In this analysis, the most significant SNP in the most significant region on each chromosome was used as a covariate in a genome-wide analysis. The second region on the chromosome was then re-tested for independent association.
Ingenuity pathway analysis (IPA) Knowledge Base 8.8 (see URLs) was used to explore the functional relationship between proteins encoded by the 17 replicated menopause loci. The IPA Knowledge Base contains millions of findings curated from the literature. All reference genes (n=61) within 60kb potentially encoded by the 17 loci (Table 2) were entered into the Ingenuity database. Fifty-one genes were eligible for pathway analysis. These eligible ‘focus genes’ were analyzed for direct interactions only. Networks were generated with a maximum size of 35 genes and shown as graphical representations of the molecular relationships between genes or gene products. Proteins are depicted as nodes in various shapes representing the functional class of the protein. Lines depict the biological relationships between nodes. To determine the probability of the analyzed gene to be found together in a network from Ingenuity Pathways Knowledge Base due to random chance alone, IPA applies a Fisher’s exact test. The network score or p-value represents the significance of the focus gene enrichment. Enrichment of focus genes to diseases and functional categories was also evaluated in the IPA Knowledge Base. The p-value, based on a right-tailed Fisher’s exact test, considers the number of identified focus genes and the total number of molecules known to be associated with these categories in the IPA knowledge database.
MAGENTA was used to explore pathway-based associations in the full GWAS dataset. MAGENTA implements a GSEA-based approach, the methodology of which has been previously described23. Briefly, each gene in the genome is mapped to a single index SNP with the lowest P value within a 110 kb upstream, 40 kb downstream window. This P value, representing a gene score, is then corrected for confounding factors such as gene size, SNP density and LD-related properties in a regression model. Genes within the HLA region were excluded from analysis due to difficulties in accounting for gene density and LD patterns. Each mapped gene in the genome is then ranked by its adjusted gene score. At a given significance threshold (95th and 75th percentiles of all gene scores), the observed number of gene scores in a given pathway, with a ranked score above the specified threshold percentile, is calculated. This observed statistic is then compared to 1,000,000 randomly permuted pathways of identical size. This generates an empirical GSEA P value for each pathway. Significance was determined when an individual pathway reached a false discovery rate < 0.05 in either analysis (Supplementary Table 5). In total, 2,580 pathways from Gene Ontology, PANTHER, KEGG and Ingenuity were tested for enrichment of multiple modest associations with age at natural menopause.
GRAIL is designed to provide evidence for related biological function among a set of candidate genes. The method is based on connections between gene names and informative words extracted from PubMed abstracts by automated language processing techniques. Genes are considered related, and achieve a high similarity score, if they share informative words. For this analysis, the input for GRAIL was a list of candidate SNPs associated with age at natural menopause. From among candidate genes mapping near the candidate SNPs, GRAIL identifies genes with associated informative words that are significantly similar to informative words from other candidate genes. Genes with significant similarity scores are thus consistent with the set of candidate genes as a whole in having greater sharing of informative words than expected by chance, which may suggest shared biological functions or even biological pathways. GRAIL was first applied to the lead SNPs from each of the replicating genome-wide significant loci using the database of genes and informative words from 2006. Separately, GRAIL was applied to a list of 66 SNPs, one from each locus that had at least one SNP meeting a false discovery rate (FDR) threshold of 0.05 from the QVALUE software in R79. For the age at natural menopause meta-analysis, the FDR < 0.05 threshold implied a p-value < 2.8×10−5.
We explored the association of natural age of menopause with 125 candidate genes selected because of a reported relationship with ovarian function, including animal models where gene mutations affect ovarian function (N=37), human studies of menopause or isolated primary ovarian insufficiency (N=48), syndromes which include ovarian failure (N=4), or genes expressed in the ovary or female germ cells (N=38) (see Supplementary Table 6). For each gene, the start and end of transcription was defined by the transcripts that span the largest portion of the genome. NCBI36/hg18 positions taken from the UCSC genome browser were used to define gene and SNP locations. Using the correlation measured from a set of ~850 independent Framingham Heart Study participants, we computed the effective number of independent SNPs for each chromosome 80, and used the total (5,774) in a Bonferroni correction for multiple testing.
Allelic pleiotropy was explored by comparing the genome-wide significant menopause signals to the online catalogue of published genome-wide association studies (GWAS catalog; see URLs). All reported associations that reached P<5×10−8 and were within 1Mb of the menopause signal were considered. LD estimates between the SNP pairs were assessed using HapMap (CEU, release#27). The results are presented in Supplementary Table 7.
The authors are very grateful to the study participants and staff from all cohorts involved in this study. Extended acknowledgements per cohort can be found in the Supplement.
GWAS catalog: www.genome.gov/gwastudies/,
Author ContributionsIndividual Study design/management: DIC, CHe, EMB, PK, JSB, MBo, EB, HC, SJC, UdF, IJD, GVZD, SE, JGE, LuF, ARF, PG, CJMWvG, CG, DEG, PH, SEH, AH, EI, SLRK, DAL, PKEM, MMar, NGM, VM, NCOM, GPa, ANP, NLP, PHMP, OP, BMP, KR, IR, JMS, RS, VJMP, VG, GE, TBH, LJL, YVS, AT, YTvdS, PMV, GWa, HEW, JFW, BHRW, LL, DIB, JEB, LC, FBH, DJH, GWM, BO, PMR, DS, TDS, KSte, EAS, MU, CMvD, HV, SB, ASa, AMe. Data collection: BZA, SB, LBL, JSB, MBo, AB, HC, PdA, UdF, IJD, GVZD, SE, JGE, LiF, LuF, ARF, DEG, PH, SEH, ACH, AH, ACJWJ, IK, SL, DAL, PKEM, NGM, IMK, ABN, NLP, PHMP, OP, BMP, KR, IR, ASc, SNS, JMS, MGS, US, BT, LT, SU, VJMP, VG, GE, YTvdS, CHvG, JMV, GWa, HEW, AFW, TZ, LZ, MCZ, MZ, LL, AMA, JEB, EAS, AGU, JMM, AMe. Genotyping: JK, LS, TE, EB, FJB, SJC, PdA, GD, PD, CHa, ACH, JLi, BCJMF, SEM, VM, PN, DRN, NCOM, AP, ANP, BMP, JIR, ASi, KSti, AT, KT, YTvdS, MV, EW, TZ, LL, FBH, GWM, BO, UT. Genotype prep: JRBP, MBa, NF, EP, SYS, WVZ, LBL, PdA, GD, CJMWvG, CG, CHa, EI, SLRK, PKEM, IMK, CM, SEM, PN, DRN, NCOM, AP, GPi, ER, CS, JAS, HS, NS, YVS, AT, MV, EW, TZ, LL, FBH, DJH, SS, KLL, MH. Phenotype preparation: JRBP, LS, CHe, MMan, MBa, LB, EMB, FE, NF, DFG, JJH, PK, GZ, WVZ, LBL, MBo, EJCdG, IJD, GVZD, MGS SE, JGE, CHvG, LiF, KF, MH, CHa, EI, ACJWJ, SLRK, IK, JLa, SL, TL, DAL, LML, PKEM, IMK, CM, PHMP, GPi, OP, ER, CS, ASc, JAS, BT, SU, RMvD, VG, GE, TA, YTvdS, HW, GWi, BHRW, AFW, MCZ, LL, AMA, EWD, AMe, KLL, JMM. Analysis plan development: LS, DIC, CHe, EMB, PK, TC, PG, DK, DPK, NGM, DT, DJH, GWM, KLL, JMM, AMu. Analysis plan review: LS, DIC, CHe, AVS, ADC, NG, DK, DPK, BM, ABN, BMP, AMA, DJH, KLL, JMM, AMu. Study Data Analysis: LS, DIC, CHe, MMan, MBa, LB, EMB, FE, TE, NF, DFG, JJH, PK, PFM, EP, SYS, AVS, SvW, GZ, WVZ, EA, CC, MCC, TC, NG, TH, CHa, ZK, JLa, DAL, LML, DM, NCOM, GPa, GPi, ASP, VE, ER, JAS, MGS, SU, YTvdS, MV, LMY, LZ, SS, UT. Reviewed/Interpret analyses: JRBP, LS, DIC, CHe, PS, MBa, EMB, NF, PK, PFM, AVS, BZA, SJC, ADC, IJD, SEH, EI, DK, SLRK, DPK, LML, PKEM, NGM, GPa, ASP, JAS, HS, LMY, JEB, EWD, FBH, DJH, GWM, PMR, CMvD, HV, KLL, JMM, AMu. Meta analyses: LS, PS, KLL. Pathway/other analyses: JRBP, DIC, CHe, ADJ, MMan, GPa, JAV. Meno Study Design: JRBP, LS, DIC, CHe, DEG, PHMP, YTvdS, CHvG, AMe, KLL, JMM, AMu. Manuscript Preparation: JRBP, LS, DIC, CHe, TE, NF, ADC, DK, DPK, EWD, AMe, KLL, JMM, AMu, JAV. Manuscript review: JRBP, LS, DIC, CHe, NF, PK, PFM, AVS, BZA, TA, LBL, JSB, MBo, FJB, HC, SJC, CC, MCC, ADC, GD, UdF, IJD, GE, BCJMF, MEG, NG, DEG, PH, SEH, CHa, EI, DK, DPK, SLRK, JAS, IK, ZK, TL, TE, ASa, AMe, JSEL, JLi, LML, YVL, PKEM, IMK, BM, VM, PN, ABN, NCOM, GPa, ANP, NLP, PHMP, OP, BMP, JIR, IR, HS, JMS, RS, AT, RMvD, YTvdS, CHvG, PMV, MV, GWa, JFW, BHRW, AFW, LMY, TZ, LZ, MCZ, VJMP, LL, AMA, DIB, JEB, EWD, VG, TBH, FBH, DJH, LJL, PMR, TDS, EAS, HV, KLL, JMM, AMu, JAV. Oversaw Consortium: KLL, JMM, AMu, JAV.
Some of the authors declare competing financial interests: details accompany the full-text HTML version of the paper