|Home | About | Journals | Submit | Contact Us | Français|
To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P=5.4×10−60) and 9q31.2 (P=2.2×10−33), we identified 30 novel menarche loci (all P<5×10−8) and found suggestive evidence for a further 10 loci (P<1.9×10−6). New loci included four previously associated with BMI (in/near FTO, SEC16B, TRA2B and TMEM18), three in/near other genes implicated in energy homeostasis (BSX, CRTC1, and MCHR2), and three in/near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and MAGENTA pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.
Menarche, the onset of first menstruation in girls, indicates the attainment of reproductive capacity and is a widely used marker of pubertal timing. Age at menarche varies widely between girls and is highly dependent on nutritional status1. Early menarche is associated with several adverse health outcomes, including breast cancer2, endometrial cancer3, obesity4, type 2 diabetes5 and cardiovascular disease6, and also shorter adult stature4. Studies of twins and extended families, albeit largely performed in populations free of nutritional deprivation, estimate that around 50% of the variance in menarche timing is attributable to genetic factors in such settings7.
Recently, common variants in LIN28B were associated with age at menarche in four independent genome-wide association studies (GWAS)8–11. LIN28B is a human homolog of lin-28 in C. elegans, which controls its rate of progression from larval stages to adult cuticle formation, indicating the possible conservation of specific micro-RNA regulatory mechanisms involved in developmental timing9. A second menarche locus was identified in an intergenic region at 9q31.28,10. These two loci together explained only 0.6% of the variance in age at menarche8. We anticipated that a much larger GWAS would substantially increase the yield of loci associated with age at menarche.
Here we report a much expanded meta-analysis of GWAS for age at menarche. By combining data from the previous studies8–11 plus several further studies to form the ReproGen Consortium, we have identified at least 30 novel loci associated with age at menarche at genome-wide significance levels. Our findings demonstrate a close link between the genetic regulation of energy homeostasis and pubertal timing, and suggest the presence of other diverse pathways.
This expanded GWAS includes data from 32 cohorts of European ancestry (N=87,802). In most studies age at menarche was determined by self-recall and the mean age at menarche in individual studies ranged from 12.4 to 13.6 years, excluding individuals with menarche <9 and >17 years (Methods, Supplementary Table 1 and Supplementary Note). Genome-wide SNP genotyping was performed using a variety of different platforms (Supplementary Table 2 and Supplementary Note). Therefore, after applying standard quality control measures, we imputed the genotypes for ~2.5 million autosomal SNPs in the HapMap CEU sample using Build 35/36 to allow inverse variance meta-analysis of additive genetic association results from each study. We also meta-analysed results from X chromosome SNPs in studies which had this data available (N=52,781). Test statistics from each cohort were adjusted using genomic control to avoid inflation of results due to population stratification.
There was strong deviation from the uniform distribution of P-values expected under the null hypothesis (Supplementary Fig. 1). This deviation was attenuated, but persisted, following removal of those signals associated with the two previously identified loci. In total, 945 SNPs representing 45 loci (r2 <0.05 based on HapMap in a 750 kb region) were associated with age at menarche at genome-wide significance levels (P<5×10−8) (Fig. 1; Supplementary Fig. 2). None of these were located on the X chromosome. These 45 loci included three apparent second signals (defined as two genome-wide significant SNPs in low LD (r2<0.05) in the same 750 kb region) at 2q33.1, 6q21 and 14q32.2. The second signal at 6q21 (rs314279) had a low minor allele frequency (MAF, 6%) and was not present in many studies. We therefore genotyped this SNP de novo in the InChianti cohort and found it was in LD with the top chromosome 6 signal (rs7759938, r2=0.3). In HapMap the r2 between the two chr 6 SNPs was 0.015, but the D’ was 1.0. To verify the independency of additional loci, we performed a conditional analysis and meta-analysis of all 32 studies using the top SNP at all the 42 genome-wide significant regions as covariates (in addition to birth year). In these conditional analyses the possible second signals on chromosomes 2 and 14 showed strong but sub-genome-wide significant associations with age at menarche (P<7.1×10−6), suggestive of, but not confirming, second independent signals in these two regions (Fig. 1; Supplementary Table 3).
The two most significant loci for age at menarche confirmed the previously reported associations at LIN28B (rs7759938; P=1.6×10−58) and 9q31.2 (rs2090409; P=4.4×10−33) (Table 1; Supplementary Fig. 3). In addition, there were genome-wide significant signals for a further 40 possible novel loci, of which 30 survived a second more stringent correction for the overall genomic control in the Stage 1 cohorts (λ=1.173) (Table 1; Fig. 1; Supplementary Fig. 3).
We sought confirmation of the 40 possible novel menarche loci in up to 14,731 women from 16 additional studies with in silico GWAS data and new genotyping data from one cohort (Supplementary Tables 4 & 5). This replication sample was substantially smaller than our Stage 1 sample, and therefore underpowered to confirm individual SNP associations (Supplementary Fig. 4). Nonetheless 37 of the 40 possible novel loci showed directionally consistent associations in both stages (Table 1; binomial sign test: P=9.7×10−9). A combined meta-analysis of the more stringent 2nd GC corrected Stage 1 results and replication cohorts gave confirmatory evidence for 30 novel menarche loci, leaving 10 unconfirmed possible menarche loci (Table 1; Fig. 1).
Based on combined Stage 1 plus replication results, the estimated magnitudes of per allele effects for the novel menarche loci ranged from 4.5 to 2.1 weeks per allele (Table 1) and demonstrated an inverse relationship with MAF (Supplementary Fig. 5). Among the four largest in silico replication cohorts (each comprising >800 women) the variance in age at menarche explained by all 42 known, confirmed and possible novel menarche loci ranged from 3.6% to 6.1% (Supplementary Table 6).
The strongest novel menarche signal was for rs1079866 (3.9 weeks/minor allele; 95% CI 2.9 to 5.0, P=5.5×10−14), approximately 250kb downstream of INHBA, which encodes the protein subunit Inhibin beta A. Heterodimers of Inhibin beta A and the Inhibin alpha subunit form the female reproductive hormone Inhibin A12. Inhibin A, produced by granulosa cells in the ovary, increases dramatically during pubertal development in girls13,14 and is involved in negative feedback regulation by inhibiting production of follicle stimulating hormone (FSH) by the pituitary and secretion of gonadotrophin releasing hormone (GnRH) from the hypothalamus15. Conversely, homodimers of Inhibin beta A form the hormone Activin A, which stimulates pituitary FSH production, and also exhibits a wide range of biological activities including the regulation of cellular proliferation and differentiation16.
The second strongest novel signal was for rs466639 (P=1.3×10−13); this SNP is intronic in RXRG, which encodes retinoid X receptor gamma, a nuclear receptor that forms dimers with the receptors for retinoic acid, thyroid hormone, and vitamin D, increasing both DNA binding and transcriptional function on their respective response elements17.
Four novel loci for menarche were previously identified by GWA studies for adult BMI18–20; these were rs9939609 (in/near FTO, P=3.1×10−8), rs633715 (SEC16B, P=2.1×10−8), rs2002675 (TRA2B, ETV5, P=1.2×10−9) and rs2947411 (TMEM18, P=1.7×10−8). Apart from rs2002675, these menarche signals were either identical to or in tight LD (r2>0.9) with those BMI loci, and in all cases the BMI-increasing allele was associated with earlier menarche. Variants at these four loci have also been associated with childhood BMI18–20, and these findings support a likely causal effect of childhood BMI on earlier pubertal timing.
Three novel menarche loci were found in/near further genes implicated in the regulation of energy homeostasis and body weight in animal models: rs6589964 (P=1.9×10−12) lies ~18kb from BSX; rs10423674 (P=5.9×10−9) is intronic in CRTC1; and rs4840046 (P=2.4×10−8) lies ~160kb from MCHR2. BSX encodes a DNA binding protein and transcriptional activator. In the mouse Bsx is expressed specifically in the pineal gland, telencephalic septum, hypothalamic pre-mammillary body and arcuate nucleus, and is necessary for postnatal growth, locomotory behaviour, expression of the genes Npy and Agrp, and for the hyperphagic phenotype in leptin deficiency21. CRTC1 encodes the CREB regulated transcription coactivator 1, an activator of cellular gene expression. Crtc1(−/−) mice are hyperphagic, obese and infertile, and females have low circulating luteinizing hormone levels22. Leptin potentiates the effects of Crtc1 transcriptional activity, and Crtc1 over-expression in hypothalamic cells increases expression of Kisspeptin, which in turn activates secretion of gonadotrophin releasing hormone Kiss1 gene. MCHR2 encodes the melanin concentrating hormone receptor 2, an orphan G protein-coupled receptor which shows high affinity binding to the hypothalamic neuropeptide melanin-concentrating hormone (MCH), which regulates nutrient intake and energy homeostasis via MCHR123. Furthermore, MCH directly inhibits GnRH neurons and thereby links energy balance to reproduction24.
rs852069 (P=3.3×10−8) lies ~84kb from PCSK2, which encodes proprotein convertase subtilisin/kexin type 2, an enzyme that cleaves latent precursor proteins, such as proinsulin and proopiomelanocortin, into their biologically active products. While rare deleterious mutations and common variants in PCSK1 are known to influence obesity risk, it is notable that PCSK2 differs from PCSK1 in that it additionally cleaves proluteinizing-hormone-releasing hormone and could therefore have a more direct influence on the reproductive hormone axis.
Remaining novel menarche loci were found in or near to genes that are involved in a seemingly diverse range of biological functions (Supplementary Table 7). We used Ingenuity Pathway Analysis (IPA) to identify potential biological pathways common to these identified loci. Based on direct interactions only, we identified two significant functional networks containing 16 and 11 genes, respectively, of those genes nearest to the novel menarche loci (Supplementary Fig. 6). Network 1, related to “Gene Expression, Cellular Growth and Proliferation, Cellular Function and Maintenance”, covers a wide and non-specific range of biological pathways. Functions in Network 2 relate to “Lipid Metabolism, Small Molecule Biochemistry and Molecular Transport” (Supplementary Table 8). Central to Network 2 are RXRG and several genes involved in fatty acid biosynthesis, including several fatty acid binding proteins and ACSL1, which encodes a enzyme that converts free long-chain fatty acids into fatty acyl-CoA esters.
To identify potential further biological pathways that influence menarche timing, we used a gene set enrichment analysis (GSEA) approach in MAGENTA, in which each gene in the genome is assigned an adjusted score that represents its association with age at menarche, and predefined pathways are tested for enrichment of multiple associations (see Methods). The most significant pathway was the biosynthesis of Coenzyme A, which is a carrier of acyl groups and is necessary for pyruvate oxidation and fatty acid synthesis and oxidation (Supplementary Table 9).
We explored the potentiallyfunctional impacts of our novel menarche loci in order to identify their likely genetic mechanisms. In addition, by particularly focusing on those groups of SNPs that have been identified as functional, we aimed to identify possible further menarche loci, which did not reach genome-wide significance in our primary meta-analysis.
Using data from a recent genomic map of Copy Number Variation (CNV)25, we established that none of the 42 known, confirmed or possible novel menarche loci were related to CNVs. Next we explored the 1,052 CNV tagging SNPs for association with age at menarche in our GWAS sample. Only one tag SNP was associated with age at menarche after Bonferroni correction (rs3101336, P=3×10−7; Supplementary Fig. 7). This SNP tags a CNV near to the NEGR1 gene locus, which has been previously associated with body mass index20.
None of the 42 known, confirmed or possible novel menarche variants were amino acid changing. However two were in strong LD (r2≥0.8) with non-synonymous (ns) variants. rs1862471 (intronic in OLFM2 at 19p13.2) is in LD (r2=0.8) with rs2303100, which encodes an Arg/Gln residue change in OLFM2. Secondly, rs4929923 (3′UTR of TRIM66 at 11p15.4) is in LD (r2=0.92) with rs11042023, which encodes a His/Arg residue change in TRIM66.
To identify possible further menarche loci we then explored the set of 12,062 ns-SNPs for association with age at menarche in our GWAS sample. Outside of the already associated regions, three ns-SNPs were associated with age at menarche after correction for multiple testing (Bonferroni threshold for 12,062 independent tests was P<4.1×10−6). These ns-SNPs were rs1254319 in C14orf39 (P=1.9×10−7), rs7653652 in C3orf38 (P=1.4×10−6) and rs913588 in JMJD2C (P=3.3×10−6).
Three of the 42 known, confirmed or possible novel menarche variants were highly significantly cis-associated with mRNA expression (P<1×10−6 for mRNA transcript abundance), based on publicly available data from lymphoblastoid cell lines on 400 children (mRNA by SNP Browser). These transcripts were in GAB2 (associated with rs10899489), RBM6 (rs6762477) and NARG2 (rs3743266) (Supplementary Table 10). As these genomic loci included a number of genes (Supplementary Fig. 3), these specific transcript associations inform the likely functional gene at each locus.
Given the likely close biological interaction between the regulation of age at menarche and adiposity, we hypothesized that adipose tissue eSNPs might show a preponderance of associations with age at menarche. Of the 5,184 adipose eSNPs identified in the Icelandic Family Adipose cohort26, 23 were significantly associated with age at menarche after correction for multiple testing (using a 1/n P-value threshold for 5,184 independent tests (P<1.9 × 10−4)) (Supplementary Table 11). Of these adipose eSNPs, rs10835211 (menarche P-value=9.4×10−6) is near BDNF, which is a BMI locus and is implicated in eating behavior and body weight regulation27,28. rs7160413 (menarche P-value=2.2×10−5) is near DLK1, a gene implicated in early onset puberty29. rs133934508 (menarche P-value=3.6×10−5) is associated with expression of PITX1, which encodes a pituitary transcriptional regulator30.
Candidate gene studies for age at menarche have largely focussed on genes involved in sex steroid-hormone biosynthesis and metabolism, highlighted through animal models or human cases with extreme delayed puberty or hypogonadotrophic hypogonadism31. We examined 8,770 SNPs in 16 candidate genes31–33 and their surrounding regions (+/−300kb) for association with age at menarche in our GWA meta-analysis sample (Supplementary Table 12). SNPs in the regions of TAC3R (top hit: rs17034046; P=3.4×10−7, ~19kb upstream of TAC3R) and ESR1 (top hit: rs9383922; P=2.2×10−6, 110kb upstream of ESR1) were significantly associated with age at menarche after correction for multiple testing (Bonferroni threshold for 8,770 independent tests was P<5.7×10−6). Rare deleterious mutations in TAC3R, encoding a receptor for Neurokinin B, and in its ligand TAC3 have been found in families affected by hypogonadotropic hypogonadism and pubertal failure31. ESR1 encodes an estrogen receptor that is essential for sexual development and reproductive function, and polymorphisms in ESR1 have previously been nominally associated with age at menarche33.
Family studies have suggested a substantial co-inheritance of the timing of puberty and BMI34, and this is supported by our finding of four established BMI variants among our novel menarche loci. We therefore systematically assessed whether established loci for adiposity-related traits (BMI, waist-hip ratio (WHR) and obesity), and adult height, were also associated with age at menarche. Nine of the 12 BMI loci and 2 of the 4 WHR loci tested were associated with age at menarche (Table 2 and Supplementary Table 13). In all cases the BMI- or WHR-increasing allele was associated with earlier menarche, which is consistent with the direction of association in epidemiological studies35. 11 of the 44 adult height loci were associated with age at menarche (Table 3 and Supplementary Table 14). However, for 7 of these loci, the adult height-increasing allele was associated with earlier menarche, which is in the opposite direction to the association in individual-level epidemiological studies35.
We then assessed the relevance of our novel menarche loci to adult BMI and height by exploring in silico data from the GIANT consortium. Nine of the 42 menarche loci were associated with adult BMI (at P<0.05; N=32,530); in all cases the allele associated with higher BMI was associated with earlier menarche (Supplementary Table 15). Eighteen of the menarche loci were associated with adult height (at P<0.05; N~130,000); although for three of these the direction of effect was opposite to that predicted from epidemiological studies (Supplementary Table 16). Despite these joint associations with body size, in ALSPAC mothers the combined influence of the menarche loci on age at menarche appeared to be completely unattenuated following adjustment for adult height and BMI (Supplementary Table 17) suggesting that in general these menarche loci have direct effects on age at menarche. However, we acknowledge that further large studies with childhood growth data are needed to establish the causal directions of effect of these loci.
In a large GWAS meta-analysis comprising over 87,000 women, we identified 30 novel loci for the timing of menarche, and provide evidence for a further 10 possible novel loci. These loci were in/near genes associated with cellular development, body weight regulation, hormonal regulation and with a wide varietyof other biological functions. Previous studies comprising up to 17,510 women had detected only one or two genome-wide significant signals8–11. We now show that those earlier signals at LIN28B and 9q31.2 represented the ‘low-hanging fruit’ with particularly large effect sizes relative to their MAF (Supplementary Fig. 5). The list of functions of those genes nearest to the menarche loci (Supplementary Table 7) and the results of pathway analyses indicate a wide diversity of biological processes that regulate the timing of female pubertal maturation.
Among the confirmed novel menarche loci were several loci implicated in body weight regulation, including four loci with established associations with BMI (in/near FTO, SEC16B, TRA2B and TMEM18). Furthermore, our systematic analysis of established BMI-related SNPs showed that the majority of alleles related to higher BMI and waist-hip ratio also showed at least nominal associations with earlier menarche (Table 2). It is noteworthy that three novel menarche loci are in/near genes implicated in energy homeostasis in animal models (BSX, CRTC1, and MCHR2). In the GIANT consortium data, we did not detect any associations between these loci and adult BMI however the BSX and MCHR2 loci were nominally associated with adult height. In order to robustly investigate whether menarche loci have pleiotropic effects on growth, or whether the association with menarche timing is driven through increased adiposity, measures of body fatness prior to menarche, or even prior to the onset of puberty would be required, but were unavailable in most studies. Further functional studies of these novel menarche loci may also help to clarify the biological mechanisms linking these traits. In addition to influencing the timing of pubertal initiation, sufficient adiposity is also required for the maintenance of normal hypothalamic-pituitary-gonadal function, via signalling by adipocytokines such as leptin36.Our pathway analyses highlighted Coenzyme A and fatty acid biosynthesis as biological pathways related to menarche timing. Hypothalamic levels of long-chain fatty acyl-Coenzyme As have been shown to regulate rodent feeding behaviour and glucose homeostasis37 and genetic variants in this pathway could therefore potentially alter central nutrient sensing.
Earlier age at menarche is related to shorter adult stature in large epidemiological studies35. We found that several adult height-increasing alleles were also associated with age at menarche (Table 3), but at different loci these alleles were associated with either earlier or later menarche. These paradoxical associations suggest a complex inter-play between growth and pubertal timing. Earlier menarche is associated with taller, rather than shorter, childhood height, and there are likely separate causal effects of rapid linear growth on earlier puberty, and of earlier pubertal maturation on earlier growth plate fusion and cessation of growth.
While our pathway analyses strongly identified potential new biological pathways involved in pubertal timing, we acknowledge that the ability to assign putative functions to these menarche loci is substantially limited by the lack of identification of the causal variant at each locus. Many of the strongest associated SNPs were located 100’s of kb distant to the nearest gene, and some menarche loci contained several plausible genes. Indeed, none of the top signals represented non-synonymous SNPs, and only two were in LD with such variants (in OLFM2 and TRIM66). Use of eQTLs helped to identify the likely causal genes (GAB2, RBM6 and NARG2) at three menarche loci that spanned multiple genes. However, much future work will be required to identify the causal variants and implicated genes related to these menarche loci.
Despite the large size of our meta-analysis and the substantial increase in the number of menarche loci, these together explained between 3.6–6.1% of the variance in age at menarche, equivalent to 7.2–12.2% of its heritability. The majority of menarche loci had estimated effect sizes of between 2 to 3 weeks per allele. Assuming the presence of many true menarche SNPs with an effect size of 2 weeks per allele, even our large meta-analysis would only have had sufficient power to detect half of those SNPs with a MAF of 50%, and only 1 in 10 of those SNPs with MAF of 10% (Supplementary Fig. 8).
We corrected for population stratification by applying the genomic control method38 to each of the individual study results. When we applied a more stringent second correction for the overall genomic control inflation factor across all 32 studies, 10 of the 40 possible novel menarche variants fell below genome-wide significance (Fig. 1; Table 1). However, our subsequent finding of confirmatory evidence (P<0.05) even in our limited replication studies for 4 of these 10 variants (in/near TRIM66, TMEM108, TMEM18 and NFAT5) suggests that the second correction for genomic control is likely to be over-conservative.
Our identification of strong associations with SNPs near to the candidate genes TAC3R and ESR1 supports the likely presence of further menarche loci, which did not meet the genome-wide significance threshold. Systematic assessment of functional genetic variants identified several further putative menarche loci. rs3101336, which tags a CNV near the BMI locus NEGR1, showed strong, but sub-genome-wide significant, association with age at menarche (P=3×10−7). Exploration of adipose tissue eQTLs also identified further putative menarche loci related to genes implicated in eating behaviour (BDNF), precocious puberty (DLK1) and pituitary function (PITX1). It has been suggested that lower levels of statistical significance may be applied to variants with prior biological candidacy, however this must be balanced against the desire to avoid false positives, and we suggest that these putative menarche loci require confirmation in further studies.
Notably all of the top menarche variants had MAF ≥7%. While it has been suggested that low-frequency variants have larger effects than common variants39, we were clearly underpowered to detect low-frequency variants (MAF <5%) with modest effect sizes. It is also possible that rare variants are not well captured using genome-wide chips. Future re-imputation using deep sequencing data from the 1000 Genomes Project may identify additional low frequency hits as well as refining the location of possible functional variants.
In the majority of studies contributing to this report, age at menarche was recalled several years later, and often to the nearest completed whole year. Although recalled age at menarche is valid40, and is unlikely to show systematic bias by genotype, any non-differential error would lead to reduced statistical power. Menarche indicates the completion of puberty in females and it is unclear whether our novel menarche loci also influence timing of other pubertal phenotypes. The known menarche locus in LIN28B was shown to also influence the onset of breast development in girls, the timing of pubic hair development and voice breaking in boys9, and the timing of the pubertal growth spurt in both boys and girls41. While our novel menarche loci might also regulate such wider pubertal processes, it is plausible that some (e.g. INHBA) might have sex-specific effects. Our study was restricted to cohorts of European ancestry and our results are therefore not generalized to other groups. African American girls tend to show earlier pubertal maturation compared to girls of White European ancestry42 and genetic studies in such populations might reveal different menarche loci.
In summary, we identified at least 30 novel loci for age at menarche. Our findings demonstrate the role of genes which regulate energy homeostasis and hormone pathways, and illustrate the complexity of the regulation of the timing of puberty.
Thirty-two studies contributed to the Stage 1 GWAS meta-analysis, comprising 87,802 women of White European ancestry. The consortium was made up of populations from the Age, Gene/Environment SusceptibilityStudy(AGES, n=1849), the Amish population (Amish, n=557), the Atherosclerosis Risk in Communities study (ARIC, n=4247), the British 1958 Birth Cohort (B58C-T1DGC and B58C-WTCCC, n=1584), CoLaus (n=2797), deCODE (n=15,864), the Danish National Birth Cohort (DNBC, n=1748), the Estonian Genome Center, University of Tartu (EGCUT, n=987), the European Prospective Investigation into Cancer and Nutrition (EPIC-obesity cases and cohort, n=1840), the Erasmus Rucphen Family Study (ERF, n=1103), the Framingham Heart Study (FHS, n=3801), the Helsinki Birth Cohort (HBCS, n=976), the Health 2000 study (Health 2000 cases and controls, n=922), InCHIANTI (n=597), the Indiana University premenopausal Caucasian women peak BMD study (Indiana, n=1497), the Nurse’s Heath Studies (NHS, n=5360), the Northern Finland Birth cohort (NFBC, n=2648), the Netherlands Twin Register (NTR, n=1051), the Queensland Institute of Medical Research (QIMR, n=3528), the Rotterdam studies (RS1, RS2 and RS3, n=5406), the Study of Addiction: Genetics and Environment (SAGE, n=1376), the SardiNIA study (n=2158), Twins UK I, II and III (n=3962) and the Women’s Genome Health Study (WGHS, n=22,028). Full details can be found in the Supplementary note. All studies were approved by local ethics committees and all participants provided written informed consent.
Age at menarche recalled by the participant was recorded in each study. Specific questions asked can be found in Supplementary Table 1. Only women of White European ancestry with a valid age at menarche between 9 and 17 years were included in this analysis, since this represents the normal physiological range. Information on birth year was also collected in each study.
The 32 Stage 1 studies were genotyped using a variety of Affymetrix (6.0, GeneChip 500K, 250K, MIP50K and 10K) and Illumina (HumanHap 550K, 318K, HumanHap 300K, HumanHap 370K CNV, HumanHap610 quad, Human660W-Quad BeadChip, 6K and Human 1Mv1_C) genotyping arrays. Genotyping call rate cut-offs were at least 90% and SNPs were filtered for those with a minor allele frequency of greater than 1%. More details on the filtering criteria for genotypes in each individual study can be found in Supplementary Table 2.
In order to increase genomic coverage and allow the evaluation of the same SNPs across as many study populations as possible, each study imputed genotype data based on the HapMap CEU Build 35 or 36. Algorithms were used to infer unobserved genotypes in a probabilistic manner in either MACH, IMPUTE43, or software that was developed by the researchers. As a quality control measure, we excluded non-genotyped SNPs with an imputation quality less than 0.3 (for observed versus expected variance in MACH) or 0.4 (for IMPUTE’s proper info statistic) from the meta-analysis.
Each study performed genome-wide association testing for age at menarche across approximately 2.5 million SNPs, based on linear regression under an additive genetic model. Analyses were adjusted for birth year in order to remove the effect of the temporal decline in age at menarche. Studies used PLINK, ProABEL, MACH2QTL, SNPTEST, R-packages or MERLIN-fastassoc for the association testing. The results from individual studies were corrected by their respective genomic inflation factors (lambda) (Supplementary Table 1) according to the genomic control (GC) method to correct for population stratification38.
We used an inverse-variance meta-analysis to test the effects of each genetic variant on age at menarche across the 32 studies. Fixed effects models were used, although in the absence of significant heterogeneity choice of model has little impact on the results. In order to correct for potential relatedness between two Icelandic cohorts (AGES and deCODE), the corrected association results for these cohorts were first meta-analysed and the GC method was re-applied to the results of the combined sample. These results were then meta-analysed with the remaining 30 studies. We also display further results following a second correction for GC using the overall genomic inflation factor calculated from the meta-analysis of all 32 studies. All meta-analyses were conducted using the METAL software package. We considered P-values <5×10−8 to indicate genome-wide significance.
We also meta-analysed results from X chromosome SNPs in a subset of studies with this data available. This included seven imputed datasets and one directly genotyped. Total sample size was ~60% of the autosomal meta-analysis (N=52,781), with the same statistical model tested.
In order to establish whether genome-wide significant SNPs with low LD in the same chromosomal region (defined as r2<0.05 in a 750 kb region) were independent loci, we carried out a conditional analysis. Each study performed a genome-wide analysis for age at menarche using linear regression adjusting for the top signal at each of the 42 associated regions to determine whether potential second signals remained significant even after adjusting for these variants. Birth year was also included as a covariate. Results from each individual study were meta-analysed to determine whether these potential second signals were truly independent (i.e. if P<5.0×10−8).
In order to confirm our possible novel menarche loci, we tested our 42 top hits for in silico association with age at menarche in 8,669 women from 16 studies with GWAS data, and which were not included in the first stage meta-analysis (Supplementary Table 4). In addition, new genotype data was generated for 30 of the 42 menarche loci and tested for association with age at menarche in up to 6,118 women from the Avon Longitudinal Study of Parents and Children (ALSPAC). Genotyping was performed by KBiosciences (Hoddesdon, UK) using their own novel system of fluorescence-based competitive allele-specific PCR (KASPar). As in stage 1, analyses were restricted to women reporting age at menarche between 9 and 17 years, and adjustment was made for birth year. Mean age at menarche ranged from 12.4 to 13.5 years, consistent with studies in the stage 1 meta-analysis. Linear regression was used to test the association between each variant and age at menarche in an additive genetic model. These results were then meta-analysed with GC-adjusted statistics from our stage 1 meta-analysis using inverse-variance fixed effects models.
In order to calculate the overall variance explained by these menarche loci in each of the replication cohorts, we calculated the r2 value from a model including all 42 known, confirmed and possible novel menarche variants and birth year, and compared this to a model including birth year alone. We only included cohorts with >800 women in their full model analyses, as sample sizes smaller than this may give spurious results.
Ingenuity Pathway Analysis (IPA) Knowledge Base 8.5 (Ingenuity Systems, CA, USA) was used to explore the functional relationship between proteins encoded by the 42 known, confirmed and possible novel menarche loci. The IPA Knowledge base contains millions of findings curated from the literature. Genes or nearest genes to the 42 loci (as listed in Table 1) were entered into the Ingenuity database. These “focus genes” were analyzed for direct interactions only. Networks were generated with a maximum size of 35 genes and shown as graphical representations of the molecular relationships between genes and gene products. Proteins are depicted as nodes in various shapes representing the functional class of the protein. The biological relationships between nodes are depicted by lines. To determine the probability of the analyzed gene to be found together in a network from Ingenuity Pathways Knowledge Base due to random chance alone, IPA applies a Fisher’s exact test. The network score or P-value represents the significance of the focus gene enrichment. There are 25 diseases and disorders categories and 32 molecular and cellular function categories in the IPA Knowledge base. Enrichment of focus genes to these diseases and functional categories was also evaluated. The P-value, based on a right-tailed Fisher’s exact test, considers the number of identified focus genes and the total number of molecules known to be associated with these categories in the IPA knowledge database.
Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA) was used to explore pathway-based associations in the full GWAS dataset. MAGENTA implements a GSEA-based approach, the methodology of which is described in Segrè et al.44. Briefly, each gene in the genome is mapped to a single index SNP with the lowest P-value within a 110kb upstream, 40kb downstream window. This P-value, representing a gene score, is then corrected for confounding factors such as gene size, SNP density and LD-related properties in a regression model. Genes within the HLA-region were excluded from analysis due to difficulties in accounting for gene density and LD patterns. Each mapped gene in the genome is then ranked by its adjusted gene score. At a given significance threshold (95th and 75th percentiles of all gene scores), the observed number of gene scores in a given pathway, with a ranked score above the specified threshold percentile, is calculated. This observed statistic is then compared to 1,000,000 randomly permuted pathways of identical size. This generates an empirical GSEA P-value for each pathway. Significance was determined when an individual pathway reached a false discovery rate (FDR) < 0.05 in either analysis (Supplementary table 9). In total, 2529 pathways from Gene Ontology, PANTHER, KEGG and Ingenuity were tested for enrichment of multiple modest associations with age at menarche.
We tested the association between 5,184 adipose tissue eSNPs identified in the Icelandic Family Adipose (IFA) cohort (n=673) with age at menarche in our stage 1 meta-analysis sample. The IFA cohort dataset included the expression of 23,720 transcripts representing 84% of the 20,060 protein-coding genes annotated in the Ensembl database (v 33)26.
Academy of Finland (Finnish Centre of Excellence in Complex Disease Genetics 129680, 120315, 129287, 129494); Affymetrix (N02-HL-6-4278); Agency of Science, Technology and Research of Singapore (A*STAR); Althingi (the Icelandic Parliament); American Heart Association (0855082E); Amgen; Augustinus Foundation; Australian National Health and Medical Research Council (241944, 339462, 389927, 389875, 389891, 389892, 389938, 442915, 442981, 496739, 552485, 552498, 619667); Australian Research Council (A7960034, A79801419, A79906588, DP0212016, DP0343921, DP0770096); Baltimore Veterans Administration Medical Center Geriatrics Research; Canadian Institutes of Health Research (Grant ID 166067); Cancer Research United Kingdom; the Cariplo Foundation; Center for Disease Control and Prevention (USA), Centre for Medical Systems Biology (CMSB); the Chief Scientist Office of the Scottish Government; Clinical Nutrition Research Unit of Maryland (P30 DK072488); Danish National Research Foundation; Danish Pharmacists’ Fund; the Egmont Foundation; Erasmus Medical Center; Erasmus University; Estonian Government (SF0180142s08); the European Commision (212111 BBMRI, 205419 ECOGENE, 201413 ENGAGE, HEALTH-F2-2008-201865 - GEFOS, HEALTH-F2-2008-35627, HEALTH-F4-2007-201413, HEALTH-F4-2007-201550 HYPERGENES, 245536 OPENGENE, TREAT-OA, GenomEUtwin Project QLG2-CT-2002-01254, EU/QLRT-2001-01254; ERC-230374); European Union framework program 6 EUROSPAN project (LSHG-CT-2006-018947, LSHM-CT-2003-503041); European Regional Development Fund; Faculty of Biology and Medicine of Lausanne, Switzerland; Fondazione Compagnia di San Paolo Health Ministry, Framingham Heart Study (N01-HC-25195); GENEVA Coordinating Center (U01HG004446); German Federal Ministry of Education and Research; German National Genome Research Network (NGFN-2 and NGFNPlus: 01GS0823); Giorgi-Cavaglieri Foundation; GlaxoSmithKline; Health Fund of the Danish Health Insurance Societies; Helmholtz Zentrum München - German Research Center for Environmental Health; Hjartavernd (the Icelandic Heart Association); Italian Ministry of Health (ICS110.1/RF97.71, RF-FSR-2007-647201); Juvenile Diabetes Research Foundation International (JDRF); Leenaards Foundation; March of Dimes Birth Defects Foundation; the Medical Research Council (G0000934, G0500539, G0701863); National Cancer Institute (CA40356, CA98233, P01CA087969, P01CA055075, CA047988, CA63464, CA54281, CA136792, CA089392, CA104021); Munich Center of Health Sciences (MC Health, LMUinnovativ); National Health and Medical Research Council of Australia (572613 and 003209); National Heart, Lung, and Blood Institute (HL 043851, HL087679, HL69757, N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, R01HL087641, R01HL59367, R01HL086694, RC2 HL102419,); National Human Genome Research Institute (NHGRI); National Institute of Aging (263 MD 9164, 263 MD 821336, N.1-AG-1-1, N.1-AG-1-2111, N01-AG-5-0002, AG-16592, Genetics of Reproductive Life Period and Health Outcomes - R21AG032598); National Institute of Arthritis Musculoskeletal and Skin Diseases and National Institute on Aging ((NIAMS/NIA) R01 AR/AG 41398); National Institute of Child Health and Human Development (HD-061437); National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, R01DK058845, U01 DK062418); National Institute on Alcohol Abuse and Alcoholism (NIAAA); National Institute of Allergy and Infectious Diseases (NIAID); National Institute on Drug Abuse (NIDA); National Institutes of Health (AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, HHSN268200782096C, M01 RR-00750, MH66206, N01-AG-12100, N01-AG-1-2109, P01-AG-18397, R01-088119, R01-DA013423, RFAHG006033, U01DE018993, U01DE018903, U01HG004422, U01HG004446, U01HL72515, U01 HL84756, U01HG004399, U01HG004402, U01HG004415, U01HG004423, U01HG004436, U01HG004438, U01HG004402, U01HG004446, U01HG004726, U01HG004728, U01HG004729, U01HG004735, U01HG004738, U01HG04424, U10AA008401, U54RR025204-01, UL1RR025005, UL1 RR025774, Z01CP010200, HHSN268200625226C); the National Institute for Health Research Cambridge Biomedical Research Centre; Netherlands Organization for Scientific Research (904-61-193, 575-25-006, 480-04-004, 56-464-14192, NWO 480-05-003, 175.010.2005.011, 911-03-012, 014-93-015); Netherlands Genomic Initiative (NGI, 050-060-810), NIA Intramural Research Program; NIA IRP Laboratory of Neurogenetics; NIH Genes, Environment and Health Initiative; Republic of Croatia Ministry of Science, Education and Sports (108-1080315-0302); Robert Dawson Evans Endowment; the Royal Society, Royal Swedish Academy of Science; State of Bavaria, Germany; Susan G. Komen Breast Cancer Foundation; Swedish Foundation for Strategic Research (SSF); Swedish Heart-Lung Foundation; Swedish Ministry of Higher Education and Research; Swedish Research Council; Swedish Society of Medicine; Swiss National Science Foundation (Ref: 33CSCO-122661, 3100AO-116323/1); the University of Bristol, University of Maryland General Clinical Research Center (M01 RR 16500); the Wellcome Trust (068545/Z/02, 076467/Z/05/Z, 077016/Z/05/Z, 89061/Z/09/Z, strategic award 079895); Western Australian DNA Bank; Western Australian Genetic Epidemiology Resource (see Supplementary note for expanded acknowledgements).
mRNA by SNP Browser, http://www.sph.umich.edu/csg/liang/asthma/
Author contributionsWriting group: DIB, HAB, DIC, DLC, LC, EWD, MJE, CEE, TE, BF, NF, FG, DFG, CHe, TBH, DJH, DLK, KKL, MMan, MMar, JMM, AMe, AMu, KKO, JRBP, LS, EAS, PS, UT, AGU, CMvD, SWvW, JAV, EW, GZ
Analysis working group: CEE, AMu, KKO, JRBP, PS
Secondary analyses: JAV, JRBP, ADJ, DL, ASP, VE, AVSe
Oversight of contributing cohorts: DIB, HAB, LJB, LC, CMvD, DFE, MJE, UdF, PG, AH, DJH, PH, TBH, KK, MMar, AMe, AMu, GWM, JMM, SSM, VM, CEP, PP, IR, PMR, EAS, DPS, GDS, KS, NJS, TDS, DT, AGU, AFW, DMW, EW, HEW, JFW, NJW
Analytical lead of contributing studies: EA, EMB, DIC, DLC, TC, CEE, TE, BF, NF, DFG, CHa, CHe, JJH, EI, DLK, ZK, KLL, PL, RJFL, MMan, PFM, PKEM, KN, EP, JRBP, AVSm, ENS, LS, PS, SS, SU, JAV, NMW, SHvW, JHZ, LZ.
Genotyping and phenotyping of contributing studies: HA, NA, Pd’A, TA, GSB, HB, IB, EB, FB, JEB, SBa, SBe, ADC, DC, HC, MCC, SJC, WC, AD, RMvD, PD, GE, JE, ACH, SHE, ARF, CG, LiF, LuF, TF, FG, EJCdG, MG, VG, DGH, FH, TI, MRJ, DK, DPK, IK, PK, TOK, JL, JSEL, LJL, SL, JBJvM, JCM, JMM, MMe, NGM, WLM, ARN, FN, MN, MAN, PN, KKO, BAO, APa, APo, ANP, CEP, GP, LP, LJP, MP, NLP, OP, FR, JPR, SMR, TR, AS, ARS, CS, DS, NS, SRS, US, VS, ET, KT, LT, MLT, JT, MU, PV, GWa, GWi, MNW, LY, GZ, WVZ.
Full author contributions and financial disclosures are listed in the Supplementary Note.