|Home | About | Journals | Submit | Contact Us | Français|
Members of the WTCCC and DGI are listed in the Supplementary Note.
M.N.W., G.L., R.M.F. and C.M.L. performed the analyses and wrote the first draft of the paper. J.R.B.P., K.S.E., B.S., E.Z., H.L., N.J.T. and N.W.R. were responsible for production, quality control and cleaning of the GWA data from the WTCCC samples. B.F.V., C.G., V.L., N.P.B. and R.S. were responsible for the production, quality control checking and cleaning of the GWA data from the DGI samples. B.F.V. also performed association testing and EIGENSTRAT analyses for height on the DGI data set. S.M.R. performed the data linking and quality control analysis for the ALSPAC study. R.H. and C.G. genotyped rs1042725 in the FINRISK97 cohort. K.A. is principal investigator for the near-extreme height panel. L.P. and V.S. are principal investigators for FINRISK97. L.C.G. is principal investigator for the Botnia study. G.D.S. is principal investigator of the ALSPAC study. J.H.T. and A.R.N. generated DXA measures in the ALSPAC children. C.N.A.P. and A.D.M. are principal investigators of the Tayside UKT2D-GCC study. A.T.H., M.I.M., J.N.H. and T.M.F. designed and led the study. All authors read and approved the final manuscript.
Human height is a classic, highly heritable quantitative trait. To begin to identify genetic variants influencing height, we examined genome-wide association data from 4,921 individuals. Common variants in the HMGA2 oncogene, exemplified by rs1042725, were associated with height (P = 4 × 10−8). HMGA2 is also a strong biological candidate for height, as rare, severe mutations in this gene alter body size in mice and humans, so we tested rs1042725 in additional samples. We confirmed the association in 19,064 adults from four further studies (P = 3 × 10−11, overall P = 4 × 10−16, including the genome-wide association data). We also observed the association in children (P = 1 × 10−6, N = 6,827) and a tall/short case-control study (P = 4 × 10−6, N = 3,207). We estimate that rs1042725 explains ~0.3% of population variation in height (~0.4 cm increased adult height per C allele). There are few examples of common genetic variants reproducibly associated with human quantitative traits; these results represent, to our knowledge, the first consistently replicated association with adult and childhood height.
Adult height is a classic polygenic trait. The genetics of height were central to the mendelian versus biometrician debate in the early part of the twentieth century that was resolved by Fisher, who proposed that height and other human phenotypes showed multifactorial inheritance1. Twin, family and adoption studies suggest that up to 90% of normal variation in human height within populations is due to genetic variation2-6. Severe mutations in several genes cause rare syndromes with extreme stature; however, these cannot explain normal population height variation7. Many regions of the genome have been linked with height based on numerous genome-wide linkage scans, with some overlap between studies6, but thus far there have not been any examples of gene variants that are reproducibly associated with height variation in the general population.
The recent flood of data from many genome-wide association (GWA) studies offers new opportunities to identify genes influencing adult height. The identification of such genes will probably provide important insights into how best to dissect the genetics of polygenic quantitative traits. The identification of genes influencing growth may also have important medical implications. Height is associated with several common disorders, including a number of cancers8,9.
To begin to identify gene variants influencing adult height, we analyzed GWA data from a total of 4,921 participants. These included 1,896 UK individuals with type 2 diabetes from the Wellcome Trust Case Control Consortium (WTCCC10; height data on the controls was unavailable) and 3,025 Swedish or Finnish participants (1,496 individuals with type 2 diabetes and 1,529 nondiabetic controls from the Diabetes Genetics Initiative (DGI); ref. 11 and Supplementary Table 1 online). All participants were of self-reported European ancestry. All DNA samples were genotyped using the Affymetrix GeneChip Human Mapping 500K platform. After performing quality control to exclude poorly performing SNPs, we conducted a meta-analysis of sex- and age-adjusted height Z scores for the 364,301 autosomal SNPs common across the data sets. These SNPs provided 64% coverage of CEU HapMap SNPs (minor allele frequency (MAF) > 5% and r2 > 0.8; see Supplementary Methods online).
We created a quantile-quantile (QQ) plot for the meta-analysis (Fig. 1). In the individual GWA studies, the most associated SNPs were rs4552313 (P = 2 × 10−6), rs7316173 (P = 3 × 10−6) and rs10804515 (P = 1 × 10−6) in the WTCCC participants, the DGI affected individuals and the DGI controls, respectively, but these did not replicate (P < 0.05) across studies, and the combined P was > 4 × 10−5 for all three SNPs. The two SNPs most strongly associated with height in our meta-analysis (Table 1 and Fig. 2), rs1042725 (P = 4 × 10−8) and rs7968682 (P = 7 × 10−8), were in linkage disequilibrium (LD) with each other (r2 of 0.87 and 0.92 for WTCCC and DGI, respectively) and were the only SNPs to reach a level of statistical significance strongly suggestive of true association (P < 5 × 10−7)10, after allowing for multiple testing. Association studies of height and other traits may be susceptible to false positive results from genotyping artifacts or population substructure12,13. However, several lines of evidence suggest that the observed association is not artifactual. First, the similar results obtained with two highly correlated SNPs suggest that technical problems in genotyping are unlikely to explain our results. Second, the availability of methods that use dense genome-wide SNP data to estimate and account for ancestry allows us to be confident that the associations are not explained by population stratification. Adjusting for residual population structure using EIGENSTRAT14 did not substantially alter the strength of the association (WTCCC: unadjusted P = 9 × 10−4; adjusted P = 1 × 10−3; DGI, unrelated individuals only: unadjusted P = 1 × 10−3; adjusted P = 2 × 10−3 for rs1042725), and the genomic control inflation factor15 was only 1.11 from the meta-analysis. Third, stratification by geographical region-of-ascertainment did not reduce the strength of the association (rs1042725 meta-analysis P = 1 × 10−6 with and without stratification, using only unrelated subjects from DGI). Fourth, a purely family-based analysis of the sibship-based portion of the DGI sample also contributed evidence for association (982 individuals in 406 sibships; P = 0.06, with the same direction of effect). Finally, 13 SNPs that are strongly correlated with the major axis of population differentiation in UK samples did not show association with height (all P > 0.05)10.
In addition to showing strong statistical evidence for association, the two SNPs lie in (rs1042725) and 12 kb downstream (rs7968682) of the 3′-UTR region of the high mobility group-A2 (HMGA2) gene, which is a strong biological candidate for influencing height. Pygmy mutant mice, which bear homozygous deletions of the orthologous Hmgic gene, are short in length16, whereas mice expressing truncated Hmgic, including only the first three exons under the regulatory control of the cytomegalovirus (CMV) promoter, develop gigantism and lipomatosis17. Furthermore, the autosomal dwarf (adw) phenotype has been mapped to the syntenic region in the chicken18. In humans, an individual with a severe overgrowth syndrome (stature of 169 cm at age 8 years, >7 s.d. above the mean) carries a chromosomal inversion that truncates the HMGA2 gene product19.
A combination of this strong statistical and biological evidence led us to investigate the role of these SNPs in additional population-based studies. We used primarily rs1042725 in the additional studies because it had a marginally stronger significance level in the initial analysis. We genotyped an additional 29,098 individuals of European ancestry from five studies, including three population-based studies (15,167 adults and 6,827 children), a type 2 diabetes case-control set (3,897 adults) and a collection of 3,207 adults sampled from the near-extremes of the height distribution (the 5th to 10th and 90th to 95th percentiles; Supplementary Table 1).
In the replication studies of adults sampled from across the height distribution, each copy of the C allele at rs1042725 was associated with an increase of 0.07 in the adult height Z score (95% confidence interval (c.i.) 0.05–0.09), equivalent to ~0.4 cm (P = 3 × 10−11; Table 1 and Supplementary Table 2 online). This result provides a strong replication of the original association. When we combined these replication data with the initial results from the genome-wide studies, the statistical evidence in favor of association increased (P = 4 × 10−16). There was little evidence of heterogeneity (I2 = 0%)20 across all adult studies, no evidence that the effect size was different between males and females (P = 0.63) and no evidence for deviation from an additive inheritance model (P = 0.93 for likelihood ratio test of additive versus full two–degree of freedom (2-d.f.) model). The C-allele frequency was similar across the different studies (0.48–0.54), further adding to the evidence that population stratification does not explain the observed association. When we used the adults sampled from the near-extremes of the height distribution, we found that each copy of the rs1042725 C allele increased the odds of being in the group of tall individuals (odds ratio = 1.27, 95% c.i. 1.15–1.40, P = 4 × 10−6).
As an initial search for other variants in this region that might be associated with adult height, we genotyped 11 additional SNPs in the population-based FINRISK97 sample (N = 6,533) and in the European American adult height panel (N = 3,207) drawn from the near-extremes of the height distribution. These 11 SNPs and 20 specific multimarker haplotypes were selected to capture the 42 variants in HapMap phase II build 21a (CEU population) with frequency >1% that lie within the region of strong LD surrounding rs1042725 (Supplementary Table 3 online, Fig. 2 and Supplementary Methods). None of these single markers or multimarker haplotypes was more significantly associated with height than rs1042725, and none remained associated after conditioning on rs1042725. Therefore, rs1042725 remains the best explanation for the observed association with height at the HMGA2 locus. However, which precise SNP(s) in HMGA2 are functional and how these SNP(s) might alter the expression or function of HMGA2 is not yet known.
To determine the age at which the association with height appears, we also analyzed the longitudinal Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort of 6,079 children, for whom growth measures were available from birth through to early adolescence, and 748 children from the Exeter Family Study of Childhood Health, for whom birth measures were available. There was no evidence that rs1042725 altered birth length (P ≥ 0.37), but there was a strong association with height at age 7 years, with an increased height of 0.5 cm per C allele (95% c.i. 0.3–0.7, P = 1 × 10−6); the association persisted at ages 9, 10 and 11 years (Table 2). Using all data from the children at ages 7–11 years, and taking into account the correlation between measurements at the varying time points, the overall effect in the children was 0.07 s.d. per allele (95% c.i. 0.04–0.10; P = 3 × 10−5), equivalent to ~0.4 cm. These results, showing normal birth measures and increased postnatal growth, are consistent with the normal birth length of the individual with a homozygous disruption of HMGA2 (ref. 19).
The growth of the spine and the long bones in the limbs proceeds by different mechanisms in humans. Therefore, we examined whether rs1042725 is associated with variation in limb growth, spine growth or both. In 7-year-old children from the ALSPAC study, each C allele of rs1042725 was associated with both an increase of 0.3 cm in leg length (95% c.i. 0.2–0.4, P = 8 × 10−7) and an increase of 0.2 cm in sitting height, a measure of the spinal portion of the skeleton (95% c.i. 0.1–0.3, P = 0.0002). This was maintained at ages 8, 9, 10 and 11 years (Supplementary Table 4 online), suggesting that the effect is on general longitudinal skeletal growth. As expected, we observed an association of the rs1042725 C allele with lean mass (P = 0.007) and a consistent trend with bone mass (P = 0.092) as assessed by dual-energy X-ray absorpitometry (DXA) in 5,289 9-year-old children.
Mice lacking a copy of the HMGA2 homolog (Hmgic) have greatly reduced fat mass and are resistant to diet-induced obesity16. This raised the question of whether the SNPs associated with height also affect body mass index (BMI). We did not find any evidence that the height association signal also affects BMI in adults (Supplementary Table 5 online) or BMI or DXA-assessed fat mass (aged 9) in children (Supplementary Table 6 online).
High-mobility group (HMG) proteins are DNA-binding proteins, often with low-affinity binding sites, and are thought to be involved in altering chromatin structure for regulation of gene expression21. Rearrangement of HMGA2 is a common feature of mesenchymal tumors, most notably lipomas19. It is not known whether variation at SNP rs1042725 is also associated with variation in cancer risk.
There are still relatively few examples of common gene variants that influence quantitative population-based traits with convincing evidence in humans. Our data provide an example of a strongly replicated association with a quantitative trait. As the best results from our combined GWAS meta-analysis just reached a P value of only <1 × 10−7, with nearly 5,000 individuals, this study highlights the need for using many thousands of individuals to identify variants underlying polygenic traits with appropriate statistical support. Furthermore, the expected ‘winner’s curse’ phenomenon22—illustrated in our data by the larger effect size estimates in the GWA data than in the replication samples—aided in our discovery of this variant, suggesting that other variants with similar effect sizes may also be present but with lower levels of significance in the GWA data. Their discovery will require larger sample sizes and/or more aggressive replication efforts. This study also provides additional insights into the genetic architecture of a classic complex trait. Although height is an accurately measured phenotype with a very strong genetic component, individual common variants have modest effects on this complex trait. We estimate the percentage of the variance of height explained by rs1042725 to be only ~0.3%; even taking into account the winner’s curse, the number of as-yet-undiscovered common variants with similar or larger effect sizes must be low, suggesting that hundreds of loci of even smaller effect will ultimately be shown to comprise the genetic basis of height. Fortunately, an increasing amount of high-quality GWA data are becoming available; in combination with well-powered replication efforts, such data should permit the identification of these additional loci for height and other quantitative traits.
In conclusion, common variation in the HMGA2 gene is associated with adult height in multiple studies at very high levels of statistical confidence. The effect on growth is present in individuals as early as 7 years of age. To our knowledge, these results represent the first reproducible association of a common variant with human stature and suggest that by analyzing data from many thousands of individuals, it will now finally be possible to dissect the genetics of this highly heritable polygenic trait. Insights gained from the study of height are likely to have general implications for the study of other complex traits and common diseases.
As detailed elsewhere10, samples in the WTCCC genome-wide scan had type 2 diabetes. All had four grandparents of exclusively British and/or Irish origin. All subjects used in this study gave written informed consent, and the project protocols were approved by the local research ethics committees. Anthropometric measurements were taken as described previously23. The DGI GWA study for type 2 diabetes has been described previously11(see also URLs section below). Participants were of European ancestry from Finland and Sweden. In both studies, extensive quality control steps were taken to exclude poorly performing samples and those of non-European descent10,11.
This study has been described previously24. All subjects were self-reported “white” and of European descent, living in the Tayside region of Dundee, UK. Height and weight measurements were made as for the WTCCC samples. This study was approved by the Tayside Medical Ethics Committee, and informed consent was obtained from all subjects.
ALSPAC recruited pregnant women with expected delivery dates between April 1991 and December 1992 from Bristol, UK25. Self-reported “non-white” individuals were excluded from all analyses. The mothers’ height and weight data were self reported. Where the data set included singleton siblings born to the same mother, only the first born was included in the analyses. All multiple births and individuals born before 36 weeks’ gestation were excluded from birth length analyses. For the analyses of children 7 to 11 years old, only the first born of each twin pair was included. Birth measurement protocols have been described previously26. At the ages of 7 to 11 years, anthropometric measurements were taken27. At age 9, 7,470 children underwent a whole-body dual-energy X-ray absorptiometry (DXA) scan26.
All aspects of the study were reviewed and approved by the ALSPAC Law and Ethics Committee and by local research ethics committees. Parents gave written consent for children in this study.
EFSOCH is a prospective study of parents and children from a consecutive birth cohort28. Subjects were recruited from a postcode-defined region of Exeter, UK between 2000 and 2004 and were of self-reported “white” European descent. Parental height and weight were measured by the research midwife at 28 weeks’ gestation. Maternal pre-pregnant weight was self reported. Ethical approval was given by the North and East Devon Local Research Ethics Committee, and informed consent was obtained from the parents of the newborns.
FINRISK1997 is a population-based risk factor survey carried out by the National Public Health Institute of Finland29 and was approved by the Ethical Committee of the National Public Health Institute on 30 October 1996 (decision number 38/96). The sample was drawn from the national population register for five geographical areas in Finland.
These height case-control samples have been described previously12. For both panels, all individuals were self-described “white” or “Caucasian.” For the US panel, all subjects were born in the US, and all of their grandparents were born in either the US or Europe. All subjects in the Polish panel were born in Poland, and all grandparents were born in Europe or Russia. All subjects gave informed consent, and approval was obtained from the Institutional Review Board of Children’s Hospital, Boston.
We used 393,453 SNPs from the Affymetrix GeneChip Human Mapping 500K platform, which was used in a recent report of type 2 diabetes association using the same WTCCC type 2 diabetes samples24. For the DGI data, SNP quality control and exclusion criteria are reported in detail elsewhere11 and resulted in the use of 386,731 SNPs. We report the 364,301 autosomal SNPs common across the studies.
For each GWA study, summary statistics from linear regression using Z scores were generated using PLINK30. We obtained a combined result for each SNP using inverse variance meta-analysis from the summary statistic beta and standard error (s.e.m.).
Height was normally distributed in all cohorts. For WTCCC, UKT2D GCC, ALSPAC and EFSOCH, gender-specific height Z scores were generated within each study, and age was included as a covariate in subsequent analyses. FINRISK Z scores were generated by correcting for gender, age and regions of recruitment. For all studies, individuals with heights greater than 4 s.d. from the mean were excluded. We examined the associations between genotype and quantitative traits using linear regression. BMI and DXA fat mass were log10 transformed before analysis, and gestational age was included as a covariate in birth length analyses. To obtain an overall estimate for the association between height and rs1042725 genotype in the children from the ALSPAC study, we performed a linear regression using a generalized estimating equation (GEE) to account for the correlation between height measurements performed repeatedly on each subject.
Meta-analysis statistics and plots were generated using StataSE version 9 (StataCorp). We used the inverse variance method to pool continuous data (Z score units). The I2 statistic20 was used to estimate between-study heterogeneity.
For both GWA studies, EIGENSTRAT14 was run on the full set of markers (~390,000 SNPs) using genotypes from unrelated individuals only. Similar results were obtained when we used the first three or ten main eigenvectors. For the WTCCC sample, an LD-pruned set of 104,766 SNPs (generated using PLINK30) produced similar results.
Statistical analysis was performed using a Cochran-Mantel-Haenszel test, as implemented in PLINK30. The data set was stratified according to the country of origin of the grandparents to account for population stratification within the European American height panel12.
Using the related individuals in the DGI sample (982 individuals in 406 sibships), we performed a family-based test of association. We used the QFAM-WITHIN method, as implemented in PLINK30.
The genotyping of the initial genome-wide association studies is described elsewhere10,11. Genotyping of the UKT2D GCC and EFSOCH samples was performed using TaqMan SNP genotyping assay (Applied Biosystems) according to the manufacturer’s protocol. Genotyping of the ALSPAC cohort was performed by KBiosciences (Hoddesdon) using their own system of fluorescence-based competitive allele-specific PCR (KASPar). Genotyping of FINRISK97 and the GCI Extreme Panel was performed using the platform iPLEX Sequenom MassARRAY. SNP rs1042725 was in HWE in all studies (P > 0.05). The duplicate concordance rate was >99.4% for each study. The genotype success rate was >95% in all studies.
HapMap: http://www.hapmap.org; DGI GWA study, http://www.broad.mit.edu/diabetes/; ALSPAC: http://www.alspac.bris.ac.uk. Details of the KBiosciences fluorescence-based competitive allele-specific PCR (KASPar) assay design are available at http://www.kbioscience.co.uk. Details of the iPLEX Sequenom MassARRAY are available at http://www.sequenom.com/Assets/pdfs/appnotes/8876-006.pdf
For the UK-based studies, collection of the type 2 diabetes cases was supported by Diabetes UK, BDA Research and the UK Medical Research Council (MRC) (Biomedical Collections Strategic Grant G0000649). The UK Type 2 Diabetes Genetics Consortium collection was supported by the Wellcome Trust (Biomedical Collections Grant GR072960). The ALSPAC study was supported by The UK MRC, the Wellcome Trust and the University of Bristol. The Exeter Family Study of Childhood Health was supported by UK National Health Service Research and Development and the Wellcome Trust. We also thank the Exeter University Foundation for funding. The UK GWA genotyping was supported by the Wellcome Trust (076113), and replication genotyping was supported by the Wellcome Trust, Diabetes UK, European Commission (EURODIA LSHG-CT-2004-518153) and the Peninsula Medical School. Personal funding comes from the Wellcome Trust (A.T.H.; Research Leave Fellow; Research Career Development Fellow); UK MRC (J.R.B.P.); Diabetes UK (R.M.F.) and the Throne-Holst Foundation (C.M.L.). M.N.W. is Vandervell Foundation Research Fellow at the Peninsula Medical School. C.M.L. is a University of Oxford Nuffield Department of Medicine Scientific Leader Fellow. C.N.A.P. and A.D.M. are supported by the Scottish executive as part of the Generation Scotland Initiative. We acknowledge the assistance of many colleagues involved in sample collection, phenotyping and DNA extraction in all the different studies. We thank K. Parnell, C. Kimber, A. Murray and K. Northstone for technical assistance. We thank S. Howell, M. Murphy and A. Wilson (Diabetes UK) for their long-term support for these studies. We also acknowledge the efforts of J. Collier, P. Robinson, S. Asquith and others at KBiosciences for their rapid and accurate large-scale genotyping. Finally, we acknowledge all participants in the various studies.
For the studies using the Scandinavian, US and Polish samples, the work was supported by a March of Dimes grant (#6-FY04-61) to J.N.H. and by a grant from The Center of Excellence in Complex Disease Genetics of the Academy of Finland (EU Projects GenomEUtwin, QLG2-CT-2002-01254) (L.P.). The FINRISK study was supported by the Sigrid Juselius Foundation. We thank members of our laboratories and of the Altshuler and Daly laboratories for helpful discussion, and we gratefully acknowledge all of the participants in the studies. We also thank J. Butler for excellent technical assistance and M. Kuokkanen for logistical help with the FINRISK97 cohort.
The whole-genome genotyping and analysis in the DGI genome scan (see Supplementary Note online for contributors) was supported by Novartis Institutes for BioMedical Research (to D. Altshuler), with additional support from The Richard and Susan Smith Family Foundation/American Diabetes Association Pinnacle Program Project Award (to D. Altshuler, J.N.H. and M.J. Daly). R.S. is supported by a US National Institutes of Health (NIH) Research Service Award. G.L. is supported by a March of Dimes research grant (6-FY04-61). Members of the DGI study group acknowledge support from an NIH/National Heart, Lung, and Blood Institute grant (U01 HG004171), the Burroughs Wellcome Fund and the Doris Duke Charitable Foundation. L.C.G. and members of the Botnia Study were funded by the Sigrid Juselius Foundation, the Finnish Diabetes Research Foundation, the Folkhalsan Research Foundation and Clinical Research Institute HUCH. The Malmö Study was funded by a Linné grant from the Swedish Research Council. L.C.G. is supported principally by the Sigrid Juselius Foundation, the Finnish Diabetes Research Foundation, The Folkhalsan Research Foundation and Clinical Research Institute HUCH. Work in Malmö, Sweden was also funded by a Linné grant from the Swedish Research Council. We thank the Botnia and Skara research teams for clinical contributions, and colleagues at Massachusetts General Hospital, Harvard, Broad and Novartis for discussions.