Some of the SNPs reported here fall in or near strong candidate height genes, such as the recently described associations with HMGA2
) and GDF5-UQCC
, whereas others identify previously unsuspected loci. Together, these associations highlight biological pathways that are important in regulating human growth.
Hedgehog-interacting protein (HHIP
; rs1492820) is a transcriptional target and an antagonist of Hedgehog signaling; it binds with high affinity to the three mammalian Hedgehog proteins22
. The mouse homolog, Hhip
, is expressed in the perichondrium, including regions flanking Indian hedgehog (Ihh
) expression in the appendicular and axial skeleton. Ectopic overexpression of Hhip
in mouse cartilage causes severe skeletal defects, including short-limbed dwarfism, a feature reminiscent of the phenotype observed in Ihh
We identified several associated SNPs in or near genes related to chromatin structure. In addition to HMGA2, which encodes a chromatin-binding protein, we found associations with a SNP (rs10946808) in a histone cluster on chromosome 6, a SNP (rs12986413) in the histone methyltransferase DOT1L gene and a SNP (rs724016) in an intron of the methyl-DNA-binding transcriptional repressor gene ZBTB38. It is currently unclear how genetic variation at these loci modulate height, but there is a precedent for a connection between regulation of chromatin structure and stature: Sotos syndrome (MIM117550), characterized by extreme tall stature, is caused by mutations and deletions in the histone methyltransferase gene NSD1. It would be interesting to test whether the height variants at HMGA2, the chromosome 6 histone cluster, DOT1L and ZBTB38 modify clinical outcome in Sotos syndrome, or whether severe mutations in these genes, particularly DOT1L, could cause a Sotos syndrome-like phenotype.
That the variant rs1042725, strongly associated with adult and childhood height12
, falls in the 3′ UTR of HMGA2
is notable in part because HMGA2
is the human gene with the greatest number of validated let-7
microRNA binding sites24,25
. In fact, rs1042725 is 13 base pairs away from a let-7
site, suggesting a possible mechanism of action whereby the SNP alters microRNA binding and therefore expression of HMGA2
. When we examined our list of 12 height loci, we were somewhat surprised to find three additional previously described targets of let-7
: the cell cycle regulator CDK6
), the histone methyltransferase DOT1L
and the gene LIN28B
, a locus with a combined P
= 1.2 × 10-6
in our study, also contains a predicted let-7
. Thus, genes that influence height seem to be enriched for validated or potential let-7
targets: 5 of the 16 (31%) confirmed or suggestive loci associated with height have let-7
binding sites, compared with 2% of the genes in the human genome (Fisher’s exact test P
= 3 × 10-5
). Because microRNAs can co-regulate genes involved in the same biological process, it will be interesting to test whether the other targets of let-7
, or let-
7 itself, are regulators of adult height.
There were also noteworthy candidate genes among the variants that showed strong but as yet less conclusive levels of significance for association with height in our meta-analysis of GWA studies and replication cohorts. A SNP 28 kb upstream of PRKG2
(rs1662845), which encodes the cGMP-dependent protein kinase II (cGKII), showed strong association with height in our meta-analysis of GWA scans (P
= 5.7 × 10-5
), and in the same direction in the European American height panel (P
= 8.5 × 10-6
) and the FUSION stage 2 sample (P
= 0.001), but not in the FINRISK97 (P
= 0.93, opposite direction) and PPP (P
= 0.16, opposite direction) panels. a This locus is very strong candidate for a role in height variation. First, Prkg2-/-
mice developed dwarfism that is caused by a severe defect in endochondral ossification at the growth plates29
. Second, the naturally occurring Komeda miniature rat Ishikawa mutant, which has general longitudinal growth retardation, results from a deletion in the rat gene encoding cGKII30
. Therefore, in rodents, it is clear that cGKII has a role in skeletal growth, acting as a molecular switch between chondrocyte proliferation and differentiation. We predict that larger replication studies will demonstrate that common genetic variation at the PRKG2
locus does contribute to height variation in humans, but it seems possible that there will also be heterogeneity among studies.
Several newly identified loci associated with height are located near genes with less immediately apparent connections to stature, including the G protein-coupled receptor gene GPR126
, a locus that encompasses the thyroid hormone receptor interactor TRIP11
and the ataxin ATXN3
genes, a locus with the Huntingtin-interacting gene SH3GL3
and the glycoprotein metalloprotease gene ADAMTSL3
(the later often mutated in colon cancer31
), a locus with gene CHCHD7
, frequently fused to the PLAG1
oncogene in salivary gland adenomas32
, and the epidermal retinal dehydrogenase 2 gene RDHE2
. Because of LD (Supplementary Fig. 2
), it is possible that the causal alleles at these loci are not located in these genes; fine-mapping in larger cohorts or in populations of different ancestry may be required to pinpoint the relevant gene and functional variant(s). Alternatively, these genes may themselves influence height, and further work will be needed to elucidate the relevant pathways and mechanisms.
We note that the accompanying manuscript by Weedon et al
identifies association with height for several of the loci reported in our study (ZBTB38, HMGA2, GDF5-UQCC, HHIP, SH3GL3-ADAMTSL3, CDK6
), and reports, as we do, a suggestive association for a SNP at the FUBP3
= 7.5 × 10-7
in our study; P
= 2.0 × 10-5
in Weedon et al.
, a gene implicated in c-myc
regulation, is therefore likely to represent an additional locus associated with height.
The variants associated with height that we validated do not have strong enough effects to generate detectable linkage signals33
. Three of our loci lie under previously reported height linkage peaks8
, lod score 2.03; TRIP11-ATXN3
, lod score 2.01; and CDK6
, lod score 2.26. However, because 17.6% of the genome overlaps with a height linkage peak with lod score >2, the number of such co-localizations is not greater than expected by chance (3 observed versus 2.12 expected). It remains possible that some genes harbor both common and rare variants that influence height, so some overlaps may yet emerge between associated and linked loci that have a real genetic basis. Furthermore, regions of linkage may indicate the locations of rare variants or other types of genetic variation that are not well captured by our current association methods.
As expected, the estimated effect sizes in the GWA meta-analysis were generally larger than the effect sizes observed in replication samples, because of the well-known ‘winner’s curse’ phenomenon34
. Perhaps less well appreciated is that the magnitude of the winner’s curse effect depends on the underlying distribution of effect sizes: the greater the number of variants with small effects, the more likely it is that one or more of these variants will approach genome-wide significance even in a study that is not well powered to detect these very modest effects35
. Such variants will then prove difficult to convincingly replicate, unless very large replication cohorts are used. Thus, it is possible that even some of the initial associations that we failed to replicate will eventually be validated.
Given the modest effect sizes observed for the validated variants associated with height (; average = 0.4 cm per additional allele), it is not surprising that the quantile-quantile plots for the individual GWA studies are essentially indistinguishable from the null expectation (). Indeed, we calculate that a study of 3,000 unrelated individuals has 1% power to detect a variant (minor allele frequency 10%) that increases height by 0.4 cm at a statistical threshold of P
= 1 × 10-5
. In comparison, a study of 16,000 individuals has 72% power to identify the same variant (in fact, there is a slight loss in power when using meta-analytic methods to combine results). Our discovery of valid associations by combining individual studies with nearly null P
-value distributions highlights the importance of using large datasets to find common variants with small effects. When we remove the 12 validated height variants (and nearby correlated SNPs) from the meta-analysis results, the number of low P
values still exceeds the null expectation (, filled squares). Furthermore, the 10,000 SNPs with the best P
values also showed excess evidence of association in an independent meta-analysis18
, even when all loci known to be associated with height were excluded (). These results indicate that there are other associations with common alleles yet to be discovered, but that our meta-analysis is not sufficiently powered to identify these associations because the effect sizes are small.
Our results have several implications. First, they outline a role for multiple genes and biological pathways that were previously not known to regulate height, substantiating the ability of unbiased genetic approaches to yield new biological insights. The identification of these genes not only expands our knowledge of human growth but also promotes these genes as candidates for as yet unexplained syndromes of severe tall or short stature. Second, these findings convincingly confirm the polygenic nature of height, a classic complex trait, and demonstrate that, at least for this trait, increasingly large GWA studies can uncover increasing numbers of associated loci. Third, each variant makes only a small contribution to phenotypic variation (although determining the total contribution of each of the loci reported here requires much more comprehensive resequencing and genotyping); thus, either many hundreds of common variants influence complex traits such as height and/or other genetic contributors (for example, gene-gene or gene-environment interactions, rare variants with large effects, or uncaptured genomic features such as structural polymorphisms) will play a significant role. In particular, because the quality-control criteria used in the GWA studies analyzed here would have removed SNPs affected by copy number polymorphisms, we cannot conclude anything regarding the role of these variants on adult height. With the development of new platforms and improved analytical tools applicable to large cohorts, it is likely that the role of common structural variants on human complex traits such as adult height will soon be elucidated. Finally, if height is indeed a good model for other complex traits, these results suggest that large meta-analyses of GWA studies will provide insights not only into human growth but also into the underlying biological mechanisms of common disease.