|Home | About | Journals | Submit | Contact Us | Français|
Human stature, as an important physical index in clinical practice and a usual covariate in gene mapping of complex disorders, is a highly heritable complex trait. To identify specific genes underlying stature, a genome-wide association study was performed in 1000 unrelated homogeneous Caucasian subjects using Affymetrix 500K arrays. A group of seven contiguous markers in the region of SBF2 gene (Set-binding factor 2) are associated with stature, significantly so at the genome-wide level after false discovery rate (FDR) correction (FDR q = 0.034–0.042). Three SNPs in another SNP group in the Filamin B (FLNB) gene were also associated with stature, significantly so with FDR q = 0.042–0.048. In follow-up independent replication studies, rs10734652 in the SBF2 gene was significantly (P = 0.036) and suggestively (P = 0.07) associated with stature in Caucasian families and 1306 unrelated Caucasian subjects, respectively, and rs9834312 in the FLNB gene was also associated with stature in such two independent Caucasian populations (P = 0.008 in unrelated sample and P = 0.049 in family sample). Particularly, additional significant replication association signals were detected in Chinese, an ethnic population different from Caucasian, between rs9834312 and stature in 619 unrelated northern Chinese subjects (P = 0.017), as well as between rs10734652 and stature in 2953 unrelated southern Chinese subjects (P = 0.048). This study also provides additional replication evidence for some of the already published stature loci. These results, together with the known functional relevance of the SBF2 and FLNB genes to skeletal linear growth and bone formation, support that two regions containing FLNB and SBF2 genes are two novel loci underlying stature variation.
Human growth is a highly complicated process, and human height is often recorded as an important physical index to reflect the processes of growth and development in clinical practice. Adult stature has significant genetic determination, is relatively stable and can be measured easily and accurately with little phenotyping error. Consequently, it is commonly used as a classical example for the genetic study of complex traits, with the goal of providing novel insights into genetic mechanisms that may be generally applicable to other human complex traits/diseases.
The heritability of human stature is generally above 0.75 (1–3). Recent technological advances in single nucleotide polymorphism (SNP) genotyping, particularly microarray technology, have provided platforms that assay hundreds of thousands of SNPs simultaneously. These technological platforms provide powerful tools for genome-wide association scan (GWAS) studies to rapidly and systematically identify/confirm functional loci underlying human stature variation. Recent GWAS studies have identified several common genetic variants associated with human stature variation (4–8). Collectively, however, these genetic variants explain <10% of the population variation in stature. Other loci influencing variations of human stature are largely unknown.
To identify additional potential loci underlying variation of human stature, we performed a GWAS study on 1000 unrelated Caucasian subjects using highly dense Affymetrix 500K SNP arrays that examined ~500 000 SNPs with a relatively even distribution across the entire human genome. In this study, we identified two novel loci associated with human stature by the initial GWAS study and by follow-up replication studies with four independent populations.
The basic characteristics of subjects for the GWAS study are listed in Table 1. We created a quantile–quantile (Q–Q) plot for the distribution of P-values involving 379 319 eligible SNPs in our sample (Fig. 1). The observed P-values for stature matched the expected P-values over the range of 1 < −log10(p) < 4.0. The departure was observed at the extreme tail (−log10(p) > 4.0) of the distribution of test statistics for stature, suggesting that the associations identified are likely due to true variants rather than potential biases such as genotyping error.
We observed strong statistical signals (P < 10−4) of association for stature with 134 SNPs. Of these 134 SNPs, 108 were more or less randomly distributed across the genome. The remaining 26 SNPs, however, were relatively tightly clustered into two distinct regions, and these two regions will be the focus of the next several paragraphs. We found that a group of 18 contiguous SNPs (namely, SNP group 1) spanning a ~230 kb region harboring the set-binding factor 2 gene (SBF2) had a raw point-wise testing P < 0.001 ranging from 6.53 × 10−4 to 1.58 × 10−6. After false discovery rate (FDR) correction for multiple testing, seven contiguous SNPs (rs12288355, rs7119000, rs11042617, rs10734652, rs11042666, rs1867138 and rs2920151) were significantly (q = 0.034–0.042) associated with human stature (Table 2). We also found a distinct group of contiguous SNPs (namely, SNP group 2) in the region of Filamin B (FLNB) gene containing eight SNPs with p ranges from 5.94 × 10−4 to 1.04 × 10−5. Among them, three SNPs (rs1718460, rs1658342 and rs839232) were individually significantly associated with human stature, with genome-wide FDR q ranges from 0.042 to 0.048 (Table 2). We also compared the reported associations in Table Table22 calculated by HelixTree to those analyzed by PLINK using the same data set. Overall, the association signals from the two analysis softwares are consistent and comparable, except for a few SNPs having P value differences in an order of magnitude.
Figure 2 demonstrates the linkage disequilibrium (LD) patterns of 18 SNPs in SNP group 1 and their haplotype block structure, which were analyzed and plotted by the Haploview program (9). We found that LD signals within this SNP group were generally quite strong. The weak LD between rs1867138 and rs1372809 breaks the SNP group into two haplotype blocks (Fig. 2). The first block (namely block 1), containing 7 SNPs, covers a ~150 kb genomic region, and the other block (namely block 2), containing 11 SNPs, spans a ~80 kb region. As shown in Figure 2, the two haplotype blocks were highly significantly associated with human stature, with raw P-values of 3.99 × 10−6 and 5.22 × 10−7 for blocks 1 and 2, respectively. The SNPs in the region of FLNB gene (SNP group 2) also formed two haplotype blocks (namely, block 3 and block 4) that were strongly associated with human stature (P = 0.0012 for block 3 and P = 0.011 for block 4) (Fig. 3).
The most significant SNP, rs1867138 in block 1, is located in intron 1 of the SBF2 gene. The distribution of stature in Caucasians for different genotypes of rs1867138 in the GWAS study is shown in Figure 4. Stature of subjects who were homozygous TT at rs1867138 had greater stature (2.65 cm) than those who were homozygous CC.
For testing the potential population stratification of our sample, we randomly selected 200 unlinked markers to cluster our subjects, and found that all 1000 subjects were tightly clustered together. When 2000 and 10 000 markers were used, under all the assigned values of k, the vast majority (>98%) of subjects were tightly clustered together. The ‘inflation factor’ λ calculated by Genomic Control (10) is 1.007, indicating that potential population stratification in this homogeneous US Caucasian population is very minimal. We performed association analyses using the principal component analysis method implemented in EIGENSTRAT (11). The analyses by EIGENSTRAT confirmed, qualitatively, our main results presented above (Table 2).
Based on the strong statistical evidence presented above, along with biological evidence to be detailed in the Discussion, four SNPs (rs10734652, rs1867138, rs11607174 and rs9834312) were selected to test their replication association with stature in four additional and independent populations (Table 3). In these replication studies, rs10734652 in the SBF2 gene was significantly (P = 0.036) and suggestively (P = 0.07) associated with stature in Caucasian family population (n = 1972) and unrelated Caucasian population (n = 1306), respectively, and rs9834312 in the FLNB gene was also associated with stature in such two independent Caucasian samples (P = 0.008 in unrelated sample, and P = 0.049 in family sample). Particularly, additional significant replication association signals were detected in Chinese, an ethnic population different from Caucasian, between rs9834312 and stature in 619 unrelated northern Chinese subjects (P = 0.017), as well as between rs10734652 and stature in 2953 unrelated southern Chinese subjects (P = 0.048). All the association signals for each SNP in either initial GWAS or follow-up replication studies are in the same direction. In particular, although the allele with minor frequency for rs9834312 in Caucasian (allele A) versus Chinese (allele G) populations differed, which suggested differences in ethnic genetic backgrounds in such two populations, the direction of the association effect was still the same, i.e. subjects homozygous for allele A had lower stature than those with the GG genotype in both populations. Importantly, Fisher’s combined p analyses (12), which combined the P-values from association tests in the study populations, showed that rs10734652 and rs9834312 have significant combined P-values either from the entire five study populations or from the four replication study populations (Table 3).
Potential functional analyses using the FASTSNP program (13) suggested that 12 SNPs in the genomic region of SNP groups 1 and 2, (rs12288355, rs10734652, rs4323860 and rs1867138 in block 1 and rs1372809, rs11042702, rs7108358, rs6484147, 11042714, rs10500724, rs11042717 and rs11607174 in block 2), were possible transcriptional binding sites for intronic enhancers. Additionally, three SNPs in the genomic region of SNP group 2 (rs865726, rs839232 and rs3772993) may act as intronic enhancers. Therefore, these SNPs may regulate transcription by altering binding sites for transcription factors or by increasing or decreasing the affinity of binding for transcription factors.
Using the genotyped and imputed genotypes in our sample of 1000 Caucasian subjects with GWAS data, we analyzed the associations between stature and 58 variants identified in previous GWAS studies. We confirmed 13 variants associated with stature in our sample (e.g. rs1042725 in HMGA2 gene, P = 0.014), and the associated signal P-values ranged from 0.007 to 0.05 (Table 4). However, about 45 SNPs did not confirm their associations with the present study (Supplementary Material, Table S1).
The present GWAS study represents an effort to detect additional genes underlying human stature variation and replication evidence for previously identified stature loci. The most important result from this study is the finding that two genomic regions containing SBF2 and FLNB appear to be two novel loci that determine variation of stature. This study also provides additional replication evidence for some of the already published stature loci.
Previous GWAS studies of human stature have identified genetic variants that, collectively, explain <10% of the population variation in stature; thus most of the genetic basis for variations in stature remains unexplained. Results of the current GWAS study were able to confirm approximately a dozen previously identified genetic variants for their associations with stature (4–8), but a large number of previously identified genetic variants failed to be replicated for their associations with stature. Most of the previously published loci explained <0.3% of stature variation. Using a threshold P = 0.05, the statistical power in a sample size of 1000, estimated by the software Genetic Power Calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/qtlassoc.html), is <50% for detecting a gene that accounts for 0.3% of stature variation. Therefore, the most likely reason why so many loci did not replicate is a lack of robust statistical power. Another potential explanation for such failure to replicate is that some of the previously identified loci may be population specific.
We compared the associations for the four SNPs, which were selected for replication studies (listed in Table 3), to the freely available height association results, which were deposited by Wellcome Trust Case Control Consortium (WTCCC) and published on-line from the British 1958 Birth Cohort DNA Collection (www.b58cgene.sgul.ac.uk/), but the signals for these SNPs are negative [P = 0.92 (β = 0.0248), 0.88 (β = −0.0466), 0.61 (β = −0.1219) and 0.59 (β = −0.1352), and one-tailed P = 0.46, 0.56, 0.68 and 0.70, for rs10734652, rs1867138, rs11607174 and rs9834312, respectively]. There are many factors potentially resulting in lack of replication. First, the effect sizes of variants observed in our data may be very small and thus easily lead to failure of replication. Using the observed one-tailed P-values in the public data as thresholds, the estimated power (by the software Genetic Power Calculator) is >91.7 to replicate a variant that accounts for 0.3% of stature variation. However, our estimated effect sizes for the identified markers in our initial GWAS study are from 0.43 to 1.04% (Table 3). Therefore, the public data seem to suggest that our effect sizes are overestimated. Secondly, the difference in gene–gene or gene–environment interactions between the two data sets may result in inconsistency in replication. Thirdly, the association results for the four SNPs have a chance of 3.4–8% representing false positives, as the estimated FDR q values are 0.034, 0.034, 0.071 and 0.08 for rs10734652, rs1867138, rs11607174 and rs9834312, respectively, at the genome-wide levels (Table 2).
Since gender is an important factor influencing stature variation, we performed a gender-specific GWAS analyses (data not shown). The association results in the total sample can be generally replicated by the results in each gender group. However, the association signals are generally weaker, which may be largely due to the smaller sample sizes in gender-specific analyses, leading to lower statistical power. Additionally, in the association analyses we have used gender as an important covariate to correct for its potential confounding effects on human stature.
A cluster of eight contiguous SNPs in the FLNB gene had P-values between 5.94 × 10−4 and 1.04 × 10−5, and three of these SNPs were significantly associated with human stature. FLNB has previously been shown to regulate intracellular signaling pathways associated with skeletal development (14–16). Interestingly, mutations in the FLNB gene have been found to cause four human skeletal disorders (spondylocarpotarsal syndrome, autosomal dominant Larsen syndrome, type I atelosteogenesis and type III atelosteogenesis), characterized by a wide diversity of skeletal abnormalities, including short stature, block fusions, epiphyseal delay, disharmonious bone mineralization, etc. (14). Further, functional studies observed strong FLNB expression in condensing chondrocytes within vertebral bodies of sectioned embryos, and in the epiphyseal growth plate (14). Another study confirmed that mutations of FLNB may cause chondrocyte defects in skeletal development (15). These findings indicate that FLNB plays a pivotal role in vertebral patterning and skeletal morphogenesis. With further support from the current significant association results and from previous linkage results (17), it appears highly likely that FLNB is a novel gene involved in regulating human stature.
Another cluster of 18 contiguous SNPs in the region containing SBF2 had a raw point-wise testing P < 0.001, and each of seven contiguous SNPs in this region were significantly associated with human stature. SBF2 is a member of the myotubularin-related protein family. To the best of our knowledge, no studies have demonstrated direct relevance of SBF2 to bone growth or developmental processes, both apparently associated with stature. However, mutation of the SBF2 gene has been shown to cause an autosomal recessive Charcot–Marie–Tooth disease type 4B, characterized by foot deformities and distal muscle weakness and atrophy (18); this suggests possible involvement of the SBF2 gene in biological processes related to bone and muscle growth. This observation, together with the present significant association findings and linkage evidence from a previous study (17), supports the concept that the SBF2 gene is a novel candidate gene underlying human stature.
In summary, we identified two novel genomic regions of about 230 and 160 kb, containing the SBF2 and FLNB genes, which were both significantly associated with human stature in Caucasians. These associations were supported by four independent replication studies in Caucasian and even in Chinese, an ethnic population different from Caucasian. These results, together with the known functions of FLNB and SBF2 genes related to growth processes, support that the two regions containing FLNB and SBF2 genes are two novel loci underlying human stature. Based on our findings, further identification of potential causative variants in the two novel loci will be pursued via genotyping denser SNPs or re-sequencing the novel genomic region containing the genes, plus potential in-depth functional studies.
The study was approved by the necessary Institutional Review Board or Research Administration of the involved institutions. Signed informed-consent documents were obtained from all study participants before entering the study.
A total of 1000 random samples (age: 50.3 ± 18.3 years) were identified from our established and expanding database containing more than 7000 subjects. All of the identified subjects were US Caucasians of European origin, living in mid-western US in Omaha, NE.
Genomic DNA was extracted from whole human blood using a commercial isolation kit (Gentra systems, Minneapolis, MN, USA) according to the protocols of the kit. Genotyping with the Affymetrix Mapping 250 k Nsp and Affymetrix Mapping 250 k Sty arrays was performed at the Vanderbilt Microarray Shared Resource at Vanderbilt University Medical Center, Nashville, TN, using the standard protocol recommended by the manufacturer. Fluorescence intensities were quantified using an Affymetrix array scanner 30007G. Data management and analyses were performed using the Affymetrix GeneChip Operating System. Genotyping calls were determined from fluorescent intensities using the DM algorithm with a 0.33 P-value setting (19) as well as the B-RLMM algorithm. (20) DM calls were used for quality control while the B-RLMM calls were used for all subsequent data analysis. B-RLMM clustering was performed with 94 samples per cluster.
According to Affymetrix's guidelines, a DM call rate of 93% was used for a quality control (QC) criterion in our genotyping experiment. Specifically, subjects with a DM call rate <93% were subject to re-genotyping. Finally, 99% of all the subjects passed this QC standard. The final average BRLMM call rate across the entire sample reached the high level of 99.14%. However, out of the initial full-set of 500 568 SNPs, we discarded 32 961 SNPs with sample call rates <95%, another 36 965 SNPs with allele frequencies deviating extremely from Hardy–Weinberg equilibrium (P < 0.001) and 51 323 SNPs with minor allele frequency (MAF) <1%. Therefore, the final SNP set maintained in the subsequent analyses contained 379 319 SNPs, yielding an average marker spacing of ~7.9 kb throughout the human genome.
For the replication studies, the SNP genotyping success rate was >97%, and the duplicate concordance rate was >99% in each individual study.
Gender and age, two significant covariates, were used to adjust the raw stature values for subsequent analyses. HelixTree 5.3.1 (Golden Helix, Bozeman, MT) was used to perform genotypic association analyses and haplotype association analyses. Genotypic association analyses were used to compare the difference of mean stature values among three genotypic groups for each SNP. Haplotype association or block association detected the different mean stature values among haplotype groups formed from a group of SNPs.
The LD [standardized D’ (D/Dmax)] patterns for genes of interest were analyzed and plotted using the Haploview program (9) (http://www.broad.mit.edu/mpg/ haploview/). Focused association analyses on certain interesting SNPs, as well as other statistical analyses, were performed using the software packages SAS (SAS Institute Inc., Cary, NC) and Minitab (Minitab Inc., State College, PA).
We used the software QVALUE (http://genomine.org/qvalue/) (23) to calculate a FDR-based q value to measure the statistical significance at the genome-wide level for association results. The cutoff of significant association at the whole genome level was set at q value <0.05. We did not use Bonferroni correction in this GWAS study because it is overly conservative for multiple-testing adjustment in a GWAS study.
To detect population stratification that may lead to spurious association results, we used the software Structure 2.2 (http://pritch.bsd.uchicago.edu/software.html) to investigate the potential substructure of our sample. The program uses a Markov chain Monte Carlo algorithm to cluster individuals into different cryptic sub-populations on the basis of multi-locus genotype data (24). To ensure the robustness of our results, we performed nine independent analyses under each combination of two conditions that included the assumed number of population strata, k, (set at 2, 3 or 4), and three different sets of un-linked markers (containing 200, 2000 or 10 000 markers) selected randomly genome-wide. EIGENSTRAT was employed to perform principal component analysis to correct for stratification in genome-wide association studies. We used ~370 000 SNPs to calculate the principal components and the 10 default main eigenvectors were used in the association analysis with the EIGENSTRAT program (11).
Different genotyping platforms were used in our current GWAS study and in previous GWAS studies for stature. To compare the associations at the same SNPs, we impute genotypes for ~2 500 000 HapMap SNPs in our GWAS sample based upon a set of known haplotypes and an estimated fine-scale recombination map using the program IMPUTE (25). The imputed genotype for each SNP is expressed as genotype probability. Association analysis was performed between the imputed SNPs (genotype dosage score) and stature using sex and age as covariates using a program called SNPTEST (25).
In the Caucasian family sample, we conducted the family-based association test (FBAT) (www.biostat.harvard.edu/~fbat/) of the SNPs identified in the GWAS for their association with the quantitative trait stature residuals adjusted by significant covariates (age and sex). In other replication study populations, we tested the associations between genotypes and stature by the likelihood ratio test and the Wald test using age and sex as covariates. Fisher’s combined p method (12) was used to combine our association tests done individually using the formula:
where 2k is the degree of freedom of the X (2) statistic and k is the number of tests being combined.
Conflict of Interest statement. None declared.
Investigators of this work were partially supported by grants from the NIH (R01 AR050496-01, R21 AG027110, R01 AG026564, and P50 AR055081). The study also benefited from grants from the National Science Foundation of China (30600364), Huo Ying Dong Education Foundation, HuNan Province, Xi’an Jiaotong University and the Ministry of Education of China. We acknowledge use of genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02.