PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bmcprocBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Proceedings
 
BMC Proc. 2011; 5(Suppl 9): S34.
Published online 2011 November 29. doi:  10.1186/1753-6561-5-S9-S34
PMCID: PMC3287870
Estimating heritability using family and unrelated individuals data
Priya B Shetty,1 Huaizhen Qin,1 Junghyun Namkung,1 Robert C Elston,1 and Xiaofeng Zhucorresponding author1
1Case Western Reserve University School of Medicine, 2103 Cornell Road, Cleveland, OH 44106, USA
corresponding authorCorresponding author.
Priya B Shetty: priya.shetty/at/case.edu; Huaizhen Qin: hxq21/at/case.edu; Junghyun Namkung: jxn138/at/case.edu; Robert C Elston: robert.elston/at/case.edu; Xiaofeng Zhu: zhu1/at/darwin.epbi.cwru.edu
Supplement
Genetic Analysis Workshop 17: Unraveling Human Exome Data
S Ghosh, H Bickeböller, J Bailey, JE Bailey-Wilson, R Cantor, W Daw, AL DeStefano, CD Engelman, A Hinrichs, J Houwing-Duistermaat, IR König, J Kent Jr., N Pankratz, A Paterson, E Pugh, Y Sun, A Thomas, N Tintle, X Zhu, JW MacCluer and L Almasy
Conference
Genetic Analysis Workshop 17
13-16 October 2010
Boston, MA, USA
For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1 and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using the family data and the unrelated individuals data.
The heritability of a trait is usually calculated using family data. The identified genetic variants found through genome-wide association studies account for only a small portion of heritability for most complex traits [1] compared with the heritability estimated from family data. This discrepancy in the estimates, the missing heritability, is of great interest because the sources of this difference are still unknown [1]. Recently, Yang et al. [2], using a novel statistical method, suggested that the missing heritability can be recovered using the genome-wide associations of unrelated samples [2]. Because the Genetic Analysis Workshop 17 (GAW17) data set included family data and unrelated individuals data for the same traits [3], we estimated the “heritability” of Q1 with the unrelated individuals data and estimated the “heritability” of Q1 and Q4 with the family data.
For the family data, the heritability is the narrow sense heritability, estimated with the polygenetic effect model; we conducted a George-Elston transformation [4] to estimate the heritability. For the unrelated data, the heritability is the proportion of the total variance in a phenotype that can be described by all single-nucleotide polymorphisms (SNPs) under an additive model; we estimated it using the ordinary least-squares (OLS) method suggested by Yang et al. [2], a naïve restricted maximum-likelihood (REML) method, and a calibrated REML method. In all our analyses, the heritability estimates were obtained after adjustments for age, sex, and smoking status.
PEDINFO and ASSOC
For the family data, we chose to use quantitative traits Q1 and Q4 of four randomly selected data set replicates (Table (Table1).1). We used the Statistical Analysis for Genetic Epidemiology (S.A.G.E.) software and the PEDINFO and ASSOC programs. The PEDINFO program calculates summary statistics about the family data set. The ASSOC program performs a family-based association test using a polygenic mixed effect model for a quantitative trait, and it estimates the heritability through the estimation of the proportion of a polygenic component to the total trait variance. In our analysis, the heritability estimates were obtained after adjustments for age, sex, and smoking status. The George-Elston transformation was applied for normality of residual distribution [4]. We did not include any genotype variables in the model.
Table 1
Table 1
Heritability estimates for Q1 and Q4 using the family data
OLS and REML estimates
For the unrelated data, we used the OLS method suggested by Yang et al. [2] and the two REML methods to estimate the heritability of Q1 with all 200 data set replicates. Here, the heritability refers to the proportion of the variance in Q1 that can be accounted for by all SNPs under an additive model [2]. We fitted the mixed effects model:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i1.gif
(1)
where An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i2.gif consists of trait values of n unrelated individuals, An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i3.gif, where xi = (xi1, …, xi3) consists of the sex, age, and smoking status of the ith individual, respectively, An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i4.gif consists of the effect sizes of the covariates, An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i5.gif summarizes genotype data of m unknown causal variants such that zi = (zi1, …, zim), and An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i6.gifAn external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i7.gif, or An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i8.gif if the genotype of the ith individual at the jth causal variant is aa, aA, or AA, respectively, fj is the frequency of allele A and An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i9.gif Here the prime indicates the transpose of a vector or matrix.
Let the effects of m causal variants be:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i10.gif
(2)
where An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i11.gif is the variance and the residuals be:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i12.gif
(3)
where An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i13.gif is the residual variance, In is the identity matrix of order n,
Then the variance-covariance matrix of y is:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i14.gif
(4)
where An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i15.gif is the genetic relationship matrix of causal SNPs and An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i16.gif. Let X have the rank r (=4 for the GAW17 unrelated individuals data), and let An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i17.gif where An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i18.gif are all orthogonal eigenvectors corresponding to eigenvalue 1 of idempotent matrix An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i19.gif. Let An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i20.gif, An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i21.gif, and An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i22.gif. It follows that:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i23.gif
(5)
where:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i24.gif
(6)
and
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i25.gif
(7)
Note that
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i26.gif
(8)
Thus the slope and intercept of the regression of:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i27.gif
(9)
on (pipj)′G(pipj) are An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i28.gif and An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i29.gif, respectively. Because G is unknown, it is replaced with an estimate. One naïve estimate is A, the genetic relationship of genome-wide SNPs. Yang et al. [2] established an unbiased estimate A* for G by calibrating the prediction error of genetic relationship G of unobserved causal SNPs. Replacing G with A* in the regression, we can estimate the heritability as:
A mathematical equation, expression, or formula.
 Object name is 1753-6561-5-S9-S34-i30.gif
(10)
Because this estimate is based on OLS, it does not need iteration. By replacing G with A and A* in the model given by An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i31.gif, we can constructed the naïve and calibrated REML estimates by maximizing the likelihood of An external file that holds a picture, illustration, etc.
Object name is 1753-6561-5-S9-S34-i32.gif
Heritability estimates using the family data
In the family data, 697 individuals (202 founders and 495 nonfounders) form eight pedigrees. The pedigrees all have four generations of family members and a mean size of 87.13 individuals (range, 73–128). The pedigrees include 194 sibships with a mean size of 2.55 (range, 1–9). In the four randomly selected replicates, the heritability estimates for Q1 ranged from 0.608 to 0.698 with an average of 0.650; the heritability estimates for Q4 ranged from 0.687 to 0.773 with an average of 0.745 (Table (Table11).
Heritability estimates using the unrelated individuals data
The unrelated individuals data consist of genotypes of 24,487 SNPs and 200 replicates of 697 individuals for Q1. The OLS estimates of the heritability were apparently unstable (Figure (Figure1),1), because many of them were outside the interval [0, 1]. We computed the mean and standard deviation of all 200 heritability estimates, including those greater than 1 or less than 0. Over the 200 replicates, the average heritability estimate for Q1 was μ = 0.555 with standard deviation σ = 0.480 after correcting for age, sex, and smoking status.
Figure 1
Figure 1
OLS estimates of the heritability of Q1. The estimates at many of the 200 replicates were greater than 1 or less than 0. Over the 200 estimates, the average heritability estimate for Q1 was μ = 0.5549 with standard error σ = 0.4803.
We found that the REML estimates for Q1 were more stable than estimates obtained using the OLS method (Figure (Figure2).2). After accounting for age, sex, and smoking status, the 200 naïve REML estimates yielded an average heritability estimate of 0.462 ± 0.999, and the calibrated REML estimates yielded an average heritability estimate of 0.5351 ± 0.1206 for Q1.
Figure 2
Figure 2
REML estimates of heritability of Q1. (a) The relationship A of genome-wide SNPs was used to estimate the relationship G at unobserved causal SNPs. Over the 200 replicates, the average heritability estimate was μ = 0.4618 with standard error (more ...)
We were unable to obtain REML estimates for Q4 because the convergence rate of the REML was extremely slow. We found that the convergence of the REML failed because no SNP contributed any phenotypic variation in the simulated model [3].
In our analyses, we estimated heritability using both the family data for Q1 and Q4 and the unrelated individuals data for Q1. The heritability estimates for Q1 and Q4 using the family data appeared stable and reasonable. In the simulation, Q1 has a heritability of 0.575, where 0.135 is due to the 39 causal SNPs and 0.440 is due to a polygenic component, and Q4 has a heritability of 0.70 resulting from a polygenic effect. The mean heritability estimates for Q1 and Q4 with the family data were 0.650 and 0.745, respectively.
The heritability estimates using the unrelated individuals data seem less reasonable. The OLS method did not work well for the GAW17 unrelated individuals data because the method was designed for genome-wide common SNPs. In the GAW17 unrelated individuals data, most of the SNPs are rare variants and a few of them are causal variants. The genetic relationships estimated using many rare variants may be unreliable, and this results in the instability of the OLS estimates. The REML approaches appear to be more stable than the OLS method for Q1. We observed that the heritability estimates using the unrelated individuals data were less than those using the family data on average. For example, the mean of the heritability estimates for Q1 for the unrelated individuals data was 0.462 (by naïve REML), which was 0.188 less than the mean for the family data. One possible reason is that the polygenic component (0.440) in Q1 is not due to any SNPs in the GAW17 sequence data set. We should not be able to uncover the polygenic effect using unrelated samples. However, the mean naïve REML estimate (0.462) is much larger than the heritability because of the causal SNPs (0.135). The reason is that we used all 24,487 SNPs to estimate the relationships among individuals. There might be other sources contributing to the heritability estimates.
Finally, we failed to estimate the heritability for Q4 using the unrelated samples because of the convergence problem, which was the result of no genotyped exonic SNPs in the data contributing to the phenotypic variation.
Competing interests
The authors declare that there are no competing interests.
Authors’ contributions
PBS performed the statistical analysis of family data and HQ performed the statistical analysis of the unrelated individuals data. PBS , HQ, JN and XZ drafted and revised the manuscript. XZ conceived the project, RCE criticized and edited the manuscript. All authors read and approved the final manuscript.
Acknowledgments
The Genetic Analysis Workshop is supported by National Institutes of Health (NIH) grant R01 GM031575 from the National Institute of General Medical Sciences. This work was supported by National Cancer Institute grant P30 CAD43703 and NIH grants HL074166, HL086718, R01 HG003054 and R01 HG005854. Some of the results of this paper were obtained by using the program package S.A.G.E., which is supported by U.S. Public Health Service Resource Grant RR03655 from the National Center for Research Resources. We thank the other members of Xiaofeng Zhu’s laboratory for their critiques and comments.
This article has been published as part of BMC Proceedings Volume 5 Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/5?issue=S9.
References
  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A. et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [PMC free article] [PubMed] [Cross Ref]
  • Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW. et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [PMC free article] [PubMed] [Cross Ref]
  • Almasy LA, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE, Blangero J. Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2011;5(suppl 9):S2. [PMC free article] [PubMed]
  • George V, Elston RC. Generalized modulus power transformations. Commun Stat Theory Meth. 1988;17:2933–2952. doi: 10.1080/03610928808829781. [Cross Ref]
Articles from BMC Proceedings are provided here courtesy of
BioMed Central