|Home | About | Journals | Submit | Contact Us | Français|
Traditionally, family-based samples have been used for genetic analyses of single-gene traits caused by rare but highly penetrant risk variants. The utility of family-based genetic data for analyzing common complex traits is unclear and contains numerous challenges. To assess the utility as well as to address these challenges, members of Genetic Analysis Workshop 16 Group 15 analyzed Framingham Heart Study data using family-based designs ranging from parent–offspring trios to large pedigrees. We investigated different methods including traditional linkage tests, family-based association tests, and population-based tests that correct for relatedness between subjects, and tests to detect parent-of-origin effects. The analyses presented an assortment of positive findings. One contribution found increased power to detect epistatic effects through linkage using ascertainment of sibships based on extreme quantitative values or presence of disease associated with the quantitative value. Another contribution found four SNPs showing a maternal effect, two SNPs with an imprinting effect, and one SNP having both effects on a binary high blood pressure trait. Finally, three contributions illustrated the advantage of using population-based methods to detect association to complex binary or quantitative traits. Our findings highlight the contribution of family-based samples to the genetic dissection of complex traits.
In recent years, the effort to identify genes affecting common diseases and complex traits has been accelerated through the use of genome-wide association studies (GWAS). The most popular and straightforward design for whole-genome association studies is undoubtedly the independent subjects (case-control) design. Sampling independent subjects requires less ascertainment cost and time [Baron, 2001]. The most notable drawback of this design is that it is susceptible to confounding due to population stratification. Furthermore, there can always be cryptic relatedness in the sample, especially among the cases.
The use of family-based samples has advantages over independent population samples. The primary advantage of family-based association studies is the robustness of the design to the effects of population stratification. Second, familial cases are from an enriched familial set and therefore may be more informative for genetic research [Antoniou and Easton, 2003]. Indeed, the frequency of the causative polymorphism is expected to be higher among familial than among unselected cases, therefore increasing the likelihood of detecting association. Several family-based samples may already have been collected for linkage studies, and some samples may have contributed to identifying chromosomal regions with positive linkage to the disease. Using cases from such family-samples may be a powerful design. Indeed, the strength of the genetic effects that are underlying the linkage signals should be, in principle, substantial and the easiest to detect through association methods, unless there is allelic heterogeneity. Finally, family-based designs allow for the genetic analyses of complex traits that cannot be done using unrelated individuals, such as testing for parent-of-origin effects, testing whether a genetic variant is inherited or de novo, performing combined linkage and association analysis, and controlling for the effects of shared environment.
The traditional family-based association methods, such as the transmission-disequilibrium test (TDT), are robust but lack power because these tests only use the informative subset of family data. In contrast, the population-based tests that incorporate between-family information for family data may be more powerful, but have the same weakness as the usual population-based association methods using independent subjects.
Here, we summarize Genetic Analysis Workshop 16 (GAW16) contributions that investigate methods to address some challenges, as well as to highlight the advantages, of using family-based samples in the genetic dissection of complex traits. All five contributions used Framingham Heart Study (FHS) pedigree sample data (real or simulated) to examine varying ascertainment criteria in the context of a quantitative linkage analysis [Huang et al., 2009]; to propose a new method for detecting causative variants with imprinting and/or maternal effects [Yang and Lin, 2009]; to compare association methods in family data for quantitative traits [Saint Pierre et al., 2009]; and to extend existing association methods for genetic analyses of dichotomous traits [Knight et al., 2009; Uh et al., 2009]. No group performed genetic testing at the genome-wide level due to time limitations. All studies limited the investigation either to all single-nucleotide polymorphisms (SNPs) from a given chromosome [Knight et al., 2009; Uh et al., 2009; Yang and Lin, 2009] or to the functional variants [Huang et al., 2009; Saint Pierre et al., 2009]. All studies concluded that family-based approaches are useful for dissecting genetic determinism of complex disorders.
A full description of the GAW16 Problem 2 (real) and Problem 3 (simulated) data is provided in the GAW16 proceedings [Cupples et al., 2009; Kraja et al., 2009]. Briefly, the FHS dataset included pedigree, genotype, and phenotype data. Demographic data and a subset of phenotypes from FHS and traditional risk factors for coronary heart disease were provided. Out of 7,130 subjects with phenotype data, 6,879 were members of 780 pedigrees, and 251 were unrelated. Dense SNP genotyping data was available from two chips: the Human Mapping 500k Array Set and the 50k Human Gene Focused panel. A total of 6,834 subjects were genotyped, and 6,583 of the genotyped subjects were members of pedigrees. The GAW16 Problem 3 dataset was generated under a semi-simulated approach: FHS pedigree and genotype data were kept as given in the real FHS data and phenotypes only were simulated on the observed genetic variation. Several quantitative traits related to lipid metabolism and one disease qualitative trait were simulated under complex genetic models. Problem 3 included 200 replicates of FHS pedigrees and unrelated subjects.
Table I contains a summary of the methods and results for the five contributions. The contributions of Group 15 can be separated into two categories. The first category includes the two contributions that used family-based methods to perform genetic analyses beyond the usual simple main-effects models. One of these contributions used the simulated FHS data to examine power of a multivariable linkage test for a quantitative trait under varying sampling schemes [Huang et al., 2009]. The second contribution developed a model to identify polymorphisms with imprinting or maternal effects on a qualitative outcome and applied the new approach using the real FHS data [Yang and Lin, 2009].
The second category includes three contributions that focused on ways to improve the power of association analysis when using family samples. Using the simulated FHS data, Saint Pierre et al.  estimated the power of several family-based association methods for two quantitative traits. Two other papers compared different case-control association analysis techniques that are corrected to account for related individuals for qualitative outcomes in the real FHS data [Knight et al., 2009; Uh et al., 2009]. The last two papers also proposed new extensions.
Huang et al.  studied coronary artery calcification (CAC), a quantitative disease endophenotype for myocardial infarction (MI), to assess the power of linkage analysis using three family selection designs: 1) randomly chosen nuclear families; 2) selection of nuclear families through a proband with a CAC value in the top 10%; and 3) selection of nuclear families through a disease affected proband, whose offspring has had an MI event. Univariate and multivariate linkage analyses, which allowed for the consideration of epistasis, were conducted with the Haseman-Elston regression method [Haseman and Elston, 1972].
Yang and Lin  studied imprinting and maternal effects simultaneously for the binary high blood pressure trait. The existing statistical methods that differentiate these two effects using family-based data from retrospective studies only use data on affected siblings [Weinberg, 1999; Weinberg et al., 1998; Wilcox et al., 1998]. Alternatively, these authors proposed a new likelihood-based method that models genotypes and offspring's disease status jointly to detect both imprinting and maternal effects simultaneously using data from both affected and unaffected siblings in nuclear families in prospective studies. It also incorporates possible heterogeneity of maternal effects by adding a random component on the link scale of the penetrance.
The purpose of these three contributions was to examine ways to improve the power of association analysis when using family samples. Improving power is a concern when using family-based samples because traditional family-based tests use only a subset of family data, so the population-based association tests may have less power when accounting for the use of correlated data. Not adjusting for the correlation has been found to effect the type I error rates and this effect increases as the size of the family and the trait heritability increase [McArdle et al., 2007]. Association methods that account for the relatedness of familial data fall into two broad categories – family-based association analysis, in which the unit of interest is a family unit (e.g., TDT, quantitative TDT (QTDT), and family-based association tests (FBAT)) – and population-based association analysis, in which the unit of interest is the individual, and is adjusted to account for relatedness (e.g., measured genotype, generalized estimating equations, weighted likelihood approaches, and variance-corrected Cochran-Armitage trend test).
Saint Pierre et al.  examined power and type I error rates of three association tests using family data. They evaluated two family-based association tests: QTDT [Abecasis et al., 2000] and its modification, the quantitative linkage disequilibrium test (QTLD) [Havill et al., 2005], which use information about transmission of alleles and are based on the orthogonal decomposition of the marker effects. They also studied a population-based association test, a measured genotype (MG) test [Boerwinkle et al., 1986] that accounts for relatedness among subjects through estimation of residual polygenic effects. All three approaches, QTDT, QTLD, and MG, were applied to the association analysis of quantitative traits in extended pedigrees, but they differ in the amount and type of marker information used for testing association.
Knight et al.  proposed a new method that builds on the previously proposed idea of assigning weights to individuals in pedigrees for use in analysis [Browning et al., 2005]. Browning's method uses a pairwise measurement of sharing, kinship coefficients, to assign weights to pedigree cases, while Knight et al.  used simulation to determine individual weights based on the average simultaneous sharing of individuals in the pedigree. They compared the Cochran-Armitage test for trend p-values using both weighting algorithms, a naïve approach (assuming independence for all observations) and empirical results (considered as the gold standard).
Uh et al.  extended the MQLS proposed by Thornton and McPeek  which also uses phenotype information from non-genotyped relatives to up-weight genotyped relatives. The original MQLS, an allelic test, was extended to be used in genotypic testing assuming a multiplicative model (gMQLS) [Sasieni, 1997]. To examine X-linked traits, Uh et al.  used an allelic MQLS test stratified by sex (because males would contribute one allele and females two alleles) and then combined the chi-square statistic to form a two-degree of freedom test (xMQLS). The authors compared the results of these modified tests to generalized estimating equations (GEE), variance-adjusted trend test, and naïve analyses.
Huang et al.  found that, based on comparison of the mean square root of the LOD scores, no sampling design had the greatest power for the univariate analyses. However, under multivariate linkage analyses, the two selected designs (selection of nuclear families through a proband with a CAC value in the top 10% quartile and through a disease-affected proband whose offspring have had an MI event) showed similar power and was much more powerful than the non-selection design, especially for detecting linkage of an epistatic factor.
Yang and Lin  scanned 230k SNPs on chromosomes 1 to 6 and detected nine SNPs that may be associated with high blood pressure through minor allele, imprinting, or maternal effect. For SNPs that have significant minor allele effect, they further looked at the direction-inferred maternal or paternal imprinting effect. After maternal effects estimates were found significant, heterogeneity of the maternal effects were tested. The minimum Akaike information criterion (AIC) was then used to determine whether the maternal effect is heterogeneous. Five SNPs were detected to have varying degrees of maternal effects, of which three appear to be heterogeneous among the families and one has a simultaneous heterogeneous maternal effect and imprinting effect. They also reported that the association between the nine detected SNPs and blood pressure has been established either in human- or in rat- based studies on the Genetic Association Studies of Complex Diseases and Disorders section of the Genome Browser [Kent et al., 2002].
The three association tests (QTDT, QTLD, and MG) were found to have similar type I error rates, and in general, these rates were lower or close to the nominal values. Interestingly, in these data, departure from normality did not yield inflated error rates, except in a few instances and with QTDT. Across the three association models, the power was the lowest for the functional SNP with smallest size effects and for the less heritable trait. The direction of the association parameters was found to be consistent across the three association models. While the authors noted that the effective sample sizes varied little across the tested variants, large power drops and marked differences in performances of the models were observed. Overall, the results showed that MG outperformed the two orthogonal-based association models (QTLD, QTDT) even after accounting for population stratification. QTDT had the lowest power rates.
Knight et al.  found that the two weighting algorithm results were similar, yet the new weighting algorithm results had a higher correlation with empirical results than the Browning method. Results using a naïve approach, in which the pedigree cases and controls were treated as independent, always were anti-conservative compared with the empirical results. This would result in an inflated type I error rate. However, the naïve approach results were highly correlated (r=0.99) with empirical results. The high correlation and unidirectional relationship between the two results suggests that the naïve method can be used in the first pass of a two-stage design. The first-stage results below a lowered significant threshold can be followed by a second-stage analysis that can account for the familial correlation to determine accurate significance.
Uh et al.  found that for the autosomal SNPs the MQLS results were similar to those found by using GEE and variance corrected Cochran-Armitage trend test. For all three tests, the analyses using nuclear families had increased significance over analyses using only the sibship data. This might be due to the increase in the effective sample size. Although only a small number of parents (n=323) were added, the proportion of cases added (20%) was relatively large compared with that in the sibling only data (6%). In this, the gMQLS test might be more efficient because it incorporates all phenotypic information available, including un-genotyped parents with coronary heart disease. For the X-linked analysis, the MQLS gave highly significant results (p<10-6) compared with the GEE analysis and PLINK analysis.
Our group has shown that family-based approaches are useful designs and can make important contributions to genetic analyses that could not be made using independent samples. These contributions include detecting epistatic linkage effects and identifying imprinting and parent-of-origin effects. Furthermore, we have shown ways to address the challenges of family-based designs effectively through the development and modification of statistical association methods. For any genetic study it is crucial to find ways to improve power. Our group found that the use of population-based approaches improved the power of family-based designs. For example, MG had an increase in power over two orthogonal-based tests, as quantified using the simulated Problem 3 data [Saint Pierre et al., 2009]. Both Knight et al.  and Uh et al.  showed that their modifications and extensions of existing methods did result in increased significance using the real FHS data. While it is possible that this increase in significance might also lead to an increase in type I error, the authors' methods had similar results to previously validated methods. Further evaluations of these methods are needed.
Group 15 contributions suggest that population-based approaches may be a powerful tool to analyze family-based samples. However, these methods may remain sensitive to the presence of population stratification, as in the case of unrelated data. There are ways to adjust for population stratification. For instance, methods developed for analyzing unrelated samples can be applied to family data [Kathiresan et al., 2009]. Only one of our contributions attempted to account for this stratification effect. Saint Pierre et al.  accounted for it by testing whether there was a significant difference between the within and between components. They found the MG test still to be the most powerful. Note, however, that in these GAW16 simulated data, Saint Pierre et al.  found minimal population stratification. Thus, it remains unclear whether the outperformance of the MG test will still be observed in samples with substantial admixture across the pedigrees. Clearly, more work is needed to evaluate the use of such approaches in the context of family samples with hidden population stratification and admixture.
One of the limitations of our contributions is the lack of a complete genome-wide scan. However, all of the approaches used are suitable for GWAS. In one of the papers [Uh et al., 2009], a two-stage design is used for the purpose of dimension reduction to decrease computation time. In the first stage, a naïve method was used to select the set of markers with lowest p-values as potential candidate markers. In the second stage, the candidate markers were properly tested by accounting for the correlation in the data. A similar approach was also suggested by Knight et al. . They identified a high correlation and unidirectional relationship with their empirical results.
In conclusion, family-based samples allow for analyses, such as linkage or parent-of-origin effects, that would only be possible with family data [Huang et al., 2009; Yang and Lin, 2009]. They are also able to identify functional variants with complex and weak effects, through linkage or association tests [Huang et al., 2009; Saint Pierre et al., 2009]. There is, however, a need to examine the sensitivity of these population-based association tests to the existence of population stratification in family-samples and to further develop methods to correct for population stratification. It was also clear that further work is needed to fully investigate the feasibility for GWAS using family data. Despite the need for future research, our main conclusion is that family-based samples are very appealing for the genetic dissection of complex traits.
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Group 15 primary contributing authors include: C. Huang, S. Knight, S. Lin, M. Martinez, N.R. Mendell, A. Saint Pierre, H.-W. Uh, and J. Yang.