PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Genet Epidemiol. Author manuscript; available in PMC 2010 October 25.
Published in final edited form as:
Genet Epidemiol. 2009; 33(Suppl 1): S99–104.
doi:  10.1002/gepi.20480
PMCID: PMC2962938
NIHMSID: NIHMS219641

Summary of Contributions to GAW Group 15: Family-Based Samples Are Useful in Identifying Common Polymorphisms Associated with Complex Traits

Abstract

Traditionally, family-based samples have been used for genetic analyses of single-gene traits caused by rare but highly penetrant risk variants. The utility of family-based genetic data for analyzing common complex traits is unclear and contains numerous challenges. To assess the utility as well as to address these challenges, members of Genetic Analysis Workshop 16 Group 15 analyzed Framingham Heart Study data using family-based designs ranging from parent–offspring trios to large pedigrees. We investigated different methods including traditional linkage tests, family-based association tests, and population-based tests that correct for relatedness between subjects, and tests to detect parent-of-origin effects. The analyses presented an assortment of positive findings. One contribution found increased power to detect epistatic effects through linkage using ascertainment of sibships based on extreme quantitative values or presence of disease associated with the quantitative value. Another contribution found four SNPs showing a maternal effect, two SNPs with an imprinting effect, and one SNP having both effects on a binary high blood pressure trait. Finally, three contributions illustrated the advantage of using population-based methods to detect association to complex binary or quantitative traits. Our findings highlight the contribution of family-based samples to the genetic dissection of complex traits.

Keywords: Linkage, association, parent-of-origin effect

Introduction

In recent years, the effort to identify genes affecting common diseases and complex traits has been accelerated through the use of genome-wide association studies (GWAS). The most popular and straightforward design for whole-genome association studies is undoubtedly the independent subjects (case-control) design. Sampling independent subjects requires less ascertainment cost and time [Baron, 2001]. The most notable drawback of this design is that it is susceptible to confounding due to population stratification. Furthermore, there can always be cryptic relatedness in the sample, especially among the cases.

The use of family-based samples has advantages over independent population samples. The primary advantage of family-based association studies is the robustness of the design to the effects of population stratification. Second, familial cases are from an enriched familial set and therefore may be more informative for genetic research [Antoniou and Easton, 2003]. Indeed, the frequency of the causative polymorphism is expected to be higher among familial than among unselected cases, therefore increasing the likelihood of detecting association. Several family-based samples may already have been collected for linkage studies, and some samples may have contributed to identifying chromosomal regions with positive linkage to the disease. Using cases from such family-samples may be a powerful design. Indeed, the strength of the genetic effects that are underlying the linkage signals should be, in principle, substantial and the easiest to detect through association methods, unless there is allelic heterogeneity. Finally, family-based designs allow for the genetic analyses of complex traits that cannot be done using unrelated individuals, such as testing for parent-of-origin effects, testing whether a genetic variant is inherited or de novo, performing combined linkage and association analysis, and controlling for the effects of shared environment.

The traditional family-based association methods, such as the transmission-disequilibrium test (TDT), are robust but lack power because these tests only use the informative subset of family data. In contrast, the population-based tests that incorporate between-family information for family data may be more powerful, but have the same weakness as the usual population-based association methods using independent subjects.

Here, we summarize Genetic Analysis Workshop 16 (GAW16) contributions that investigate methods to address some challenges, as well as to highlight the advantages, of using family-based samples in the genetic dissection of complex traits. All five contributions used Framingham Heart Study (FHS) pedigree sample data (real or simulated) to examine varying ascertainment criteria in the context of a quantitative linkage analysis [Huang et al., 2009]; to propose a new method for detecting causative variants with imprinting and/or maternal effects [Yang and Lin, 2009]; to compare association methods in family data for quantitative traits [Saint Pierre et al., 2009]; and to extend existing association methods for genetic analyses of dichotomous traits [Knight et al., 2009; Uh et al., 2009]. No group performed genetic testing at the genome-wide level due to time limitations. All studies limited the investigation either to all single-nucleotide polymorphisms (SNPs) from a given chromosome [Knight et al., 2009; Uh et al., 2009; Yang and Lin, 2009] or to the functional variants [Huang et al., 2009; Saint Pierre et al., 2009]. All studies concluded that family-based approaches are useful for dissecting genetic determinism of complex disorders.

GAW 16 Data

A full description of the GAW16 Problem 2 (real) and Problem 3 (simulated) data is provided in the GAW16 proceedings [Cupples et al., 2009; Kraja et al., 2009]. Briefly, the FHS dataset included pedigree, genotype, and phenotype data. Demographic data and a subset of phenotypes from FHS and traditional risk factors for coronary heart disease were provided. Out of 7,130 subjects with phenotype data, 6,879 were members of 780 pedigrees, and 251 were unrelated. Dense SNP genotyping data was available from two chips: the Human Mapping 500k Array Set and the 50k Human Gene Focused panel. A total of 6,834 subjects were genotyped, and 6,583 of the genotyped subjects were members of pedigrees. The GAW16 Problem 3 dataset was generated under a semi-simulated approach: FHS pedigree and genotype data were kept as given in the real FHS data and phenotypes only were simulated on the observed genetic variation. Several quantitative traits related to lipid metabolism and one disease qualitative trait were simulated under complex genetic models. Problem 3 included 200 replicates of FHS pedigrees and unrelated subjects.

Methods

Table I contains a summary of the methods and results for the five contributions. The contributions of Group 15 can be separated into two categories. The first category includes the two contributions that used family-based methods to perform genetic analyses beyond the usual simple main-effects models. One of these contributions used the simulated FHS data to examine power of a multivariable linkage test for a quantitative trait under varying sampling schemes [Huang et al., 2009]. The second contribution developed a model to identify polymorphisms with imprinting or maternal effects on a qualitative outcome and applied the new approach using the real FHS data [Yang and Lin, 2009].

Table I
Group 15 contribution's summary

The second category includes three contributions that focused on ways to improve the power of association analysis when using family samples. Using the simulated FHS data, Saint Pierre et al. [2009] estimated the power of several family-based association methods for two quantitative traits. Two other papers compared different case-control association analysis techniques that are corrected to account for related individuals for qualitative outcomes in the real FHS data [Knight et al., 2009; Uh et al., 2009]. The last two papers also proposed new extensions.

Beyond simple main-effects models

Power of varying sampling designs for linkage analysis

Huang et al. [2009] studied coronary artery calcification (CAC), a quantitative disease endophenotype for myocardial infarction (MI), to assess the power of linkage analysis using three family selection designs: 1) randomly chosen nuclear families; 2) selection of nuclear families through a proband with a CAC value in the top 10%; and 3) selection of nuclear families through a disease affected proband, whose offspring has had an MI event. Univariate and multivariate linkage analyses, which allowed for the consideration of epistasis, were conducted with the Haseman-Elston regression method [Haseman and Elston, 1972].

Detection of imprinting and heterogeneous maternal effects

Yang and Lin [2009] studied imprinting and maternal effects simultaneously for the binary high blood pressure trait. The existing statistical methods that differentiate these two effects using family-based data from retrospective studies only use data on affected siblings [Weinberg, 1999; Weinberg et al., 1998; Wilcox et al., 1998]. Alternatively, these authors proposed a new likelihood-based method that models genotypes and offspring's disease status jointly to detect both imprinting and maternal effects simultaneously using data from both affected and unaffected siblings in nuclear families in prospective studies. It also incorporates possible heterogeneity of maternal effects by adding a random component on the link scale of the penetrance.

Association testing for common polymorphisms in family-based samples

The purpose of these three contributions was to examine ways to improve the power of association analysis when using family samples. Improving power is a concern when using family-based samples because traditional family-based tests use only a subset of family data, so the population-based association tests may have less power when accounting for the use of correlated data. Not adjusting for the correlation has been found to effect the type I error rates and this effect increases as the size of the family and the trait heritability increase [McArdle et al., 2007]. Association methods that account for the relatedness of familial data fall into two broad categories – family-based association analysis, in which the unit of interest is a family unit (e.g., TDT, quantitative TDT (QTDT), and family-based association tests (FBAT)) – and population-based association analysis, in which the unit of interest is the individual, and is adjusted to account for relatedness (e.g., measured genotype, generalized estimating equations, weighted likelihood approaches, and variance-corrected Cochran-Armitage trend test).

Family-based versus population-based association tests

Saint Pierre et al. [2009] examined power and type I error rates of three association tests using family data. They evaluated two family-based association tests: QTDT [Abecasis et al., 2000] and its modification, the quantitative linkage disequilibrium test (QTLD) [Havill et al., 2005], which use information about transmission of alleles and are based on the orthogonal decomposition of the marker effects. They also studied a population-based association test, a measured genotype (MG) test [Boerwinkle et al., 1986] that accounts for relatedness among subjects through estimation of residual polygenic effects. All three approaches, QTDT, QTLD, and MG, were applied to the association analysis of quantitative traits in extended pedigrees, but they differ in the amount and type of marker information used for testing association.

Weighting individuals in pedigrees for association analysis

Knight et al. [2009] proposed a new method that builds on the previously proposed idea of assigning weights to individuals in pedigrees for use in analysis [Browning et al., 2005]. Browning's method uses a pairwise measurement of sharing, kinship coefficients, to assign weights to pedigree cases, while Knight et al. [2009] used simulation to determine individual weights based on the average simultaneous sharing of individuals in the pedigree. They compared the Cochran-Armitage test for trend p-values using both weighting algorithms, a naïve approach (assuming independence for all observations) and empirical results (considered as the gold standard).

Modified quasi-likelihood score test (MQLS)

Uh et al. [2009] extended the MQLS proposed by Thornton and McPeek [2007] which also uses phenotype information from non-genotyped relatives to up-weight genotyped relatives. The original MQLS, an allelic test, was extended to be used in genotypic testing assuming a multiplicative model (gMQLS) [Sasieni, 1997]. To examine X-linked traits, Uh et al. [2009] used an allelic MQLS test stratified by sex (because males would contribute one allele and females two alleles) and then combined the chi-square statistic to form a two-degree of freedom test (xMQLS). The authors compared the results of these modified tests to generalized estimating equations (GEE), variance-adjusted trend test, and naïve analyses.

Results

Beyond simple main-effects models

Power of varying sampling designs for linkage analysis

Huang et al. [2009] found that, based on comparison of the mean square root of the LOD scores, no sampling design had the greatest power for the univariate analyses. However, under multivariate linkage analyses, the two selected designs (selection of nuclear families through a proband with a CAC value in the top 10% quartile and through a disease-affected proband whose offspring have had an MI event) showed similar power and was much more powerful than the non-selection design, especially for detecting linkage of an epistatic factor.

Detection of imprinting and heterogeneous maternal effects

Yang and Lin [2009] scanned 230k SNPs on chromosomes 1 to 6 and detected nine SNPs that may be associated with high blood pressure through minor allele, imprinting, or maternal effect. For SNPs that have significant minor allele effect, they further looked at the direction-inferred maternal or paternal imprinting effect. After maternal effects estimates were found significant, heterogeneity of the maternal effects were tested. The minimum Akaike information criterion (AIC) was then used to determine whether the maternal effect is heterogeneous. Five SNPs were detected to have varying degrees of maternal effects, of which three appear to be heterogeneous among the families and one has a simultaneous heterogeneous maternal effect and imprinting effect. They also reported that the association between the nine detected SNPs and blood pressure has been established either in human- or in rat- based studies on the Genetic Association Studies of Complex Diseases and Disorders section of the Genome Browser [Kent et al., 2002].

Association testing for common polymorphisms in family-based samples

Family-based versus population-based association tests

The three association tests (QTDT, QTLD, and MG) were found to have similar type I error rates, and in general, these rates were lower or close to the nominal values. Interestingly, in these data, departure from normality did not yield inflated error rates, except in a few instances and with QTDT. Across the three association models, the power was the lowest for the functional SNP with smallest size effects and for the less heritable trait. The direction of the association parameters was found to be consistent across the three association models. While the authors noted that the effective sample sizes varied little across the tested variants, large power drops and marked differences in performances of the models were observed. Overall, the results showed that MG outperformed the two orthogonal-based association models (QTLD, QTDT) even after accounting for population stratification. QTDT had the lowest power rates.

Weighting individuals in pedigrees for association analysis

Knight et al. [2009] found that the two weighting algorithm results were similar, yet the new weighting algorithm results had a higher correlation with empirical results than the Browning method. Results using a naïve approach, in which the pedigree cases and controls were treated as independent, always were anti-conservative compared with the empirical results. This would result in an inflated type I error rate. However, the naïve approach results were highly correlated (r=0.99) with empirical results. The high correlation and unidirectional relationship between the two results suggests that the naïve method can be used in the first pass of a two-stage design. The first-stage results below a lowered significant threshold can be followed by a second-stage analysis that can account for the familial correlation to determine accurate significance.

MQLS extended testing nuclear families

Uh et al. [2009] found that for the autosomal SNPs the MQLS results were similar to those found by using GEE and variance corrected Cochran-Armitage trend test. For all three tests, the analyses using nuclear families had increased significance over analyses using only the sibship data. This might be due to the increase in the effective sample size. Although only a small number of parents (n=323) were added, the proportion of cases added (20%) was relatively large compared with that in the sibling only data (6%). In this, the gMQLS test might be more efficient because it incorporates all phenotypic information available, including un-genotyped parents with coronary heart disease. For the X-linked analysis, the MQLS gave highly significant results (p<10-6) compared with the GEE analysis and PLINK analysis.

Discussion

Our group has shown that family-based approaches are useful designs and can make important contributions to genetic analyses that could not be made using independent samples. These contributions include detecting epistatic linkage effects and identifying imprinting and parent-of-origin effects. Furthermore, we have shown ways to address the challenges of family-based designs effectively through the development and modification of statistical association methods. For any genetic study it is crucial to find ways to improve power. Our group found that the use of population-based approaches improved the power of family-based designs. For example, MG had an increase in power over two orthogonal-based tests, as quantified using the simulated Problem 3 data [Saint Pierre et al., 2009]. Both Knight et al. [2009] and Uh et al. [2009] showed that their modifications and extensions of existing methods did result in increased significance using the real FHS data. While it is possible that this increase in significance might also lead to an increase in type I error, the authors' methods had similar results to previously validated methods. Further evaluations of these methods are needed.

Group 15 contributions suggest that population-based approaches may be a powerful tool to analyze family-based samples. However, these methods may remain sensitive to the presence of population stratification, as in the case of unrelated data. There are ways to adjust for population stratification. For instance, methods developed for analyzing unrelated samples can be applied to family data [Kathiresan et al., 2009]. Only one of our contributions attempted to account for this stratification effect. Saint Pierre et al. [2009] accounted for it by testing whether there was a significant difference between the within and between components. They found the MG test still to be the most powerful. Note, however, that in these GAW16 simulated data, Saint Pierre et al. [2009] found minimal population stratification. Thus, it remains unclear whether the outperformance of the MG test will still be observed in samples with substantial admixture across the pedigrees. Clearly, more work is needed to evaluate the use of such approaches in the context of family samples with hidden population stratification and admixture.

One of the limitations of our contributions is the lack of a complete genome-wide scan. However, all of the approaches used are suitable for GWAS. In one of the papers [Uh et al., 2009], a two-stage design is used for the purpose of dimension reduction to decrease computation time. In the first stage, a naïve method was used to select the set of markers with lowest p-values as potential candidate markers. In the second stage, the candidate markers were properly tested by accounting for the correlation in the data. A similar approach was also suggested by Knight et al. [2009]. They identified a high correlation and unidirectional relationship with their empirical results.

In conclusion, family-based samples allow for analyses, such as linkage or parent-of-origin effects, that would only be possible with family data [Huang et al., 2009; Yang and Lin, 2009]. They are also able to identify functional variants with complex and weak effects, through linkage or association tests [Huang et al., 2009; Saint Pierre et al., 2009]. There is, however, a need to examine the sensitivity of these population-based association tests to the existence of population stratification in family-samples and to further develop methods to correct for population stratification. It was also clear that further work is needed to fully investigate the feasibility for GWAS using family data. Despite the need for future research, our main conclusion is that family-based samples are very appealing for the genetic dissection of complex traits.

Acknowledgments

The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Group 15 primary contributing authors include: C. Huang, S. Knight, S. Lin, M. Martinez, N.R. Mendell, A. Saint Pierre, H.-W. Uh, and J. Yang.

References

  • Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66:279–92. [PubMed]
  • Antoniou AC, Easton DF. Polygenic inheritance of breast cancer: Implications for design of association studies. Genet Epidemiol. 2003;25:190–202. [PubMed]
  • Baron M. The search for complex disease genes: Fault by linkage or fault by association? Mol Psychiatry. 2001;6:143–9. [PubMed]
  • Boerwinkle E, Chakraborty R, Sing CF. The use of measured genotype information in the analysis of quantitative phenotypes in man. I. Models and analytical methods. Ann Hum Genet. 1986;50:181–94. [PubMed]
  • Browning SR, Briley JD, Briley LP, Chandra G, Charnecki JH, Ehm MG, Johansson KA, Jones BJ, Karter AJ, Yarnall DP, Wagner MJ. Case-control single-marker and haplotypic association analysis of pedigree data. Genet Epidemiol. 2005;28:110–22. [PubMed]
  • Cupples LA, Heard-Costa N, Lee M, Atwood LD, Framingham Heart Study Investigators Genetic Analysis Workshop 16 Problem 2: Framingham Heart Study data. BMC Proc. 2009;3(suppl 7):S2. [PMC free article] [PubMed]
  • Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972;2:3–19. [PubMed]
  • Havill LM, Dyer TD, Richardson DK, Mahaney MC, Blangero J. The quantitative trait linkage disequilibrium test: A more powerful alternative to the quantitative transmission disequilibrium test for use in the absence of population stratification. BMC Genet. 2005;6(Suppl 1):S91. [PMC free article] [PubMed]
  • Huang C, Li K, Saint Fleur R, Chang SW, Choi SH, Shen T, Shin SY, Finch SJ, Mendell NR. Family-based analysis of a myocardial infarction endophenotype: Comparison of sampling designs. BMC Proc. 2009;3(Suppl 7):S120. [PMC free article] [PubMed]
  • Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, Schadt EE, Kaplan L, Bennett D, Li Y, Tanaka T, Voight BF, Bonnycastle LL, Jackson AU, Crawford G, Surti A, Guiducci C, Burtt NP, Parish S, Clarke R, Zelenika D, Kubalanza KA, Morken MA, Scott LJ, Stringham HM, Galan P, Swift AJ, Kuusisto J, Bergman RN, Sundvall J, Laakso M, Ferrucci L, Scheet P, Sanna S, Uda M, Yang Q, Lunetta KL, Dupuis J, de Bakker PI, O'Donnell CJ, Chambers JC, Kooner JS, Hercberg S, Meneton P, Lakatta EG, Scuteri A, Schlessinger D, Tuomilehto J, Collins FS, Groop L, Altshuler D, Collins R, Lathrop GM, Melander O, Salomaa V, Peltonen L, Orho-Melander M, Ordovas JM, Boehnke M, Abecasis GR, Mohlke KL, Cupples LA. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet. 2009;41:56–65. [PMC free article] [PubMed]
  • Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. [PubMed]
  • Knight S, Abo RP, Wong J, Thomas A, Camp NJ. Pedigree association: Assigning individual weights to pedigree members for genetic association analysis. BMC Proc. 2009;3(Suppl 7):S121. [PMC free article] [PubMed]
  • Kraja AT, Culverhouse R, Daw EW, Wu J, Van Brunt A, Province MA, Borecki IB. The Genetic Analysis Workshop 16 Problem 3: Simulation of heritable longitudinal cardiovascular phenotypes based on actual genome-wide single-nucleotide polymorphisms in the Framingham Heart Study. BMC Proc. 2009;3(Suppl 7):S4. [PMC free article] [PubMed]
  • McArdle PF, O'Connell JR, Pollin TI, Baumgarten M, Shuldiner AR, Peyser PA, Mitchell BD. Accounting for relatedness in family based genetic association studies. Hum Hered. 2007;64:234–42. [PMC free article] [PubMed]
  • Saint Pierre A, Vitezica Z, Martinez M. A comparative study of three methods for detecting association of quantitative traits in samples of related subjects. BMC Proc. 2009;3(Suppl 7):S122. [PMC free article] [PubMed]
  • Sasieni PD. From genotypes to genes: Doubling the sample size. Biometrics. 1997;53:1253–61. [PubMed]
  • Thornton T, McPeek MS. Case-control association testing with related individuals: A more powerful quasi-likelihood score test. Am J Hum Genet. 2007;81:321–37. [PubMed]
  • Uh HW, van der Wijk HJ, Houwing-Duistermaat JJ. Testing for genetic association taking into account phenotypic information of relatives. BMC Proc. 2009;3(Suppl 7):S123. [PMC free article] [PubMed]
  • Weinberg CR. Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am J Hum Genet. 1999;65:229–35. [PubMed]
  • Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: Assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet. 1998;62:969–78. [PubMed]
  • Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of “case-parent triads” Am J Epidemiol. 1998;148:893–901. [PubMed]
  • Yang J, Lin S. Detection of imprinting and heterogeneous maternal effects on high blood pressure using Framingham Heart Study data. BMC Proc. 2009;3(Suppl 7):S125. [PMC free article] [PubMed]