|Home | About | Journals | Submit | Contact Us | Français|
Recently, an indirect genetic association approach that compares genotype frequencies in offspring of long-lived subjects and offspring from random families has been introduced to study gene-longevity associations. Although the indirect genetic association has certain advantages over the direct association approach that compares genotype frequency between centenarians and young controls, the power has been of concern. This paper reports a power study performed on the indirect approach using computer simulation. We perform our simulation study by introducing the current Danish population life table and the proportional hazard model for generating individual lifespan. Family genotype data is generated using a genetic linkage program for given SNP allele frequency. Power is estimated by setting the type I error rate at 0.05 and by calculating the Armitage’s chi-squared test statistic for 200 replicate samples for each setting of the specified allele risk and frequency parameters under different modes of inheritance and for different sample sizes. The indirect genetic association analysis is a valid approach for studying gene-longevity association, but the sample size requirement is about 3–4 time larger than the direct approach. It also has low power in detecting non-additive effect genes. Indirect genetic association using offspring from families with both parents as nonagenarians is nearly as powerful as using offspring from families with one centenarian parent. In conclusion, the indirect design can be a good choice for studying longevity in comparison with other alternatives, when relatively large sample size is available.
As a complex trait, human longevity involves a large number of both genetic and non-genetic factors together with their interactions . In the recent years, high throughput genotyping for the single nucleotide polymorphisms (SNPs) is enabling the genetic association analysis in fine mapping genes that contribute to human complex traits. In longevity studies, genetic association analysis using the popular case–control design has been conducted frequently for which genotypic information from centenarians or nonagenarians (cases) and young subjects (controls) are collected and genotype frequency compared to infer the association. Similar to the genetic association study of human diseases, the case–control design in longevity study requires that the two groups be well matched for potential confounding factors. However, the case–control design for longevity study failed to account for the important birth cohort effect, because of the constantly improving living standard and healthcare, which have largely helped to extend human lifespan . Moreover, similar to any complex disease phenotype, the multi-factorial nature of human longevity also means that complex interaction between genes and the environment can be an important contributor to extreme survival. In this case, the changing environment reflected by the birth cohort effects could result in a biased estimate of the true genetic model.
Instead of directly comparing the long-lived subjects with young controls, which are taken from different birth cohorts, Barzilai et al.  introduced an indirect genetic association analysis on the cholesteryl ester transfer protein gene by comparing genotype frequencies between centenarian offspring and their age-matched controls. Their analysis detected significant increase of the homozygote genotype of the 405 valine allele in the centenarian offspring with similar pattern also revealed by a direct comparison between centenarians and young controls. Similar indirect analyses had been done on health outcomes by Barzilai  and by Adams et al. . The merits in offspring of long-lived subjects in studying exceptional longevity have been demonstrated in the literature [5–13]. For example, a very recent study reported that centenarian offspring are more likely to age in better cardiovascular health and with a lower mortality than their peers . In another study, Rose et al.  reported that centenarians and their offspring show significantly higher level of heteroplasmy in the mtDNA control region than the controls. All these observations indicate that offspring of the long-lived subjects could be ideal samples for studying human longevity.
In this paper, we are going to validate the indirect genetic association approach for studying longevity using computer simulation. Efficiency of the approach will be examined by power estimation for given parameters (allele relative risk, frequency, mode of inheritance) and for given sample sizes under two sampling schemes (LP1: at least one centenarian parent; LP2: both parents over age 90) when type I error rate is fixed to α = 0.05. Individual lifespan data are generated according to the current population survival to ensure the simulated lifespan distribution complies with the observed population data. Power estimates for the indirect association will be compared with our published power estimates for the direct association approach  and advantages and disadvantages will be discussed.
We introduce the latest life table for Denmark in our data simulation. With the observed population survival from the Danish life table, we are able to generate our data that follow the current mortality rate in the Danish population, without imposing any parametric function for the survival distribution. The Danish life table was taken from the Human Life-Table Database maintained at the Max-Planck Institute for Demographic Research in Rostock, Germany under http://www.lifetable.de/data/MPIDR/DNK_2005-2006.pdf. According to the life table, life-expectancy at birth for males is 76 years and for females 80 years. The mean survival for the two sexes was taken for the simulation.
For a given SNP allele with frequency p and relative risk r (r < 1beneficial and r > 1 harmful to survival; the other allele is defined as the baseline allele), we decompose the observed population survival at age x from the Danish life table (x) into genotype specific survivals,
where s2(x), s1(x) and s0(x) are genotype-specific survival functions for individuals carrying 2, 1 and 0 copies of the allele. Genotype-specific survivals are dependent on the relative risk parameter, the number of risky alleles carried by the genotype, and the mode of inheritance. In a simple proportional hazard model, we assume that the risk of an allele is constant over the ages (for example, the effect of apolipoprotein E gene as reported by Gerdes et al. ) so that the hazard function corresponding to a genotype-specific survival function, for example s1(x), can be written as μ1(x) = rμo(x). Here we can see that, for carriers of one allele with r < 1, the hazard of death can be reduced by 100*(1 − r) percent. Given the existence of multiple unobserved factors or hidden frailty that also contribute to individual survival by increasing or reducing the hazard of death, we introduce a gamma-frailty model  (mean of frailty = 1, variance = σ2) for defining the genotype-specific survival functions so that we have
Here so(x) is the baseline survival function and σ2 is set to 0.1 according to our experience in fitting frailty models to the Danish life table data . Introducing (2) into (1), we can numerically solve Eq. (1) to obtain a non-parametric baseline survival function so(x) for given risk and frequency parameters  and consequently obtain the genotype-specific survival functions in (2). Individual lifespan can then be generated for given genotypes.
Family data and individual genotypes are simulated using the linkage program Merlin . The program first randomly generates parental genotypes based on the specified allele frequency and then offspring genotypes are assigned based on their parental genotypes. Both parental and offspring genotypes are used for simulating their lifespan data. However, only offspring genotypes are used for indirect association analysis by frequency comparison between offspring of long-lived parents (probands) and their age-matched controls who are offspring from random families. The maximum age gap between the long-lived parents and their offspring is set to 35 years.
We choose the Armitage’s trend test given by Sasieni  as the test statistic for comparing genotype frequency between offspring of probands and of random families. Following Sasieni, the Armitage’s test statistic is calculated using the following formula,
Here, N1 and N2 are the number of heterozygous and homozygous allele carriers in the total sample of size N, R1 and R2 are the number of heterozygous and homozygous allele carriers in the R offspring of the long-lived parents (Table 1).
In our simulation, an equal number of samples are drawn for offspring from long-lived parents and from random families. So we have R = S = 0.5*N. The test statistic follows a chi-squared distribution with 1 degree of freedom. Power of the test is calculated as the proportion of significant tests among all the tests performed on 200 replications generated in the simulation. By setting the type I error rate to α = 0.05, we can calculate the power as
In (4), B is the total number of replicates set to B = 200, is the test statistic for the jth replicate, and .
In Fig. 1, we show the frequency of a beneficial allele in 1,500 offspring with at least one centenarian parent (LP1 offspring) with an allele frequency at birth of 0.2 in the simulated samples. Each of the 95% confidence intervals (CIs) is estimated from an independent simulation with an assigned risk of the allele (0.7, 0.75, 0.8, 0.85, and 0.9). We can see that the allele frequency estimates significantly deviate from 0.2 and the deviation increases rapidly with the percentage of hazard reduction (from the lowest reduction of 10% for r = 0.9 to the highest reduction of 30% for r = 0.7). The message from Fig. 1 is that, frequency of gene alleles that contribute to human longevity is higher in the offspring of centenarians than in the general population. This phenomenon also means that offspring of the long-lived can be used for indirect genetic association analysis of human longevity.
Next we examine the power for the indirect approach using different settings of allele risk (from 0.6 to 0.9) and frequency parameters (from 0.05 to 0.8) for various sample sizes (from 200 to 3000) under different modes of inheritance (multiplicative or log additive, dominant and recessive). Table 2 has the power estimates for comparing genotype frequency of LP1 offspring with offspring from random families for additive SNP alleles. For a sample size of 3,000 (i.e. 1,500 centenarian offspring), the power for detecting an allele of r = 0.9 (10% hazard reduction) is 82% when allele frequency is 0.2, 96% when frequency is 0.5 and 72% when frequency is 0.8. When the sample size is reduced to 1,600, the model still has high power (>81%) in capturing common SNP alleles that reduces hazard by 15% (r = 0.85). For a sample size of 1,000, only common alleles of big effect (r = 0.8 or 20% hazard reduction) can be mapped with enough power (>81%). For large effect alleles (>20% hazard reduction), a sample size of 400–600 can be used. A small sample of 200 subjects does not have enough power unless extremely large effect genes exist which is unlikely.
As shown in Tables 3 and and4,4, sufficient power (>80%) can only be achieved with large samples of centenarian offspring in testing strong effect SNP alleles having over 15% hazard reduction for dominant alleles with frequency <0.5 and for recessive alleles with frequency >0.5. These results indicate that, the indirect association is actually a weak approach for studying genes with non-additive effects.
Instead of sampling centenarian offspring, we also simulated another sampling scheme that collects genotype information for offspring, whose parents both lived past 90 years (LP2 offspring). Power estimates indicate that such a sampling scheme has high power (>86%) in identifying common SNP alleles with over 15% hazard reduction for large sample sizes (>3,000) (Table 5). For a smaller sample size of 1,000, the approach has acceptable power in detecting common SNP alleles with over 20% hazard reduction. Comparing power estimates in Table 5 with that in Table 2, one can see that although the LP2 offspring are generally less informative than the LP1 offspring, the major difference is only for the rare SNPs (frequency of 0.05). For very high frequency alleles, power estimates are very close, especially for large sample sizes.
We have shown, through computer simulation that indirect genetic association analysis is a valid method for studying genetic association with human longevity. The estimated power is highly dependent on the parameters specified (frequency, risk, mode of inheritance) and sampling schemes (size of study, selection of proband). A relatively large sample size (over 1,000 centenarian offspring) is required for mapping genes with low to modest additive effects. For non-additive effect genes, the power is generally low. The power is especially low for detecting high frequency dominant and low frequency recessive genes. The low power can be due to the high presence of risky genotypes that overwhelm the population, p2 + 2p(1 − p), for high frequency dominant alleles, and to the low presence of risky genotype that is very rare in the population, p2, for low frequency recessive alleles in offspring from both proband and random families.
As shown in Tables 2 and and5,5, except for rare SNP alleles, the power in testing common SNPs using offspring from LP1 (centenarian as proband) and LP2 (nonagenarian as proband) families is comparable. This means, according to our power estimates that, offspring from LP2 families are nearly equally as useful as those from the LP1 families and thus can be sampled and analyzed jointly. The joint sampling can help researchers to achieve larger sample sizes and thus more power for their studies.
It is necessary to compare our power estimates for the indirect approach with that from the direct approach . For any fixed parameter and sample size, the indirect genetic association exhibits lower power, compared to the direct approach and thus larger sample sizes are needed in order to obtain comparable power as in the direct association studies. In general, there is a 3 to fourfold difference in sample size requirement between the two approaches. Note that, the reported power for the direct association does not take into account the birth cohort effect that constantly reduces mortality over time. However, we emphasize the following two points.
First, the offspring from both proband and random families, who are genotyped in the indirect association studies, are of relatively younger ages (over age 65 in LP1 families and 55 in LP2 families). Their genotype information can be re-used when these individuals are followed up to conduct cohort studies on, for example, aging related diseases or longevity. According to Hjelmborg et al.  the genetic influences on lifespan are minimal prior to age 60 but increase thereafter. This means that the follow-up studies on the offspring from the indirect approach can be highly informative. As offspring of centenarians or long-lived subjects reported to inherit significantly better health , important results can be expected from follow-up studies on these already genotyped samples.
Second, most of the genetic association studies using centenarians are of small scale, because centenarians are rare samples. However, the indirect design genotypes centenarian offspring instead of the centenarians themselves. Since the indirect design does not require genotypic data from centenarians, the sampling scope can be largely expanded.
Third, because of the rarity of centenarians, many case–control association analyses have been done using nonagenarians or even octogenarians as cases instead of centenarians. Tan et al.  reported that the case–control association using nonagenarians requires a more than fivefolds increase in the sample size, compared to using centenarians. In this case, a better alternative would be conducting the indirect genetic association analysis given the above mentioned advantages.
Obtaining sufficient samples has been a major obstacle in longevity studies. The small sample sizes used resulted in the lack of power and accounts for the inconsistent results in gene-longevity association studies . In this aspect, the indirect association design offers a good alternative although it requires larger sample sizes. It is encouraging that international consortia have been established for collecting data on long life families (for example, the Long Life Family Study at https://dsgweb.wustl.edu/llfs/, the Genetics of Healthy Aging Project at http://www.geha.unibo.it/). Large scale genotype data will be collected for performing both direct and indirect genetic association analyses for identifying or replicating genetic variations that affect human longevity.
Our simulation focuses on nuclear families with only one offspring. In practice, multiple siblings from each family of long-lived parents can be sampled. In this case, inference on statistical significance in frequency differences needs to take into account of the genetic correlation among siblings within each family. Statistical models able to handle correlated data are available, for example the generalized estimation equation model that treats siblings in each family as one cluster with exchangeable correlation structure .
Our computer simulation has shown that the indirect case–control association design, using centenarian offspring, is a valid approach for studying human longevity. Compared with the direct design that is based on centenarians, a three to fourfolds increase in samples size is required to achieve comparable power. However, given the rarity of centenarians and the usefulness of genotype data of centenarian offspring, the indirect design can be a good choice for studying longevity in comparison with other alternatives.
This work was supported by the National Institutes of Health [U01AG023712]; and the National Institute of Aging [P01-AG08761].
Qihua Tan, Epidemiology, Institute of Public Health, University of Southern Denmark, Winsløws Vej 9B, 5000 Odense, Denmark, Department of Biochemistry, Pharmacology and Genetics, Odense University Hospital, Odense, Denmark.
Jing Hua Zhao, MRC Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, UK.
Shuxia Li, Epidemiology, Institute of Public Health, University of Southern Denmark, Winsløws Vej 9B, 5000 Odense, Denmark.
Torben A. Kruse, Department of Biochemistry, Pharmacology and Genetics, Odense University Hospital, Odense, Denmark.
Kaare Christensen, Danish Aging Research Center, Institute of Public Health, University of Southern Denmark, Odense, Denmark.