Familial studies suggest that roughly 25% of the inter-individual differences in YL can be attributed to genetic factors 
. Although linkage and association studies have reported several variants associated with human lifespan and aging-related traits 
, the individual effects of these variants is usually small and our ability to use genetic information to predict human lifespan remains very limited. Recent studies 
suggest that WGP is effective at predicting complex traits. Here, we developed a WGP model for the prediction of YL and evaluated its predictive power using data from the Framingham longitudinal study.
When genetic markers were added to a model accounting for age at entry, sex, smoking, and BMI, the increase in
obtained by adding 80K SNPs (~9–10% of inter-individual differences in YL) was greater than the increase obtained by adding smoking and BMI (~6% of inter-individual differences in YL), indicating that genetic markers are making a relatively important contribution to predictive ability. Similar results were obtained when prediction accuracy was evaluated using longitudinal AUC's.
As anticipated, our results suggest that the genetic basis of YL involves a large number of variants. The observation that DIC and prediction accuracy improved with marker density suggests that a large number of markers spread across the genome are needed to account for differences at QTLs affecting YL, and this is consistent with what one would expect for a trait that conforms to an “infinitesimal” model 
. This pattern is also consistent with empirical evidence obtained for traits that conform to the infinitesimal model, such as human height 
or production traits in dairy cattle 
Our results are also consistent with those of Yashin et al. 
who, using a subset of the dataset used here (1,173 individuals of the original cohort), found that a sizable proportion of inter-individual differences in YL (20% in the training dataset) can be explained by the joint influence of 168 small-effect genetic variants which were pre-selected using p-values
derived from single-marker regressions. Although the study by Yashin et al. 
and the one presented here both suggest that a large number of variants is needed to account for interindividual differences in YL, the two studies differ in many respects: (a) our study uses a larger sample size (N
5,117, versus N
1,173) and incorporates both uncensored and censored observations, (b) unlike the Yashin study, where markers were pre-selected using statistics derived from single marker regression, here we used a much larger number of markers (up to 80K), spread along the whole genome, (c) although the two studies used an additive linear score to predict YL, the two scores are different. In the Yashin study the score consist of a sum of so-called “longevity alleles”, while in our study the predictive score is a weighted sum of allele dosage, with weights given by estimates of marker effects, (d) in some of our models we account for the effects of BMI and smoking, while these covariates were not accounted for in the Yashin study. Finally (d) in our study we focused on prediction accuracy of yet-to be observed outcomes, while the study by Yashin et al. reports the proportion of interindividual differences in YL that could be accounted for in the same dataset that was used to derive the predictive score. Nevertheless, despite the differences in the datasets and methods used, both studies provide consistent evidence that an important proportion of differences in YL can be predicted using genomic information and that capturing those patterns requires considering a large number of small-effects variants.
In addition to demonstrating that a sizable proportion differences in YL can be predicted using genomic information, we found that most of the gains in prediction accuracy obtained with use of genetic information came from increased accuracy of prediction of early mortality. Further examination of the distribution of causes of death by age at death reveled that cancer was the leading cause of death for people dying early in life. Therefore, a possible explanation of our results is that the ability of our WGP to capture cancer risk (indirectly through YL) was higher than for other death-related disorders. Further studies, using disorder-specific responses (e.g., presence/absence or onset of cancer) and case-control datasets will be needed to confirm this conjecture.
The Framingham dataset has a familial design and exhibits some level of population structure, much of which can be described through PCA of genome-wide SNPs. Whole-Genome Prediction exploits multi-locus LD between markers and QTL. These patterns of LD are likely to change across sub-groups in the population and because of this, models fitted using WGP cannot be regarded as ‘universal equations’. The validity across sub-groups of the patterns captured by a WGP model will depend on the extent to which genetic features (e.g., stratification) present in training samples are also present in those used for validation.
Including the first two marker-derived PCs increased prediction accuracy markedly, indicating that YL covariates with ancestry, as described by the first 2 PCs. However, the level of prediction accuracy attained by models using the first two marker-derived PCs was substantially lower than that of the model using 80K genome-wide SNPs, suggesting that the genetic factors affecting YL cannot be fully described by features such as population structure. The effects of familial relationships on the prediction accuracy of WGP are well established 
. However, in our study, the pedigree is relatively sparse and when families with more than one subject exist the offspring are highly likely to be censored; therefore, familiar relationships are not very informative to begin with, explaining why in this study familial relationships did not show a strong effect on the prediction accuracy of WGP.