|Home | About | Journals | Submit | Contact Us | Français|
Many rodent experiments have assessed effects of diets, drugs, genes, and other factors on life span. A challenge with such experiments is their long duration, typically over 3.5 years given rodent life spans, thus requiring significant time costs until answers are obtained. We collected longevity data from 15 rodent studies and artificially truncated them at 2 years to assess the extent to which one will obtain the same answer regarding mortality effects. When truncated, the point estimates were not significantly different in any study, implying that in most cases, truncated studies yield similar estimates. The median ratio of variances of coefficients for truncated to full-length studies was 3.4, implying that truncated studies with roughly 3.4 times as many rodents will often have equivalent or greater power. Cost calculations suggest that shorter studies will be more expensive but perhaps not so much to not be worth the reduced time.
RODENT longevity studies remain a staple of experimental aging research, having been used to evaluate the effects of diets, drugs, genetic factors, toxins, or other factors on life span. One of the greatest obstacles with such experiments is their long duration, typically requiring 3.5 years or more to observe the complete life span of all study rodents (1). In addition to the significant financial burden for such studies, the investment in terms of time until answers can be obtained and can represent a significant fraction of an investigator's research career, limiting the number of experiments an investigator can perform in their lifetime.
Interventional effects on longevity are commonly estimated through Cox proportional hazards (PHs) regression models (2) or such parametric formulations as the Gompertz model (3). Under the Cox PH model, the relative hazard rate (the instantaneous probability of death occurring at a given moment, conditional on death not happening prior to that moment) between two factors is assumed to have a fixed ratio over time. It is important to note that the Cox PH model does not restrict the hazard for a particular group to be constant as a function of time, instead entailing the inherent flexibility to be congruent with the biological expectation of increasing hazard during the course of normal aging. Similarly, the Gompertz model assumes that the relative acceleration in mortality is constant as function of age. Because these models assume constant hazard ratios (HRs)/relative increases in mortality rate over time, their use implicitly assumes that any relative effects on mean or median life span are also present for so-called “maximum life span,” a quantity of great interest to gerontologists (4).
Recognizing the significant time investment required for full longevity studies along with the application of models that assume constant effects across time invites the question as to whether the rodent mortality experience in these settings is consistent with the assumption of PH. Put another way, is it possible to reduce the length of longevity experiments by running studies for only τ years, where τ is less than full follow-up and still achieve comparable effect estimates to what one would obtain if one continued the study until all rodents died? Truncating longevity experiments prior to observing death for all rodents reduces statistical power due to the inherent loss of information. Thus, a related question is whether this loss of power can be overcome by increasing sample size, permitting shorter studies in terms of calendar time that still achieve the same expected inference concerning effects on mortality rate.
The goal of the present study is to empirically evaluate the assumption of PH using a collection of 15 rodent longevity experiments. Motivated by the design of long-term rodent toxicology (typically carcinogenicity) experiments, we also consider the situation of truncating longevity studies at 2 years of follow-up. We evaluate whether these shorter term studies could be expected to yield, on average, the same result with respect to effects on mortality rate that would be obtained from a full lifetime study. For situations in which the observed effects on mortality rate would be roughly the same in a shorter versus life-long follow-up study (ie, PH is a valid assumption), we estimate the required increase in sample size needed to achieve the same statistical power and precision as would a full lifetime study. Finally, we also compare the total study costs for a truncated follow-up experiment versus a full-length longevity study.
We utilized mortality data from a convenience sample of 15 rodent life-span studies available to us from our own work or provided by close collaborators (Table 1). These life-span studies evaluated a broad range of factors, including sex, genotype, and drug and diet, exercise, and surgical interventions, though we do not claim that they represent an exhaustive sampling of rodent longevity studies. Available studies, where the intervention changed during the course of the study, were excluded, such as the methionine study of Miller and colleagues (4). Of the 15 studies included, their average study length was 3.6 years (SD = 0.8) with an average sample size of 363 rodents (SD = 50).
For each available study, we estimated the mortality effects of the various treatments, interventions, and other factors considered in the original experiment via Cox PH regression. In order to access evidence for deviations from the assumption of PH, we performed statistical tests of the PH assumption by including time-based effects in the regression model for each study (5). The inclusion of time-varying effects for each predictor variable is a standard statistical approach to assess the evidence for changing HRs changes over time, such as could occur if a given treatment leads to increased early mortality yet offers later protective effects (6). For each study, we performed an overall F test for all time-based effects in the full study data to examine the PH assumption. We utilized a jackknife procedure to conduct more focused tests for differences between the HRs obtained from analyzing the full longevity data versus truncating each study at 2 years (Appendix 1). We report uncorrected p values, so that readers may choose to apply the multiple testing or false discovery rate correction of their choice (7) or simply apply a nominal α = .05 significance level (8).
To estimate the required increase in sample size for truncated life-span studies, we assumed that, asymptotically, the variance of estimated effects is proportional to 1/N. That is, if we double the sample size, then the standard errors of estimated effects will shrink by a factor of . If the data from the full study yielded an estimate with variance VF and the truncated data yielded an estimate VT, then a truncated study will need a larger sample size by a factor of ~VF/VT to achieve equivalent power.
Based on the estimated sample size increases for truncated longevity studies, we also estimated the costs associated with performing a hypothetical larger truncated experiment for each of the 15 studies. Based on the National Center for Research Resources rate-setting manual (9) and consulting with experts on rodent life-span experiments, we assumed the following in cost calculations. Both mice and rats incur an initial cost of $20.00 per animal to purchase, with loaded cage maintenance costs of $1.27 and $2.14 per day for mice (up to five per cage) and rats (up to three per cage), respectively. We also assume that a lab technician (salary of $40,000 per year) can manage up to 1,000 rodents. Due to the fact that, in practice, surviving rodents are not reassigned to new cages, once cage mates have deceased, these costs of full survival studies are likely underestimated at some level. All analyzes were performed using SAS v9.1.3 (SAS, Cary, NC).
In Table 2, we report the uncorrected results of testing for departures from PH. After considering either the Bonferroni or Holm's (10) multiple-testing corrections, none of the studies gave statistically significant evidence of deviation from a PH model (Table 2). Two studies, 5 and 8, did exhibit nominal departure from PH. For Study 5, which tested selectively bred high-runner mice at different exercise levels against control mice, the control mice initially died off at a slower rate than the other two treatment groups but then died off rapidly toward the end (11). For Study 8, the mice that were administered thyroxine started the study with similar survival rates but then experienced accelerating death rates later in the study (12).
Table 3 displays the results of truncating each of the studies at 2 years, censoring each rodent at 730 days for those with deaths that occurred after that point. The jackknife tests for differences in the estimated HRs between the truncated and full-length studies suggest that in 43 of 49 comparisons (88%), the value of the estimated coefficients did not differ significantly between the truncated and full-length studies at the α = .05 level; at the Bonferonni corrected α level of .05/49, none of the differences were significant. Of the six treatment groups whose coefficients differed at the nominal .05 level, 4 of 6 were for dietary interventions.
Figure 1 displays variance ratios for the estimated HRs from truncated and full-length studies, defined as (SE2yr/SEfull)2. Based on truncating studies at 2 years, the variances of the estimates became inflated by factors ranging from 1.2 to 34.0. As expected, the proportion of rodents that died before 2 years was strongly correlated with the level of variance inflation (Figure 1). On average, the variance of the estimates increased by a factor of 5.1 (SD = 5.6, median = 3.4). The largest increases in the variance ratios were in studies where less than 20% of rodents died before 2 years due to long life or small variation in life span. Under the assumption that truncated studies with five times as many rodents allocated to each arm would have equivalent power to a full-length study with one fifth of the total sample size, Appendix 2 displays the estimated total study cost for each type of design for each of the 15 data sets considered. The costs of running shorter experiments ranged from 1.8 to 4.0 times the cost of the full-length study (Table 4). Full details of the cost computations are shown in Appendix 2.
Our analysis indicates that, in most cases, interventions or other factors that influence rodent longevity induce effects consistent with PH models. Truncating the study length to 2 years did not significantly affect the estimated effect; only the variance of this estimate and hence its statistical significance were influenced, a result of reduced sample size and power. This implies that the Cox PH regression model (13) is sufficient for detecting differences in longevity, even when studies are cut short. We propose that increasing the number of rodents in the study by a factor derived from Figure 1, by a factor near 5 on average (mean) but in half the cases by a factor of no more than 3.4 (median), can offer power equivalent to a full-length study. In situations where time is critical and the impact of discovery is large, the additional cost burden of a shorter experiment may be justified. It may also be worth commenting on whether diet, strain, environment, cohort, etc. might have the most impact on achieving similarity of results in truncated versus full longevity studies, particularly given that each of these factors can affect both the overall life-span trends and specific mortality trajectories at earlier timepoints. Of course, the truncated approach will not provide full information on diseases of aging compared with a full longevity/disease experiment.
The current results suggest empirical adherence of mouse and rat longevity studies to PH, implying similar effects on mortality rate across the full life span. Thus, although the risk of death (hazard rate) may certainly accelerate with advanced age, it appears that differences in acceleration between groups commonly occur by a proportional factor. This further suggests that cases where early, mean, and median life span are extended, extensions in so-called maximal life span would also be the expected norm. However, exceptions to this general rule may be possible, in principle and observation. It is commonly observed and cited that in addition to mean and median life span, maximal life span is extended in response to calorie restriction, setting it apart from interventions that may increase mean or median life span independent of maximum life span (14,15). It is theoretically possible that an intervention that extends both mean and median life span has no benefit on maximal life span or that maximal life span may be increased independent of mean or median benefits. What is less commonly reported is the case where higher early- to mid-life mortality is followed by a subsequent extension of maximal life span. Although such an instance has been reported (4) and others may exist, it should be noted that the constant effects on longevity would only be expected in response to a single intervention that was maintained for the duration of the longevity study (16). Therefore, the alteration of a single intervention, as occurred with a methionine restriction protocol that showed increased early-life mortality (using the initial dietary formulation) followed by subsequent extension (as the diet was reformulated at two interim study points), does not contradict the expected proportionality of life span (4). Whether other studies that show potential differences in either early-life or late-life mortality effects that might contradict the predicted proportionality has not been formally tested. For example, calorie-restricted wild-derived mice were not extended at early- and mid-life but were extended at the 90th percentile (17); resveratrol was reported to increase the early- and mid-life span of high fat–fed mice, with no benefit on maximal life span (18).
Part of the question regarding maximal life-span effects may be related to the lack of proper statistical analyzes of reported maximal longevity results. Clearly, truncating studies at any point prior to the maximal observed life span prevents assessment of any late-life specific effects that might occur. Yet, the limited sample size defining the maximum life span (90th percentile—4 animals of a cohort of 40) results in low-powered comparisons between groups, and it is often unclear whether observed increases in maximum life span are statistically significantly greater rather than simply numerically increased. The application of a standardized method, such as those described in (19) and (20) for maximal life-span assessment between studies, would be useful to determine whether longevity studies that have observed early- and mid-life extension consistently occur independent of late-life (maximal) extension. Another point of interest related to the proportionality of survival/mortality is whether the cause of mortality may differ at various points along the survival curve. This is often likely to be the case, at least for comparisons of some groups (eg, sex differences, and if so, then an intervention that affects a specific cause of death that is isolated to late-life in rodents would be unobserved in a truncated study). Of course, our approach assumes that the observed mortality was due to natural causes in shorter lived mice; in practice, the investigator will need to check this assumption by carefully examining the causes of death.
The current results lend support to several practical applications. The first is the potential for statistical analyzes of existing data sets in order to hypothesize about the expected survival impact that may occur in response to a particular diet, compound, or intervention. In particular, we envision the use of toxicology studies of at least 2-year duration where greater than 20% mortality has occurred, potentially advancing aging research without additional study-related costs. Second, although we have specifically evaluated truncated 2-year life-span experiments in order to parallel existing toxicology resources, our results in no way imply that 2-year studies or designs with a fixed stopping point are optimal. Rather than specifying a deterministic study length, an alternative implication from PH would be the potential for applying group sequential or similar clinical trial designs that permit interim analyzes of a life-span experiment (21). For example, one could envision a life-span study employing planned interim analyzes at 3 to 6-month intervals beginning at 2 years using such approaches as conditional power methodology (22) or continual monitoring using Bayesian decision-theoretic techniques to evaluate the need to continue the experiment (23). This methodology may be particularly beneficial, given that recent publications indicate that exploration of factors affecting life span in rodents remains a staple of experimental aging research (24–28). Although similar analyzes have been applied to toxicology experiments in clinical drug testing, our analyzes illustrate the potential to apply similar methodologies to a multifactorial endpoint of longevity. However, a meticulous adherence to the planned study design and proper analytic techniques is certainly advised to prevent improper interpretations, such as could occur by simply ending a study when a statistically significant result is obtained.
This work was supported in part by National Institutes of Health grants: P30DK056336, R01DK076771, T32HL072757, T32DK062710, and T32HL079888. T.G. was supported by NSF grant IOB-0543429.
D.B.A. has received grants, honoraria, donations, book royalties, and consulting fees from numerous food, beverage, dietary supplement, pharmaceutical companies, litigators and other commercial, government and nonprofit entities with interests in obesity, and related topics.
We would like to thank the following individuals for their generosity in sharing rodent data: George Roth, Chief Executive Officer, GeroScience Inc.; Richard Miller, Professor of Pathology, University of Michigan at Ann Arbor; Donald K. Ingram, Professor of Nutritional Neuroscience and Aging Laboratory, Pennington Biomedical Research Center; Julie Mattison, Laboratory of Experimental Gerontology, National Institute on Aging, National Institutes of Health (NIH); Arlan Richardson, Senior Research Career Scientist (STVHCS) Director, University of Texas Health Science Center; and Kyle Grimes, Professor of English at the University of Alabama at Birmingham. We would also like to thank Junior Bazile, Nigel Rozario, and Huichien Kuo for assistance with data compilation and standardization. The opinions expressed are those of the authors and not necessarily those of the NIH or any other organization with which the authors are affiliated.
Let with corresponding estimator based on the full data set (i = 1, 2, … , N observations). The jackknife procedure then consists of the following steps (29).
For i = 1 to N
Remove the ith subject from the data set.
Estimate θ in the reduced sample, denote this estimate as .
The jackknife estimate of θ is then
where . The jackknife estimate of the standard error is
Thus, a test statistic under the null hypothesis =0 can be constructed as
where T follows an approximate t distribution with N – 1 degrees of freedom.
|Study||Length||N||Mean Longevity||Maximum Longevity||Total Rodent Days||# Technicians Required||Technician Cost||Rodent Overhead Cost||Rodent Maintenance Cost||Total Cost||Cost Ratio|