The MASS was set up to assess whether or not screening for AAA was beneficial in terms of long-term mortality (Thompson et al., 2009
). Between 1997 and 1999, men aged 65–74 years were recruited from family doctor lists in four UK centres. Of the 33883 men who were invited to screening, 26875 had a visualized abdominal ultrasound scan and 1334 aneurysms (diameter 30 mm or greater) were detected. For this analysis of growth rates, data are taken from 1046 subjects who had a diameter of 30–54 mm at their first screen and at least one follow-up ultrasound measurement. The current diameter of aneurysm determined the next examination time; individuals who measured 30–44 mm were rescanned a year later, whereas those with diameters 45–54 mm were rescanned after a further 3 months. In total, the data contain 8941 ultrasound examinations. The average duration of follow-up was 4.9 years, with a mean of 8.5 ultrasound scans per person.
4.1. Follow-up and censoring
Individual series are terminated either because of surgery (36%), death (21%), loss to follow-up (26%) or the administrative censoring date of March 31st, 2008 (17%), whichever comes first. Individuals whose aneurysm diameter measured 55 mm or greater at any examination or who showed rapid expansion (defined as observed growth 10 mm or more in 1 year) were considered for elective surgery. Those who were deemed unsuitable for surgery had continued surveillance of their aneurysm. A series that is terminated because the patient underwent elective surgery will tend to be biased towards a larger diameter on the final measurement due to measurement error (Brady et al., 2004
). However, patients who drop out on the basis of their observed measurement history define a random, and hence ignorable, drop-out mechanism, if a likelihood-based analysis is used (see pages 283–285 and equation 13.2.3 of Diggle et al. (2002))
. shows four ‘spaghetti’ plots of individual growth series, grouped by the mode of termination, together with the empirical mean AAA diameter profiles. In only measurements that were taken close to an anniversary of screening are used, since 3-monthly rescans were only undertaken in individuals with diameters 45–54 mm, and could distort mean values. It can clearly be seen that on average AAA diameters are larger in the group who eventually go for surgery, and those who become lost to follow-up have on average smaller AAAs. This latter observation is not entirely unexpected, since many of the measurements in the lost to follow-up group are below 30 mm, and essentially the AAA is no longer confirmed in this size range. This may explain why the patients drop out of the study. Hence there is good reason to suspect that dropout due to both surgery and lost to follow-up is mainly dependent on observed AAA diameters, and hence for this analysis we assume (missing at) random dropout.
Fig. 2 Trajectories of AAA growth given the type of censoring for all yearly AAA observations (the yearly mean AAA diameter with 95% non-parametric bootstrap confidence intervals are superimposed on the plots): (a) administrative censoring; (b) died; (c) lost (more ...)
Of the 3846 non-final ultrasounds that measured less than 45 mm, 3158 (82%) had a repeat measurement within 9–15 months, broadly following the protocol. 4041 non-final ultrasounds measured 45 mm or more, for which 2913 (72%) had a repeat measurement within 1–5 months. Appointments were therefore not always strictly adhered to, either because the patient did not attend or because the appointment was not scheduled. The effect of these missed appointments on the analysis should be minor, since these data are only intermittently missing and there is no reason to suspect that the missingness depends on unobserved AAA diameters.
4.2. Estimation and convergence
Both classical restricted maximum likelihood and Bayesian MCMC methods are used to obtain estimates of the parameters. Non-informative priors are used for the Bayesian models. The population mean parameters (β
) are given vague independent N
) priors with τ
=1000. The within-subject variance
is assigned an inverse gamma prior, IG(0.001,0.001). To ensure that Σ is positive definite, an inverse Wishart prior distribution is used with degrees of freedom equal to 1 plus the dimension of Σ, i.e. 3 for the linear model, and 4 for the quadratic. This has the effect of placing a uniform distribution on each of the correlation parameters (Gelman and Hill, 2007
). Inferences are based on two parallel chains, each with 10000 iterations, of which the first 500 are discarded as burn-in. The convergence diagnostic
(Brooks and Gelman, 1998
) is assessed for each parameter with a value close to 1 indicating good convergence properties. We obtained
for all parameters in all models. Posterior medians (with standard deviations) from the Bayesian analyses are interpreted as equivalent to estimates (with standard errors (SEs)) from the classical analyses. The WinBUGS code for the models that are presented in this paper is available at http://www.mrc-bsu.cam.ac.uk
4.3. Timescale for analysis
There are two possible choices for the timescale that is used in the longitudinal model; time since screening and age. Time since screening is relevant since at baseline the population is constrained to be within the diameters 30–54 mm; the inclusion policy of the MASS study. This is also the inclusion criterion for the UK national screening programme, and hence this timescale is highly relevant for predictions. However, using age as the timescale may be more relevant for general predictions of aneurysm growth, where the time of screening is an irrelevant quantity. A comparison of models using each timescale was first made. In a hierarchical model, the choice of timescale is important as shrinkage of the random effects can result in different estimates of mean growth and can change predictions. This is seen in , where restricted maximum likelihood estimates from classical linear and quadratic growth models, using either time since screening or age as the timescale, are presented (linear models, L1-time and L1-age; quadratic models, Q1-time and Q1-age). The estimates of mean AAA growth are quite different between the models L1-time and L1-age, and between Q1-time and Q1-age.
Parameter estimates (with SEs in parentheses) from classical restricted maximum likelihood linear and quadratic growth models, using either time since screening or age as the timescale†
The models can be further compared by studying the Akaike information criterion AIC. Clearly a non-linear trend provides a better fit as AIC decreases dramatically in the two quadratic models. Furthermore, the use of time since screening as the timescale provides a better fit to the MASS data. In terms of prediction, AIC helps us to choose a model that will give good predictions for a new individual recruited in the same way as the sample, and clearly the models that use time since screening are better in this respect. Time since screening is therefore used as the timescale in all following models, but to make relevant predictions for the national screening programme at age 65 years we also consider including baseline age as a covariate in Section 5. This facilitates predictions to be made for a number of possible ages at screening, and in particular age 65 years.
4.4. Bayesian models
shows the parameter estimates for the standard linear (L1) and quadratic (Q1) models, fitted by using Bayesian MCMC sampling. Compared with the maximum likelihood estimates that were obtained from the classical fit (), the classical and Bayesian models produce almost identical parameter estimates, suggesting that the priors that were chosen in the Bayesian models are indeed effectively non-informative. also shows the posterior mean deviance
, the effective number of parameters pD
and the deviance information criterion
(Spiegelhalter et al., 2002
). From model L1, the average diameter at first screen is 37.5 mm (SE 0.2), with an average growth rate of 2.2 mm year −1
(SE 0.07). There is considerable between-patient variation both in AAA diameters at first screen and in growth rates, and these are positively correlated. As with the classical models, there is evidence that AAA growth is non-linear since the quadratic model (Q1) has a lower DIC.
Parameter estimates from Bayesian linear and quadratic random-effects models†
shows the distribution of measured aneurysm diameters at first screen. Clearly the distribution is skewed and non-normal, indicating that the standard model may be inadequate. To avoid making a parametric assumption concerning the distribution of diameters at first screen, a further model (L2) allows the individual-specific intercepts to be independent, and entirely unrelated, parameters. Each individual's intercept is given an independent uniform U(0,1000) prior. Hence this model estimates 1046 separate intercepts with no shrinkage towards their overall mean. The random slopes are then modelled conditionally on the intercepts by assuming that the conditional distribution is Gaussian, as follows:
Histogram of aneurysm diameters 30–54 mm at first screen in the MASS (n=1046)
is given a uniform U
(−5,5) prior. This parameterization results in E
. Since the population of the intercepts is not specified and hence β0
are not parameters of the model, in the unweighted empirical means and variances of the intercepts are presented. The standard deviation σ0
is higher than estimated in model L1, reflecting the fact that no shrinkage of the intercepts is taking place. Conversely the standard deviation of the slopes σ1
is smaller as is the empirical correlation between intercepts and slopes. Interestingly, the effective number of parameters increases by only 53 compared with the random-effects model L1 and the posterior mean deviance
is actually higher in model L2, as is DIC. One possible reason for the increase in
is that, since the deviance is averaged over its posterior distribution,
already incorporates a degree of penalty for model complexity. Indeed the relatively small increase in the effective number of parameters (compared with the 1046 individuals) suggests that there is not much shrinkage of the intercepts under model L1. Nevertheless, the smaller DIC in model L1 indicates that this is the preferred model.
There is evidence from residual plots that the within-patient variation is more heavy tailed than Gaussian. So a further model that we consider specifies a t-distribution for within-patient variation. The degrees of freedom of this distribution are to be estimated, and we place a U(2,1000) prior on the degrees-of-freedom parameter. Results from this extension to the linear model, labelled L1-T, are given in . The degrees of freedom are estimated to be close to 4, suggesting a heavy-tailed distribution, and DIC has decreased substantially compared with model L1.
shows predictions for a specific individual whose AAA diameter at screening (t=0) is either 35 mm or 50 mm. All models estimate a similar true growth rate when y=35 at first screen, at approximately 2 mm year−1. The predicted growth rates when y=50 at first screen are, however, higher, at approximately 3.5 mm year−1. The estimated time for the underlying process to cross 55 mm is similar across all models, although the wide credible intervals limit practical usefulness of this quantity. The probabilities of crossing the 55 mm threshold within 3 months and 1 year are practically 0 for a screening diameter of 35 mm and are very similar between models L1, Q1, and L1-T for a screening diameter of 50 mm, whereas the probabilities from model L2 are higher. shows predicted aneurysm growth given a single measurement at screening of either
Predictions (with 95% intervals) for an individual with a single AAA diameter measurement at baseline (t=0) according to the four models of
Fig. 4 Predicted AAA diameter given a current diameter of either (a)–(d) 35 mm or (e)–(h) 50 mm taken at the time of screening, according to the four models of (posterior medians and pointwise 95% credible intervals are presented): (a), (more ...)
In general, predictions are remarkably similar between the fitted models, with the quadratic model Q1 showing slight curvature for an individual with a diameter of 35 mm at screening. Interestingly, for an individual who measures 50 mm at screening, the predicted average AAA diameter 3 months later is slightly less than 50 mm for all models except L2. This occurs because the intercepts from all these models are shrunk towards the population mean intercept, resulting in slightly lower predictions, whereas there is no shrinkage of intercepts in model L2. By way of explanation, these random-intercept models assume that an imperfectly measured baseline diameter of 50 mm is more likely to be an outlier since it lies far from the mean baseline diameter in the population. Hence the model predicts that a subsequent measurement would on average be less than 50 mm. For an individual with a diameter of 35 mm at screening, predictions are more similar to the observed diameter since it is closer to the population mean diameter resulting in less shrinkage.
In terms of planning monitoring intervals for AAA, a key desire is to limit the probability that the next observation is greater than or equal to the 55 mm threshold. Such probabilities can be easily calculated from the predictive distributions in an MCMC framework, and shows how these depend on the baseline AAA diameter and can be controlled by choosing the time of the next measurement. Both the linear and the quadratic models are shown in for probability limits of 1%, 5% and 10%. For example, if we wish for fewer than 1% of individuals to have a diameter over the threshold at their second scan, a screening interval of 2.5 years or less would be sufficient for those who measured 35 mm at baseline. In contrast this interval would need to be 5 months or less for an individual who measured 45 mm at baseline. For individuals who measured 50 mm at baseline there is actually already a chance greater than 1% that an immediate remeasurement would result in an observed diameter that is 55 mm or more. The linear and quadratic models give very similar results.
Fig. 5 Probability of an observed AAA diameter being greater than or equal to 55 mm at rescreening given baseline AAA diameter, using (a) the first linear and (b) the quadratic models of : , 10% probability; ––, 5% probability; (more ...)
The accuracy of the models in predicting the probability of exceeding the 55 mm threshold is investigated by forming a second, prediction, data set consisting of each individual's first k
=1,2,3. We treat each individual in the prediction data set as a new patient, independent from the analysis data set, for which new random effects are estimated. The posterior predicted probability of a measurement being greater than or equal to 55 mm is then calculated for each individual at years 1, 2, 3, 4 and 5 after screening. However, these probabilities cannot be directly compared with the observed data owing to individuals with high measurements dropping out of the study, and hence leaving a non-representative sample. Instead, multiply imputed complete data sets, as suggested by Gelman et al. (2005)
, are used as the comparator. Each multiply imputed data set is constructed as follows. For each individual at year x
=1,…,5), the measurement and observation time closest to year x
is used. However, if no scans are taken within 6 months of year x
the measurement is imputed from the individual's posterior predictive distribution at year x
(using all available data). The multiply imputed data set therefore consists of a mixture of observed data and imputed data. The percentage of missing, and hence imputed, data at years 1–5 is 3%, 10%, 23%, 34% and 49% respectively. Over all (19000 MCMC) imputed data sets the mean proportion of measurements that were 55 mm or greater was calculated as 18.5% by using model L1. This compares with predicted proportions of 17.9%, 18.4% and 18.4% when the first one, two or three scans were used for prediction respectively, suggesting an overall good predictive performance. Similar results were obtained for the other models.
4.5. Predictors of abdominal aortic aneurysm growth
We consider extending model L1 to include possible predictors of AAA growth. We chose to extend this model because of its simplicity and because its predictions were very similar to those of the more complex models. At first repeat scan individuals were asked about their current smoking habits. 97 individuals reported never smoking compared with 585 previous smokers and 317 current smokers. Smoking data were missing for 47 individuals. The population parameters for this model are very similar to those for model L1 although there is strong evidence that previous and current smokers have on average larger diameters at baseline than non-smokers, by 2.4 mm (SE 0.8) and 2.5 mm (SE 0.9) respectively, and faster growth than non-smokers, by 0.53 mm year−1 (SE 0.21) and 0.82 mm year−1(SE 0.22) respectively. The age of an individual at baseline was also considered as a predictor of aneurysm growth. There was found to be no evidence of an association between age and AAA size at screening (−0.08 mm per year of baseline age; SE 0.08), and only a small association between age and the rate of AAA growth (−0.04 mm year−1 per year of baseline age; SE 0.02). The surprising negative coefficient, suggesting smaller AAA growth in the older population, may be due to the MASS selection process. One hypothesis is that fast growers in the older population will have diameters that are too large to be included in the MASS, whereas slow growers in the young population have diameters that are too small for selection. Such a selection bias could produce an apparently negative association between age and growth.
shows how predictions vary depending on the number and pattern of previous observations and the smoking status of an individual. All predictions are shown for individuals aged 65 years at screening who have a 40-mm-diameter aneurysm observed 2 years after screening. is based on a single 40-mm-diameter measurement taken 2 years after screening and can be used as the reference prediction. presents an individual with two 40 mm measurements at t=1 and t=2, and predictions are slightly higher in this scenario. show predictions for ‘fast growers’ who have observed growth rates of 6 mm year−1 (approximately 2 standard deviations above the population mean). Meanwhile, show predictions for individuals whose AAA is observed to ‘shrink’ at a rate of −2 mm year−1 (approximately 2 standard deviations below the population mean). Predictions change only very slightly between smoking categories, despite the highly significant effect of including this variable as a covariate in the model. In contrast, previously observed AAA diameters do alter predictions, suggesting that the whole history is important, not just the final diameter. Since the average observed diameter for individuals who ‘shrink’ ( is greater than that for the ‘fast growers’ (, predictions are actually higher for these individuals. Surprisingly, the number of measurements does not apparently alter the precision of the predicted diameter, although the predicted size of AAA does change slightly between patients who have two measurements compared with those who have three. Finally, all the predicted growth curves appear to pass close to the average observed diameter at the average observation time.