A four-class mixture model was selected as the optimal model. Explanatory variables enter the model in two ways. First, within each class, EQ-5D is predicted by HAQ and HAQ2, pain, age and age2. Second, the probability of any patient’s observation being in each of the four classes is based on HAQ, pain and pain2. The optimal linear regression model included HAQ and HAQ2, pain, age and age2. However, this model suffered very poor fit particularly at the extremes of good health and poor health.
The mixture model vastly outperformed the linear model in terms of summary fit measures. AIC and BIC were both lower (indicating better fit) for the mixture model and there was a 9.6% improvement in MAE and a 3.4% improvement in RMSE. Importantly, the improvement in fit was greatest at the extremes of very poor and very good health. For those patients with an HAQ either between 0 and 1 or between 2 and 3, MAE improved by more than 11%. At pain scores of 0, the MAE reduces from 0.13 to 0.08, a 35% improvement. At pain scores exceeding 95, the MAE reduces from 0.23 to 0.18, a 22% improvement. These features are evident in , which plots the mean EQ-5D versus (a) HAQ and (b) pain for the observed data, the linear regression model and the preferred mixture model. Results for this model are reported in .
Mean observed and predicted values for linear and mixture models: (a) HAQ vs EQ-5D and (b) pain vs EQ-5D.
Results from the four-class mixture model
The first class is by far the largest, with a mean probability of class membership of 0.73. In this class, HAQ and pain are negatively related to EQ-5D (P = 0.000) (). HAQ2 is not significant. A positive relationship with age and age2 is demonstrated but in the case of age2 this is not statistically significant (P = 0.230). The average characteristics of those patients most likely to be in this class are very similar to those of the average overall dataset. Notably, these are less severely affected patients with a mean HAQ of approximately 1, EQ-5D of 0.67 and disease duration of 17 years. a illustrates that this component of the model has a peak around 0.7 that coincides with that of the observed data in . This component also contributes to the mass of data at EQ-5D equal to 1, but does not contribute significantly to the lower end of the distribution.
Distribution of simulated values from the four-component mixture and linear models: (a)–(d) for each component individually, (e) four-class combined and (f) linear model.
The mean probability of an observation being in the second class is 0.05, making it the smallest class. This component of the model has a large spread, including both those patients in the most severe EQ-5D health states and those in full health (b). The coefficients on HAQ and HAQ2 indicate that EQ-5D decreases, by increasing amounts, as HAQ worsens. The impact of pain on EQ-5D in this group is the most pronounced of all the classes. In those patients most likely to be assigned to this group, the mean HAQ is almost 2.76 (s.d. 0.23), EQ-5D is 0.33 (s.d. 0.32), but pain is relatively mild at 10.3 (s.d. 11.2). Patients most likely to be in this group have an average RA duration in excess of 31 years.
c shows that the fourth component is centred around EQ-5D of 0.2 and accounts in part for the second element of the bi-modal EQ-5D distribution. Seven per cent of patients are most likely to be assigned to this component. HAQ is negatively associated with EQ-5D and is much greater in magnitude than the positive coefficient on HAQ2. Pain is also negatively associated with EQ-5D. This is a class made up of patients with poor functional status. The mean HAQ is 2.03 (s.d. 0.44). These patients also have the most severe average pain score for any of the four groups at 87.8 (s.d. 7.4).
The fourth class shows no statistically significant relationship between EQ-5D and either age or pain. HAQ is negatively related to EQ-5D (P < 0.05). HAQ2 is not statistically significant. This group of 14% of the dataset is made up of patients with mild or no symptoms. The mean HAQ is 0.15 (s.d. 0.27), pain is 2.3 (s.d. 2.5) and EQ-5D is 0.93 (s.d. 0.11). d illustrates how this element of the model contributes predominantly to the mass of values at EQ-5D equal to 1.
e shows that the key features of the EQ-5D data distribution () are replicated by the bespoke mixture model: a mass of observations at 1, a gap to the next set of feasible values, tri-modal and does not predict values outside the feasible range either at the top or the bottom. The linear regression model has none of these features (f).