Growth mixture modeling (GMM) has the potential of uncovering important information about classes of responders and non-responders in clinical trials extending the above models to longitudinal settings where not only the end point outcome is considered but the trajectory throughout the trial. GMM combines random effects modeling in conventional repeated measures analysis with finite mixture modeling using latent class variables to represent qualitatively different classes of trajectories ([19
]). GMM is currently used in a wide variety of settings, see, e.g. [20
] for an application to the joint study of PSA development and prostate cancer survival, [21
] for an application to identifying trajectories of positive affect and negative events following myocardial infarction, and [22
] for an application to growth modeling with non-ignorable dropout in a depression trial.
The 4-class model discussed above will now be presented in GMM terms. Let pi
be the binary latent class variables for individual i
in the placebo and drug group, respectively. The probability of latent class membership is modeled by the logistic regressions
is a covariate influencing class membership of the placebo latent class variable pi
, and zdi
is a covariate influencing class membership of the drug latent class variable di
. Let 1 refer to the non-responder class, and 2 the responder class. The relationship between pi
is expressed via the log odds ratio
It may be noted that this model assumes that treatment status does not influence latent class membership. Class membership is conceptualized as a quality characterizing a subject before entering the trial. As an alternative, one may hypothesize that class membership arises as a function of treatment, with a single class during the pre-treatment period. This approach will not be explored here. If treatment only influences class membership probabilities and not the random effect means directly the distinction between the four hypothesized latent classes of subjects cannot be made. Also, if the model allows treatment to influence class membership, the principal stratification interpretation of [8
] referred to in Section 3 is not valid.
Consider the depression outcome yti
observed at time point t
for individual i
. Let η denote random effects, let at
denote time, and let εt
denote residuals containing measurement error and time-specific variation. In line with the real-data analysis in Section 4.1, it is assumed that the outcome is observed at two pre-randomization time points. For the first, pre-randomization piece, the means of the random effects vary as a function of the combination of placebo latent class k
= 1, 2) and drug latent class l
= 1, 2),
= 0 to center at baseline, and random effects
With only two pre-randomization time points, the model is simplified by specifying a non-random slope,
, for identification purposes. All pre-randomization parameters are assumed to be equal across the placebo and drug groups.
Assume for simplicity a single drug and denote the treatment status for individual i
by the dummy variable wi
= 0 for the placebo group and w
= 1 for the drug group). For the second, post-randomization piece, a quadratic growth model is specified (t
= 3, 4, … ,T
where the at
values are set according to the distance in timing of measurements and c
is a constant such as the average of at
. The random effects are allowed to be influenced by the group dummy covariate w
, their distributions varying as a function of the combination of trajectory classes k
The residuals ζi
in the first and second piece have a 4 × 4 covariance matrix Ψk,l
, here taken to be constant across the k, l
classes. The residuals εti
of the two pieces have a T
covariance matrix Θk,l
, here taken to be constant across classes as well. All residuals are assumed i.i.d and normally distributed. For simplicity, Ψk,l
are assumed to not vary across treatment groups, although this can be relaxed. In the actual analyses in Section 4.1 and Section 4.2, the drug status covariate is represented by yet another latent class variable, where the latent status is known (this adds one extra class probability parameter, which could be ignored, but is included in the reporting of all the models). This creates a total of eight classes where the variances can be allowed to vary over subsets of those classes.
As seen in (15)
, the placebo group (wi
= 0) consists of subjects that vary in the means of the growth factors, which are represented by α0kl
, and α2kl
. This gives the average development in the absence of medication for each of the four types of subjects of . Because of randomization, the placebo and drug groups are assumed to be statistically equivalent at the first two time points. Drug effects are described in the second piece by γ0kl
, and γ2kl
as a change in the development that can be different for the four types of subjects.
This model allows the assessment of drug response in the presence of placebo response both in terms of γ0kl
, and γ2kl
and in terms of the probabilities of (7)
, and (9)
, giving the prevalence of each of the four types of subjects of .
The analysis may use the above model in an exploratory way or by restricting the parameters of the second growth piece to correspond to the hypothesized non-responder and responder classes. Used in an exploratory way, the resulting four types of subjects may not have the interpretation used for . For example, instead of the Placebo Only Responder type two sets of Drug Only Responder types may be found, differing in their non-response/response characteristics. Restrictions on the parameters can be applied e.g. by forcing the estimated outcome mean at the last time point to be less than a certain value indicating response, and greater than a certain value indicating non-response.
An equivalent way to formulate the model is as in [12
] using a single latent class variable that has four categories corresponding to the four types of subjects in . This approach does not emphasize the hypothesis that the four types of arise as a combination of an individual being prone to placebo and/or drug response. It also does not enable separate covariates for the latent class covariates as in (7)
. Nevertheless, the single latent class variable approach is used in the present analyses for simplicity given that no covariates are included. As mentioned, the placebo-drug dummy variable is handled via an additional latent class variable with known class status.
The models discussed may be estimated by maximum-likelihood using the Mplus program [23
]. Mplus was used for both the real-data and Monte Carlo analyses. Mplus scripts for the analyses are available from the first author. For a technical description, see [12
] and [13
The choice of the number of latent classes in mixture modeling is often guided by the minimum of the Bayesian information criterion (BIC), penalizing models with many parameters ([24
is the loglikelihood, r
is the number of free parameters in the model, and n
is the sample size. The lower the BIC value, the better the model. BIC, however, is not always reliable for small sample sizes but may underestimate the number of classes for samples of size 200 and below ([26
]). Classification of subjects into the latent classes can be carried based on the estimated posterior probabilities of class membership ([17
]). A summary measure of the classification quality is given by the entropy measure (see, e.g., [27
denotes the estimated posterior probability for individual i
in class k
. Entropy values range from zero to one, where entropy values close to one indicate clear classifications in that the entropy decreases for probability values that are not close to zero or one. Values of at least 0.8 typically represent good classification quality.
4.1 GMM applied to the antidepressant trial
The growth mixture models will here be applied to the repeated measures data from the antidepressant trial, whereas the next section treats the schizophrenia trial. For the antidepressant trial the GMM approach uses a quadratic growth function for the second, post-randomization piece. As a first step, the drug and placebo groups are analyzed separately to show the trajectory features.
4.1.1 Separate GMM analysis of drug and placebo groups
shows the results of growth mixture modeling of the drug and placebo groups analyzed separately. For the drug group, BIC points to three classes. As mentioned earlier, however, the low sample size may make BIC less trustworthy and suggest too few classes. The 3-class solution has an entropy of 0.87. The posterior probabilities are used to classify observed trajectories for subjects most likely to belong to each of the 3 classes as shown in . The mean curves of the 3-class solution for the drug group are shown in the top part of . Class 1 is drug responder class containing 67% of the subjects. Class 2 is a drug non-responder class containing 28%. Class 3 is a drug non-responder class with volatile development, containing 5%. The 4-class solution gives two classes very similar in shape and prevalence to class 1 and class 3, whereas class 2 is split into two non-responder classes.
Summary of separate analyses of drug and placebo groups
Observed trajectories divided into 3 classes for the drug group (class 1 is top left, class 2 is top right, class 3 is at the bottom)
Estimated mean curves for 3-class model for drug group (top) and placebo group (bottom)
Judged by BIC, the placebo group analyses suggest that a conventional, single-class growth model is sufficient. Again, the low sample size may cause BIC to underestimate the number of classes. The mean curves for the 3-class solution for the placebo group are shown in the bottom part of . For the placebo group only 1/3 are in the responder class which is in contrast with the 2/3 in the responder class for the drug group. What is not clear from these analyses, however, is what portion of the drug responders and drug non-responders would have been responders and non-responders under placebo. For this the joint analysis of both groups is needed.
4.1.2 Joint GMM analysis of drug and placebo groups
Four models are fitted as summarized in the bottom part of , labeled Growth mixture analysis. Judging by BIC, the parsimonious model 7 with 3 classes and 2 sets of means is better than model 6, the ITT 1-class random effect repeated measures model. For model 7, the Drug Only Responder prevalence is estimated as 35% and with a week 10 treatment effect of 14 units on the Hamilton D28 scale. The entropy is 0.67. The results for model 7 are not far from those of models 2, 3, and 5.
shows the estimated mean trajectories for model 7, the 3-class model with 2 sets of means. The modeling uses the approach of letting the treatment dummy covariate be represented by a latent class variable with known classes as mentioned in Section 4. This results in a total of 6 latent classes: classes 1–3 are for the placebo group and classes 4–6 are the corresponding classes for the drug group. The top three curves are identical and the bottom three curves are identical, but the curves have been jiggled here to show the class membership. It is seen that classes 1 and 4 represent Never Responders (non-response in both groups), classes 2 and 5 represent Drug Only Responders (non-response in placebo group and response in drug group), and classes 3 and 6 represent Always Responders (response in both groups).
Estimated mean curves for model 7: 3-class model, 2 sets of means
shows the estimated mean trajectories for model 8, the 3-class model with 4 sets of means. The entropy is 0.91. As for the previous figure, classes 1–3 are for the placebo group and classes 4–6 are the corresponding classes for the drug group. Classes 1 and 4 represent Never Responders and classes 3 and 6 represent Always Responders. For classes 2 and 5, however, the outcome is unclear. Although the class 5 trajectory for the drug group ends at a lower Week 10 value than the corresponding class 2 trajectory for the placebo group, the class 5 mean trajectory ends with a high value at Week 10. It is therefore unclear if this can be characterized as a Drug Only Responder class. The class percentages for this solution are also quite different than for the other models, with the Never Responder class prevalence estimated as only 4%, which does not seem plausible. As for the week 10 analysis using model 4, the 3-class, 4-mean model is therefore a questionable representation of the data.
Estimated mean curves for model 8: 3-class model, 4 sets of means (AIR model)
Model 9, the 4-class model with 8 sets of means, does not have a better BIC than the other models, but as noted in Section 4, BIC tends to underestimate the number of latent classes in small samples. The entropy is 0.84. gives the estimated class prevalences and shows the estimated mean curves for the four types of subjects divided into the placebo and drug groups, resulting in eight classes of curves.
Antidepressant trial prevalence of four types of subjects under 4-class model with 8 sets of means
Estimated mean curves for model 9: 4-class model, 8 sets of means
shows that the Never Responder subjects are found in Class 2 for the placebo group and in Class 6 for the drug group. Their week 10 means are around 15. As seen in , the prevalence of this type of subjects is estimated as 28%.
The Drug Only Responder subjects are found in Class 3 for the placebo group and Class 7 for the drug group. The prevalence of this type of subjects is estimated as 26%. The estimated week 10 treatment effect is 18 (corresponding to an estimated mean of 5 for the drug group and an estimated mean of 23 for the placebo group), which corresponds to a little over two standard deviations. The 95% confidence interval for the treatment effect is 15.2 – 21.5.
The Placebo Only Responder subjects are found in Class 1 for the placebo group and Class 5 for the drug group. It is seen that the placebo response is temporary, limited to weeks 4 – 7, with a later upswing in depression. This type of subjects has the highest baseline score of about 28 and a more volatile development with a sharp increase in depression around week 1. The prevalence of this type of subjects is estimated as only 4%.
The Always Responder subjects are found in Class 4 for the placebo group and Class 8 for the drug group. Their estimated mean at week 10 is only around 4. The prevalence of this type of subjects is estimated as 42%.
also gives the estimated marginal response rates for placebo and drug. The placebo response rate is 46%, whereas the drug response rate is 68%.
The antidepressant analysis results are summarized in , listing the models in order of appearance. All models are estimated using maximum likelihood except model 3 which uses the moment estimator of Section 3.1. With the exception of the two maximum-likelihood estimated 3-class, 4-mean AIR models, model 4 and model 8, the estimates are on the whole rather close. Only model 9, the 4-class GMM with 8 sets of means, uncovers the hypothesized 4 classes of . For this model the drug effect of 18 at week 10 for the Drug Only Responder class corresponds to a little over two standard deviations in terms of the total variation at week 10.
Previous attempts to isolate placebo response in antidepressants trials by statistical modeling include [28
], where five trajectory categories were hypothesized for an individual treated with an active drug and where placebo subjects can fall into only one of the first three categories: A. Nonresponders, B. Nonresponders with initial placebo effect, C. Placebo responders, D. True drug responders, and E. Mixture effects responders. This classification does not have the clarity of the potential outcomes - principal stratification approach used in the current paper. For example, the requirement that placebo subjects cannot occupy category D., corresponding to Drug Only Response, is in contrast with the view of the current paper that the categories exist as principal strata before randomization. Also, it is not clear if, for example, treatment subjects in category D. would fall in category A. if they had been given placebo. Nevertheless, the trajectory types of the first four categories are found in the current paper, including the category B. trajectory type seen for the Placebo Only Responder class. The fifth category E. (”subjects who have an initial improvement due to nonspecific effects and then experience a drug effect”) represents a more fine-grained distinction than used here. More recently, [29
] used infinite mixtures in an attempt at isolating drug effects in the presence of placebo effects.
4.2 GMM applied to the schizophrenia trial
The schizophrenia trial example offers two new growth mixture features. First, the outcome is binary instead of continuous. Second, the fact that the outcome is categorical makes it possible to test the model against data using conventional likelihood-ratio χ2 testing against the unrestricted model represented by a multinomial distribution for the corresponding frequency table. Also, in this trial the sample size is larger (n = 437) so that the use of BIC to help decide on the number of classes is more reliable.
For the schizophrenia trial data the ITT model, the 3-class, 2-mean model, and the 4-class, 8-mean model will be discussed. Here, a linear logistic growth model with random intercept and random slope is applied (time was specified in weeks, not taking the square root as in previous growth analyses of these data). The conventional, ITT single-class random effects model gives LL = −858 with 8 parameters, and BIC = 1765. The week 10 estimated probabilities are 0.60 and 0.27 for the placebo and drug groups, respectively. The 3-class, 2-mean model gives LL = −840 with 11 parameters, and a better BIC of 1748. The entropy is 0.75. The 4-class, 8-mean model gives LL = −836 with 19 parameters, and a worse BIC of 1788. The entropy is 0.72. Given the binary outcomes, a likelihood-ratio χ2 test for the frequency table of all response patterns is also available for evaluating the fit of the three models. The 1-class model gives χ2 = 77 with 23 degrees of freedom, whereas the 3-class model improves the model fit to χ2 = 42 with 20 degrees of freedom. The 4-class model obtains χ2 = 33 with 12 degrees of freedom.
The prevalences for the 3-class model are estimated as: Never Responder class 45%, Drug Only Responder class 27%, and Always Responder Class 28%. The 4-class does not show a Placebo Only Responder class, but instead two non-responder classes. The prevalences are Never Responder class 60%, Drug Only Responder class 27%, and Always Responder Class 13%.
The estimated mean probability curves for the 3-class model are shown in . Curves are shown for 6 classes, where the first 3 are for the placebo group and the next 3 the corresponding classes for the drug group. Classes 1 and 4 have the same curves as do classes 3 and 6, but these classes are jiggled to be slightly different in order for the curves to show up. The Drug Only Responders appear as class 2 showing non-response in the placebo group and as class 5 showing response in the drug group. The figure shows that Drug Only Responders have a quicker improvement than Always Responders (class 3 and 6 for placebo and drug groups, respectively). Although the model has only responder and non-responder mean parameters, the linear logistic growth model produces this differential improvement due to different starting points at week 0.
Schizophrenia trial: 3-class, 2-mean model
Growth mixture modeling of these data was also carried out in [30
] (the outcome was kept in ordinal form and the square root of week was used). The model is similar in that it allows for responder and non-responders in both the treatment and placebo groups. The model is different in that only two classes are used and that it does not allow for different drug effects in the classes. Investigating their model, it was found that the drug effect was significantly different across the classes. This model resulted in LL = −842 with 11 parameters, and BIC = 1751, which is a slightly worse BIC value than for the proposed 3-class, 2-mean model. More importantly, the modeling in [30
] makes no attempt at causal inference regarding potential outcomes and therefore does not make a distinction between the classes of Never Responders, Drug Only Responders, Placebo Only Responders, and Always Responders. Because of this, it cannot make inference about the Drug Only Responder rate.