A number of simulation studies were conducted. In each case, 100 data sets with three ‘latent’ classes (η={0.05,0.15,0.80}) were generated with sample sizes of 500 and 5000. Here, the initial ηs refer to means over the population of the covariates. Both the latent class prevalences and conditional probabilities (π1={0.75,0.75,0.75,0.25,0.25,0.25}, π2={0.25,0.25,0.25,0.75,0.75,0.75}; π3={0.05,0.05,0.05,0.05,0.05,0.05}) were chosen to reflect what might be expected in a scenario where a disorder has two distinctively different potential profiles, with the majority of the sample having no disease. Runs were first conducted with a range of λ values from 0 (no penalty) to 10 with an increment of 1 (‘gross mapping’). Subsequent ‘further mapping’, with an increment of 0.1, was then done in the most promising region (based on cross-validation of log-likelihood loss, R) identified by gross mapping, or in increments of 1, 10, or 100 beyond the initial region.
In order to assess the effects of penalization on external validity, external distal outcome variables were simulated from a Uniform (0,1) distribution, and were then dichotomized using latent class-specific outcome prevalences (0.70, 0.05, 0.05), such that individuals in class 1 would be expected to have the distal outcome 70 per cent of the time, and individuals in class 2 or 3 would be expected to have the distal outcome 5 per cent of the time. Following each simulation, parameter estimates corresponding to λ=0 and each λ value evaluated in the ‘further mapping’ phase were used to calculate posterior probabilities of class membership for each simulated individual. These posterior probabilities, along with the simulated outcome statuses were then used to calculate AUC for each λ value.
The first set of simulations was intended to examine the effects of penalization in the context of varying strengths of association between the class of interest (class 1) and a single covariate (
β11=0.69 or 1.79) in the case of a correctly specified latent class regression model—i.e. where all model assumptions are met. In this case, penalization of non-intercept
βs(
βpj,
p≠0) to minimize prediction error may add precision in identifying classes but is not expected to meaningfully improve over analyses without the penalty. The log odds ratio (
β12) for association between the single covariate and class 2 was always 0. The covariate itself was generated from a Uniform (−0.5,0.5) distribution. A number of summary measures were calculated for

: bias, empirical and estimated precision, accuracy of the precision, and mean-squared error (MSE). shows fine-mapping results with sample size equal to 5000 and a ridge penalty. As expected, non-zero values of
λ resulted in small increases in precision, at the expense of small increases in bias. These effects were larger in identical simulations (not shown) with the smaller sample size of 500. Standard errors accurately estimated the coefficient sampling distributions in both cases. In no case did AUC (not shown) vary with
λ.
| Table IRidge penalty: variations of β11 and δ (N =5000). |
The next set of simulations was conducted to determine the effects of penalization in the context of differential measurement that was not accounted for in the model. Differential measurement is present when there is dependence between one or more covariates and one or more latent class indicators after conditioning on latent class membership. In the presence of such differential measurement, the standard latent class regression model (1) would be misspecified, as its form implies independence of covariates and latent class indicators after conditioning on class membership. Typically, differential measurement is ascertained by creating a set of ‘pseudoclass assignments’ based on individuals’ vectors of posterior probabilities of class membership, and then testing for dependence within each pseudoclass [
5]. This set of simulations regarding differential measurement is of primary concern for our methodological development, which aims to tradeoff between construct validating assumptions that may not all be correct. Data sets were simulated with a range of log odds ratios (
δ={−1.79,0.0,0.69,1.79}) for the conditional dependence between the covariate and the first of the six latent class indicators; this
δ was constant across classes. Conditional probabilities and latent class prevalences were as in the previous set of simulations, and
β11=1.79. We will refer to instances where both the conditional dependence between the covariate and the latent class indicator as well as the relationship between the covariate and class membership are in the same direction as ‘positive differential measurement’ (
δ={0.69,1.79}), and when they are in the opposite direction (
δ={−1.79}) as ‘negative differential measurement’. The second half of shows results from these simulations using the ridge penalty and a sample size of 5000. In comparing bias across values of
δ, but where
λ=0 (no penalty), it is clear that the presence of positive differential measurement leads to overestimation of
β11. This is because the relationship between the latent class of interest and the indicator (here,
π11=0.75) and between the covariate and the indicator (
δ) are both in the same direction. As in the previous set of simulations, imposition of the penalty decreased empirical variances, but bias was decreased. In this case, the typical downward biasing effect of the penalty (partially) neutralized the upward biasing effect of the differential measurement. In the case of negative differential measurement, unpenalized estimation was biased downward, and imposition of the ridge penalty
exacerbated that bias. It is therefore important that users considering use of a penalty first conduct diagnostics on the unpenalized model to determine the degree and direction of differential measurement. Here, the standard errors overestimated the coefficient sampling distributions, both with and without penalization. In cases where differential measurement is suspected, users may want to employ an alternate method of standard error estimation, such as bootstrap or Huber–White.
In none of the simulations we performed did the conditional probability estimates change dramatically as a result of penalization, though there were subtle alterations. shows scatter plots for the posterior probabilities of class membership based on parameter estimates for λ=0 and λ=0.1 from a simulation analogous to that in where β11=1.79 and δ=1.79, but with a sample size of 500. Black points represent simulated cases where the first latent class indicator (the indicator with the independent relationship with the covariate) is 0, while for gray it is 1. Points above the diagonal line represent cases where the posterior probability of membership in that class had increased as a result of the penalty, and those below represent cases where that probability had decreased. The pattern of posterior probabilities suggests that use of the ridge penalty ‘moved’ some individuals with a positive value for the first indicator out of the first class and into the second, and likewise some individuals with negative value for the first indicator out of the second class and into the first.
Given that traditionally both the ridge and LASSO penalty are useful in situations with large numbers of covariates, we next explored scenarios with nine dichotomous covariates. These nine covariates were generated from a multivariate normal distribution with means of 0, variances of 1, and covariances of either 0 (first set) or 0.9 (second set). These were then dichotomized to produce covariates with specified frequencies,
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
=(0.5,0.2,0.3,0.1,0.2,0.3,0.4,0.5,0.6). Simulated
β values for the nine covariates were
β•1 =(1.79,0,0,–1.39,1.39,–0.40,0.40,–0.69,0.69), and
β•2 =(0,0.69,–0.69,0.40,–0.40,1.39,–1.39,0.40,–0.40). shows results using a ridge penalty and sample size of 5000. As in the previous simulations, imposition of the penalty increased bias but improved precision, resulting in a modest improvement in MSE.
| Table IIRidge penalty: nine covariates, variations of ρ (N =5000). |
We have previously described penalization from a Bayesian perspective, and next wished to compare our penalization approach to a Bayesian analysis, both in terms of parameter estimation and computational burden. To achieve this, Bayesian estimations of the latent class regression models were performed using WinBugs. Priors were defined for parameters on the logit scale for the conditional probabilities within classes to be logit(
πmj)~N(0,5). On the probability scale, this provides a rather flat prior over the range of 0.05–0.95, suggesting a relatively uninformative prior, but it bounds estimates away from the boundaries of the parameter space. The same prior distribution was used for the intercept in the regression portion of the model, and each of the remaining regression coefficients was specified with a prior that was ~N(0,1/
λ), to be consistent with the ridge penalty. A fully Bayesian analysis might also have included a hyperprior on
λ itself, but this was not undertaken here, as such a hyperprior would have altered the interpretation of the penalty itself [
17], and would have produced parameter estimates which were not directly comparable to those obtained via penalization. For each simulated data set, a burn-in of 10 000 iterations was performed. Based on exploration of chains, we found that convergence occurred within 5000 iterations so that doubling the burn-in should have ensured convergence for all simulated data sets. The chain was then run for an additional 50 000 iterations with every 10th iteration saved for inferences. It is possible for ‘label switching’ to occur among the classes during each Bayesian estimation. To correct this, chains were post-processed so that class definitions were consistent throughout the chains using an approach similar to that described by Stephens [
18]. Additionally, we inspected histograms of final estimates and scatterplots of standard errors versus parameter estimates across all simulated data sets and found no evidence of lack of convergence. The same simulated data files were used for both the Bayesian and penalized estimation to facilitate comparisons between the two methods.
shows simulated β values, Bayesian posterior means, and penalized parameter estimates from the scenario presented in with nine highly correlated covariates, but with a sample size of 500. Comparing the penalized estimates to the Bayesian posterior means, it appears that the penalized estimates were more accurate. Empirical standard errors were similar between the two methods, but for penalization were larger for class one but smaller for class two. We believe it worth noting that, notwithstanding the different patterns as observed, the two methods produced remarkably similar estimates in all scenarios, with absolute differences not exceeding 0.04 and generally considerably lower. In addition to potential gains in accuracy, the penalized estimation was far less time-consuming. Using the same computer, (2.13 GHz, 2 M RAM), the Bayesian analyses took an average of 35.55 h for each λ value shown in . By contrast, penalized estimation took an average of 0.95 h per λ value.
In the simulations described so far, imposition of either the ridge or LASSO penalties resulted in a downward bias in the regression coefficients. Ideally, though, one would like to maximize separation between the classes, such that the coefficient for just one of the classes (the one that, in truth, was not associated with the covariate) was biased downwards. In the context of a strong
a priori hypothesis, it might be appropriate to employ class-specific penalization. includes results from simulations with a sample size of 5000,
β11 of 0.69 or 1.79, and the log odds ratio between the covariate and the other non-reference class,
β12 equal to 0 in both cases. Here, the ridge penalty was imposed only on

. On comparing these results to those from , where both

and

were penalized, one sees that much higher values of
λ are now optimal, that neither

nor

are substantially biased, and that imposition of the penalty still results in modest improvements in MSE. shows the results from analogous simulations that used the LASSO penalty.
| Table IIIClass-specific ridge penalization (only β12 penalized) (N =5000). |
| Table IVClass-specific LASSO penalization (only β12 penalized) (N =5000). |
A concern with our approach is that spurious relationships between covariates and class membership may be created, or legitimate ones obscured. With class-specific penalization, this could manifest as an artificial separation between classes. Included in and are additional simulations in which
β11 was 1.79, but
β12 was now also non-zero. For both the ridge and LASSO penalties, optimal values of
λ were much smaller than when
β12 was 0, and bias in

for these optimal
λ values is not large. With the ridge penalty, in the case where both
β11 and
β12 were 1.79, the optimal value of
λ was always 0, providing reassurance that class-specific ridge penalization does not create an artificial separation between classes with respect to the regression coefficient estimates.