First, consider the continuous results presented in Additional File

1. Across all six models, each software program converged on nearly identical results. With few exceptions, the unweighted parameter estimates and their standard errors were nearly identical across programs. The fact that the estimates did not converge perfectly across the programs may have occurred because of the relatively large cluster sizes in these data. Large cluster sizes may limit the performance of quadrature estimation, and MQL methods may work better. [

23] As Rabe- Hesketh, et al. suggest,[

23] analysts should check the adequacy of the quadrature points in any given situation by estimating models with increasing numbers of quadrature points. In these analyses, two models required increasing the quadrature points from 8 to 16 to achieve estimates in line with the other programs. But, again, overall, the results achieved marked similarity across programs.

With regard to the weighted analyses, across the fixed and random effects, the programs achieved nearly identical weighted results, with two exceptions. MLwiN estimated a smaller residual variance and residual variance standard error using weight method B than either Mplus or GLLAMM. Likewise, MLwiN's estimate of the slope for state poverty and its standard error diverged slightly (but consistently) from Mplus and GLLAMM at the second decimal point under all scaled weighting analyses. To investigate the source of these differences, I reran these analyses with increasingly stringent convergence criteria. In all cases, MLwiN arrived at the same estimate of the residual variance. This suggests that the discrepancy does not result from convergence issues, but results from estimation differences. In this case, the small difference led to *no *inferential differences across the software packages or weighting methods. For example, consider the final model. Across all weighting methods and software programs, one would conclude that, while variance does exist across states in the relationship between family income and months uninsured, the proportion of families in poverty in a state does not appear to affect this relationship.

For the categorical outcome presented in Additional File

2, a similar pattern resulted. Across all six models, each software program converged on similar results. Without exception, the unweighted parameters and their standard errors were similar across programs. With regard to the weighted analyses, a similar pattern resulted. Across the fixed and random effects, the programs achieved nearly identical weighted results, though MLwiN consistently estimated a marginally larger variance in the intercepts across states. Again, observed differences led to no differences in the inferential conclusions. For example, consider the final model. Regardless of weighting method or program, one would conclude that, while variance does exist across states in the relationship between family income and the likelihood that a child will go uninsured, the proportion of families in poverty in a state does not appear to affect this relationship.

Somewhat surprisingly, though the standard errors for the scaled-weighted data did range somewhat larger than unweighted analyses, the standard errors for the unweighted and scaled-weighted methods achieved remarkable consistency. This may have occurred because of the large cluster sizes in the NS-CSHCN (approximately 750 individuals in each cluster). It may also have occurred because of a relatively small intraclass correlation coefficient (a measure of the proportion of variance in the outcome attributable to clustering alone) for this outcome (e.g., 0.01 for months uninsured). It also suggests that, in these data, for these outcomes, and these predictors, the sampling weights are not particularly informative (Table presents the results of single level analyses ignoring sampling design for comparison). However, this need not be the case. For situations with informative sampling weights (i.e., where the design weights correlate with the outcome), the findings could diverge greatly. [

1] The weights lead to more representative population estimates, but failure to include them did not bias inferential decisions. This set of findings highlights the importance of conducting weighted and unweighted analyses. With the set, an analyst can compare differences across the approaches and evaluate the impact of different approaches on estimates and inferences. Without conducting analyses across scaling methods, it would be unclear whether the estimation process, type of outcome, or other factors biased the results. One should not simply choose a single method without exploring similarities and differences across methods.

| **Table 1**Single Level Continuous and Categorical outcome parameter and standard error estimates. |