In this section, we carried out simulation studies under different levels of the diagnostic performance of doctors as well as different prevalence rates in the different classes of a symptom to assess the performance of the nonparametric method and the overall diagnostic accuracy described in the previous section. For simplicity, we chose *K* = 5 and *D* = *J* = 4 as in our real data set. Since in practice, it is difficult to collect the data in which one patient is diagnosed by many doctors, the total number of patients *N* could not be very large. In our simulation, we chose the number of patients as 20 and 40. We considered 3 different prevalence rates for each sample size. The first scenario assumed equal portions for patients with different symptom severity. The second one assumed that more patients were of heavier symptom status. And the last assumption considered *U*-shaped prevalence rates, where most patients are in the asymptomatic and severe groups, while less patients are in the 2 groups with mild or moderate symptom status.

We simulated 5000 data sets under each setting. We then implemented the proposed method to obtain the parameter estimates. As for the choice of initial values, we avoided equal

_{0kj} =

_{1kj} =

=

_{Dkj} for all

*k* and

*j* as mentioned previously. To avoid reaching a local maximum, we used a number of initial values, then compared the local log-likelihood maxima obtained for each of the chosen initial values before deriving the global ML estimates. In addition, we also used the initial values given by experts of TCM and the results were the same. We focus on assessing the bias and mean squared error (MSE) of the estimates of the diagnostic accuracy indicators under different settings. The results were shown in . To help better evaluate the performance of all estimates, the biases and MSEs of all

_{dkj} with

*d* = 0,1,…,3,

*k* = 1,2,…,5, and

*j* = 0,1,…,3 were also summarized in the last 3 columns of Table 1.

| **Table 1.**Results from 5000 simulations with *D* = 3, *K* = 5, and *J* = 3 under various prevalence rates and parameter settings |

From the results, we can see that the proposed method yields ML estimates for the overall diagnostic accuracy with small biases and MSEs regardless of the true diagnostic performance of doctors and true prevalence rates of the statuses of a symptom. The summary result of all

_{djk} also support our method. We can see that even the maximum bias and MSE are not very large. In general, the results were slightly better when the prevalence rates of different symptom statuses were similar. And the larger the number of patients was, the smaller were the estimated bias and MSEs of the estimates.

The conditional independence assumption is crucial in the proposed method. Although it is reasonable in our real data problem, it is helpful to evaluate the robustness of the method when this assumption is violated. Suppose except for the true symptom status, there were some doctor-level covariates *R* that would influent the doctors' decision. As a result, instead of conditional independence giving the true symptom status, the diagnosis results were correlated unless providing both true symptom status *S* and covariates *R*. In these simulations, we suppose doctors' decision was influenced by the covariate *R* via coefficient *b* through a probit model, where *R* follows a normal distribution with mean 0 and standard deviation 0.5. Specifically, *P*(*T*_{ik} = *j*|*S*_{i} = *d*,*R*_{k} = *r*) = Φ(*a*_{dkj} + *b**r*), for *i* = 1,…,*N*, *d* = 0,1,…,*D*, and *k* = 1,2,…,*K*. We chose sample size *N* = 34 as in our real data set and simulated 5000 data sets under the 3 prevalence scenarios as before, i.e. equal portions with *p*_{0} = 0.25,*p*_{1} = 0.25,*p*_{3} = 0.25,*p*_{4} = 0.25; increasing prevalence *p*_{0} = 0.1,*p*_{1} = 0.2,*p*_{3} = 0.3,*p*_{4} = 0.4; and *U*-shaped prevalence *p*_{0} = 0.25,*p*_{1} = 0.1,*p*_{3} = 0.15,*p*_{4} = 0.5. In each setting, we considered 3 different correlation level by setting *b* equal to 0.2, 0.5, and 1. The results on bias of the estimates were shown in , with MSEs in the parenthesis.

| **Table 2.**Robustness performance from 5000 simulations when *D* = 3, *K* = 5, and *J* = 3 and patient number *N* = 34. Covariate *R* follows normal distribution with mean 0 and standard deviation 0.5, and influent diagnosis with coefficient *b* |

From the results, we can see that with correlations induced by the covariates, both the biases and MSEs are larger than those when conditional independence assumption holds. However, when the influence of the covariates to the diagnosis results were not very large, i.e. when the dependence was mild, the results were still acceptable. But as we expected, when the covariates have bigger influences, the biases and MSEs were increased. This suggests that the conditional assumption is necessary for the proposed method. And it is not suitable in the situation where the diagnostic results are highly correlated even conditional on the true symptom status.

In addition, we also performed additional simulation to assess the estimators' performance on the boundary of the parameter space, i.e. some of the diagnosis probabilities

_{dkj} were zero. This question was first raised by the results of the real data analysis, where some of these quantities were estimated as zero. The parameter setup in the simulation used the estimated values of these parameters from the real data study. Results were shown in Section B of the

supplementary material available at

*Biostatistics* online. The results do not suggest that the proposed method have any different performance in these settings from the settings before.