We did a preliminary analysis by maximum likelihood estimation. First, we estimated the model without covariates (model I; nine parameters) given by

equation (6) and used the prevalence estimates as starting values in the maximization for the model with all the covariates (model III; 17 parameters) as given above. Next, we selected those covariates in model III that were significant regarding the univariate Wald test with significance level 10%: Sex and Advantage for sample 1; Advantage for sample 2. This restricted version of model III is called model II and it has 12 parameters. Model I can be fitted to observed frequencies. Owing to the sample-specific SP no parameters τ

_{1} and τ

_{2}, model I induces a perfect fit for the frequencies for category

*K*=8 (the 222-category) in both the RR sample and the DQ sample.

To compare the models, we use the Bayesian information criterion (BIC) that provides a rough approximation to the Bayes factor that is independent of the priors (

Carlin and Louis (2009), page 53). Information criteria are given in . In accordance with the theory, minus twice the log-likelihood decreases when the number of parameter increases. The BIC takes the increase of parameters into account and shows a clear preference for model II. Following the BIC, we shall discuss Bayesian inference for model I and model II.

| **Table 2**Information criteria for the maximum likelihood estimation: minus two times the log-likelihood −2LL and the BIC |

For the priors we choose vague uniform priors, i.e. for the individual λs and βs we specify a uniform distribution on the interval (−10,10). For example, for β

_{se.1} a value 10 would mean that the odds on cheating increase multiplicatively by exp (10) when men are compared with women. This is rather unlikely. We call the prior that is specified by the interval (−10,10) vague because the interval is sufficiently wide to include all realistically possible values of the parameters without favouring specific values. The same reasoning applies to parameter λ

_{7} which is the three-factor interaction and describes how the odds ratio between two variables changes across categories of the third. For example, define

_{11.c} as the odds ratio for questions 1 and 2 given category

*c* of question 3; we have

_{11.1}/

_{11.2}= exp (8λ

_{7}) and a value of ±10 for λ

_{7} is quite extreme.

The models are mixture models and long Markov chain Monte Carlo chains are recommended. We used a burn-in of 50 000 simulations and 50 000 updates. Convergence was checked by assessing the chain visually, by looking at the auto-correlation, and by Geweke's convergence diagnostic (

Geweke, 1992) as implemented in the R package (

Plummer *et al.*, 2006). Given the Bayesian framework, transformations from λs to πs or from βs to τs are direct and credible intervals (CIs) are readily derived. Results for model I and model II are presented in .

| **Table 3**Bayesian inference for models without covariates (model I) and with covariates (model II) |

First we discuss model I. This is the model without the covariates and it can be assessed on the level of the frequencies. As a consequence, it is easy to investigate the goodness of fit by posterior predictive checking (

Gelman *et al.* (2004), section 6.2). Denote the 16 observed frequencies (eight in sample 1; eight in sample 2) generically by

**n**^{*}, and the model parameter vector by

**θ**. The Pearson χ

^{2}-statistic for observed frequencies

**n**^{*} and estimated frequencies derived from

**θ** yields a posterior predictive

*p*-value that is equal to 0.085. This shows that the model fits the data though the evidence is not overwhelming and further modelling seems worthwhile.

For the estimated model I, note that the posterior mean of τ_{1} is smaller than the posterior mean of τ_{2} and that the CIs do not overlap. This means that the probability of cheating in sample 1 with RR is smaller than the cheating in sample 2 with DQ. This is in accordance with the basic idea of RR. When sensitive questions are asked, a technique that protects the privacy of respondents leads to improved compliance with the design of the survey.

Next we discuss model II, which we consider to be the final model. Monitoring the mean of the individually estimated cheating probabilities yields posterior means 0.155 for sample 1 and 0.568 for sample 2, which are close to the posterior means of the cheating parameters for model I given by 0.157 and 0.536 respectively. Furthermore, according to the estimation of β_{ad.1} and β_{ad.2}, when an individual states that it is not advantageous to violate a benefit rule, he or she is more likely to cheat in the survey (the posterior means of β_{ad.1} and β_{ad.2} are negative and the CIs do not include zero). The posterior distribution of β_{se.1} shows that men are less likely to cheat in the RR sample than women.

Given that often not following the rules can indeed be advantageous, we think that the attitude question is a sensitive question. Individuals who do not follow the RR rules also do not honestly answer the attitude question. In other words, denying that violating the benefit rules can be advantageous is a proxy for cheating in the RR design.

The prevalence estimates are also given in . Comparing the posterior means for **π**=(π_{111},…,π_{222})^{T} the results show that the prevalence estimates are robust regarding model selection. The probability of complying with all the benefit regulations is 0.768 with 95% CI (0.707, 0.819), whereas the probability of violating all the regulations is 0.017 (0.006, 0.029). It is interesting to see that there is a relatively large probability that individuals violate the first regulation but follow the second and the third: 0.120 (95% CI (0.086, 0.158)). brings out the strength of the Bayesian framework: the asymmetrical distributions of some of the prevalence parameters is nicely captured by the MCMC results. Distributions that are not close to the boundary of the parameter space (for instance, those of π_{111} and π_{121}) resemble the shape of normal distributions.