Results from the survey
Using the full dataset (n = 318), no gender*region interaction effect was observed in social projection (F(1,310) = 1.547, p = 0.211; partial eta2 = 0.004) or in perceived harm (F(1,308) = 1.242, p = 0.266; partial eta2 = 0.005). Participants in the metropolitan area gave significantly higher estimates for others using Mephedrone (F(1,310) = 16.90, p < 0.001) but no difference was evidenced by gender (F(1,310) = 0.506, p = 0.478; Cohen's d = 0.100). The main effect for gender and region in perceived harm was significant (F(1,308) = 5.237, p = 0.023; F(1,308) = 5.000, p = 0.026, respectively). The slight discrepancies in sample sizes are due to missing values. Means and standard deviations by area and gender are shown in Table . The opinion regarding the legal status of Mephedrone overwhelmingly favoured control (81.7%), independent of area (Fisher's Exact Test = 2.104, p = 0.370) but not of gender (Fisher's Exact Test = 7.731, p = 0.011), with the preference for non-control of Mephedrone being higher amongst males (21.8%), compared to 11.6% amongst females.
Social projection (0: nobody - 100%: everybody) and perceived harm (1: not harmful at all - 10: very harmful)
Higher estimation of prevalence by participants in the metropolitan area is likely to be due to them holding different descriptive norms arising from the person's social context. Declared drug use among the active population (16-59) in England and Wales is consistently around twice as high in males than females and higher prevalence rates have been documented for urban compared to rural areas in last year's usage; with a similar but slightly more ambiguous trend for the 16-24 age group [40
]. Biased social projection is one of the most intriguing areas in social cognition research. On the one hand, it suggests that the repeatedly observed association between self-reported behaviour or personality characteristics is explained by an egocentric bias (i.e. finding comfort in false consensus) [43
], which is in keeping with the Bayesian approach [11
]. On the other hand, particularly regarding the chosen sensitive and/or transgressive behaviours, it is suggested that the distorted perception of what eventually leads to a behavioural choice is congruent with this perception [44
]. Conversely, recent research provides evidence showing that the prediction of population prevalence relates to the behaviour or characteristics the respondents wish to project about themselves, but not the actual behaviour [46
Age was significantly negatively related, with the prevalence estimate (Spearman's r = -.150, p = 0.01) suggesting that younger people consider Mephedrone to be more prevalent. This is in line with the notion that Mephedrone is a drug for the young [33
]. The correlation between age and the belief that Mephedrone was harmful was positive and significant (Spearman's r = .190, p = 0.001). As regional differences were not significant, the data from the two collection sites was combined and treated as one unified sample for future analyses.
Estimation using the Forced Response model
Subsequent to completing the questionnaire, the prevalence rate for Mephedrone use, using the formula suggested by Tourengeau & Yan [6
] was calculated as follows:
π1 = probability that the respondent is forced to say 'yes'
π2 = probability that the respondent is forced to answer a sensitive question honestly
λ = observed percent that responded 'yes'
From the dice instructions, we see that π1
= 1 out of 6 and π2
= 3 out of 4. There were 74 'yes' responses out of 318 total, thus
The estimated prevalence rate for Mephedrone is 8.81%. The variance and standard error of this estimator are calculated as:
A 95% CI for the prevalence rate of Mephedrone would be the estimated prevalence rate ± the product of the zα/2 value and the standard error: 1.96 × 0.034159 = 0.061925, yielding the 95% CI of 0.026175 and 0.150025. Thus, the prevalence rate as determined by the Forced Response model with a standard error of 0.034159 and a 95% confidence interval of (0.02611, 0.14999) is estimated to be between 2.6% and 15.0%.
Among the available 154 hair samples, the presence of Mephedrone was found in six samples giving a 3.9% positive rate. As the quantity of substance potentially used and time of exposure is not known, it is plausible that the actual positive rate is higher than 3.9%. It is likely that the hair analysis would only capture 3 months preceding drug use and could not detect a single exposure, nor any use that might have taken place in the immediate two weeks preceding the sample collection during which the hair is still in the scalp. Thus this period is considered as a 'blind period' for hair analysis.
Combining these positive samples with known use from the questionnaire where respondents accidentally give away this information by either answering each question on the Single Sample Count/Unmatched list five or answered each question on the same individually, the prevalence rate rises to 5.7% (9/157). Two of the nine known positive cases overlap between analytical and questionnaire results.
The simplified SSC algorithm
The fuzzy response SSC model is a new method and uses known population prevalence to estimate the proportion of affirmative answers to the sensitive question. As such, it is a simplified and more economical version of the Unmatched List Count using only one (experimental) sample. In order to avoid the need for a control sample (which inevitably leads to 50% loss of the sample), we embedded the target sensitive question into a set of four questions with 50-50 probability and benchmarked the sum of the number of observed 'yes' responses against the expected sum of the number of 'yes' responses for the four questions.
The benchmark questions were:
• My birthday is in the first 6 months (January - June) of the year.
• My house number is an even number.
• The last digit of my phone number is even
• My mother's birthday falls between July and December
The probability of a 'yes' answer to each of the four questions is therefore 50%, the expected average (sum of the number of 'yes' responses divided by the total number of responses) is two. Any upward deviation from this benchmark figure is the estimated proportion of 'yes' answers to the target question.
The target research question was:
• I have taken Mephedrone at least once in the previous three months
Respondents were instructed to indicate only the total number of their affirmative answers to the five questions without revealing which ones.
Based on the nature of the four non-sensitive questions, it was assumed that the population distribution for each question follows a binomial distribution, thus the distribution of the total number of 'yes' responses for non-sensitive questions is B(4*k
, 0.5) where k
is the sample size. In other word, the probability of an honest 'yes' response to each of the four non-sensitive questions is 50%. Assuming that there are equal numbers of 'yes' and 'no' responses to each of these four non-sensitive questions, it is possible to calculate the expected value of responses for the baseline non-sensitive questions:
Thus, if the probability distributions are exactly the same for all non-sensitive questions individually (assumed to be 0.5 in this case), the mean response for the four non-sensitive questions is expected to equal two, thus obtaining a mean response value greater than two is the indication of the estimated prevalence rate for the sensitive question. The prevalence rate estimation is calculated as:
where d is the estimated population distribution of the 'yes' answers to the sensitive question, λ is the observed number of 'yes' answers; and n is the sample size. The observed probability distribution of the number of 'yes' answers is shown in Table .
Observed probability distribution of X = the number of 'yes' answers
The three-month prevalence rate and 95%CI for Mephedrone use, using the SSC method, was calculated as follows:
The observed number of 'yes' answers is derived from the sum of two random variables with distribution of B(4*237, 0.5) and B(237, d), where d is the population distribution of the sensitive key question and 237 was the number of respondents in the sample. The observed number of 'yes' answers in the sample was 469.
Whilst the distribution of the sum of these two random variables is unknown, we can make use of the normal approximation for a binomial distribution. A rule of thumb is that the normal approximation is applicable if np > 5 and n*(1-p ) > 5, d > 0.021 and d < 0.979, where n and p are the distribution of the two binomial parameters. The normal approximation is derived as mean = np and variance = n*p*(1-p). Thus B(4*237, 0.5) is approximately the same as N(2*237, 237) and B(237, d) is N(237*d, 237*d*(1-d)). Since the maximum likelihood approximation of the mean of the normal distribution is the sample mean, 237*(d+2) = 469, hence d = -0.021097. Note that the estimated d is negative, since the observed number of 'yes' responses (469) is less than the expected number of 'yes' responses for the non-sensitive questions (474). This does not mean that the prevalence rate for Mephedrone is negative, only that the random fluctuations in the sample were too large and mask the expected upward bias in the number of observed 'yes' responses. We can nevertheless calculate the 95%CI for d, which is 469 ± Z(0.95)*√(237*(1+d*(1-d))), where Z(0.95) = 1.959964. Thus 95%CI is d ± 0.12731334 = -0.021097 ± 0.12731334 = 0, 0.099634. Therefore the estimated prevalence rate for Mephedrone use is between 0 and 10.0%.
T-test statistics indicated that the mean score (1.9789, 95%CI 1.85, 2.11) obtained on the SSC did not differ significantly from 2, thus there was no evidence that the prevalence rate for Mephedrone use in the population would differ significantly from zero (t(236) = -0.3113, p = 0.7558, Cohen's d = 0.041). This non-significant test result can be explained by the relatively small sample size. Notably, the sample prevalence was estimated to be between 0 and 10%.
The above calculation holds if the probability distribution of answers to each baseline question is equal (e.g. 50/50 in all 4 cases), thus we can assume that the sum of the binomial distributions is also binomial. However, the sum of the binomials is not necessarily binomial if the probabilities vary among the questions. Therefore, in such cases the normal approximation is calculated individually for each question before the probabilities from the baseline questions are added together, as we know that the sum of the normal distributions also follows normal distribution.
SSC algorithm taking the divergence from the 50/50 distribution into consideration
In order to test whether the estimation from the simplified SSC algorithm differs significantly from the estimation that takes the observed likely distribution for the 4 innocuous questions into consideration, we calculated d in a two-step process.
Firstly, we assumed that the probabilities of the innocuous binomial variables are not the same, so we estimated the probability distribution for each baseline question independently. In order to calculate the probabilities of the 4 innocuous binomial questions, we used the following datasets. For distribution of house and phone numbers, we used 7,500,000 UK residential data (usable dataset for house numbers: n = 6,859,957 and for phone numbers: n = 6,895,960) purchased from a commercial provider, whereas for birthdays, we used anonym datasets from two UK universities (n = 495,870 and n = 11,157). For the subsequent analysis, we used the large UK university dataset (n = 495,870) for birthdays. Details are presented in Table .
House numbers (including apartment/flat number in the absence of house number) were split as 3,405,322 even (p = 0.4964057) and 3,454,635 (p = 0.5035943) odd numbers. 0.5 (t = -18.828, df = 6859956, p-value < 2.2e-16, 95% CI: 0.4960316, 0.4967799). Among the listed phone numbers, the last digit of the phone number was an even number in 3,429,497 cases (p = 0.4973197) with 3,466,463 last digits being an odd number (p = 0.5026803). The probability of a birthday falling on the first half of the year was p = 0.5004075 (247,447 cases) vs. 248,423 (p = 0.499016) birthdays registered for the second half of the year. Single sample t-test statistic testing H0: p = 0.5 for the 4 innocuous questions are as follows.
1. My birthday is in the first 6 months (January - June) of the year (t = -1.386, df = 495869, p = 0.1657; with estimated probability of 0.4990159 (95% CI = 0.4976242, 0.5004075)
2. My house number is an even number (t = -18.6633, df = 6952970, p < 0.001; with estimated probability of 0.49646115 (95% CI = 0.4960895, 0.4968328)
3. The last digit of my phone number is even (t = -14.077, df = 6895959, p < 0.001); with estimated probability of 0.4973197 (95% CI = 0.496946, 0.4976929)
4. My mother's birthday falls between July and December (t = 1.386, df = 495869, p = 0.165); with estimated probability of 0.5009841 (95% CI: 0.4995925, 0.5023758)
Therefore, we used these empirically derived probabilities to approximate normal distribution.
The number of 'yes' answers for the
1st question is binomial, B(k, 0.4990159) → N(k*0.4990159, k*0.4990159*0.5009841)
2nd question is binomial, B(k, 0.4964611) → N(k*0.4964611, k*0.4964611*0.5035389)
3rd question is binomial, B(k, 0.4973197) → N(k*0.4973197, k*0.4973197*0.5026803)
4th question is binomial, B(k, 0.5009841) → N(k*0.5009841, k*0.4990159*0.5009841)
Sensitive question is binomial, B(k, d) → N(k*d, k*d*(1-d))
Therefore, by adding these approximations together, the distribution of the 'yes' answers are
The Mephedrone dataset contained 469 'yes' answers from 237 respondents, therefore k = 237, and 237*(1.9937808+d) = 469, thus d = -0.0148779. The 95%CIs for the number of 'yes' answers with the above estimated mean and variance are439.0453 and 498.9547, thus d is between -0.1412 and 0.1115. Consequently, d (the estimated prevalence of Mephedrone use) is, indeed, between 0% and 11%, which is in keeping with the estimation we received using the simple algorithm with assumed p = 0.5 for 'yes' answers in all baseline non-sensitive questions. Therefore, applying the principles of Occam's razor, the simple algorithm should prevail.