|Home | About | Journals | Submit | Contact Us | Français|
To describe temporal trends in concordance, sensitivity, and specificity and to explore demographic trends in concordance in two outpatient treatment studies for cocaine dependence.
We obtained 2229 urine drug screens from 129 individuals, along with accompanying self-use reports. Paired self-use reports and urine drug screens were considered concordant if the two measures of cocaine use were in agreement. The sensitivity and specificity of the self-use reports in predicting the urine drug screen was also estimated. To model concordance, sensitivity, and specificity as a function of time, generalized estimating equations were used. Demographic effects on concordance among subjects who achieved 100% concordance and subjects who achieved a recently proposed 70% concordance threshold were tested.
Over the course of our studies, both sensitivity and concordance statistically decreased, yet specificity remained relatively constant. Median concordance for all subjects was 88%. Among all subjects, concordance varied significantly by gender, with females achieving significantly higher concordance than males (96% vs. 86%). Similarly, females were almost twice as likely to achieve 100% concordance as males (42% vs. 22%). Finally, 80% of participants achieved the 70% concordance threshold, and no differences among demographic groups with regards to the 70% concordance threshold were observed.
Temporal effects of concordance and sensitivity may have profound repercussions when using self-use reports to gauge efficacy of an experimental intervention. Furthermore, gender may differentially affect concordance. Finally, a substance abuse outcome measure that reliably combines objective and self-report data is promising, but further research is needed.
Drug treatment studies have long been plagued by the challenge of accurately monitoring subjects' drug use. Typically, both self-use reports (SUR) and urine drug screens (UDS) are collected in cocaine treatment studies, and the concordance between these two measures has been extensively researched [1, 2]. In a clinical drug treatment context, concordance is defined as the probability an individual's SUR results matching his UDS results (for a given time), meaning both are positive or negative. Following convention, UDS results are taken to be the “gold standard,” and SUR results are compared to these “accepted” UDS results. There are two components of concordance - specificity and sensitivity. Sensitivity is the conditional probability that a subject self-reports cocaine use, given that his urine screen is positive; specificity is the conditional probability that a subject self-reports no cocaine use, given that his urine screen is negative.
It is clear that self-use reports are subjective, and thus cannot be assumed to be an entirely accurate account of drug use. The validity of self-reported data varies from individual to individual and is entirely based on an individual's willingness and ability to accurately recount use patterns, which may be confounded by denial, paranoia, desire to please, cognitive impairment, or unknown motivations. On the other hand, although UDS are generally regarded as the standard measure and as highly objective, UDS does have some pitfalls. Urine drug screening assumes an average metabolic rate, and thus individual variations may affect the validity of UDS. For instance, slow metabolizers may test positive even though they used prior to the standard 72 hour window. Conversely, rapid metabolizers may test negative, even though they used within the standard 72 hour window. Furthermore, the practical utility of UDS is diminished for drugs with a rapid half-life, such as cocaine, since subjects must report to the clinic every 2 to 3 days to accurately monitor use. Clearly, this required frequency results in a substantial amount of missing data, which may be confounded by subjects systematically missing urine screens when they are using. Thus, both SUR and UDS have notable drawbacks.
Clearly, in the ideal cocaine treatment study, UDS would be collected thrice weekly from all subjects (with no missing data), providing a complete picture of use as measured by the gold standard. However, in the typical study, there is extensive missing data among the UDS, and thus the quality of the SUR data becomes crucial. Concordance can be used to quantify the reliability of self-reported data in relation to the accepted standard, urine drug screens. Basically, an unconfirmed SUR from a subject who has “high” individual concordance should be more likely to be judged “reliable” than an unconfirmed SUR from a subject with “low” individual concordance. Clearly, in order to utilize SUR in any fashion more sophisticated than merely taking all SUR at face value, terms such as “high” and “reliable” must be explicitly defined.
Some addiction researchers have proposed the use of a composite outcome, which would benefits from both the completeness of SUR and the objectivity of UDS . However, such an imputation scheme would only be desirable if the SUR data used for imputation was not biased. As proposed by Somoza et al., individual concordance scores are used as an indication of the validity (thus bias) of SUR data. If the subject has a high concordance, defined by Somoza et al. as at least 0.70, then SUR data is accepted in place of missing UDS data. However, if the subject has a concordance less than 0.70, then SUR data is not accepted and the missingness persists.
Again, use of SUR as a primary endpoint in addiction studies would only be desirable if the SUR were not biased. Concordance is one attempt at quantifying the bias in SUR. Many investigators have previously examined concordance in cocaine abusing and dependent populations, and have reported sensitivity (of SUR as confirmed by UDS) ranging from 28% to 75% . Furthermore, there is a well-documented temporal bias in concordance, since subjects systematically under-report use more often at the end of treatment than at admission [ 5, 6, 7, 8, 9]. For example, Hindin et al. report a baseline sensitivity of 89% and a post-treatment sensitivity of only 51% among a treatment-seeking, cocaine dependent population .
Demographic trends in concordance have also been explored, yet the findings in the literature have been mixed. Myrick et al. report that age, gender, and race were not associated with concordance among a cocaine-dependent population . However, they found that later onset of cocaine use, heavier use, and presence of an affective illness were associated with concordance, and that completion status was highly associated with concordance. Nyamathi et al. found that the Hispanic ethnicity was associated with lower concordance . Additionally, Tassiopoulos et al. reported that the racial/ethnic pairing of a subject and interviewer can influence concordance, especially for black subjects . Ideally, there would be no temporal or demographic trends in concordance, since such trends raise the possibility of systematic bias among the SUR.
Thus, the goal of this study was to describe both temporal and demographic trends in concordance, sensitivity, and specificity for two outpatient treatment studies for cocaine dependence. Furthermore, a concordance of 0.70 has recently been proposed as threshold for subject reliability by Somoza et al. , and we assessed the feasibility and validity of this proposal in relation to our study population. Finally, we characterized subjects who demonstrated a concordance of 1.0, a fundamentally intriguing group of subjects.
Observations were drawn from two outpatient clinical trials of pharmacological treatments for cocaine dependence in the Center for Drug and Alcohol Programs at the Medical University of South Carolina - one evaluating the efficacy of Modafinil and the second evaluating the efficacy of N-acetylcysteine (R01 DA019903, R01 DA016368).
Participants for these two clinical trials were recruited within a 50-mile radius of the Medical University of South Carolina through clinical referrals, flyers, word of mouth, and television, radio, and newspaper advertisements. Qualifying participants were males or females of any race between the ages of 18–65, who met DSM-IV criteria for cocaine dependence, as assessed by the Structured Clinical Interview for DSM-IV . Participants were excluded if they met current dependence criteria for any psychoactive substance other than cocaine, alcohol, nicotine, or marijuana, or demonstrated physiological dependence on alcohol requiring medical detoxification. Participants were also excluded if they met DSM-IV criteria for any current psychiatric disorders. Prior to randomization, each participant received a thorough medical history and physical examination, electrocardiogram, and blood work including a hematology and chemistry panel to ensure physical stability. All participants were actively seeking treatment for cocaine dependence at the time of admission to the programs.
Self-use reports were collected through Timeline Follow Back interviews administered by research staff at admission, treatment, and follow-up visits . A UDS and a 30-day retrospective SUR was collected at admission. During treatment, a UDS and a calendar-based SUR were collected at each clinical visit (scheduled thrice weekly). Additionally, a UDS and a 30-day retrospective SUR were collected at both the four-week and eight-week follow-up visits; however, these two follow-up visits were not included in our analysis. Self-reported use was queried using statements “Did you use any cocaine on Monday the 19th?”, and each date since last visit was queried.
Urine samples were assessed for BE with an immunoassay and BE levels of 300 ng/ml or higher constituted a cocaine positive specimen. To determine concordance of SUR with a given UDS results, a subject's SUR for the 72 hours prior to the UDS were analyzed. Note that 72 hours is the standard, conservative estimate of BE persistence.  If the given UDS was negative, all SUR within 72 hours must be negative to be concordant. If the UDS was positive, then at least one SUR within 72 hours must be positive to be concordant.
A generalized linear model (GLM) framework was used to estimate the concordance of the self-report and urine screen data . The identity link and binomial distribution were specified for the GLMs, and fastidious specification of the dependent variable and predictors was used to estimate the sensitivity and specificity of the self-report data. Both SUR and UDS were coded as binary variables, where 1 defined use and 0 defined no use. When the GLM was constructed to model the probability of SUR = 1, the intercept alone represents the false positive rate (i.e., probability of SUR = 1 given UDS = 0), and the sum of the intercept and the urine screen parameter estimates represents the sensitivity (i.e., probability of SUR = 1 given UDS = 1). Conversely, when the model is constructed to model the probability of SUR = 0, the intercept alone represents specificity (i.e., probability of SUR = 0 given UDS = 0), and the sum of the intercept and the urine screen parameter estimates represents the false negative rate (i.e., probability of SUR = 0 given UDS = 1). The analysis was performed for three time periods: for baseline visit, for final visit, and for overall study duration. W hen analyzing the overall study data, generalized estimating equations methods (as opposed to frequency table analysis) were necessary to account for repeated measures through the course of the study, as well as to generate statistically-appropriate confidence intervals for these estimates. The correlation of repeated measures within subject was accounted for in the generalized linear model by invoking the robust variance used in generalized estimating equations (GEE) .
Concordance was calculated as the percentage of observations for which self-use report and UDS results agreed (either in the positive or the negative). Standard confidence intervals were calculated for baseline and last visit, since repeated measures by subject were not a factor. However, a GEE regression model with subject ID as the repeated measure variable was used to determine the confidence interval for the overall study concordance. Additionally, to assess statistical significance of trends in sensitivity, specificity, and concordance over time, we used separate GEE regression models to regress sensitivity, specificity, and concordance on number of days in study from randomization. For all GEE models, we specified a compound symmetry covariance matrix structure, a conservative approach since the data include a large number of non-uniformly spaced visits per subject.
Furthermore, we sought to describe the demographics of two particularly interesting subgroups, subjects who were 100% concordant (i.e., self-reported use aligned perfectly with UDS results) and subjects who were at least 70% concordant. Seventy percent concordance has recently been proposed as a threshold for sufficient reliability in the literature, yet has not been widely validated . The primary demographic characteristics of interest were race, gender, completion status, and level of baseline cocaine use. For both the 100% and at least 70% concordant classifications, we calculated the percentage of subjects who fell in each demographic category. To test for significance between the concordance classifications with respect to demographic variables, the Wilcoxon-Rank Sum test, a nonparametric alternative to the independent group student's t-test, and chi-square tests were used for continuous and categorical variables, respectively. Additionally, since our data arise from two separate studies, we also conducted a stratified analysis under a Mantel-Haenszel framework, in order to adjust for the potential confounding effects attributable to the two clinical trials. All analyses were conducted with SAS version 9.1. The type I error rate was specified at 0.05 and the reported p-values have not been adjusted for multiple comparisons.
Table 1 presents the demographic data for the two clinical trials used in this analysis. The median age for study participants was 42, although the subjects in the Modafinil study were statistically significantly younger than the subjects in the N-acetylcysteine study. Subjects were evenly split between white and black individuals; the two Hispanic subjects were included in the “white” category for purposes of analysis. Eighty percent of subjects were male, and the median number of days of cocaine use in the month preceding admission was 12. At the time of analysis, 66 subjects were enrolled in the N-acetylcysteine study and 63 in the Modafinil study, studies with nearly identical protocol. Fifty-six percent of all subjects completed the eight-week treatment phase, meaning they attended at least one study visit during the eighth week. Overall, the study populations from these two studies are highly comparable.
We examined sensitivity and specificity of SUR for all 129 subjects at baseline and at final study visit. Baseline represents first study visit, during the screening phase, and final study visit represents the final treatment visit a subject completed, excluding post-treatment follow-up visits.
In our study population, sensitivity decreased from 83.5% at baseline to 63.2% at final study visit, suggesting the need to more formally examine temporal changes using a repeated measures analysis. Equivalently, the false negative rate increased from 16.5% at baseline to 36.8% at final study visit. Specificity was 90.6% at baseline and 95.2% at final study visit.
In order to determine whether the observed changes in concordance rates from baseline to final visit were statistically significant, we constructed a series of generalized estimating equation (GEE) regression models, which regressed concordance, sensitivity, and specificity, respectively, on the number of days the subject had been enrolled in the study. The day a subject was randomized was taken to be day 0, and thus days in screening (which occurs prior to randomization) were assigned a negative value, while active treatment days were assigned a positive value. Overall, specificity (false positive rate) remained constant throughout the study, yet concordance and sensitivity significantly declined (Figure 1).
We introduce a description of concordance throughout the study, to complement the baseline and final visit “snapshots.” Overall concordance measures that utilize all self-reports and urine drug data can be calculated by using generalized estimating equation (GEE) methods to account for the variability unique to repeated measures data .
We observe the highest concordance at the beginning of the study (85.3%), lower concordance for the entire study duration, and the lowest concordance (73.6%) at the final visit. Note that all subject visits were included when calculating overall concordance measures, which naturally weighted these estimates to favor subjects who remained in the study the longest. Of our 129 subjects, 56% of subjects completed the study, meaning that they attended at least one visit during treatment week 8.
Currently, there is no general consensus in the literature defining a reliable level of concordance. Recently, Somoza et al. proposed that subjects who demonstrate an overall concordance of at least 0.70 should be considered to be “reliable.”  To assess feasibility and validity of the proposed 0.70 concordance threshold, we analyzed demographic trends among this group of subjects.
First, median concordance for all subjects was 88%, meaning that 50% of subjects achieved at least 88% concordance between their SUR data their UDS data. The mean concordance for all subjects was 83%, indicating that concordance data for this study population is somewhat left-skewed by subjects with very low concordance. Indeed, the minimum concordance achieved by a given subject was 6%, whereas the maximum concordance was 100%. For overall concordance, we observe a significant gender effect: median male concordance is 86%, whereas median female concordance is 96% (p-value = 0.047). However, completion status, race, baseline cocaine use, and study assignment do not appear to influence concordance.
Second, 80% of subjects exhibited an overall concordance of at least 0.70 throughout their participation in the study. In comparison to 0.70 concordance threshold, 65% of subjects achieved an overall concordance of at least 0.80, and 43% of subjects achieved an overall concordance of at least 0.90. Study assignment did significantly affect whether or not a subject met the 0.70 reliability criterion, with a higher percentage of subjects in the N-acetylcysteine study achieving at least 0.70 concordance. However, race, gender, completion status, and level of baseline cocaine use were not associated with meeting the 0.70 reliability criterion.
Finally, we were interested in classifying the ideal study participants whose SUR results were consistently in agreement with their UDS results throughout the entire study, and thus achieved a concordance level of 1.0. Twenty-six percent of all subjects achieved 100% concordance, as confirmed by UDS. Furthermore, we again observed a significant gender effect, with 42% of female subjects achieving 100% concordance in comparison to 22% of male subjects.
Overall, sensitivity of self-use reports (SUR) decreased (and the false negative rate rose) in a statistically significantly fashion over the course of an eight-week pharmacological treatment study for cocaine dependence. A significant gender trend with regards to concordance of SUR with urine drug screens (UDS) was observed among all subjects, and among subjects who achieved 100% concordance. Among all subjects, median concordance among females was 96%, yet only 86% among males. Similarly, 42% of female subjects achieved 100% concordance, while only 22% of male subjects did.
The decreasing temporal trend in sensitivity of SUR and in concordance between SUR and UDS is consistent with other investigators' findings. The results of this study support the hypothesis that subjects are most motivated to report drug use accurately at the beginning of a study, and have declining motivation as the study progresses.  Subjects may be more willing to admit drug use at the beginning since current dependence is a required eligibility criterion; furthermore, subjects may believe that an accurate description of their use patterns is essential to successful treatment. As treatment progresses, subjects may be motivated to deny use out of desire to show progress or in an attempt to preserve self-esteem by denying relapse.  Furthermore, at study completion, subjects may be reluctant to admit use since the protective atmosphere of the treatment program is terminating . Note that an alternative explanation for a decreasing trend in concordance is that subjects with high concordance drop out early in the study, and subjects with lower concordance complete. However, we analyzed individual concordance rates and individual duration of treatment, and these variables were not strongly associated. That is, subjects with shorter treatment durations did not have statistically higher concordance rates. Thus, we dismiss this hypothesis as an explanation for the observed temporal trend.
A significant gender trend among all subjects and among subjects who achieve 100% concordance was observed; in both cases, females averaged a higher concordance than males. However, the proportion of males and females achieving at least 70% concordance did not statistically differ. Thus, care must be taken when determining a reliability threshold, since demographic differences may exist at certain concordance levels. Although 70% concordance appears robust to demographic factors in our study, it is unclear whether 70% concordance is truly clinically meaningful. Clearly, it is of clinical relevance if a subject is 100% concordant, but to what degree is partial concordance meaningful? Indeed, any adopted concordance threshold may have very little clinical relevance, and only serve as a technical method for addressing missing data. Thus, great care must be taken to establish a set threshold a priori, rather than in a post-hoc, data-driven fashion. Finally, this gender difference may be a result of variations in subject-interviewer gender pairing, a phenomenon we plan on investigating further.
In an outpatient clinical trial setting, no perfect measure for assessing illicit drug use currently exists. Since self-report and urine drug screens are the most commonly used measures to gauge illicit drug use, it is imperative to understand the limitations associated with each. The major limitation of UDS is the potential for missing data; furthermore, missing data may be more likely when the subject is using. Thus, from an analysis perspective, when UDS data is missing, it is reasonable to assume that this data are not always missing completely at random (MCAR). When missing data are not MCAR, it is possible that parameter estimates, such as treatment effect, may be biased . However, if subject characteristics (including concordance and SUR) can predict missingness in the UDS, then missing data mechanism may be considered missing at random (MAR), in which case standard likelihood methods would yield unbiased estimates, provided that the predictive characteristics are included in the model. Thus, this research primarily aims to describe a meaningful relationship between SUR results and UDS results, as measured by concordance, with the ultimate goal of developing effective and unbiased methods to handle missing data in addiction clinical trial settings.
Several important limitations of this study deserve mention. First, due to the ongoing nature of these two, blinded clinical trials, tests for a treatment group effect on concordance could not be conducted. Additionally, all research involving benzoylecgonine (BE) drug screens are confounded by the fact that the half-life of BE varies by individual, yet UDS results are interpreted in the context of a standard 72 hour half-life. As such, one must be careful to not directly associate concordance with `truthfulness.' However, discordance is a clinically relevant phenomenon that should be investigated and described in detail. As previously mentioned, the use of standard metabolic window for BE may result in either false positives or false negatives for subjects with nonstandard metabolic rates. Additionally, in our study self-reported use is assessed in full-day units, and does not cover the fractional day elapsed during the day of treatment. Thus, if a subject has used the morning of treatment, but not the previous three days, his true self-report will be negative. Therefore, some of the false negatives may due to study design, rather than subject inaccuracy. Finally, even if a subject achieves concordance between his urine screens and self-report, he may still be underreporting. If a subject consistently has positive urine screens and reports use once every three days, yet is using every day, he will achieve concordance yet be underreporting his use by 66% .
Missing UDS data compromises the results of clinical drug trials. SUR data could potentially be used as a surrogate outcome for missing UDS results; however, the accuracy of SUR may be affected by potential biases. If a standardized method for extracting reliable information regarding use patterns from SUR data could be established, this composite approach would be of great value to clinical drug study research. Missing data methods involving concordance may prove effective, yet more research is needed to describe and validate concordance-based methods.
This research was supported by grants # R01 DA019903 and # R01 DA016368 from the National Institute on Drug Abuse.
Funding: This research was supported by grants # R01 DA019903 and # R01 DA016368 from the National Institute on Drug Abuse.