|Home | About | Journals | Submit | Contact Us | Français|
We report here on the results of a randomized, controlled trial evaluating the efficacy of a semi-automated performance improvement system (“Patient Feedback”) that enables real-time monitoring of patient outcomes in outpatient substance abuse treatment clinics. The study involved 118 clinicians working at 20 community-based outpatient substance abuse treatment clinics in the northeast United States. Ten clinics received 12 weeks of the Patient Feedback performance improvement intervention and ten clinics received no intervention during the 12 weeks. Over 1500 patients provided anonymous ratings of therapeutic alliance, treatment satisfaction, and drug/alcohol use. There was no evidence of an intervention effect on the primary drug and alcohol use scales. There was also no evidence of an intervention effect on secondary measures of therapeutic alliance. Clinician-rated measures of organizational functioning and job satisfaction also showed no intervention effect. Possible insights from these findings, and alternative methods of utilizing feedback reports to enhance clinical outcomes are proposed.
Quality improvement methods to improve the performance of clinicians working in substance abuse treatment facilities have received increasing attention in recent years (McCarty et al., in press). One method of implementing quality improvement in the mental health field has been the provision of feedback to clinicians. The value of such feedback-based performance improvement systems in the mental health field has been demonstrated in a series of randomized clinical trials conducted by Lambert and colleagues. These studies have shown that providing feedback to mental health clinicians about the progress of individual patients can improve outcomes compared to not providing feedback (Lambert, Hansen, & Fitch, 2001; Lambert et al., 2003; Lambert, Whipple et al., 2001; Hawkins, Lambert et al., 2004; Harmon, Lambert et al. 2007; Whipple, Lambert et al. 2003).
In the addiction field, performance improvement methods of various kinds have been implemented on a clinical basis. These include a Methadone Treatment Quality Assurance System that provided performance improvement reports on a quarterly basis to supervisors in 70 Veterans Affairs (VA) clinics (Ducharme & Luckey, 2000; Phillips et al., 1995) and the Quality Enhancement Research Initiative, which provided performance monitoring, feedback, and dissemination of best practice guidelines to administrators and clinicians in VA settings (Finney, Willenbring, & Moos, 2000). Another outcomes assessment system for use in VA substance abuse clinics that has been described has the potential to provide feedback at the program level by collecting baseline and six-month follow-up data (Tiet et al., 2006). One study has been published evaluating performance improvement systems for substance abuse counselors. In this study, providing feedback reports on patient attendance data to clinicians in a substance abuse treatment clinic resulted in improvements in attendance (McCaul & Svikis, 1991).
We have previously reported on the feasibility of implementing a comprehensive performance improvement system in outpatient substance abuse treatment clinics (Forman et al., 2007). This system, called Patient Feedback (PF), employs near real-time monitoring of therapeutic alliance and treatment satisfaction by clinicians and supervisors working in outpatient substance abuse treatment clinics. Therapeutic alliance was chosen because it consistently predicts the outcome of psychotherapy and counseling (Martin, Garske, & Davis, 2000) and has also been found to be associated with outcome in substance abuse settings (Gillaspy, Wright, Campbell, Stokes, & Adinoff, 2002). Treatment satisfaction was chosen because it is commonly an element of quality monitoring systems in addiction treatment programs (National Treatment Center Study, 2005). By monitoring therapeutic alliance and treatment satisfaction, the PF system is designed to assess the interim effectiveness of clinicians’ average outcomes for their full caseload of patients so that they can then make modifications if needed. We also hypothesized that regular assessment of alliance and treatment satisfaction might influence clinician behavior by signaling that alliance and satisfaction are priorities of the organization (Alvero et al., 2001; Berwick et al., 1990; Nicol & Hantula, 2001).
The purpose of the current article is to report on the results of a randomized controlled trial evaluating the efficacy of the PF system on both patient and clinician outcomes. We hypothesized that utilization of PF system would result in more positive outcomes with regard to greater improvements in average patient drug and alcohol use, attendance at group counseling sessions, and alliance. It was also hypothesized that use of the PF system would be associated with more favorable clinician views of their organization as reflected in perceived increases in the organization’s motivation for change, adequacy of institutional resources, staff attributes (e.g., potential for professional growth, confidence in counseling skills, adaptability), organizational climate, quality of supervisor-employee relations, and clinician job satisfaction.
The study was a randomized, 12-week, controlled trial conducted in community-based substance abuse treatment clinics. Data collection began in January 2007 and ended in October 2008. Participating clinics were randomly assigned to use the PF performance improvement system either immediately (PF group) or after a delay (control group). Clinics randomized to the delayed group had subsequent access to the PF system after the 12-week study period.
The study took place in 20 community-based non-methadone maintenance outpatient substance abuse treatment clinics in the Philadelphia and New York areas. To be eligible to participate in the study, the clinics had to have at least four clinicians who were currently conducting group counseling sessions (at least once a week) and were able to attend a monthly staff team meeting. Clinics also needed to have internet access for their clinicians and supervisors. All study materials, procedures, and consent forms were approved by all relevant Institutional Review Boards for each participating clinic.
Group counseling was the primary clinical modality at all participating clinics, with all patients expected to participate in the groups. All participating clinics had clinical policies for the regular implementation of biological testing, and typically also specifically tested patients who were suspected of using drugs or alcohol.
To be eligible for participation in the study, clinicians had to be conducting group counseling sessions on a weekly basis in one of the 20 enrolled clinics. Clinicians were considered the human subjects in this study, thus they provided written informed consent.
To be eligible to participate in the study, patients had to be receiving group counseling for substance abuse problems at one of the 20 clinics enrolled in the study. Informed consent by patients was not required by the participating Institutional Review Boards because the patient survey was anonymous and there was minimal perceived risk. However, patients were oriented to the study and their participation was voluntary. During a given study week, all patients who were attending participating clinicians’ eligible group counseling sessions during that week were recruited to complete the PF survey regardless of the how long they had been in treatment. The Week 1 (baseline) study assessment was therefore not necessarily at the beginning of the course of treatment for each patient. Additionally, at each subsequent assessment point during the study, the group of patients within each clinic could be different from that of the prior assessment because former patients may have ended treatment and new patients may have been admitted.
The Patient Feedback Survey form is a one-sided sheet of paper with optical character recognition marks, instructions to the respondent, and 14 survey items. The PF Survey was originally designed in English, and additional Spanish and Polish translations of the survey were distributed as needed in this study for selected clinics. The PF Survey was developed in accordance with QI indicator development guidelines (JCAHO, 1998; Wilkinson et al., 1998; Meyer, 1994) and consisted of four item domains: a) therapeutic alliance; b) group treatment satisfaction; c) demographics; and d) self-reported substance use.
The therapeutic alliance scale (Crits-Christoph et al., 2004) was a brief four-item scale derived from the California Psychotherapy Alliance Scale (CALPAS; Gaston, 1991), with each of the items assessing one of four main dimensions of the alliance (bond, agreement on tasks, agreement on goals, therapist understanding). The total of the four items was found to correlate 0.80 with the CALPAS total score deleting the four items (Crits-Christoph et al., 2004). Using Week 1 scores from the current study, the brief alliance scale was found to have an internal consistency (Cronbach’s alpha) of 0.87. Each item was rated on a 5-point scale from 1 (not at all) to 5 (very much so).
Demographic and length of stay items were incorporated into the survey so that feedback reports could display data separately by patient gender, ethnicity, and length of stay in the treatment program. For feedback reports, length of stay was categorized as less than one month, one to three months, or greater than three months. Separate displays of clinician performance based on each of these demographic categories were done so that clinicians and supervisors could focus their performance improvement efforts within specific patient characteristics.
Two substance use items were included, number of days (in past 7) of alcohol use and number of day (in past 7) of drug use. These two substance use items were adapted from the Drug and Alcohol sections of the Addiction Severity Index (McLellan et al., 1980). In contrast to the other PF items, responses to these items were not provided to supervisors or clinicians; instead, they were used in our appraisal of the impact of the PF system (i.e., as primary outcome measures).
Prior to randomization, clinic supervisors and clinicians were informed of the study procedures and the responsibilities they would have if they chose to participate. The clinic supervisors and clinicians then provided consent for participation. Each program designated a project assistant whose task it was to facilitate communication between clinic staff and research personnel as well as assist with some weekly research administrative responsibilities.
Once consent was obtained, clinics were randomly assigned to either the PF or control study group. Random assignment was stratified based on the clinic size. Random assignment of clinics instead of clinicians was necessary in order to allow clinics to function as teams.
Following randomization, site initiation began. During this time, clinicians and supervisors from both the PF and control groups completed the baseline staff measures (described below). The clinic supervisor and the project assistant were trained on the project procedures (including orientation of patients, collection of attendance data, survey distribution to patients, and survey collection and faxing to the data center).
Participating patients at the clinics in the delayed (control) condition completed the PF survey during Week 1 and then again at Week 12 (not during Weeks 2–11). In addition, patient attendance data was collected by the project assistant for all 12 weeks of the delay. At the end of the delay, clinicians in the delayed condition were asked to complete the follow-up staff measures. Following the 12-week delay, the clinics assigned to this study condition received the PF intervention.
Clinics that were randomly assigned to the PF (immediate) group completed the site initiation procedures as described above. This process was followed by clinician training in use of the PF system. This training was conducted via an internet-based conference. Clinic staff viewed a web-based presentation of slides while listening to the oral presentation over the telephone. The internet-based conference provided training on the PF System and included: 1) an outline of the study with a review of study procedures; 2) the principles of quality improvement as they apply to individuals and organizations; 3) a discussion of the importance of the confidentiality of feedback; 4) a review of PF survey item content; 5) an explanation of PF Report interpretation utilizing sample data; 6) the careful consideration of the strategies and goals of the Team Meeting; 7) a description and review of the PF Manual; 8) a tour of the PF Website; and 9) a sample of the Patient Feedback Newsletter to preview. Once the training was complete, a start date for Week 1 survey distribution was agreed upon. At this point, clinics in the PF (immediate) group would proceed with patient orientation, survey distribution, survey collection and weekly faxing to the data center; attendance data collection and calculation; and viewing feedback reports at the individual clinician and clinic levels. This occurred each week for the 12 weeks of the intervention. During this period, team meetings were scheduled to occur once every four weeks, around Weeks 3, 7, and 11. During these meetings, supervisors reviewed treatment strategies, set goals for improvement, and utilized the clinic reports to stimulate discussion. Following the 12 weeks of intervention, the clinic staff completed the follow-up clinician measures.
Data was extracted from the clinic’s administrative record to provide information on attendance of patients within each clinicians’ caseload of participating patients. Project assistants entered those data onto an attendance form that was then faxed to the data center. The primary attendance data for each clinician was the ratio of the total number of patients that attended group sessions divided by the total number of patients scheduled to attend group sessions each week. These attendance data were collected throughout the 12-week study period for both the PF and control groups.
Potential patient participants were approached by their clinician or the project assistant and were oriented via a written document to the nature of the study. Patients were made aware that participation was voluntary and fully confidential (no patient names or code numbers were on the survey). Guidelines for completing the survey were provided. Patients were informed that the surveys were maintained in a confidential manner and that they were stored in a locked box. Only the project assistant and research staff would have access to the box and cabinet (not the clinicians).
The PF survey was distributed to all eligible patients at the end of the group counseling sessions (weekly in the PF group; Weeks 1 and 12 for the delayed group). The procedures dictated that the clinician would not be present while patients completed the survey. Clinicians instructed patients to only complete one survey per week if the patient attended more than one group in a week with that clinician. Occasionally, patients may have attended two (or more) groups led by different clinicians in a given week, and therefore may have completed more than one survey that week.
The project assistant collected the PF surveys, completed the attendance forms, and faxed the material to the central data center for the study. The faxed surveys and attendance forms were read by optical scanning software on a fax server, and the data automatically entered the database. Custom software analyzed the data and calculated scale scores for the therapeutic alliance scale (averaging item scores for each scale). This data was then placed on time-series graphs as part of the PF report (see below).
Data gleaned from the PF survey enabled the PF data system to automatically produce two types of feedback reports: 1) clinic reports that aggregated PF survey data from the entire clinic (from all participating clinicians); and 2) caseload reports that represented data from individual clinician caseloads. The feedback reports included both graphs and tables of the data for easy viewing. Each report was posted for viewing on the internet within one week (usually the same day as the faxed data was received). Clinicians and supervisors had password-protected access to the reports. To protect clinician confidentiality, supervisors were not given access to the individual clinician caseload reports; they only had access to the clinic reports. The feedback reports were not intended to enable supervisors to evaluate individual clinician’s performance but instead to be utilized as a performance improvement device for the clinic as a whole. Caseload reports could be viewed by the relevant individual clinician so that the clinician could assess for themselves areas of strength or weakness.
As the intervention progressed through the 12 weeks, data were added to the graphs and tables to develop a time-series presentation. Graphs included in the feedback reports were: 1) therapeutic alliance by length of stay; 2) therapeutic alliance by ethnicity; 3) therapeutic alliance by gender; 4) treatment satisfaction by length of stay; 5) treatment satisfaction by ethnicity; and 6) treatment satisfaction by gender; and 7) percent of actual to scheduled attendance. To protect patients’ confidentiality, a threshold was established for a minimum number of patient surveys that would be displayed in the feedback reports. If fewer than five patient surveys were available for a given week’s graph/table, that graph/table was not produced so as to prevent a clinician from connecting the data to a particular patient. A similar level of protection was applied for the clinicians, assuming that if too few clinicians provided patient data, the supervisor could possibly surmise which clinician provided data of interest. If fewer than four clinicians contributed data for a given clinic report, that feedback would not be provided at the clinic level for that week. In addition to the reports already described, graphed data were also provided, representing average patient attendance data for each clinic and clinician.
Clinicians and supervisors were trained during the internet-based conference to read the graphs, and they were encouraged to utilize the PF Manual for additional guidance when considering their reports. Clinicians and supervisors alike were encouraged to review the clinic report as a team during their monthly meetings; individual clinicians were encouraged to examine their own caseload reports, and if desired, they could discuss their caseload reports with other clinicians and/or their supervisor on a voluntary basis. Clinic report data could be utilized by supervisors to inform decisions about staff training, supervision, and resource distribution.
Monetary incentives were provided for accessing the feedback reports. Because many clinics do not allow individual clinicians to receive compensation from outside entities, the incentive was provided to the clinic. The PF website was programmed so each time a caseload report was downloaded for the first time, the clinician was notified of having earned $1, $5, $20, $50 or $100 for the clinic. On average, clinicians could each earned up to $100 for the clinic if they accessed every available caseload report during the study. The total rewards earned by all clinicians were aggregated into a total clinic reward sum that appeared on the PF website homepage for all clinicians at a particular clinic to see. At the end of the study, the clinical team could decide how to spend the clinic reward money.
As previously mentioned, clinic supervisors led monthly team meetings during which clinic staff viewed and discussed the clinic reports. Staff identified quality indicators that needed attention (e.g., ratings of therapeutic alliance by patients in treatment for less than one month), action steps they intended to engage during the next month, and who would be responsible for implementing the action steps. A PF Team Meeting form was completed by the supervisor during the meeting, documenting the participants in the team meeting, the indicators selected, and the action steps planned. The forms were faxed each month along with the feedback surveys to the Data Management Unit. The meetings were structured according to JCAHO quality improvement guidelines (JCAHO, 1998); the PF Manual and internet-based conference training sessions provided instruction to the staff on how to conduct and benefit from these meetings.
Each month, an electronic newsletter was published by the research team. It was emailed to the participating clinic staff at all 20 clinics and posted on the website (only accessible to participants who were active in the intervention phase). This monthly newsletter, Patient Feedback Newsletter, accomplished the following: 1) highlighted the accomplishments of participating clinic teams; 2) encouraged the use of evidence-based practices; and 3) disseminated information about quality improvement and the PF System. In each issue of Patient Feedback Newsletter, team innovations and the accomplishments of individual clinics were recognized. Anecdotal information and opinions derived from interviews with clinic staff along with team photographs and quotes were included in cover stories (all with permission) in an effort to share the different experiences the clinics had with each other.
All participating clinic supervisors and clinicians completed the clinician measures before randomization and again after the 12-week study period (intervention or delayed) was completed.
The ORC assesses organizational characteristics along four major domains, comprised of 18 subscales of organizational readiness to change (Lehman, Greener, & Simpson, 2002). The four major domains are motivation for change (perceived program needs, training needs, and pressure for change, with higher scores indicating greater needs/pressure for change), institutional resources (adequacy of office space, staffing, training resources, computer access, and use of e-mail and Internet; higher scores indicate greater availability of resources), staff attributes (potential for professional growth, confidence in counseling skills, ability to influence coworkers, adaptability; higher scores indicate more positive staff attributes), and organizational climate (clarity of mission and goals, staff cohesiveness, trust and cooperation among staff, staff autonomy, management’s openness to communication from staff, perceived stress, and openness to change; higher scores indicate a more favorable organizational climate). The reliability and validity of this self-administered measure was established in a study involving over 500 clinicians from more than 100 programs (Lehman, Greener, & Simpson, 2002).
The LMX-7 is a 7-item widely-used self-administered instrument designed to measure the quality of the working relationship between supervisors and employees (Graen & Scandura, 1985; Graen and Uhl-Bien, 1995). A total score of the seven items is computed, with a higher score indicating a more positive working relationship.
The MSQ is a widely-used measure of job satisfaction (Weiss, Dawis, English, & Lofquist, 1967). The short form contains 20 self-administered items using a 5-point scale. The responses are scored to produce extrinsic, intrinsic and general satisfaction scores. High scores indicate greater satisfaction.
The primary efficacy measures for the study were the drug and alcohol use scores from the PF survey at Week 12. These scores were nested within clinicians, who were in turn nested within the clinics. The primary analysis included only patients who have been in treatment for one month or less. This specification was based on the results of the feasibility study (Forman et al., 2007), which indicated that for patients who had been in treatment longer than one month, drug/alcohol use was very low. These floor effects were less pronounced in the feasibility study among patients who were in treatment one month or less.
Secondary analyses examined (1) all patients regardless of treatment duration, and (2) only those patients who were present for both the Week 1 and Week 12 assessment (although patients did not put a name or identifying number on the survey, we were able to uniquely identify a subset of patients who completed surveys based on their month and year of birth and demographic variables).
The primary drug and alcohol use measures revealed a preponderance of zeros (> 80%). To account for this high level of zeros, a mixed-effects, zero-inflated Negative Binomial (ZINB) model was used for the analysis of the primary outcomes (Hedeker & Gibbons, 2006; Lambert, 1992). The ZINB model, which is an extension of the more recognized zero-inflated Poisson (ZIP) model, specifies the conditional distribution as being Negative Binomial and thereby is a non-linear analog to the linear mixed-effects, repeated-measures model. There are two components to the model: a logistic regression portion to account for the zero state and a Negative Binomial regression part to account for the non-zero state. Unlike the Poisson regression model, the Negative Binomial regression model relaxes the assumption of equality of the mean and variance, which is referred to as dispersion in the Poisson regression and ZIP models (Lambert, 1992; Atkins and Gallop, 2007). To implement the model with our data, we initially included a random intercept for clinic, and random intercept and slope for clinician within clinic. The fixed effects part of the model included the binary intervention group factor. In addition, the models examining study group differences in Week 12 scores included clinicians’ baseline (Week 1) level on the respective drug or alcohol use item as a covariate. Comparable ZINB models were conducted on the Week 1 scores alone to assess any baseline differences between the study groups. To fit these models, we use the SAS Procedure, PROC NLMIXED, in SAS 9.1.3 (Littell et al., 2006). Initial analyses using the ZINB models described above revealed the variance component for the clinic term going to zero. The clinic term was therefore dropped from all analyses.
The distribution of the alliance scores was also non-normal, with a relatively high (about 35%) of scores at the maximum (5 on the 1-to-5 scale). A Box-Cox power transformation (Box & Cox, 1964) revealed that a cube transformation was most appropriate to address the non-normal distribution. The transformed Week 12 scores were analyzed using a mixed effects model with a nested random effects structure (clinician within clinic). All patients (i.e., not only those selected based on treatment duration) were included in these analyses. Week 1 scores were used as covariates. PROC MIXED in SAS was used for these analyses. The attendance measure was collected at each of the 12 weeks of the study and therefore an additional level of repeated measures over time was included in the mixed effects analysis of this variable. The clinician measures (ORC, LMX-7, MSQ) at Week 12 were analyzed using analyses of covariance with baseline levels and treatment group as terms in the model.
Across the 20 clinics, there were a total of 123 clinicians who qualified to participate in the study as a treating clinician (additional clinicians functioned as supervisors but not as treating clinicians). Of these 123, 118 (96%) consented to participate in the study. Within the sample of 118 clinicians, 72.1% (85/118) were female and 12.0% (14/117) reported their ethnicity as Hispanic or Latino. The clinicians’ racial demographics were 74.4% (87/117) Caucasian, 18.8% (22/117) African-American, 0.9% (1/117) Bi-racial, and 6.0% (7/117) other. The average age of the participating clinicians was 41.9 (with a range of 22 to 78). On average, 2.5% (3/118) these clinicians had worked 0–6 months, 11.9% (14/118) had worked 6–11 months, 26.3% (31/118) had worked 1–3 years, 11.0% (13/118) had worked 3–5 years, and 48.3% (57/118) had worked more than 5 years as a clinician. In terms of highest level of education, 1.7% (2/118) of the clinicians had a high school diploma, 9.3% (11/118) had an associate’s degree, 32.2% (38/118) had a bachelor’s degree, 55.0% (65/118) had a master’s degree, and 1.7% (2/118) had a doctoral degree.
In the PF group, there were 11 clinicians (out of 66) who began the trial but did not complete the study; in the delayed group, there were 9 (of 52) who dropped out before completing the study. In addition, six clinicians in the PF group and one clinician in the delayed group were consented for the study but were not treating patients during study Week 1 (e.g., due to vacations, illnesses, etc.); however, they did treat patients subsequent to Week 1 and were included in analyses of attendance data and change in clinician measures.
At the Week 1 (baseline) assessment, the mean alliance score for clinicians was 4.27 (SD = 0.83), reflecting very good alliance (on the 1 to 5 scale) for the average patient in clinicians’ caseload.
The anonymous nature of the PF survey and changing patient samples over time preclude an exact description of all of the patients who participated. However, using data obtained from the Week 1 assessment only (N = 1,584), the general nature of the patient population could be summarized on the available descriptors. The patient sample was approximately 66% men and 34% women; 49% identified themselves as Caucasian, 38% African-American, 10% Latino, and 3% other. In terms of treatment duration, 7% percent of patients had been in treatment less than one week; 22% from one to four weeks; 26% from one to three months; 45% more than three months.
At study Week 1, focusing only on those patients who had been in treatment for one month or less, the logistic component of the ZINB model indicated that there was no significant difference between the study groups in any usage vs. no usage for alcohol (t(234) = −.001, p = 0.99) or drug use (t(232) = 1.29, p = 0.20). The PF group had 81.9% of patients reporting no use in the past seven days at Week 1, while the delayed group had 74.5% of patients reporting no use in the past seven days at Week 1. There was, however, a significant difference in the amount of use between the study groups (Poisson regression component of ZINB model) for those reporting any alcohol use (t(234) = −2.17, p = 0.031). For those with any alcohol use, the PF group had a mean alcohol use of 1.76 (SD = 1.21) days in the past week and the delayed group had a mean alcohol use of 2.34 (SD = 1.74) days in the past week. The study groups did not differ significantly (t(232) = 0.29, p = 0.77) in the average amount of drug use at Week 1 for those patients with any drug use (PF: 2.78 days [SD = 2.1]; Delayed: 2.63 days [SD = 2.07]).
Analysis that included all patients at Week 1, regardless of treatment duration, revealed that the study groups differed significantly on both the alcohol (t(609) = 2.19, p = 0.029) and drug use items (t(602) = 4.88, p < 0.0001) in regard to any use vs. no use. In the PF group, 88.0% of patients reported no alcohol use and 91.4% reported no drug use. In the delayed group, 82.4% reported no alcohol use and 82.4% reported no drug use.
These baseline differences between the treatment groups, and low overall levels of drug and alcohol use, prompted us to examine clinic variability in patient report of alcohol and drug use. For those patients in treatment one month or less, there was a significant difference among the clinics in the percent of patients reporting use vs. no use of both alcohol (χ 2 (15) = 40.5, p < 0.0004) and drugs (χ 2 (15) = 29.7, p = 0.013). A similar between-clinic effect was evident when all the data were examined (regardless of duration of treatment) for both use vs. no use of alcohol (χ 2 (18) = 130.6, p < 0.0001) and drugs (χ 2 (18) = 119.91, p< 0.0001).
The average number of days of use for each of the 20 participating clinics (all patients), shown in Table 1, suggested that several clinics were outliers on these scales, either displaying considerably higher use than most clinics or, in the case of three clinics, displaying no drug use for any participating patients. The inclusion of clinics with no drug use at baseline presented an obvious problem that such clinics could show no improvement on this measure. No drug use within a clinic also likely contributed to the confounding of clinician and clinic (i.e., all clinicians within these clinics had the same scores) that led to variance components for the clinic factor going to zero in the ZINB mixed effects models. Therefore, additional exploratory analyses of change in drug and alcohol use were conducted with the outlier clinics removed from the analyses. The outliers included one clinic (Clinic 16 in Table 1) that had no drug use for any patients at Week 1, and three clinics that had much higher (relatively) drug or alcohol use than the other clinics (Clinics 2, 9, and 10 in Table 1).
In the primary sample (those in treatment for one month or less at Week 1), there were no significant differences between the PF and delayed groups on alcohol use for the use vs. no use logistic portion of the ZINB model (t(82) = 1.35, p = 0.18) or for the ZINB test of the amount of use among those using alcohol (t(82) = 0.32, p = 0.75). Similarly, for drug use, neither the logistic parameter (t(82) = 1.01, p = 0.32) nor the test of amount of use among those that use (t(82) = −0.13, p = 0.90) was statistically significant. In the PF group, the average alcohol abstinence rate increased from 81.9% at Week 1 to 85.0% at Week 12; in the delayed group, alcohol abstinence increased from 74.5% to 75%. Abstinence from drugs decreased slightly from 89.8% (Week 1) to 87.2% (Week 12) in the PF group; the delayed group increased from 75.7% to 80.1%.
The secondary analyses that included all patients (regardless of treatment duration) also revealed non-significant logistic components in the ZINB models for both alcohol (t(94) = 1.66, p = 0.10) and drug use (t(94) = 1.69, p = 0.095). The tests for amount of use among those that use was not significant for either alcohol (t(94) = −0.32, p = 0.75) or drugs (t(94) = −1.02, p = 0.31). In the PF group, Week 1 abstinence rates were 88.0% (alcohol) and 91.4% (drug), and Week 12 abstinence rates were 91.2% (alcohol) and 92.9% (drug). The comparable rates for the delayed group at Week 1 were 82.4% (alcohol) and 82.4% (drug), and at Week 12 were 83.0% (alcohol) and 85.0% (drug).
We also conducted analyses on a subset of patients who could be uniquely identified using site, gender, ethnicity, month of birth, and year of birth and who completed the PF survey at both Week 1 and Week 12. For this subgroup (n = 92) of patients, there was no evidence that the intervention group had greater change on alcohol use (use vs. no use: t(108) = −0.89, p = 0.38; amount of use among those using: (t(108) = −0.72, p = 0.47) or drug use (use vs. no use: t(180) = 0.01, p = 0.99; amount of use among those using: (t(180) = −0.01, p = 0.99).
Analyses that excluded outlier sites, whether or not restricted to the subgroup of patients who were in treatment for one month or less, also revealed no significant differences between the treatment groups in drug or alcohol use outcomes.
There was no significant difference between the PF and delayed groups on Week 12 alliance scores (F(1,115) = 0.34, p = 0.56) in the full sample of patients. The mean alliance scores at Week 12 were 4.40 (SD = 0.84) for the PF group and 4.31 (SD = 0.85) for the delayed group. Similarly, no significant study group differences (F(1,104) = 2.11, p = 0.15) were evident in analyses conducted with the subsample of patients present at both Weeks 1 and 12.
The attendance data revealed no indication that the PF system had an impact (Table 2). There was no significant overall time effect (F (1,120) = 3.08, p = 0.08), no significant effect for intervention group (F(1,120) = 2.33, p = 0.13), and no significant time by treatment interaction (F(1,120) = 0.06, p = 0.81) for attendance.
There were no significant differences between the study groups in change from baseline to Week 12 in the LMX-7 total score or in the three scales of the MSQ (Table 3). T-tests examining change from pre- to posttreatment within each treatment group revealed no significant improvements in either group on any of the LMX-7 scales.
Analysis of the four overall dimensions from the ORC revealed one significant difference between the study groups in change from baseline to Week 12 (Table 3). This significant difference was on the Institutional Resources scale. However, this effect was due to a small mean decrease in the PF group (indicating less availability of resources at Week 12 compared to Week 1) combined with a small increase in the delayed group.
The overall finding of this study was that there is no evidence for the effectiveness of the PF system. The PF intervention group did not show significantly greater improvement compared to the delayed condition (control group) on clinicians’ average levels of patient-reported drug and alcohol use, therapeutic alliance, or patient attendance at group counseling sessions. In addition, the PF intervention did not improve (either over time or relative to the delayed treatment control group) clinicians’ views of their job satisfaction, the quality of the working relationship between supervisors and clinicians, or their organization’s readiness to change. Thus, the current study casts doubt on the clinical use of the PF system and other performance improvement systems that are similar in design to that used in this study.
One of the major reasons for the lack of intervention effects of drug/alcohol use and alliance may have been the lack of room for improvement on these measures. In general, over 75% of patients reported no drug or alcohol use at baseline (Week 1). Based on our feasibility study (Forman et al., 2007), we anticipated this finding and accordingly designed our primary outcome analyses to be restricted to those patients who were in treatment for one month or less so that greater drug and alcohol use was present. However, even with this restriction, the proportion of patients with no drug use or alcohol use at Week 1 was very high. Variability across clinics on these measures was also apparent, with some clinics showing near zero use for all patients and a few other clinics displaying much higher than average use. The relative lack of drug and alcohol use among patients in outpatient substance abuse treatment programs is a relatively unrecognized aspect of the community-based treatment system. Many community-based clinical trials involving substance abuse treatments select only patients who are currently using drugs or alcohol at baseline and therefore these studies have not reported such low levels of usage (e.g., Ball et al., 2007; Carroll et al., 2006; Pierce et al., 2006; Petry et al., 2005).
A variety of factors are likely responsible for such low drug and alcohol use. One is that some patients are mandated to drug treatment after a legal problem and are already drug-free before starting treatment, with staying drug-free a requirement of their mandate. Other individuals may have been convicted of driving under the influence and have been required to participate in outpatient treatment despite the fact that they are not regular heavy drinkers. An overriding factor is the nature of substance use disorders: most patients can readily stop using for brief periods; treatment is more needed to sustain recovery over a longer period of time. Many patients may enter treatment following a binge, rapidly become abstinent, and then stay in treatment in order to prevent future drug/alcohol use. A final factor may have been the self-report nature of the drug and alcohol use measures. Although relatively high concordance of self-report and biological assessments of drug and alcohol use has been seen in clinical trials (e.g., Ball et al., 2007; Dillon et al., 2005; Crits-Christoph et al., 1999; Project MATCH Research Group, 1997), there have also been reports of relatively low concordance in clinical practice settings (Chermack et al., 2000; Ehrman et al., 1997; Harris et al., 2008; Morral et al., 2000). Despite the fact that all of the participating clinics used drug and alcohol screens as part of their normal clinical procedures, and the fact that our patient survey was anonymous, patients may still have underreported drug and alcohol use, and this underreport may have limited our ability to detect an intervention effect.
The therapeutic alliance measure also displayed a “ceiling” effect. This effect, however, was not as pronounced as with the drug and alcohol outcomes (approximately 34% of patients reported the maximum score of 5 at Week 1), and therefore there was some degree of room for improvement. Scores on patient-rated alliance scales that are near the top of the scale have often been found in psychotherapy studies (Barber et al., 2001; Connors et al., 1997; Feldstein & Forcehimes, 2007), including community-based studies (Crits-Christoph et al., in press), so our finding is not unusual. Despite these typical high levels of average alliances, it has been possible to demonstrate improvement in clinicians’ average alliances, at least among relatively less experienced psychotherapists (Crits-Christoph et al., 2006). Furthermore, our attendance measure did not show a “ceiling” effect. Thus, it seems more likely that the lack of any evidence for the effect of the PF intervention is more likely due to an ineffectual intervention rather than measurement problems.
We can speculate on possible reasons for the lack of efficacy for the PF system. One contributing factor may have been that performance improvement is largely a concern of program administrators, supervisors, and policy makers, rather than clinicians. The PF system was designed to not allow program administrators or supervisors to be able to identify which clinicians were performing relatively poorly. This constraint was implemented so that the research would not negatively impact on clinicians’ employment. Moreover, research has shown that supervisors who are not facile in feedback can de-motivate performance through improper use of performance data and feedback systems (Anderson, Crowell, Doman, & Howard, 1988; Daniels & Daniels, 2004). However, it is exactly this individual performance information that was withheld from supervisors that is of greatest interest to them. Thus, supervisors may have had little motivation for enhancing the efficacy of the intervention through active problem solving during PF team meetings. Clinicians, on the other hand, may be primarily concerned about the clinical well-being of individual patients. As long as their job is not threatened, their motivation for general performance improvement may be marginal. This may be particularly true among highly experienced counselors who believe that they already perform at an advanced level. In the current study, 57% of clinicians had a master’s or doctoral degree and almost half had worked as a clinician for more than 5 years, suggesting that many of the clinicians in the sample were likely at a relatively more advanced level. Moreover, the design of the research study, with assessments of clinician performance conducted every week or 12 weeks, may have been seen as a burden by clinicians, further decreasing their enthusiasm for attending to the feedback. Although the PF system included a variety of elements oriented towards engaging clinicians (e.g., downloadable clinically useful forms from the PF website; links to useful addiction-related information on the website; newsletters that highlighted individual clinics and provided information about performance improvement), without a strong motivation to improve one’s performance, clinician-level feedback may not be effective.
The lack of change on clinicians’ views of their job, their supervisor, and their agency’s readiness to change may have been due to larger organizational problems at these clinics that are not adequately addressed by an intervention like the PF system. Problems such as a lack of adequate funding to provide training and other resources, excessive clinical hours for counselors, and low salaries would lead to low job satisfaction and affect clinicians’ perspectives on their supervisors and agency.
The findings from this study, while negative, point to some potential fruitful directions for further research. The low level of alcohol and drug use at baseline suggest that performance improvement studies for enhancing the effectiveness of substance abuse treatment need to either select only patient and/or agencies with higher baseline levels of drug/alcohol use or rely on other types of outcomes rather than recent substance use. The ideal clinical outcome might be the proportion of patients that relapse. However, high levels of attrition from treatment would undermine the ability to measure this outcome on large proportions of a clinic population. In addition, the usefulness of providing feedback on such long-term outcomes, in contrast to regular (e.g., weekly) feedback might be limited. Another strategy might be to target a performance improvement intervention to only less highly trained and experienced clinicians who may be more in need of performance improvement. Additionally, to increase motivation, the use of incentives to clinicians for improving performance may result in greater changes than seen in the current study.
Another alternative to the approach taken in the current study would be to provide feedback for individual patients measured at baseline and weekly throughout the course of treatment. This in fact has been the strategy taken by Lambert and colleagues for studies of the efficacy of feedback to psychotherapists working with mental health patients (Lambert, Hansen, & Fitch, 2001; Lambert, et al., 2003; Lambert, Whipple, et al., 2001; Hawkins, Lambert et al. 2004; Harmon, Lambert, et al. 2007; Whipple, Lambert, et al. 2003). Unlike the average caseload feedback that we provided to clinicians in the current study, this type of individual patient feedback provides the opportunity for clinicians to alter the treatment plan to potentially improve outcomes on an ongoing basis with an individual patient. Lambert et al. (2005) have found that the provision of such feedback is only effective for improving treatment in about 20% of patients – in particular the subgroup of patients who have not been improving at an expectable rate (compared to other patients who begin treatment at a similar level of severity). Whether the Lambert feedback approach (e.g., Lambert, Hansen, & Fitch, 2001; Lambert, et al., 2003; Lambert, Whipple, et al., 2001; Hawkins, Lambert et al. 2004; Harmon, Lambert, et al. 2007; Whipple, Lambert, et al. 2003) using the Outcomes Questionnaire for feedback reports would be successful within the context of a substance using population is a question for further research.
In addition to the limitations mentioned above (ceiling effects; self-report measures of drug and alcohol use), there are a number of other limitations of the current study that should be acknowledged. A variety of decisions regarding the design of the PF intervention package were made to enhance the potential clinical sustainability of the system if it turned out to be effective. The Patient Survey was very brief; however, a longer survey may have provided more reliable scores and a broader range of outcomes. The strategy of assessing all patients at a clinic within a given week, regardless of how long patients have been in treatment, allowed for a quick snapshot of clinicians’ performance. This strategy resulted in a potentially different set of patients across time at each of the 12 assessments. Because we did not target individual patients for feedback and because most change for individual patients occurs over the first few weeks, the PF system did not capture the bulk of the change process that occurs for individual patients. A further potential limitation was that the monthly team meetings may have been too infrequent to provide ideas that could be implemented for enhancing performance. Another possible limitation is that there may be other factors besides the therapeutic alliance and treatment satisfaction that are most relevant to treatment retention and decrease drug and alcohol use. Feedback systems that target other outcomes may be more effective than the one examined here. A final limitation is that the PF system relied upon team meetings to generate ideas about how to respond to feedback reports so that outcomes could be improved. Several alternative ways of generating ideas for improving patient outcomes might have been more successful. These include one-on-one supervisory feedback, the use of independent testing or consultation on difficult cases, or the provision of clinical decisional supports as part of feedback reports (Whipple et al., 2003; Harmon et al., 2007).
The preparation of this manuscript was funded in part by NIDA grants R01-DA020799 and R01-DA020809. We wish to thank the participating program administrators, clinicians, and clinic staff from the following programs: In the Philadelphia region: Northeast Treatment Center; Rehab After Work – Northeast; Rehab After Work - Center City; Rehab After Work – Lansdale; Rehab After Work – Paoli; CHANCES; Brandywine Counseling (in Delaware); Princeton House Behavioral Health (in New Jersey); In New York: Outreach Development – Greenpoint; Turning Point; Odyssey House; Lexington Center – Poughkeepsie; Lexington Center - Mt. Kisco; Lexington Center - New Rochelle; Addiction Research and Treatment Corporation; Horizon - Main Amherst; Horizon - Hertel Elmwood; Horizon - Bailey LaSalle; Horizon - Niagara Falls; and Horizon - Boulevard.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.