|Home | About | Journals | Submit | Contact Us | Français|
Mental health intervention research requires clear and accurate specification of treatment conditions in intervention studies. Measures are increasingly available for community-based interventions for persons with serious mental illnesses. Measures must go beyond structural features to assess critical processes in interventions. They must also balance effectiveness, adequate coverage of active treatment elements, with efficiency, the degree to which measures may be used cost-effectively. The context of their use is changing with the emergence of new frameworks for implementation research and quality improvement.
The focus, content, and results of preliminary studies of four recently developed fidelity measures are described. Measures respectively assess fidelity to case management, cognitive therapy for psychosis, illness management and recovery, and assertive community treatment.
Fidelity measures described assess interventions in a range of treatment contexts from dyads to teams. Each measure focuses assessment resources on the elements critical to the respective intervention. Each has demonstrated coverage of its target intervention and satisfactory psychometric properties and is related to outcomes. Measures have been used for training, quality improvement, or certification. Current fidelity measures assess domains and have uses beyond their nominal position in implementation and quality frameworks.
Process components in community-based interventions can be effectively assessed in fidelity measures. Omission of elements assessing potentially critical, active treatment components poses risk to both research and practice until there is evidence to demonstrate they are non-essential. Further development of fidelity measurement theory and approaches should articulate with development of theory and methods in implementation science.
Mental health intervention research requires clear and accurate specification of independent variables actually operating in studies. Inferences about effects in experimental studies, for example, depend upon fidelity: therapists’ adherence to the intended treatment, their competence to apply it, and sufficient differentiation across conditions (1, 2). Although empirical verification of fidelity has been reported infrequently in psychological treatment research (3), fidelity has recently received greater attention in research on community-based psychosocial interventions for persons with serious mental illnesses. These program-based interventions are inherently more complex and less amenable to full specification in manuals than interventions delivered by a single clinician, often including such elements as aspects of organization, caseload, types of treatments and other services provided, as well as interactions with other programs (4). Fidelity measurement strategies and measures have been developed and used with a wide range of such programs for research and practice (5–11).
The principal research uses of program fidelity measures are to monitor and ensure adherence to particular interventions and to identify their critical ingredients. They also serve as operational syntheses of prior research and as vehicles to disseminate information to the field about essential features of evidence-based practices (4). The demands of multiple uses pose significant challenges for fidelity measure design. One is selection of features to include. Critical ingredients are identified through theory and empirical research on the active mechanisms that are expected to yield intended outcomes; with potentially multiple organizational levels, contributing mechanisms, and potential assessment points, program-based models present developers with a multitude of options. A second challenge is the need to balance effectiveness, the degree to which fidelity measures and methods capture the essential features of an intervention reliably and validly, with efficiency, the degree to which the tools can be applied cost-effectively such that the real gains from use in ordinary settings warrant the effort required to use them (12). Corresponding to the complexity of treatments, contexts, and possible uses, developers of fidelity measures have made a wide range of choices in balancing effectiveness and efficiency.
Consideration of a number of conceptual frameworks within which fidelity measures operate suggests why there is such variation. Within the classic structure-process-outcome quality framework (13), fidelity measures typically include both structure and process elements. Although fidelity measures have sometimes emphasized more accessible structural features –e.g., group size, duration of treatment – less tangible processes may be essential to program integrity (7), and there are risks to both research and practice posed by over-emphasizing structure (14–16). Such misplaced emphasis can follow from weak theory, since fidelity measures have been described as representing program theory, or theory of action of the intervention (14, 17). What is included in a fidelity measure will thus depend upon what actions and at what level the intervention is defined. In some cases, this represents a departure from program theory as such, which specifies mechanisms of change, to include implementation theory, which specifies how a program is carried out (18). A more recent model of implementation research, with primary domains of intervention strategies, implementation strategies, and three types of outcome domains – implementation, service, and client – would place fidelity as one of a number of implementation outcomes (19). However, some fidelity measures, including two described here, have addressed implementation features in three or four of these five domains. This implementation research framework itself draws on a model for assessing change at four levels – individual, group/team, organization, and larger system/environment (20); again, fidelity measures can span multiple levels. Finally, a recent heuristic model for ensuring quality of implementation of evidence-based practices proposes four main strategic categories: policy and administration; training and consultation; team operations; and program evaluation (21). Fidelity assessment is placed in the last category, but it can also support other strategies.
We describe four recent fidelity measures for community-based interventions for people with serious mental illness to illustrate a range of approaches within this context. Overall they reflect advances in effective measurement of critical processes, but they differ in terms of where and how they focus within those frameworks for theory, quality, and implementation. These measures are summarized in Table 1 and listed along a continuum of complexity of program levels.
Between 25 and 40 percent of people with a schizophrenia-spectrum disorder experience persistent psychotic symptoms (22, 23), which are associated with high levels of distress, functional impairment, and increased vulnerability to relapses (24, 25). To address this problem, cognitive-behavioral therapy (CBT) for psychosis was adapted based on the principles of CBT that were initially developed for the treatment of depression and anxiety (26, 27) and that emphasize treatment components such as the “normalization” of psychotic symptoms, teaching effective coping strategies for persistent symptoms, and critically examining and challenging thoughts and beliefs underlying psychotic symptoms (28–30). Over the past two decades, over 30 randomized controlled trials have been conducted to evaluate the effects of CBT for psychosis, with the results indicating significant effects on the reduction of psychotic, negative, mood, and social anxiety symptoms (31). CBT for psychosis is a recommended treatment for schizophrenia both in the most recent NICE guidelines from Great Britain (32) and the PORT recommendations in the U.S. (33).
In order to evaluate therapist adherence to the elements of CBT for psychosis defined by Fowler and colleagues (29), Startup and colleagues (34) developed the Cognitive Therapy for Psychosis Adherence Scale (CTPAS), which included 12 items, each rated on 7-point Likert scales, with assessments based on audiotapes of treatment sessions. Ratings on the scale pertain to specific therapist behaviors, such as “Assessing psychotic experiences” and “Validity testing.” Startup and colleagues (34) demonstrated that reliable ratings could be obtained with the CTPAS. A principal components factor analysis indicated two factors, corresponding to focus on problems and focus on delusions. This scale was used to document therapist fidelity to CBT for psychosis in two clinical trials (35, 36).
The CTPAS was subsequently revised (R-CTPAS) by adding nine additional items, and changing the rating scale in order to provide separate ratings of therapist adherence to the CBT for psychosis model, and the frequency of specific therapist activities (37). Adherence is conceptualized as therapist activities described in the manual (29) that are delivered competently, as defined by practices that are individualized to the client’s presenting problems, matched to the person’s understanding, and carried out collaboratively. Frequency items are recorded for specific therapist activities, regardless of whether they are adherent or not. High inter-rater reliability ratings were obtained. A principal components factor analysis of the presence of specific therapist activities demonstrating adherence to the model yielded three factors corresponding to “engagement and assessment,” relapse prevention,” and “formulation and schema work.” Concurrent validity was shown by demonstrating moderate associations between ratings on the CTPAS and the Cognitive Therapy Scale (38), which was developed to evaluate fidelity to CBT for depression. The CTPAS has been used to ensure adherence of therapists delivering CBT for psychosis in randomized controlled trials (39), and to compare the skills of clinicians working on a research project with those providing routine clinical practice (37).
The purpose of the Strengths Model of case management, first formulated in the early 1980’s, is to help people with psychiatric disabilities to attain the goals that they set themselves by identifying, securing and sustaining the range of resources, both environmental and personal, needed to live, play and work in a normally interdependent way in the community (40). The focus is on individual and community strengths and assets in the service of goal achievement. The Strengths Model has been the subject of four experimental/quasi-experimental studies (41–44) and five non-experimental studies (45–49). Results have been consistently positive with reduction in symptoms and improved social functioning being the most frequent finding. This body of research has been criticized for small sample sizes and the varied measures employed (50). Of particular concern is the lack of systematic monitoring of intervention implementation.
The impetus to develop a Strengths Model Fidelity Scale (SM-FS) was threefold. First, future research on the Strengths Model needed a reliable method for monitoring the intervention implementation. Second, the mental health authority in Kansas created an enhanced Medicaid reimbursement rate for providers who delivered high fidelity Strengths Model case management and other states were pursing similar arrangements. They needed a reliable method for ascertaining fidelity. Third, since the idea of strengths-based practice has gained such currency, there was a need to distinguish between the rhetoric of programs and actual practice.
The SM-FS contains three major domains: structure (e.g. caseload size, use of group supervision), supervision (e.g. field mentoring, review and feedback on the use of clinical tools), and clinical practice (e.g. use of the strengths assessment and personal recovery plan, use of naturally occurring community resources, hope-inducing behavior)(51). The measure uses the five point anchored scale format used in many fidelity measures (11). Scores can range from 11 to 55 with 45 defined as “good” fidelity. It uses multiple sources of data including case records; interviews with consumers, case managers, and supervisors; and direct observation of practice. SM-FS has face validity with expert item reviews.
One study showed the predictive validity of SM-FS for team performance in terms of consumer outcomes (52). The core outcomes included psychiatric hospitalization, competitive employment, involvement in higher education, and independent living. Fidelity reviews were conducted at baseline and then every six months during the first 18 months of implementation. Each review was conducted by at least two consultant-trainers. Inter-rater reliability (intraclass correlation) between the two raters of the fidelity scale was .97 representing a high level of agreement. Internal consistency (Cronbach’s alpha) for the 11 items was .98. Consumer outcomes were reported by the participating team case managers when fidelity reviews occurred. The data contained 14 case management teams representing 10 agencies serving an average of 953 consumers diagnosed with a serious mental illness over an 18 month period of time. The study revealed that consumer outcomes improved over time and the improvement was explained by the increase in the fidelity score indicating predictive validity. Concurrent correlations between the fidelity score and outcomes showed expected directions, which also support the associations between fidelity and outcomes.
The Illness Management and Recovery (IMR) program was developed in order to teach illness self-management skills to people with severe mental illness (53). A comprehensive review of illness self-management strategies was first conducted, which identified psychoeducation, behavioral tailoring for medication adherence, developing a relapse prevention plan, coping skills training, and social skills training to improve social support as empirically supported interventions (54). These strategies were then incorporated into a comprehensive, integrated program including ten different “modules” or topic areas aimed at teaching illness self-management strategies in the service of helping clients achieve personally meaningful, “recovery” goals. The IMR program can be implemented either individually or in groups, and generally requires 4–5 months of twice weekly meetings, or 9–10 months of weekly meetings to complete.
The IMR Fidelity Scale (IMR-FS) was developed to evaluate the adherence of clinicians to the principles of the IMR Program. In contrast to the R-CTPAS, which focuses on evaluating the fidelity of individual clinicians to the CBT for psychosis treatment model, the IMR-FS focuses on the evaluation of an overall program (i.e., all the clinicians together) to the principles and defining elements of IMR. The IMR-FS includes 13 items, each rated on 5-point behaviorally anchored scales, which tap a combination of specific structural aspects regarding how the program should be delivered (number of people in sessions, program length, comprehensiveness of curriculum provision of handouts), the provision of specific empirically supported components in IMR sessions (psychoeducation, behavioral tailoring, relapse prevention plan, coping skills training, and social skills training), and adherence to specific principles that guide implementation of the overall IMR program (goal setting and follow-up; use of educational, motivational, and cognitive-behavioral teaching strategies; involvement of significant others). Ratings are usually conducted by two assessors, based on a combination of inspection of charts, meetings with clinicians, clients, and supervisors, and direct (limited) observation of IMR sessions.
Good inter-rater reliability has been shown for the IMR-FS, which was also found to be sensitive to change over two years following training and consultation in the IMR program across 12 community mental health centers participating in the National Implementing Evidence-Based Practices project (11). The IMR-FS has also been used to document fidelity to the IMR model in three randomized controlled trials comparing IMR to usual services (55–57). Interestingly, in one of these studies IMR was implemented at 12 sites, of which 9 showed high fidelity to the IMR program. When analyses were restricted to the 9 high fidelity sites, somewhat stronger effects were found than in the intent-to-treat analyses that included all 12 sites (56).
Validation work on the IMR-FS has yet to be conducted, although there are several possible approaches. Research could be conducted to evaluate whether total scores on the IMR-FS at different agencies providing the IMR program are related to improvements in domains targeted by the program, such as illness self-management, hospitalizations, or functioning. In addition, research could evaluate whether ratings on some of the items of the IMR-FS are significantly related to independent fidelity measures tapping those same constructs. For example, one would expect that higher scores on the motivational teaching strategies item of the IMR-FS would be related to greater clinical competence on the motivational interviewing subscale of the Yale Adherence and Competence Scale (58).
Assertive community treatment (ACT) was developed as a comprehensive program to provide the full array of treatments, services, and supports needed by persons with severe mental disorders and significant psychiatric disabilities to establish and maintain fulfilling lives in the community (59, 60). The program is the single point of responsibility for enrolled consumers, has a small caseload of approximately 100 consumers shared across a 10–12-member multidisciplinary team, and provides highly individualized, integrated services in vivo, whenever, wherever, and for as long as needed in consumers’ daily lives. The model incorporates carefully specified procedures to track and respond to consumer needs, deploying staff as needed. As definitions for optimal treatment and expectations for treatment goals have changed over time, the practice of ACT has also evolved, incorporating other evidence-based practices in treatment (10) within an overall recovery orientation (61).
Following development of preliminary fidelity measures (6, 62), the Dartmouth Assertive Community Treatment Scale (7), though developed for a particular study (63), became the standard fidelity measure for ACT and has been used widely in studies up to the present (11, 64, 65). Because it was available prior to publication of the first ACT manual (60) and had a clear and accessible format and protocol, it was frequently used as a guide to implementing the program despite authors’ assertions that some key processes were not assessed. While not critical in its original application (66), the emphasis on structural features and omission of some critical process risked weaker implementation and research inferences elsewhere, especially as the ACT model evolved.
The Tool for Measurement of ACT (TMACT) (16) was designed to address these issues. It assesses use of evidence-based practices – e.g., supported employment, integrated dual disorder treatment – within the ACT model, includes items for consumer recovery orientation, and strengthens measurement of team functioning. It has 47 items in six subscales respectively defining operations and structure, core team, specialist team, core practices, evidence-based practices, and person-centered planning and practices. A protocol specifies the fidelity assessment process and provides interview questions, rules for scoring all items, and formats for collecting data and providing feedback. Items on each evidence-based practice are derived from respective full fidelity scales (11). ACT staff function as both specialists and generalists informed by others’ specialist services, so staff roles are assessed relative to other staff as well as consumers. Recovery orientation is built into items assessing person-centered planning and practices and is more generally reflected throughout the measure in assessing the focus of treatment and interactions with consumers. DACTS and TMACT scores were compared for 10 teams over 18 months; significant differences between the two measures varied over time and were a function of lower fidelity in key areas not measured by the DACTS, confirming the TMACT as a more comprehensive and higher standard than the DACTS and more sensitive to change (16).
Advances in research on community-based interventions will depend in part upon advances in our ability to measure whether they are being delivered as intended. The fidelity measures described above for four different intervention models for people with severe mental disorders represent improvements in this respect. While all of these measures include structural elements, they also include assessment of specific processes demonstrated or hypothesized as critical to successful delivery of the intervention. They were designed for use in a number of research purposes, such as validating inclusion of sites or practitioners in studies, indicating strength of intervention, or identifying critical ingredients. And their intended uses go beyond research: one or more is used to accredit programs for enhanced reimbursement rates, to certify individual clinicians, or as a tool for training and quality improvement.
They differ in important respects following from differences in program and implementation theory underlying their respective interventions. At the programmatically simplest level, the CBT for psychosis intervention is specified strictly in terms of dyadic interaction. At the other extreme of programmatic complexity, the ACT model includes specifications for program-level structures and processes theoretically required to ensure optimal delivery of services at the dyadic level. Current fidelity measurement as exemplified by these recent measures expands In practice beyond the respective niches suggested by recent frameworks for implementation science and quality improvement (19–21).
This broad practical and conceptual scope in what we currently define as fidelity measurement suggests an important future need. There are calls for refinement in program and implementation theory, as well as development of measures of implementation fidelity (17, 19). The field would gain from greater clarity in concept and definition. The term fidelity has merit as representing a general concept, but we would benefit from articulation of a typology of fidelity measurement linked to emerging frameworks for implementation and quality.
The role of fidelity measures in research could also be better clarified. For example, McGrew faulted the TMACT authors for including items that had not been individually demonstrated to predict outcomes in ACT (67). The authors’ response was that fidelity measures had rarely been pre-validated at the item level and that the TMACT was just the sort of refinement of program theory called for, based on related evidence, and necessary to move the science of this intervention forward (17, 68). Further consideration of this issue is warranted.
Validation is a related need. The four measures described here have made variable use of one or more of the approaches described by Mowbray and colleagues: reliability, structural analysis, known groups, convergent validity, and outcome prediction; but these must be used judiciously (14). Internal consistency, for example, may apply poorly to measurement of domains not representing a single underlying construct; outcome prediction may be uninformative or misleading where program variation is insufficient; and overall test-retest and inter-rater reliability may present practical challenges for program-wide assessments. However, several of these, especially convergent validity, could more routinely apply to measure components. And validation of choice of method would be important and feasible in complex programs, for example, by evaluating program-level items against aggregated results from individual-level items.
In the absence of the suggested theoretical and empirical work it would be difficult to judge the respective choices made within the four fidelity measures in balancing effectiveness and efficiency. Respective descriptions and Table 1 indicate an adaptation of both coverage and methods to the domains assessed. In the normal scientific context of testing and refinement, increasing effectiveness should also yield improved efficiency over time. All four entail considerable effort, albeit of varying types, suggesting the importance of additional work to quantify value added and establish cost-effectiveness. How much effort in fidelity assessment is warranted, and to what degree of precision, is unclear. However, there is substantial evidence that higher fidelity is generally correlated with outcome; establishment of high fidelity early on should yield substantial benefit. Even a full TMACT assessment with consultative feedback requires less than 0.5% of the annual effort of a team, a modest marginal cost for expected treatment improvement from assessment of either a new team or one showing intermediate performance, and there is reasonable concern and some evidence that diluted fidelity measurement in an environment of complex incentives may weaken both practice and research findings (16, 68). Further work is needed on development of low-risk strategies for titration of ongoing fidelity assessment effort (64).
Fidelity measurement in mental health services research is at a promising if uncertain point. New measures are being developed to measure and guide fidelity of emerging and enhanced practices in serving persons with serious mental illnesses in the community. Four recent measures illustrate this progress. At the same time, the context of use is rapidly changing as an emerging implementation science begins to articulate frameworks for addressing the compelling translational challenge of developing the necessary knowledge to establish and maintain evidence-based practices in usual care settings. Further refinement and clarification of the science and practice of fidelity measurement, along with an expanded view of its useful place in these frameworks, should be a part of that development.
Disclosures: None for any author.