|Home | About | Journals | Submit | Contact Us | Français|
Background: People with schizophrenia demonstrate considerable discrepancy between self-reported functioning and informant reports. It is not clear whether these discrepancies originate from the instruments used or from the perspectives of different informants. The goal of the Validation of Everyday Real-World Outcomes (VALERO) Study is to enhance the measurement of real-world (RW) outcomes in the social, residential, and vocational domains through selection of optimal scales and informants using a multistep process similar to the Measurement and Treatment Research to Improve Cognition in Schizophrenia (MATRICS) initiative. Methods: Forty-eight experts provided their opinion regarding the best scales measuring RW outcomes. Fifty-nine measures were nominated. The investigators selected the 11 scales that were the most highly nominated, had the most published validity data, and best represented the domains of interest. Information was provided to other experts who served as RAND panelists. Panelists rated each measure for its suitability across multiple a priori domains. Discrepant ratings were discussed until consensus was reached. Results: Following the RAND Panel, the 2 scales that scored highest across the various criteria for each of the classes of scales (hybrid, social functioning, and everyday living skills) were selected for use in the first substudy of VALERO. The scales selected were the Quality-of-Life Scale, Specific Levels of Functioning Scale, Social Behavior Schedule, Social Functioning Scale, Independent Living Skills Schedule, and Life Skills Profile. Discussion: The results show that although there are significant limitations with current scales used for the assessment of RW outcome in schizophrenia, a consensus is possible. Further, several existing instruments were rated as useful for measuring social, residential, and vocational outcomes.
Deficits in the performance of critical everyday functional skills, including social and occupational functioning, residential maintenance, medication management, and basic self-care, are present in many neuropsychiatric conditions.1 These impairments are particularly salient in schizophrenia.2 Disability in schizophrenia occurs even following successful treatment of the clinical symptoms of the illness3 and often sets in immediately after the first episode.4 Throughout the course of the illness, the majority of schizophrenic patients experience some form of impairment in everyday functioning, whether in employment, independent living, or social functioning.5 As a result, disability reduction has the potential to benefit nearly every patient with schizophrenia, yet current treatments for the illness are notably ineffective at reducing disability.6
While disability in schizophrenia appears to be related to the failure to perform critical functions in the real world, this disability is likely caused by multiple factors. Failure to perform may be due to skill deficits, motivational deficits, interfering symptoms, and/or limited opportunities or personal resources.7 Thus, what one does in the real world may not be the perfect index of what one can do, but what one can do under optimal conditions is likely an index of maximal real-world (RW) potential.
We argue that RW functioning is just one element of a more global functional outcomes construct. Factors that influence potential, such as cognitive impairments indexed by neuropsychological (NP) test scores and functional capacity (FC; ie, ability or competence in the performance of everyday living skills), as well as other individual differences such demographic factors and symptoms, including positive, negative, and depressive symptoms, have been shown to predict individuals’ RW functioning in schizophrenia.8 However, reports of RW outcomes vary across informants and contain elements of error, which can be indexed as well. Even “objective” milestones such as employment and marriage are influenced by measurable factors other than ability, such as opportunities and societal incentives and disincentives, and they are often reported inconsistently across informants as described below. Thus, each of the elements of the functional outcomes construct is measured by error-laden indices, and there is no “strict operational” definition of what “RW functional outcome” is.
Arguably, the most consistent element of the functional outcomes construct is NP performance, as measured in recent treatment studies by the MATRICS Consensus Cognitive Battery (MCCB)8. The battery was developed through expert nominations from the field and a RAND Appropriateness Panel to select measures in several domains for subsequent comparison in a formal psychometric study.9–11 The final consensus battery consists of 10 neuropsychological tests and a measure of social cognition, which met comprehensive standards for criterion-referenced validity and test-retest reliability.
Several recent studies have highlighted the variability in convergence between NP, FC, and RW performance measures. In particular, these studies11–15 examined these convergences in patients with schizophrenia using the University of California San Diego Performance-Based Skills Assessment (UPSA) as the measure of FC. Interestingly, despite the use of different NP performance measures in each of the studies, the correlation between NP performance and the total UPSA score was remarkably consistent, ranging from r=0.60 to r=0.65. Across the same studies, however, the correlation between RW outcomes and UPSA performance varied considerably, ranging from r=0.04 to r=0.50. The studies also showed considerable variance in the correlation between NP performance and RW outcomes: r=0.05 to r=0.54. The lowest correlations were found in studies using only self-report of RW outcomes, and the highest correlation, for both domains, came from a study where the RW outcome used was residential independence measured with a comprehensive assessment involving multiple sources of information. These data suggest 2 conclusions: First, performance-based measures of NP performance and FC are highly convergent with each other regardless of the NP battery employed; and second, different RW outcome measures yield widely variant correlations with corresponding performance-based measures. Because the correlations between the performance-based NP and FC measures were so consistent, the variation in correlations with RW measures and these other domains implicates shortcomings of the RW outcomes measures.
The overriding goal in treating cognitive deficits in schizophrenia is reducing functional disability. However, if the overlap between NP performance, even if measured with a highly reliable and valid assessment battery, and RW outcome is as small as it initially appears, the question remains as to whether successful treatment of cognition can realistically improve RW outcomes. One suggestion, explored below, is that current instruments assessing RW outcomes exhibit intrinsic limitations, at least when in the hands of certain informants. The most global and arguably most significant aspects of RW outcome can be measured with high reliability, admittedly with certain limitations. These include marriage or an equally stable relationship, full-time employment, and self-supported living. However, these outcomes are rare and develop over time; hence, they are impractical for use as outcomes variables in treatment studies, even for trials of treatment effectiveness. Measurement of more subtle aspects of RW outcome in neuropsychiatric conditions (eg, household management, social contacts, and job seeking activities) is rarely direct, and in many research studies, these aspects are often measured through self-report. However, recent research has shown that self-reports by patients with schizophrenia may be unreliable when compared with other sources of information; schizophrenia seems to induce types and degrees of self-report deficits that exceed those of the general population.16 Patients with schizophrenia manifested substantial problems in self-reporting their cognitive impairments, when examined on a structured rating scale that was then related to their performance on an NP assessment.17 Further, the convergence of case manager reports and patient self-reports, even of supposedly objective outcomes such as living situation and time spent working in the past week has been found to be minimal, accounting for as little as 4% of joint variance.18 Patients’ self-report of their functioning in that same study was not as strongly associated with their performance on NP and FC measures than were case manager reports, suggesting that these case manager reports have evidence of more validity than patients self-report.
The modifiable sources of reduced validity for rating RW outcomes are at least 2-fold: first, the characteristics of the informant used and, second, the RW outcome rating scale selected. Variance in reports by informants can be influenced by the amount of contact with the subject and situation specificity of the observation. In the case of self-report, the variation can be influenced by patients’ competence in self-evaluation of the quality as well as the quantity of their performance (see Bowie et al18 for an example of this). It is entirely possible that a substantially greater correlation exists between NP performance and aspects of RW outcomes than has been detected in previous studies where the RW outcome measures may have been deficient. For example, in the Twamley et al15 study, the RW outcome was based on a comprehensive assessment of residential independence, and the correlation between NP performance and this outcome was the highest for any of the studies cited above. Therefore, the next step in the construct validation process would be to evaluate candidate measures of RW outcomes with rigorous process similar to the selection of the MCCB.
The Validation of Everyday Real-World Outcomes (VALERO) in schizophrenia project represents a joint effort between researchers at Emory University and the University of California, San Diego. The main goal of the project is to improve the assessment of RW functioning and hopefully apply the findings to future treatment studies of schizophrenia. To do so, VALERO will examine the convergence between a wide range of existing RW rating scales with performance-based measures, including NP test scores and FC assessments. Researchers will identify the existing RW outcomes scale (or subscales from existing scales) that is most highly convergent in a longitudinal design. Next, the identity of the informant whose ratings are most convergent with the rest of the outcomes construct will be investigated. Candidate informants include patient self-report, a relative/caregiver, a case manager or other high-contact clinician, and a medical prescriber. Further, the VALERO project will systematically study factors possibly associated with discrepancies between self-appraisal and informant appraisal of RW functional outcomes (such as depression, metacognitive skills, and emotional intelligence) in order to inform later research attempting to increase congruence of appraisals.
The VALERO Study will complete these goals in 3 substudies. Study 1 will use assessment scales selected by a RAND Appropriateness Panel to obtain RW functional status ratings and examine the convergence of those ratings with each other and with NP and FC scores. Study 2 will attempt to determine the best informant of patient functioning, and Study 3 will examine how demographic factors, psychiatric symptoms, and other features of illness affect the convergence of self-report and informant ratings of patients’ RW functional skills performance.
In this article, we report the first step in this process. The current study used an expert survey and RAND Panel, such as those employed in the MATRICS process,9 to select the most suitable current RW outcomes measures for entry into the validation study.
A list of experts was compiled by the grant authors based on personal experience, literature searches and networking connections. Further, feedback on the expert list was provided by two rounds of review by a National Institute of Mental Health (NIMH) study section. The experts were selected because they conducted research or performed high-level clinical activity in an area that would inform the nomination process and for the breadth of types of activities in which they were engaged. Researchers and leading clinicians in academia, the pharmaceutical industry, and in rehabilitation medicine and occupational therapy were surveyed, as the ultimate goal of this project is to inform outcome measurements in a large-scale cognitive enhancement trial.
In September of 2007, e-mails sending overviews of the study and defining the concept of everyday outcomes as operationalized in this study were sent to 46 researchers and professionals. These experts were asked to “nominate the scales that you think best measure everyday outcomes in schizophrenia. The outcomes may include social, vocational, independent living, self-care, or any combination of these.” In addition, the 9 individuals selected to compose the RAND Panel were also asked to submit their own nominations. The nomination process concluded in November 2007 after each expert received 2 reminder e-mails.
Upon conclusion of the nomination process, the investigators (Drs P.D.H., R.K.H., and T.L.P.) examined the most frequently nominated scales and identified all those that met the a priori criteria for continuation to the next stage. These broad criteria were that (a) the scale was nominated by the experts surveyed, (b) the scale had available data (published or unpublished) regarding its psychometric qualities, and (c) the scale assessed social functioning, everyday living skills, or both these areas (“hybrid” scales).
Once the investigators eliminated ineligible scales for review at the RAND Panel, they established the various scale characteristics, which would be provided to and rated by the panelists. The characteristics chosen were similar to those deemed important in the MATRICS process.19 The entire citation history of the original published article for each scale was retrieved from 2 search engines: Web of Knowledge and Google Scholar. All articles citing the scale were retrieved and examined for information regarding the domains chosen by the investigators. The final rating domains selected were reliability (test-retest and interrater), convergence with other measures of the functional outcomes construct: performance-based measures of FC, and NP performance, sensitivity to treatment effects, usefulness for multiple informants (eg, self, friend or relative, case manager, or prescriber), relationships with symptom measures, practicality and tolerability for people with low education levels, and convergence with other measures of RW functional outcomes (including either other rating scales or achievement milestones). Final definitions of each of these domains are described in the “Results” section.
Data in these areas were compiled in a summary sheet along with a brief description of the scale that included time it took to administer, reference period of RW functioning, and additional information describing how the scale should be administered. These data along with copies of each scale and the citation history were distributed to the 8 panelists and chairperson (Stephen R. Marder, MD). The representatives of the panel included schizophrenia researchers studying functional outcomes, providing psychosocial treatments for disability, and conducting pharmacological treatment studies and experts on pharmacological treatments (see Appendix for a full list). No member of the panel reported a real or potential conflict of interest with the outcome of the process.
All panelists were given 1 month to review the information and were asked to submit preliminary ratings on the scales before they met at the Panel. Scales were rated on a 9-point (1–9) scale, where scores of 1–3 were poor, 4–6 were fair to good, and 7–9 were very good to superb. Preliminary results were compiled in each scale domain for each of the functional outcomes scales. These results were assembled into summary tables for each scale showing the mean, range, and SD of the preliminary survey results. These summary tables were provided to the panelists at the RAND Panel meeting.
The RAND Panel meeting was open to NIMH staff and other interested parties, with only scale developers being recused. During the RAND Panel meeting, 2 NIMH observers attended the panel but did not submit formal ratings of the scales. The panel focused on resolving discrepant ratings. Panelists discussed each item for each rating scale that had an SD of greater than 2 points until a consensus of a 1-point range around a mean value could be reached (ie, rating of 3 ± 1). Panelists then submitted their final ratings within this range.
Thirty-one e-mails led to a usable nomination (67.4% response rate) from the expert nominators, and an additional 7 experts returned an e-mail declining to participate or referring us to contact someone else. Five of the total 38 people who returned e-mails containing nominations were in the pharmaceutical industry, while the remaining 33 were in academia. Of those nominations returned, 27 e-mails contained nominations of scales that met the general criteria of the investigators.
Upon conclusion of the nominations, the experts surveyed had suggested 59 different measures. The investigators selected 2 hybrid measures, 2 social functioning measures, and 5 everyday living scales that they felt best met the agreed upon criteria and for which the literature search process was conducted. These scales are described briefly in Table 1, and their primary citation and nomination history are listed.
Following the panelists’ preliminary review of the scale information, descriptive statistics regarding their opinions was compiled in each domain for each of the 9 rating scales. These data are shown in table 2. In the initial ratings, panelists had disagreement, noted by an SD > 2, on 6 items. It was noted at the meeting that discrepancy was more prevalent in the domains of practicality, usefulness for multiple raters, and comprehensiveness. These domains varied significantly due to incongruence in the panelists’ personal perception of the definitions in each of these domains. Significant time was spent during the meeting refining definitions of these areas. Therefore, each of these domains, regardless of degree of discrepancy, was examined during the review of the scales to ensure that the ratings matched the revised definitions (see table 3). Following a clarification of the definitions, the panelists reached consensus on all items. No mean scale rating at the close of the panel differed significantly from the original rating (at P < .05).
During the panel, it was determined that one of the everyday living scales, the Role Functioning Scale (RFS), acted as a summary scale rather than an actual rating instrument. This determination occurred because analysis of the scale showed that it just assessed global functioning in 4 broad areas of functional outcomes rather than asking specific questions regarding the patients’ functioning in specific areas. As a result, it was decided that the scale would be excluded from the panel conversation. The final descriptive statistics of the panel's consensus ratings are presented in table 4.
The Quality-of-Life Scale (QLS) is a 21-item semistructured interview assessing functioning in schizophrenia. The scale addresses functioning across 4 domains: (1) intrapsychic foundations, (2) interpersonal relations, (3) instrumental role category, and (4) common objects and activities. The QLS is administered by a trained interviewer or clinician and takes about 45 minutes to complete. The scale assesses functioning over the past 4 weeks. Each of the 21 items is rated based on the interviewers’ opinions of the patient's functioning. The interviewer rates the patient on a 7-point scale with higher scores indicating higher levels of functioning. Scores on each of the items in a domain can be summed to create a subscale score, and all items can be summed to create an overall score on the QLS that ranges from 0–126.
The Specific Levels of Functioning (SLOF) Scale is a 43-item multidimensional behavioral survey administered in person to the caseworker or caregiver of a schizophrenic patient. The scale assesses the patient's current functioning and behavior across the following 6 domains: (1) physical functioning, (2) personal care skills, (3) interpersonal relationships, (4) social acceptability, (5) activities of community living, and (6) work skills. Each of the questions in the above domains is rated on a 5-point Likert scale. Scores on the instrument range from 43 to 215. The higher the total score, the better the overall functioning of the patient. The exact time frame that the survey attempts to assess functioning for is unspecified. The scale also includes an open-ended question asking the informant if there are any other areas of functioning not covered by the instrument that may be important in assessing functioning in this patient. Each informant is asked to rank how well they know the patient on a 5-point Likert scale ranging from “not well at all” to “very well.”
The Social Behavior Schedule (SBS) is a 30-item measure used to assess social functioning in chronic (community or hospital) psychiatric populations. The survey assesses a patient's past month functioning in 21 areas. The scale is administered as a semistructured interview and is given to an informant. The scale takes approximately 15 minutes to deliver. Most items are rated on a 5-point scale, with a higher score representing lower levels of functioning. Scores in each of the 21 areas can be used alone as indicators of functioning, or a total SBS score can be used. In addition, 2 additional scores can be derived from the scale, the severe behavior problems score (BSS) and the mild and severe behavior problems score (BSM). Behaviors rated areas 3 or 4 in the 21 areas are rated BSS, and items that are rated as 2, 3, or 4 are rated as BSM.
The Birchwood Social Functioning Scale (SFS) was developed to assess social adjustment in schizophrenic patients. The 79-item measure assesses social functioning across 7 domains: (1) social engagement/withdrawal, (2) interpersonal behavior/communication, (3) prosocial activities, (4) recreation, (5) independence—competence, (6) independence—performance, and (7) employment/occupation.
The SFS takes approximately 30–45 minutes to administer and can be used as a self-report or informant interview, although it is generally administered to an informant. Items are scored on a 4-point scale with higher scores indicating a higher level of functioning. Raw scores on each of the subscales are converted to a scale score. The reference period for this scale is unspecified.
The original version of the Life Skills Profile (LSP) is a 39-item informant survey assessing a patient's level of functioning. Family members, psychiatric professionals, or case workers can be used as the informant in the interview. Multiple informants can be used to create a mean informant score for each patient. The scale assesses functioning in 5 areas: (1) self-care, (2) nonturbulence, (3) social contact, (4) communication, and (5) responsibility.
Items are rated on a 4-point scale with higher scores reflecting lower functioning. The mean of the scores in each subscale is used to represent a patient's functioning in each of this areas. The time frame in which the LSP is used to assess functioning is unspecified; however, most studies have used a 3-month range.
The Independent Living Skills Survey (ILSS) is a checklist measure of basic functioning for individuals with severe and persistent mental illnesses. There are 2 versions of the ILSS, the self-report version (ILSS-SR) and the informant version (ILSS-I). Both versions can be administered in person or on paper and rate the patients on their functioning over the past 30 days.
The ILSS-I is a 103-item scale assessing basic community living skills such as appearance and care of clothing, personal hygiene, care of personal possessions and living space, food preparation, eating behaviors, care of one's own health and safety, money management, transportation, leisure and recreational activities, job seeking, job maintenance, and social interactions. Informants rate the patient on a 5-point scale ranging from never to always. The ILSS-I takes from 20 to 35 minutes to administer. The average score of each functional area is computed to determine the overall level of functioning in a given area where higher scores will mean higher functioning.
The ILSS-SR is a 61-item scale measuring appearance and care of clothing, personal hygiene, care of personal possessions, food preparation and storage, health maintenance, money management, transportation, leisure and community, job seeking, and job management. If given in interview format, there are 9 questions for the interviewer to respond to regarding the appearance of the patient. Similar to the ILSS-I, patients are asked to rate whether or not (yes or no) they complete basic tasks. Answers are summed (no=0, yes=1) and averaged per area. The ILSS-SR takes approximately 20–30 minutes to administer.
An examination of the preliminary ratings of the scales shows discrepancies on certain measures and domains. In general, there was greater variance on the ratings of the measures in the domains of practicality, comprehensiveness, and usefulness for multiple raters, suggesting variability in interpretations of the definitions as discussed above as well as variation across the scales in their approaches to assessments of RW outcomes. The social functioning domain (which included the SFS, SBS, and Social Adjustment Scale II) showed the most amount of variability in the mean score of each measure's ratings. This could suggest that the panelists had the hardest time fitting these measures into the domains or that the panelists differentially rated the utility of the various social functioning measures. The mean scores on the social scales, however, are quite high, suggesting that the panelists felt that they were as useful as the hybrid scales. The mean measure rating on the everyday living scales were often rated at least a point lower (the lowest being the RFS) when compared with the highest rated social and hybrid scales (SBS and QLS, respectively). Unlike the SFSs, there was little variability in the SDs of the mean scale ratings in this domain, suggesting that panelists may have possessed similar overall opinions of these scales.
The final panel ratings followed a trend similar to the preliminary panel ratings. The mean scale ratings of hybrid and SFSs were often higher than the everyday living scales, although less so than during the preliminary ratings. Ratings of the LSP remained exactly the same from pre to post panel ratings. The ratings of the SFS and ILSS only changed in one domain (practicality and symptoms, respectively) and only slightly so. Interestingly, the ILSS and LSP were the highest rated everyday living scales at the preliminary ranking and stayed so upon final rating. Overall, the QLS scored most highly in the final ratings over all constructs. The SFS and LSP scored the highest in their respective constructs. No particular domain appeared to have more variance than others during the final rating.
Following review of the final ratings, the investigators selected 2 scales from each of the classifications (hybrid, everyday living, and social functioning) to be used in the first validation study. The Heinrichs-Carpenter QLS and SLOF Scale were selected for validation. Although the Multidimensional Scale of Independent Functioning (MSIF) exhibited higher ratings than the SLOF at the conclusion of the RAND Panel, the investigators opted to use the SLOF because the RAND Panel noted that the MSIF had been used reliably with bipolar patients but lacked extensive data on patients with schizophrenia and that their ratings were based on these bipolar data. The 2 social functioning measures with the highest ratings by the experts, the Birchwood SFS and the SBS, were selected to represent that domain. Likewise, the LSP and ILSS were the highest rated in their construct and were chosen to represent the everyday living skills scales. Interestingly, the Multnomah Community Ability Scale was the most frequently nominated scale by the experts but was not the most highly rated by our panelists, some of whom had used this scale in their research. Failing to rate on popularity may show a lack of bias on the part of the panelists.
The results of this study reflect the current consensus in the field with regard to functional outcomes scales. No scale received a mean total score rating over 6 or below 4—suggesting that all current scales are viewed as moderately useful in their current versions, with some meeting minimal criteria for acceptability for use as currently configured. The origin of these ratings was not based on poor performance in previous studies. Rather, many of these scales lack critical data regarding basic reliability across raters and relationships with other elements of the functional outcomes construct, including NP and FC performance. Further, although each of the selected scales has evidence of sensitivity to RW milestones, such as independent living and social outcomes, many of the previous studies used very broad indices of these outcomes (institutionalized vs ambulatory) as the outcomes variables. Ratings for usefulness across multiple raters were also quite low, partly because many of these scales do not have alternate forms that attempt to capture the differing perspectives of different raters. The panelists’ consensus then indicates that those in the field of schizophrenia research have not yet determined an entirely effective measure of the RW outcomes component of the functional outcomes construct but that some measures are likely to be suitable in the interim. The VALERO Study will attempt to determine the best scale or compilation of subscales in order to create an RW functional outcomes measure that could serve as an outcome measure in future clinical trials concerned with improving cognition and functional disability in patients with schizophrenia. This first component of the VALERO project demonstrates the need for a scale that can score highly on all the domains investigated in this study and also serve as a practical and informative outcome measure in the area of schizophrenia research. The first phase of the VALERO Study will in fact directly examine the 2 most problematic aspects of this current group of scales: their temporal stability and reliability as well as the usefulness across multiple informants who report on the same person with schizophrenia.
National Institute of Mental Health (linked grants NIMH MH78775 to P.D.H., MH78737 to T.L.P.).
In the past 3 years, Dr Harvey has served as an advisor or consultant to: Astra-Zeneca Pharmaceuticals; Dainippon Sumitomo Pharmaceuticals America; Eli Lilly and Company; Johnson and Johnson, Inc; Merck and Company; Novartis Pharmaceuticals; Pfizer, Inc; SolvayWyeth Alliance; and the Sanofi-aventis group. Dr Harvey received grant or contract support from Astra-Zeneca Pharmaceuticals and Johnson and Johnson, Inc. All other authors report no other outside relationships. Articles in the reference section numbered from 20 to 32 describe nominated scales.