|Home | About | Journals | Submit | Contact Us | Français|
Neuropsychiatric symptoms (NPS) affect almost all patients with dementia and are a major focus of study and treatment. Accurate assessment of NPS through valid, sensitive and reliable measures is crucial. Although current NPS measures have many strengths, they also have some limitations (e.g. acquisition of data is limited to informants or caregivers as respondents, limited depth of items specific to moderate dementia). Therefore, we developed a revised version of the NPI, known as the NPI-C. The NPI-C includes expanded domains and items, and a clinician-rating methodology. This study evaluated the reliability and convergent validity of the NPI-C at ten international sites (seven languages).
Face validity for 78 new items was obtained through a Delphi panel. A total of 128 dyads (caregivers/patients) from three severity categories of dementia (mild = 58, moderate = 49, severe = 21) were interviewed separately by two trained raters using two rating methods: the original NPI interview and a clinician-rated method. Rater 1 also administered four additional, established measures: the Apathy Evaluation Scale, the Brief Psychiatric Rating Scale, the Cohen-Mansfield Agitation Index, and the Cornell Scale for Depression in Dementia. Intraclass correlations were used to determine inter-rater reliability. Pearson correlations between the four relevant NPI-C domains and their corresponding outside measures were used for convergent validity.
Inter-rater reliability was strong for most items. Convergent validity was moderate (apathy and agitation) to strong (hallucinations and delusions; agitation and aberrant vocalization; and depression) for clinician ratings in NPI-C domains.
Overall, the NPI-C shows promise as a versatile tool which can accurately measure NPS and which uses a uniform scale system to facilitate data comparisons across studies.
Neuropsychiatric symptoms (NPS) affect almost everyone with dementia (Lyketsos et al., 1999; 2000; Schreinzer et al., 2005; Steinberg et al., 2006). Although patients with dementia may have NPS at any disease stage, and the prevalence of NPS may increase with time (Lopez et al., 2003), certain NPS are more common in mild dementia (e.g. depression, anxiety, irritability, and apathy) (Feldman et al., 2004) while others are more frequent in later states (aberrant vocalizations, delusions, hallucinations and disinhibition) (Lopez et al., 2003; Mayer et al., 2006). Given their frequent occurrence, NPS are a major focus of study and treatment (Tariot et al., 1995; Steinberg et al., 2004; 2006; Sink et al., 2005; Lyketsos, 2007; Gauthier et al., 2010). NPS measurement is therefore a crucial aspect of dementia research, especially in clinical trials where change (or lack thereof) are key indicators of treatment effect (Lyketsos, 2007; Ballard et al., 2008).
The accurate assessment of NPS through valid, sensitive, and reliable measures is key to interpreting results from individual investigations. The use of uniform measures is important to the field’s ability to compare and subsequently interpret results across studies. Multiple NPS measures have been developed, each with strengths and weaknesses, with different measures being used in different studies. These differ in what they measure (i.e. symptoms within one particular NPS domain or several domains); scale properties such as the targeted informant (e.g. caregiver, clinician, patient); and rating approach (e.g. rating frequency alone, severity, other), all of which can complicate interpretation of findings, especially in clinical trials targeting specific NPS (Perrault et al., 2000). For example, Sink and colleagues (2005), in their meta-analysis of NPS-related randomized controlled trials, reported conflicting results among several trials of cholinesterase inhibitors and memantine attributable to use of several different scales and subscales. In addition, such differences can also increase the chance of Type I errors. A uniform measure that can be used in a range of different studies involving NPS is needed.
Commonly used NPS measures include the Alzheimer’s Disease Assessment Scale non-cognitive section (ADAS-noncog; Rosen et al., 1984) and the Behavioral Pathology in Alzheimer’s Disease Rating Scale (BEHAVE-AD; Reisberg, et al., 1987). The ADAS-noncog, one of the first measures developed for this purpose, consists of ten items rated for frequency by a knowledgeable informant. Although good reliability and validity have been reported (Weyer et al., 1997), the ADAS-noncog does not rate some common NPS such as aggressiveness and anxiety and includes other items not related to emotional behavior (e.g. tremors). The BEHAVE-AD is a 25-item questionnaire developed for pharmaceutical trials (Reisberg et al., 1996) in which a knowledgeable informant provides a severity rating for each item over the past two weeks. As with the ADAS-noncog, reliability and validity have been established. However, the scale’s brevity provides only limited insight into NPS occurrence and may not be effective in monitoring behavior change (Perrault et al., 2000).
For about a decade, the Neuropsychiatric Inventory (NPI; Cummings et al., 1994) and its slightly modified version the Neuropsychiatric Inventory Questionnaire (NPIQ; Kaufer et al., 2000) have been widely accepted as the standard measure of NPS in clinical trials for “anti-dementia therapies” and in several population-based studies in dementia and related disorders (Lyketsos et al., 2002; Geda et al., 2008). It is a brief and quick-to-administer questionnaire composed of a screening question and seven to nine items for each of 12 domains. A knowledgeable informant (usually a caregiver) indicates via the screening question whether the patient has experienced any domain-related NPS in the past month. If the screening question is validated, the caregiver is then asked whether each item within the domain has occurred in the past month and provides a global rating of frequency, severity and caregiver distress for all items in the domain at the same time (not item by item).
The NPI has established several important methodological approaches to NPS measurement. These include: (1) items that are behavior-based and observable, which facilitates report of frequency and severity of symptoms by a knowledgeable informant; (2) items that are grouped into domains with a screening question which enables quick completion and interpretation of results; (3) items that are specific to populations with dementia unlike other assessments that may apply to a nursing home population or to adults in general; and (4) standard ratings of domain frequency, severity and caregiver distress, unlike other assessments which may capture frequency or severity but not both.
Despite its many strengths, however, the NPI has certain weaknesses: (1) data are acquired from informants not directly from patients. As a result, neuropsychiatric data obtained from caregivers are susceptible to recall bias. Caregiver reports may be influenced by caregiver mood (e.g. he or she may be depressed), cultural beliefs (e.g. caregiver’s views of how he/she should appropriately respond or what is “normal” for older people to experience), denial (caregiver’s minimization of symptoms) and/or the caregiver’s education; (2) unknown reliability of ratings for individual items versus global domain ratings; (3) few items are specific to severe or mild dementia; (4) limited depth of items in specific domains such that individual domains cannot be used as a stand-alone measure in studies targeting individual NPS (e.g. depression, agitation); and (5) limited sensitivity to change when compared to measures that incorporate clinician judgment (Mayer et al., 2006).
To address limitations of the NPI and other measures of NPS, we have developed a revised and expanded version of the NPI, known as the NPI-C (“clinician rated”), which includes additional domains and individual items within various domains, and a clinician rating methodology. Here, we report on the development of the NPI-C and the results from an international validation study in which we examined inter-rater reliability and convergent validity for the NPI-C domains of apathy, delusions and hallucinations, agitation, aberrant vocalization and depression. We hypothesized a strong correlation between raters (intraclass correlations: ICC > 0.50) and moderate to strong convergent validity (r ≥ 0.40) between the NPI-C domain and a validated outside measure. We anticipate that the NPI-C will provide a single versatile method of rating for NPS that can be used in a range of clinical studies as either a “broad spectrum” measure of several types of NPS or as a domain-specific measure when a study calls for in-depth monitoring of a limited set of domains, such as apathy or agitation.
The NPI-C was developed and validated by an international collaborative group of researchers in this area, led out of The Johns Hopkins University by two of the authors (KdM and CGL), working closely with the original developer of the NPI (JC). The foundation of the new instrument is the original NPI, whose items and domains have been expanded as described below, and whose rating approach has been modified to include a clinician rating methodology. Additionally, several translations have been developed to make the new instrument available for international studies. Translated versions were developed using the language-specific validated NPI translation for original NPI items and translation/back translation for newly added NPI-C items. The NPI-C languages include English, French, Greek, Hungarian, Italian, Portuguese and Spanish.
Expansion of the domains was based on comprehensive review of 19 existing NPS measures. A master table of all symptoms was constructed and items were sorted based on existing NPI domains or on face similarities (e.g. vocalization) if an applicable domain did not exist. Items not corresponding to observable NPS and duplicate items were removed. A total of 78 new items were added to the nine domains of the NPI (see Table 1). No additional items were added to three NPI domains: hallucinations, delusions and elation. The face validity for the new structure and items was obtained through a Delphi panel of eight experts in dementia research (CL, FT, JK, PR, SG, HB, FS and Jacobo Mintzer, Professor of Neurosciences and Psychiatry at the Medical University of South Carolina) through an in-person meeting and email correspondence. (For the list of items, see Appendix A2 published as supplementary material online attached to the electronic version of this paper at http://www.journals.cambridge.org/ipg.) For the apathy domain, new items were chosen to fit as well as possible with the new diagnostic criteria for apathy reported by Robert and colleagues (2009).
We made two changes to the existing NPI domains based on prevalence of specific NPS. The reported prevalence rates for agitation, measured separately from aggression, range from 28% to 53% (Brodaty et al., 2001). “Aggression” alone has been estimated to occur in 15%–20% of people with dementia (Lyketsos et al., 1999). Consequently, the NPI-C separates the NPI domain of “Agitation/aggression” into two separate domains, with new items added to each.
The domain “Aberrant vocalization” was added to capture symptoms most evident in advanced dementia. Although the prevalence of disruptive vocalization in long-term care residents ranges from 11% to 31% (Dwyer and Byrne, 2000), only one symptom related to vocalization is present in the NPI. Eleven items representing various types of vocal behavior were included in this domain after review and approval by the Delphi panel.
There are two important rating changes in the NPI-C. The first is item-by-item scoring. In the NPI, a knowledgeable informant provides a global domain rating for frequency, severity, and level of caregiver distress related to the group of items within a given domain. The domain score is the product of the ratings for domain frequency and severity. In the NPI-C, ratings for frequency, severity and distress are provided for each item and summed to create a total domain score. Item scores are better suited to clinical trials where assessment of change with finer detail is needed.
The second important rating change is the use of ratings based on expert clinical judgment using a “LEAD” standard (longitudinal data, expert rater, all data) to make severity ratings for individual items (Wilberg et al., 2000). In the clinician rating approach, the knowledgeable informant is first asked to provide frequency, severity and distress ratings for items as described above. Separately, the rater also interviews the patient. The importance of involving the patient in ratings of NPS in individuals with mild or moderate dementia cannot be overemphasized. Even if the patient lacks the ability or insight to describe experiences of NPS accurately, the interview gives the clinician rater an opportunity to compare the knowledgeable informant’s insights to the patient’s perceptions.
To rate various item responses, the clinician can ask for additional details during either interview and may consult other sources of information such as the patient’s chart or other caregivers familiar with the case in order to provide an overall severity rating for a given domain item. Such an approach has been successfully implemented in large multi-center clinical trials to rate depression in dementia using the Cornell Scale for Depression in Dementia (CSDD) where clinician rated CSDD demonstrated sensitivity to change in depressive symptoms that were superior to the NPI (Mayer et al., 2006). Clinical judgment also becomes important in distinguishing domains which may seem similar to the non-clinician, such as apathy and depression, but which can be distinguished by a knowledgeable clinician (Starkstein, 2000). The clinician rating approach reduces bias from knowledgeable informant interviews in which their own experience with depression, cognitive decline, or other factors may affect their ability to report symptoms accurately in the patient.
As part of the development of the NPI-C, the authors developed a training methodology for clinicians rating this measure. An instructional DVD which features complete NPI-C interviews with dyads and an accompanying workbook were used to illustrate how to obtain additional information from informants and patients effectively, and presented strategies to reconcile conflicting information in order to provide an overall severity rating for each item. A completed NPI-C worksheet based on the DVD interviews allowed the trainee to see the actual scores produced from the interview. A brief discussion with the training rater also provided detailed explanations regarding rating choices.
Ten sites from eight countries participated in the cross-sectional validation study. Sites were located in Argentina, Brazil, Canada, France, Greece, Hungary, Italy and the U.S.A. (three sites). A total of 128 pairs of participants (knowledgeable informants/patients) in three categories of dementia severity (mild, moderate and severe) were recruited. The majority of patients (79.5%) lived at home with a spouse or child. The remaining resided in various care institutions (e.g. assisted living, nursing homes, residential hospitals.) In addition, most knowledgeable informants were family members (46.4% spouses, 35.7% children, and 8.9% “other” relative). Only 9% were professional caregivers. All patients met criteria for probable Alzheimer’s disease (McKhann et al., 1984). Inclusion criteria for knowledgeable informant participants were: ability to comment accurately on NPS in the patient over the past month, and verbal contact with the patient at least three times per week during the past three months. Inclusion criteria for patients were: presence of a knowledgeable informant and a medical diagnosis of probable Alzheimer’s disease. All sites obtained approval from their respective ethics review boards prior to the start of the study. Written informed consent was obtained from all participants.
Dementia severity was determined by the Mini-mental State Examination (MMSE; Folstein et al., 1975) and the Global Deterioration Scale (GDS; Reisberg et al., 1982), both of which were used to rate all patients. Mild dementia was defined by an MMSE 18–26 and GDS 0–3, moderate dementia by MMSE 10–17, GDS 4–5, and severe dementia by MMSE < 10, GDS ≥ 6, which correspond to the commonly accepted MMSE range in clinical trials (Lopez et al., 2003; Perneczky et al., 2006). In cases where either MMSE or GDS score did not fall within the dementia severity range, the rating clinician determined the appropriate severity category.
The NPI-C is to be administered by an experienced clinician. Of the 21 raters who participated in the study, there were 11 physicians, 2 research nurses, 7 researchers with a master’s degree or higher, and 1 clinical social worker. Seven sites attended in-person training at Johns Hopkins in September 2008, and were provided with training notebooks and an NPI-C training DVD. The other three sites (Argentina, Hungary and Italy) were trained from a distance through the training notebooks and DVDs that were used at the in-person meeting, along with email support from the lead investigators.
To estimate the inter-rater reliability of the NPI-C, and to compare its ratings to those of the NPI, each informant/patient dyad was interviewed at different times (no later than one week apart) by two independent, trained raters. Both raters completed the NPI (Cummings et al., 1994) and the NPI-C. All questions for the NPI and NPI-C were asked regardless of the caregiver’s response to the screening question. We note that given the need to ask all NPI-C questions for the purpose of the validation study, administration times would not reflect “real world use” where entire domains could be omitted based on response to the screening question. Administration times were therefore not recorded. When administering the NPI-C, raters first asked caregivers to rate the frequency and severity of each item in each domain. Patients were also interviewed. Although caregivers also rated their distress level for each item indicated, data on distress were used to inform clinician rating and were not part of the study analysis. Clinicians then rated severity (0 to 3) for each item based on caregiver and patient interviews and any additional clinical information. Caregivers were also asked to rate their level of distress for each item indicated. This information was used to inform the overall clinician rating for each item, however, and was not used as a stand-alone rating for this study.
To estimate the convergent validity of specific domains of the NPI-C, one of the two raters also administered four additional measures: the Apathy Evaluation Scale (AES; Marin et al., 1991); the Cohen-Mansfield Agitation Index (CMAI; Cohen-Mansfied et al., 1989); the Cornell Scale for Depression in Dementia (CSDD; Alexopoulos et al., 1988); and the Brief Psychiatric Rating Scale (BPRS; Ventura et al., 1993).
Inter-rater reliabilities for clinician ratings per domain were determined by estimating intraclass correlations (ICC) (form (1,1); Shrout and Fleiss, 1979). New NPI-C items (but not original NPI items) with ICCs <0.5 were removed prior to analyses of convergent validity. Pearson correlations between four NPI-C and NPI domains and their respective validated and established measures of the corresponding NPS were used to determine convergent validity.
128 patient/caregiver dyads completed NPI-C interviews with two raters for a total of 256 observations. Table 2 includes descriptive statistics of the sample by dementia severity for selected demographic characteristics and scores on the MMSE, GDS, AES, BPRS, CMAI, CSDD. Total cases by dementia severity for the sample were: 58 mild, 49 moderate and 21 severe. Total dyads by language were: 37 English, 16 French, 27 Greek, 5 Italian, 10 Hungarian, 15 Portuguese and 18 Spanish. 83 patients were married, 18 widowed, 10 divorced and 18 never married.
Inter-rater reliabilities based on clinician ratings of each NPI-C item were generally moderate to strong (Table 3). Eight existing NPI and nine newly added NPI-C items (this includes three items in the new domain “aberrant vocalizations”) had ICC values <0.50. For seven items ICCs could not be calculated due to lack of variability. Regardless of ICC value, original NPI items were not removed prior to convergent validity analysis. A total of 17 new items with missing ICCs or ICC <0.50 were removed, leaving a total of 142 NPI-C (61 more items than in the original NPI). Table 1 includes a count of total items per domain for the NPI, the initial NPI-C and the NPI-C after low ICCs were removed. It also includes the total percentage of items per domain for the NPI and NPI-C that were indicated (i.e. had a non-zero rating).
Table 4 includes Pearson correlations and confidence intervals for four NPI-C domains and their corresponding outside measure. Correlations for the sum of clinician ratings of NPI-C domains and original NPI domain scores (global rating of frequency × severity) are also presented. Correlations for all NPI-C domains and corresponding assessments were moderate to strong, with “depression/CSDD” having the strongest correlation (r = 0.61). Except for the correlations between “delusions” and “hallucinations” and the psychosis items of BPRS, where there was little difference between the clinician rating (0.60) and the sum of the two global domain caregiver ratings (0.56), the clinician ratings had stronger correlations with the validated measures. This could be attributable to the addition of new items, the clinician rating method, or both.
Of special interest is the difference in strength of correlation between CMAI and the NPI-C agitation domain (r = 0.40) and the traditional NPI global rating of agitation items (r = 0.19). The correlation strengthened for the NPI and CMAI when the global ratings of “agitation” and “aggression”, which comprise one domain on the NPI, were summed (r = 0.31; CI: 0.14 to 0.48). The correlation strengthened considerably for the NPI-C when the global ratings of “agitation” and “aberrant vocalization” were summed (r = 0.60). The rationale for this is that the CMAI includes items about verbal aggression. “Aberrant vocalization” was added as a new domain to the NPI-C and is not found in the NPI.
In this paper, we report on the development of a new state-of-the art instrument to measure NPS in people with dementia and the results from an international validation study. The NPI-C capitalizes on the existing strength of NPI and has rectified the weaknesses of NPI. Overall, the NPI-C is responsive to shortcomings in NPS measures and has the potential to function as a single tool that could be used in clinical trials either as a “broad spectrum” assessment across many domains or for “in-depth” monitoring of a limited set of domains (Lyketsos, 2007).
We first note that nested within the NPI-C are the original NPI items. This provided us with the opportunity to examine ICCs and convergent validity using clinician ratings for this widely used measure. Twelve original items had ICC values <0.5, the majority of which (n = 5) were in the domain of hallucinations. One item (item 4) in hallucinations lacked variability; three (items 5–7) had an ICC value of 0.00. This discrepancy may point to conflicting responses by caregivers and/or patients regarding the occurrence of various types of hallucination-related behaviors and/or relative infrequency of them, in which any small discrepancy would affect the ICC value. Other NPI domains with low ICC items were delusions (item 7); elation (item 5), disinhibition (item 4), aberrant motor (items 4 and 6), and sleep disturbances (item 6).
For the NPI-C items that were not part of the original NPI, the domain with highest number of items <0.50 was “aggression.” Five items lacked any variability, suggesting their occurrence may be too rare to be worthwhile. Three new items in “agitation” also had ICC values <0.50. Overall, a total of 17 items were removed (see Table 1). For the remaining items, ICCs were moderate to strong. Although three items were removed in the new domain of aberrant vocalizations (ICC = 0.23), the eight remaining items had values ranging from 0.70 to 0.96. This suggests that the new domain may provide useful and previously unreported information on this behavior, which is relevant for people in moderate to severe dementia stages.
Convergent validity was moderate to strong for the four domains and their corresponding measures. “Apathy” had the weakest correlation (r = 0.31) for the NPI-C. We note that the correlation for the NPI global domain rating was much lower (r = 0.22). One factor possibly contributing to the weaker correlation is the absence of data from the Brazil site. Inconsistencies in the translation of the AES necessitated the exclusion of data from this site (n = 15). Although Camozzato and colleagues (2008) reported reliability data for the Brazilian Portuguese version of the NPI, they did not examine validity, specifically whether the cultural interpretation of apathy-related items were similar in Brazilian caregivers as in other languages. It is interesting to note that Camozzato et al. found higher scores of severity and distress in the apathy domain than in any other NPI domain.
The strength of correlation between “agitation” and the CMAI for the NPI-C was moderate (r = 0.40), but increased substantially when the domain of aberrant vocalization was added (r = 0.60). The correlation was very weak, however, for the domain rating of agitation in the NPI (r = 0.19). Although the correlation strengthened for the NPI with the addition of aggression (r = 0.31), it was still not as strong as the NPI-C agitation domain alone and the CMAI. This points to the potential usefulness of the NPI-C agitation domain as a “stand alone” measure in trials. Another domain that shows strength as a stand-alone measure is the depression/dysphoria domain of the NPI-C (r = 0.61), which showed significant improvement over the NPI for the same domain (r = 0.31).
At this stage of the scale development, the NPI-C has not yet been incorporated into any clinical trials and its sensitivity to change is unknown. The raters in the current study were experienced in dementia research and in the assessment of NPS, and came from different clinical and research backgrounds. They included nurses, physicians, gerontologists, social workers and others with research expertise. The performance of the NPI-C in multicenter trials including raters of varying levels of expertise requires further investigation. Due to small samples sizes at each of the site, we are not able to assess reliability across languages and sites at this time but will address this shortcoming in future studies.
Overall, the study results demonstrate the utility of the NPI-C as a measuring tool of NPS in clinical trials. There are several notable advantages to this measurement approach. The NPI-C allows the flexibility of simultaneously administering the NPI. Since original NPI items are included, researchers can record NPI-C data in addition to NPI scores, which will facilitate cross-trial and site comparisons. Several NPI-C domains also show promise as stand-alone measures, which will also facilitate study comparison and eliminate the need to include other outside measures. This can improve uniformity in study design and reduce error and administration time. The availability of the NPI-C in several languages through this validation study is another added benefit.
Overall, the NPI-C is a universal tool that can accurately measure several NPS. It uses a uniform scale system, which will facilitate data comparisons across studies. The NPI-C may be extremely useful in several settings, including clinical trials, observational studies, and potentially in clinical practice as well.
The research reported in this paper was partially funded by an educational grant from Forest Pharmaceuticals, Inc. (Dr. de Medeiros) and grant P50AG005 from the Johns Hopkins Alzheimer’s Disease Research Center (Dr. Lyketsos). Dr. Geda was supported by NIMH (K01 MH068351) and Harold Amos Scholar (RWJ foundation) grants. Dr. Leoutsakos was supported by the National Institute of Aging (RAG031348Z NIH/NIA). Dr. Robert received support from Lundbeck France. Dr. Taragano was supported by Lina Esevich grant #310618 to the CEMIC University Hospital Dementia Research Unit.
Conflict of interest
Description of authors’ rolesK. de Medeiros and C. Lyketsos co-designed the study, supervised the data collection and wrote the paper. P. Robert, S. Gauthier, F. Taragano, J. Kremer, A. Porsteinsson, Y. Geda and F. Stella contributed to the study design, participated in the Delphi panel, supervised data collection at their sites and assisted with writing the paper. H. Brodaty contributed to the study design, participated in the Delphi panel and assisted with writing the paper. A. Politis, A. Brugnolo and G. Gazdag supervised data collection at their sites and assisted with writing the paper. J. Leoutsakos contributed to the statistical design and data analysis and to writing the paper. J. Cummings assisted with study content and design.