|Home | About | Journals | Submit | Contact Us | Français|
To address the need for brief, reliable, valid, and standardized quality of life (QOL) assessment applicable across neurologic conditions.
Drawing from larger calibrated item banks, we developed short measures (8–9 items each) of 13 different QOL domains across physical, mental, and social health and evaluated their validity and reliability. Three samples were utilized during short form development: general population (Internet-based, n = 2,113); clinical panel (Internet-based, n = 553); and clinical outpatient (clinic-based, n = 581). All short forms are expressed as T scores with a mean of 50 and SD of 10.
Internal consistency (Cronbach α) of the 13 short forms ranged from 0.85 to 0.97. Correlations between short form and full-length item bank scores ranged from 0.88 to 0.99 (0.82–0.96 after removing common items from banks). Online respondents were asked whether they had any of 19 different chronic health conditions, and whether or not those reported conditions interfered with ability to function normally. All short forms, across physical, mental, and social health, were able to separate people who reported no health condition from those who reported 1–2 or 3 or more. In addition, scores on all 13 domains were worse for people who acknowledged being limited by the health conditions they reported, compared to those who reported conditions but were not limited by them.
These 13 brief measures of self-reported QOL are reliable and show preliminary evidence of concurrent validity inasmuch as they differentiate people based upon number of reported health conditions and whether those reported conditions impede normal function.
In neurology clinical research, traditional outcome measures of disease status often fail to represent the full impact of disease and treatment. The patient's experience of disease symptoms, treatment side effects, functioning, and well-being—commonly referred to as health-related quality of life (QOL)—is often not included in a systematic evaluation of clinical benefit. Yet, the patient's experience of disease and treatment can be the key driver of treatment impact, acceptability, or value.1 While many QOL scales are available to the neurology clinical researcher, some have questionable validity or may be difficult to interpret. In addition, different instruments tend to be used in different neurologic conditions, rendering cross-disease evaluations of QOL burden or benefit impossible.2–6 Even within a given condition, there is seldom consensus on common measures, which impedes cross-study comparisons of relative disease burden, benefits of different treatments, or other factors.
In an effort to address these limitations, the National Institute of Neurologic Disorders and Stroke (NINDS) sponsored a multisite project to develop a clinically relevant and psychometrically robust QOL assessment tool for adults and children with neurologic disorders.7,8 This effort, Neuro-QOL, enables clinical researchers to compare the QOL impact of different interventions within and across various conditions. In this article, we summarize the development and validation of the first generation of 13 brief adult Neuro-QOL short forms for use in clinical neurology research.
To build Neuro-QOL item banks, we followed a series of steps designed to ensure clinical and psychometric validity. These steps included identifying the needs of the clinical research community,8,9 ensuring clinical and patient-driven evidence of importance and relevance of the selected QOL domains, and an expert consensus-based selection of priority conditions.10 We combined input from patient and caregiver focus groups11,12 with expert input and a literature review, to determine the QOL domains to include in Neuro-QOL.8 We then conducted large-scale testing to calibrate item response theory (IRT)–based13 item banks across physical, mental, and social domains of QOL.14,15 Each Neuro-QOL bank includes a large collection of items (questions and their response options) that have been evaluated and tested to ensure their relevance, clarity, fit with the concept being measured, and informativeness.15 This produced 13 adult QOL item banks (anxiety; depression; fatigue; upper extremity function–fine motor, activities of daily living; lower extremity function–mobility; applied cognition–general concerns; applied cognition–executive function; emotional and behavioral dyscontrol; positive affect and well-being; sleep disturbance; ability to participate in social roles and activities; satisfaction with social roles and activities; stigma).
Neuro-QOL item banks enable researchers to select or design static short form measures or to administer a dynamic computerized adaptive test (CAT).16,17 The Neuro-QOL CAT can be tailored to each respondent, selecting the most informative next question based on previous responses. In general, CATs provide the most precise estimate of patients' health status with the fewest number of questions as only the most informative items are selected iteratively.18 A second option is the use of short forms, or subsets of questions from the bank. Short forms can be of any length, ranging from 1 question to 20 or more. In each case, the score generated for the respondent is expressed on a common metric or scale. For Neuro-QOL, we report scores using a T distribution, with the mean of the reference population set to 50 and the SD set to 10 units. Based upon the samples required to obtain stable item statistics, Neuro-QOL T scores are anchored either to the general US population (designated as “GPT”) or to a clinical population (designated as “CT” scores).
All research activities reported here received Institutional Review Board approval and all participants provided informed consent.
The samples studied for this work are described in detail elsewhere.14,15 Briefly, we engaged 3 adult samples for item pool testing. The US general population sample included a total of 3,123 English-speaking and Spanish-speaking respondents recruited through an online panel company; data from the subset of 2,113 English speakers were used for calibrating item parameters and setting the central (average) location for each T score (mean = 50; SD = 10). This subset was 50% male (mean age = 52.7 years; SD = 15.5 years). These 2,113 participants were divided into 4 blocks of at least 500, with each block given 1 of 4 item pools: physical function, emotional health, social function, and cognitive function. A second clinical panel sample of 553 people with a physician-confirmed diagnosis of epilepsy, stroke, amyotrophic lateral sclerosis (ALS), multiple sclerosis (MS), or Parkinson disease (PD) was also recruited, by a different online panel company, to calibrate the stigma bank and disease-targeted scales (because such scales cannot be answered by people without a medical diagnosis). The clinical panel sample was 53% male with mean age of 56.2 years (SD = 12.8).15 Finally, a third sample of 581 outpatient neurology patients was drawn from collaborating neurologists around the United States and Puerto Rico, and included patients with physician-diagnosed epilepsy, stroke, ALS, MS, or PD. Comorbidity was not an exclusionary criterion. This sample, the clinical outpatient sample, was 46% male (mean age = 55.21; SD = 14.3). They were included in the initial item calibration study11 to obtain reliable calibrations for the physical function, applied cognition, and sleep item banks. Demographic details on each sample are provided in table 1.
We asked respondents in both online samples the following “yes/no” question: “Have you ever been told by a doctor or health professional that you have <condition/disease>?” We queried for 19 conditions: hypertension; chest pain; coronary artery disease; heart failure or congestive heart failure; heart attack; stroke; migraines or severe headaches; diabetes, high blood sugar, or sugar in urine; cancer (other than nonmelanoma skin cancer); depression; anxiety; alcohol or drug problem; sleep disorder; HIV or AIDS; spinal cord injury; MS; PD; epilepsy; and ALS. For each condition endorsed, we then asked: “Are any of your current activities limited by your <condition/disease>? (Yes or No). ” Respondents were then sorted into 3 groups by number of reported conditions (0; 1–2 conditions; 3 or more conditions), and all respondents who reported at least 1 condition were sorted again within the above categories into “yes” or “no” activity limitation.
We hypothesized that respondents who reported more conditions, and those who reported activity limitation, would have worse QOL.
Each short form was constructed using the same approach. Starting with item statistics generated from the IRT item calibrations (response category threshold and slope parameters),15 we ranked items by the amount of information they provided across the range of what was being measured (e.g., applied cognition). We also ran CAT simulations to identify items selected early in the procedure. Because the CAT algorithm weights information heavily in item selection, there was overlap between information ranks and CAT ranks, although some items were ranked highly in one but not the other criterion. Ten doctoral level clinical and measurement experts (3 neurologists; 4 clinical psychologists; 1 occupational therapist; 1 social worker; 1 neuropsychologist) then reviewed each candidate item for relevance and appeal based on item content only. Item performance (information and CAT rank order) was not shared with experts prior to their ratings. Experts identified their 5 most-preferred and 5 least-preferred items in the calibrated bank. Individual preferences of each rater were then presented along with item performance statistics.
With the above information tabulated, we (D.C., D.V., S.C., J.-S.L., C.N., D.M., N.R.) identified items with strong psychometric characteristic (IRT model fit; highly informative; selected early by CAT) and high appeal to clinical raters. We discussed marginal item choices (e.g., high clinical appeal but relatively weak psychometric performance) until we reached consensus regarding item inclusion in the short form. We also considered 2 other goals: 1 was respondent burden, so that if 1 of 2 nearly equal items had the same response options as the other selected items, it was selected. The other was inclusion of items from the Patient-Reported Outcomes Measurement Information System (PROMIS; www.nihpromis.org), if they were calibrated with other items in the Neuro-QOL bank and were not ranked very low by either of the sources of input. This extra step was taken to maximize the probability that Neuro-QOL can be linked (cross-walked) to the PROMIS item banks.
Using the samples described above, we produced T scores for each of the 13 short forms (note that sleep disturbance is an 8-item bank so no short version was developed). We computed Cronbach α coefficient to evaluate internal consistency, and correlated each short form score with the score derived from the full item bank (correcting for item overlap by removing short form items from the full bank prior to correlating scores). We evaluated the ability of each short form to 1) differentiate people who reported no health condition from those who reported 1–2 conditions or 3 or more conditions, and 2) differentiate people who reported activity limitation due to a health condition from those who reported no limitation. Finally, we compared the results of these 2 comparisons to results of the same comparisons using the corresponding full item banks.
We began with 13 item banks (table 2). Applying the approach described above, we developed 8- or 9-item short forms for banks that included more than 10 items. We then calculated internal consistency of each short form using Cronbach α coefficient, and evaluated correlations between each short form and its respective bank, using Spearman rank order correlation for ordinal data. However, because this association can be inflated by redundancy of items in both scales, especially with smaller banks, we also report Spearman correlations between short forms and banks after excluding items that comprised the actual short forms. These results can be found in table 2.
After establishing their internal consistency reliability and strong association with full bank scores, we examined the association of each short form with all other short forms. Our purpose was to ensure relatively higher correlations with related concepts (e.g., depression and anxiety, lower extremity and upper extremity physical function), and relatively lower correlations with unrelated concepts (e.g., depression and lower extremity function). These results are available in table 3.
To evaluate the validity of the 13 developed short forms, we tested their ability to differentiate subgroups of participants hypothesized to have poorer QOL. Using 13 separate one-way analyses of variance, we compared participants who reported 0, vs 1–2, vs 3 or more comorbid conditions. In 3 of the 13 comparisons (fatigue, emotional and behavioral dyscontrol, stigma) there were no respondents with “0” diagnoses reported, because this sample (the “clinical panel sample”) was selected on the basis of having a neurologic disorder (PD, MS, epilepsy, stroke, or ALS). In every analysis, a significant F value was obtained with mean differences between groups in the hypothesized direction (more conditions associated with worse QOL). Usually, the difference in mean score was greater between the 1–2 vs the 3 or more group than between the 0 and the 1–2 group (table 4).
Within each comorbidity group (1–2 and 3 or more), we next divided participants into those who reported activity limitation from their health condition vs those who did not. If a patient with multiple conditions reported activity limitation on only a subset of those reported conditions, he or she was counted as “yes” with regard to activity limitation from disease. The results of these comparisons, including t tests and effect sizes for group differences, are also reported in table 4. For each of the 2 comorbidity groups (1–2 and 3 or more), across all 13 short forms (i.e., 26 comparisons), the t test was significant. Effect sizes for the difference in score between those with vs without activity limitation from their health condition was moderate to very large (effect size range = 0.43–1.58). As a final check on the performance of the short forms relative to the full item banks, we plotted short form scores against full bank scores for every subgroup represented in table 4, and for the overall group. Correlations between short form and full bank scores ranged from 0.97 to 1.00 for each subgroup, and was 0.91 overall. This is plotted in the figure.
We report on the development and initial validation of 13 brief measures of QOL for adults with neurologic disorders. Each Neuro-QOL short form comprises a set of items that have been carefully selected from item banks to enhance estimation of a patient's health status. The length of each short form ranges from 8 to 9 items; each can be completed in less than 2 minutes by the typical patient.19 A profile of 6 selected domains, for example, would require approximately 10 minutes to complete. This compares favorably to other QOL instruments in common use, which can take as many as 20 minutes or more. Scoring look-up tables are available in appendix e-1 on the Neurology® Web site at www.neurology.org. Researchers also may design their own short forms by selecting items from the item banks. In that case, scoring and converting to the T-score metric can be done with direction also provided in appendix e-1. The short forms reported here provide a practical opportunity for multidimensional assessment in neurologic clinical research or practice. Over time and with accumulating publications, their use can be enhanced by increased interpretability with regard to the meaning of specific scores and score changes. For now, the interpretability of these scales surrounds the reference point provided by the T score. Specifically, the following 4 banks were referenced against a clinical neurology population: sleep disturbance, fatigue, emotional and behavioral dyscontrol, and stigma. T scores from the other 9 item banks are referenced against the US general population. Therefore, when interpreting sleep disturbance, fatigue, emotional and behavioral dyscontrol, or stigma scores, one should consider a score of 60, for example, to be 1 SD higher (worse) than the average of the clinical neurology sample described here and elsewhere.11 When interpreting scores on the other 9 item banks or short forms, one should consider the reference group to be the US general population. That same score of 60 would be 1 SD higher (worse) than the average US resident (rather than neurology patient) on anxiety or depression, and 1 SD higher (better) than the average US resident on upper and lower extremity function, applied cognition, positive affect and well-being, ability to participate in social roles and activities, and satisfaction with social roles and activities.
The Neuro-QOL measurement system is intended to be brief, reliable, valid, responsive, and consistent enough across the selected conditions to allow for cross-disease comparison, and yet flexible enough to capture condition-specific HRQOL issues. However, there are limitations in the current work which can be addressed in future research. First, the calibration samples were essentially samples of convenience, with most respondents recruited through Internet panel companies. The impact of this sampling strategy is likely negligible with regard to the integrity of the item statistics (“calibrations”), because what is most important for calibration is obtaining a full range of responses to items administered. When the general population sample did not provide sufficient responses in the most impaired response option (i.e., physical function, applied cognition, and sleep), we supplemented cases from the clinical outpatient sample to obtain stable item parameter statistics. However, the predominant use of an Internet panel sample raises questions about the generalizability of the results and the interpretation of T scores. Further research with populations that are not regular Internet users and those with limited reading ability will be important. In addition, it will be very important to evaluate use of these short forms with patients who have limited functional or expressive ability and with proxies. Finally, although we developed Spanish language equivalent assessments for all of these 13 QOL domains, they have not been formally tested or evaluated.
Standardized QOL evaluations such as Neuro-QOL can inform health care accountability, from patient care to health care policy. It does so by improving assessment of patient-reported outcomes and disease burden in neurologic diseases, increasing measurement consistency across neurologic clinical research, and offering a common metric to express burdens of disease and benefits of treatment. Over time, accumulated experience and published results with Neuro-QOL measures will support their use in a variety of applications, from clinical trial research to broader comparative effectiveness research, cross-sectional and longitudinal observational cohort studies, health care delivery observational and intervention studies, and population-based research.
The authors thank their project manager, Vitali Ustsinovich, MA.
Statistical analyses were conducted by R. Bode, S. Choi, J.-S. Lai, N. McKinney, and T. Podrabsky.
Dr. Cella has received research support from the National Institute of Neurological Disorders and Stroke (NINDS) contract number HHSN265200423601C. Dr. Lai reports no disclosures. Dr. Nowinski receives or has received research support from the NIH (contracts #HHSN265200423601C and #HHSN260200600007C) and Teva Pharmaceuticals. She has also received honoraria for writing an article for Medlink. Dr. Victorson holds stock options in Eli Lily and Company, received an honoraria for serving on the Steering Committee of the Reeve Neuro-Recovery Network, was funded by NIH contracts #HHSN265200423601C and #HHS-N-260–2006-00007-C and grants #R01HD054569–02NIDRR, #1U01NS056975–01, and #R01 CA104883, received support from the American Cancer Society (national and Illinois Division) for research in prostate cancer, received institutional support from NorthShore University HealthCare System for research in prostate cancer, received institutional support from the Medical University of South Carolina for sarcoidosis research, and received institutional support from the Northwestern Medical Faculty Foundation for urology research. Dr. Peterman receives royalties from an online entry in UpToDate; was supported by NIH Contract #HHSN265200423601C for the study on which this manuscript reports; and received institutional funding for research on spirituality and health. Dr. Miller has received research support from NIH contract HHSN26520043601C and TEVA NeuroSciences. She has received consulting fees from Biogen Idec and the Consortium of Multiple Sclerosis Centers. She serves on the editorial board of Quality of Life Research. Dr. Bethoux receives honoraria for consulting from Medtronic Inc., Biogen Idec, and IMPAX Laboratories. Dr. Heinemann receives salary support from a variety of federal research and training grants: Enhancing Quality of Prosthetic and Orthotic Services with Process and Outcome Information, National Institute on Disability and Rehabilitation Research (H133E080009); Rehabilitation Research and Training Center on Improving Measurement of Medical Rehabilitation Outcomes (H133B090024), National Institute on Disability and Rehabilitation Research; Midwest Regional Spinal Cord Injury Care System (H133N060014), National Institute on Disability and Rehabilitation Research; Midwest Regional Traumatic Brain Injury Care System, National Institute on Disability and Rehabilitation Research (H133A080045); Development of Quality Measures for Post-Stroke Rehabilitation, NIDRR/USDE. Dr. Rubin, Dr. Cavazos, and Dr. Reder report no disclosures. Dr. Sufit has served on Data Safety and Monitoring Boards for the NINDS and Pfizer Pharmaceuticals. He has received honoraria for speaking engagements from Hill-Rom. He has served as an expert in medical malpractice litigation. Dr. Simuni reports no disclosures. Dr. Holmes serves on the Advisory Board for Questcor Pharmaceuticals, Sunovion Pharmaceuticals, and Upsher-Smith Laboratories. Dr. Siderowf is supported by a Morris K. Udall Parkinson's Disease Research Center of Excellence grant from NINDS (NS-053488), and has been supported by SAP4100027296, a health research grant awarded by the Department of Health of the Commonwealth of Pennsylvania from the Tobacco Master Settlement Agreement under Act 2001–77. Dr. Wojna has received personal compensation for activities with GlaxoSmithKline as a speaker and research support from Biogen Idec. and is funded by NIH grants #2U54NS43011, #U54RR022762 (pilot study), and #S11NS46278. Dr. Bode, N. McKinney, T. Podrabsky, K. Wortman, and Dr. Choi report no disclosures. Dr. Gershon has received personal compensation for activities as a speaker and consultant with Sylvan Learning, Rockman, and the American Board of Podiatric Surgery. He has several grants awarded by NIH: N01-AG-6–0007, 1U5AR057943–01, HHSN260200600007, 1U01DK082342–01, AG-260–06-01, HD05469, NINDS: U01 NS 056 975 02, NHLBI K23: K23HL085766, NIA: 1RC2AG036498–01, NIDRR: H133B090024, OppNet: N01-AG-6–0007 (PI: David Cella). Dr. Rothrock receives royalties from the publication of “Evaluation of Health-Related Quality of Life” in UpToDate. She receives research support from the National Institutes of Health and previously the Center for Disease Control and Prevention. Dr. Moy reports no disclosures. Go to Neurology.org for full disclosures.