Identifying criteria for the acceptance of neurology HRQL measures
An early task was to gain understanding of what the neurology research community required in an HRQL measure in order to be interested in using it. This involved identifying objective criteria that should be met by the system. It also included an evaluation of investigator attitudes and beliefs that might need to be addressed in order to facilitate adoption. Since little is known about the factors influencing the use of HRQL measures in neurology, we modified an existing survey originally developed to examine use of HRQL data in oncology practice,1,2
and used it to gather empirical information about the perspectives of neurologists and affiliated professionals regarding HRQL and HRQL instruments.
Drawing names from our consultant pool, a list of NINDS reviewers and grantees, and members of the American Academy of Neurology and the American Congress of Rehabilitation Medicine, we submitted a request for information to 719 neurology professionals. We received 103 responses (14%), with complete data available for item-level analysis on 89. The 89 responders reported a median age of 51 (33–89), were primarily male (70%), had practiced a median of 22 years, with the largest proportions coming from the professions of Neurology (47%) and Physiatry (15%). Sixty-seven (78%) experts saw only adult patients, 9% saw only pediatric patients, and 13% saw both. The vast majority (93%) had experience as an investigator in a clinical trial and reported having used HRQL measures (54%).
Sixty-six respondents provided qualitative data indicating HRQL measures should: 1) possess satisfactory psychometric properties (50% of all respondents); 2) be easy to administer and use (50%); 3) contain content reflecting the patient perspective and the diversity of symptoms and HRQL domains impacted by neurological disorders (27%); and 4) be clinically relevant and directly applicable to patient care (17%). Factor analysis of quantitative responses revealed two major perspectives (which we labeled Enthusiasm and Reluctance) that reflected positive or negative viewpoints toward HRQL. A median split on the enthusiasm and reluctance scales created four separate groups: high enthusiasm, low enthusiasm, high reluctance and low reluctance. Cross tabulations on these groups revealed four distinct patterns of respondents: enthusiastic (high enthusiasm/low reluctance; n= 25); reluctant (high reluctance/low enthusiasm; n=33); uncommitted (low reluctance/low enthusiasm; n=14) and reluctantly enthusiastic (high reluctance/high enthusiasm; n=17. Using a general linear model and Scheffe’s post-hoc tests, we compared these four groups to determine the nature of any differences.
When compared to other groups those who were enthusiastic believed that HRQL can be objectively measured (p=.01) and reported finding HRQL data more helpful in understanding their patients (p<.001), and useful in changing their practice (p=.001). Compared to other groups, reluctant respondents preferred focusing on clinical care over HRQL issues (p<.001). The uncommitted and reluctantly enthusiastic groups were more likely to report willingness to use HRQL measures if they could be shown to be clinically relevant (p<.01). Finally, reluctantly enthusiastic respondents were most likely to acknowledge that HRQL confirms clinical experience (p<.01) and say that their use of HRQL measures would increase if they were easier to understand.
Taken together, these survey data suggested that incorporating those criteria identified from qualitative review, and in particular, ensuring that the Neuro-QOL system is clinically relevant and useful, easy to understand and to use will help support those who already feel generally positive toward HRQL measures and could help persuade those who are uncommitted or outright reluctant to use HRQL instruments.
Selection of target conditions
A key element of the Neuro-QOL development strategy was the selection of the pediatric and adult conditions that would be used to test the assessment platform. We understood that this selection process needed to be inclusive and transparent, with significant input from the neurological research community. We intended to include neurological conditions that manifest across the normal human life span and had varying rates of morbidity and mortality. Results from each stage of this multi-step process are reported in .
Expert Nominations and Rationale for Selection of Neurological Conditions as Research Priorities
The first step in the condition selection process involved an extensive literature review of neurological conditions in MEDLINE, PUBMED, Science Direct and Wiley Inter-science from 1996 to 2005 (when the review was completed). The search was conducted using combinations of key words including HRQL, neurological disorders, measurement issues and known disease-specific characteristics. This literature review was synthesized to identify conditions by their time of typical on-set, common health related quality of life concerns as well as disease-specific concerns and the likely impact of the condition on normal life span. Independent of this literature review, interviews were conducted with 44 experts in neurological disorders and/or health related quality of life to obtain their opinion about the 5 neurological conditions for which they felt it was most important to assess HRQL (see ). They were not asked to specify whether they were nominating pediatric or adult conditions.
An expert consensus panel composed of 13 pediatric and adult neurology experts from across the country was convened in March, 2005, to establish and apply a set of criteria for selecting, per the NINDS contract, 5 adult and 2 pediatric conditions on which to build Neuro-QOL. After reviewing the results of the literature review and recommendations from the 44 individual expert reviews, members of this panel established criteria for selecting the 7 conditions which included: prevalence, individual impact, effective treatments, multiple domains affected, chronicity, and likelihood of HRQL change. Before the close of the consensus meeting, the panel nominated 5 adult and 2 pediatric conditions. An additional source of expert consultation was obtained when the results of the consensus meeting were presented to the American Academy of Neurology (AAN) for their comment. The recommended conditions from each step (interviews, consensus meeting and AAN) are presented in .
A final review of the recommended conditions was conducted with the NINDS staff and was reconciled with their historic grant portfolio. The final set of diseases, including their basis for inclusion, is presented in .
Bank and Scale Development
Identification of HRQL Domains and Sub-Domains
The next step in our process was to determine which areas of HRQL to assess with the Neuro-QOL measures. We identified domains through multiple methods and data sources including a literature review, expert interviews, patient and caregiver focus groups and a keyword search.
First, we identified domains by completing an extensive Medline literature review of 24 major neurological conditions using key words such as health-related quality of life (HRQL), specific names of neurological disorders, measurement, as well as disease-specific characteristics, from 1996 to the present. This literature review summarized major neurological disorders and their impact upon HRQL, beginning with those typical to childhood onset followed by those most common in adults and advancing age. From this review, our initial list of domains included: emotional distress, perceived cognitive functioning, social functioning, physical functioning, fatigue, pain, communication/language difficulty, positive psychological functioning, sexual functioning, bowel/bladder function, sleep disturbance and personality/behavioral changes.
We obtained expert input through two waves of expert interviews (n=44 and n=63 experts) and through the previously mentioned Request for Information (n=89) (see ).
Expert Background and Experience
Experts were asked to identify domains or areas of HRQL that are affected by neurological disorders and their treatments. Experts were informed that their responses could include important symptoms (e.g., pain), areas of function (e.g., mobility), or anything else that was deemed important to consider when thinking of the people with neurological disorders. Experts were first asked to list all the domains they believed would be important to cover in an HRQL questionnaire that could be given to patients with neurological disorders (i.e., general and disease-specific). After that, they were asked to list domains that might be important in one of the disorders they named previously, but that weren’t necessarily common to all disorders. During the individual interviews, experts provided greater depth and elaboration of content for given domains. For example, when the domain Physical Function was mentioned, experts may have elaborated further by mentioning activities of daily living, balance, fine motor skills, gait, hemiparesis, etc. Overall, these interviews confirmed domains that had been identified from the literature review and they also revealed the following new areas: behavior/personality change, driving, memory, attention, executive function, aggression/irritability, psychotic symptoms, meaning/spirituality and mastery/control.
Patient and Caregiver Focus Groups
We conducted eight focus groups with patients (total n=64) and three with caregivers (total n=19) to assess the impact of neurological conditions on HRQL domains. We began with broad questions, such as what do you think of when I say the phrase “quality of life” or “how has your life been affected by X condition?”, allowing participants to freely list responses on their definition of quality of life as it relates to their health. We then progressed to questions regarding specific domains, such as physical function, emotional function, social aspects, and treatment effects that have been shown to be relevant in the literature. The previously mentioned focus groups with caregivers of Alzheimer’s disease, stroke, and pediatric epilepsy patients were also conducted to gather important proxy perspectives from caregivers. Responses were qualitatively analyzed using NVivo software to determine the frequencies of each domain and sub-domain per diseases.3
Key Word Search
Because new domains arose from these different sources, we also conducted a comprehensive keyword literature search (from 1996 to 2005) using the OVID search engine with previous and newly identified domains and Neuro-QOL diseases to best estimate the number of published studies in a given area. We used these approximate totals to provide an overall quantification of how important certain domains were within different neurological conditions (see ).
Example of Keyword Literature Search
Selection of HRQL Domains and Sub-Domains
After identifying the range of important domains and sub-domains, we selected the most important areas for item bank development. Working groups were formed for each of the seven Neuro-QOL conditions (stroke, adult epilepsy, ALS, Parkinson’s disease, multiple sclerosis, muscular dystrophy, and pediatric epilepsy). Each group reviewed all data sources and extracted the most frequently-named and most relevant domains for item bank consideration.
Each source of data was analyzed using largely qualitative approaches. This process primarily entailed identifying and coding content derived from the previously described data sources. These codes were converted into percentages, which were calculated as the number of times a particular theme or code was applied over the total number of all codes applied from each data source. For example, using this approach it was possible to understand how frequently physical function was mentioned in ALS, within the context of all other domains that were mentioned for ALS. This permitted a greater understanding of occurrence (and by association, importance) of certain domains either across all conditions or as a unique aspect of one disease. Frequent comparison to the literature and other sources of informant data were applied to enhance the data collection process.
Within each disease, domain percentages were calculated and recorded on a chart that was populated by information obtained from the various sources mentioned previously. For the expert input, to minimize experimenter demand and acquiescence biases, we included only the open-ended, spontaneously generated expert responses (vs. information experts suggested only after being asked to elaborate on a specific domain we provided them). If a domain was mentioned across all five data sources (e.g., literature review, 3 types of expert input, focus groups, key word search), it received a score of “5”; if it was mentioned across four data sources, it received a score of “4”, and so on. These 0–5 counts were then compared across diseases. If a domain was counted as ≥3 on at least 50% of the diseases (e.g., 4/7 diseases) it was considered to be a generic concept. Targeted domains were those that summed ≥2 in at least one domain, but were not necessarily prevalent across the majority of diseases. In the event that certain disease specific domains “tied” either within or between conditions, we consulted our expert panel for their input. See for generic and targeted domains. After reviewing the findings of this comprehensive identification and selection process, the generic domains that were chosen for item bank development were: Physical, Social, Emotional and Cognitive Function.
Domains and Importance Scores Across Diseases
Next, we identified domain co-chairs from the Neuro-QOL Executive Committee and co-investigator panel. Each co-chair team was assigned a domain from the four generic domains previously selected and one pair was assigned to oversee the targeted domains. Each dyad was charged with reviewing the aforementioned data sources and extracting the most relevant subdomains for item bank consideration. Due to funding restraints, a decision was made by the Executive Committee to develop and test up to three targeted banks, and develop but not test others, thus providing future investigators with item pools that could be subsequently advanced. Frequent checks back with NINDS to keep the project anchored to the original scope afforded us useful feedback regarding relevance, vis-à-vis the original purpose of the project, which was to create psychometrically robust patient reported outcomes of HRQL that could be used by neurology clinical trials researchers. Data were analyzed using the approaches described below.
Using data from expert interview domain elaborations, we calculated the percentage of times a particular code was applied within a domain. This helped us estimate which codes might carry additional importance for a particular domain within a disease based on how often they were discussed among experts. The total number of applied codes was tallied both across and within conditions. The number of applied codes across conditions was used to determine which diseases shared similar codes relative to one another as well as which codes were unique to a particular disorder. If an issue was present across a majority of diseases, it was labeled as generic. The following generic sub-domains were selected for item bank development in adults: Physical (Self-care/Upper Extremity, Mobility/Ambulation), Social (Role Participation, Role Satisfaction), Emotion (Depression, Anxiety, Positive Psychological Function), Cognitive (Perceived, Applied). In pediatrics, the following generic sub-domains were selected for item bank development: Physical (Self-care/Upper Extremity, Mobility/Ambulation), Social, Emotion (Emotional Health, Stigma).
Based on feedback from experts, as well as considering the complexity of issues surrounding these conditions, we decided to develop and field test one (1) targeted scale per condition, and also develop (but not field test) additional targeted scales as indicated by the unique circumstances of each condition. To determine which scales would be field tested, we summarized and examined data from our data sources in which domain elaboration were available. Using these data we made preliminary decisions regarding which targeted scales should be developed, and for which disease(s). This led to the identification of a select number of candidate domains, which were presented to disease specific experts involved in the Neuro-QOL study. Because the targeted domains presented to experts varied by disease (e.g., adult epilepsy experts were asked to rank fatigue, pain, bowel and bladder and stigma, while Parkinson’s experts were asked to rank sleep, sexual function and personality/behavioral changes) it was not possible to rank each using the same denominator, but rather to examine each disease group individually. Using these expert rankings, focus group frequency counts, and the total number of coded targeted domain issues within each disease, we identified our candidate targeted scales to develop and field test per disease, as well as additional targeted scales for development only (see ).
Targeted Scales for Development and Field Testing
When reviewing this data to make targeted scale decisions, we referred to the total number of codes by disease as a rough indicator to determine which diseases are comparatively more affected by certain issues in a given domain. When applicable, we gave greater importance to domain-condition relationships when there was an approximate and sizeable difference between total codes among conditions. For example, in , ALS, MD, MS and PD all appear to have greater numbers of bowel and bladder issues that were coded, compared to adult/pediatric epilepsy, and stroke.
Identifying and selecting existing items
For each of the domains and sub-domains selected as a critical part of the HRQL universe for neurological disorders, large pools of relevant items were identified from a variety of sources. An extensive, iterative process took place with the goals of obtaining comprehensive coverage of each content area, then selecting a “best set” of items for field testing.
Candidate items for the generic item banks and targeted scales were identified from our existing item banking projects and affiliated studies, Rasch analysis of several large external datasets, and additional generic and disease-specific questionnaires that have been used in neurological conditions. Permission from outside principal investigators and primary scale authors was obtained for the latter two activities. These data were evaluated by examining the content and dimensionality of the constituent items in these preliminary banks.
From these various data sources, a centralized Neuro-QOL Item Library was created. Over 3,000 items were entered into this Library according to elements such as item order, context, time frame, item stem and response options. An extensive “binning” and “winnowing” process was then undertaken. This iterative, multi-step process involved at least three domain experts. Two of these independent raters worked collaboratively to assign items to “bins” according to primary domain. After this, a third rater reconciled any discrepancies. As the number of items (many redundant) was quite large, all items were reviewed to determine if they should proceed through detailed item review/revision/testing. Items were then grouped together according to each domain’s hierarchy of sub-domains, factors and facets. Once all items were assigned to a domain, content experts “winnowed” (i.e., systematically removed) items from item pools. Items were removed for a variety of reasons, including semantic redundancy, availability of a superior alternative, inconsistency with domain definition, wrong domain assignment, vague or confusing language, gender inappropriateness, narrow applicability, and likelihood of problems in cultural/linguistic translation. Remaining items were then reviewed by two Neuro-QOL investigators and several outside content experts. Most items needed revision for general consistency across banks. Re-writing or generating new items was done to assure comprehensiveness in measuring the domain; clear, understandable and precise language; and ease of translation.
Qualitative item review and cognitive interviews
The comprehensive item pool for each HRQL domain was then subjected to a qualitative item review (QIR) process. Similar to scale development processes, item preparation through QIR creates new items and adapts existing items based on two key sources: expert opinion (expert item review; EIR) and patients/potential research participants (cognitive interviews). Our previous expert interviews and patient focus groups helped provide input to conceptual gaps in the domain definitions, which led to the identification of new items, especially where it was judged that existing items did not provide adequate coverage. Cognitive interviews in English and Spanish helped ensure that items selected for testing would be understood as intended by respondents, especially those with neurological disorders and/or low literacy.
Expert item review (EIR)
Before cognitive interviews were conducted with patients, every item in the comprehensive pool was reviewed by at least three experts for clarity, precision, acceptability to respondents, adaptation to computerized testing, format of responses, preferred response options and similarity of timeframe. Two Neuro-QOL domain experts then evaluated that information and made decisions about the need for review or modification of individual item. Expert collaborators: a) signed off on items that appeared to need no further revision; and b) suggested revisions to items that still needed improvement. The final item pools were approved after review by members of the Neuro-QOL Executive Committee.
After identifying approximately the 50 best items per generic item bank or disease-specific scale, cognitive interviews were conducted by telephone with 63 adult and pediatric patients with Neuro-QOL conditions, as well as four pediatric caregivers. During these interviews, patients reviewed each item in a one-on-one semi-structured interview that focused on item comprehension and relevance. The interviewer asked questions to assess the content validity of items, concept clarity, language refinement and ease of using the response options. Respondents also identified areas for new item development and creation. When these were “gaps” in the newly created banks and scales, the Neuro-QOL domain experts either identified a relevant item on an existing HRQL questionnaire or within our other item banking projects OR a new item was written to cover the gap.
Final steps to creation of field test-ready item banks and scales
Because the items would be translated into Spanish, it was important to consider problems that might arise during that translation. Accordingly, translation science experts provided feedback about the ease of translating all items and potential item response categories (e.g., “not at all” to “very much”): this information was used to modify items, when possible; to remove items that appeared to be particularly problematic for translation; and to choose the final response categories for the various types of items (e.g., frequency, severity).
Each domain working group carefully reviewed all the input from neurology experts, patients and translation scientists and made appropriate changes. The proposed final, field-test ready item banks and scales were reviewed by all the working group and domain chairs. The Neuro-QOL Executive Committee gave final approval prior to the first field test.
Spanish language version
From the outset, one of this project’s aims was to make all of the item banks/scales readily available for use in the Spanish-speaking population. Input was obtained from native Spanish speaking patients with neurological disorders in all the previous steps for which patient input was solicited. A rigorous forward-backward translation process 4
was undertaken to translate the field test-ready item banks and scales described above. Following this extensive work to obtain a high quality linguistic translation, the items were cognitively debriefed with 30 adults and 30 children. Each subject was asked to first answer a subset of the translated items independently. Next, a Spanish speaking interviewer asked the subject about the meaning of specific words within the item stem, the overall meaning of the item, or why they had chosen a specific answer. For some items, the subjects were also asked to consider alternative wording for those items. On the basis of the cognitive interviews, some revisions were made to the original translations.