|Home | About | Journals | Submit | Contact Us | Français|
Throughout the course of their disease, individuals with systemic lupus erythematosus (SLE) face considerable physical, psychological and social challenges. The disease has profound effects on health-related quality-of-life (HRQoL), which have been documented extensively in the literature (1). Capturing decrements and improvements in HRQoL has therefore become important in clinical research in SLE, and is advocated by both the U.S. Food and Drug Administration (FDA) in providing guidance to SLE clinical trialists as well as the Outcome Measures in Rheumatology Clinical Trials (OMERACT) group (2, 3). Here I review three measures designed to ascertain HRQoL in SLE, the Lupus Quality of Life (LupusQoL), SLE-specific Quality of Life questionnaire (SLEQoL) and SLE Quality of Life Questionnaire (L-QoL) (Table 1). These measures were chosen because they were developed and specifically designed as patient-reported outcome measures to assess quality of life in SLE and have all had some published validation testing to date.
Most studies examining HRQoL in SLE have employed generic measures, such as the Medical Outcomes Study Short Form (SF-36) (4). An advantage of generic instruments is that they allow comparison of the HRQoL in SLE to other related conditions or to population norms, something that has been useful in documenting that SLE has similar or worse HRQoL decrements compared to other severe chronic conditions (5). In addition, many generic instruments have undergone extensive validation testing and are adapted in multiple languages and cultures.
However, a disadvantage of employing generic instruments alone in SLE is that they may not adequately capture symptoms or issues that are specific to the disease. This may reduce their sensitivity to detect meaningful changes over time. For example, some, but not all, studies suggest that the SF-36 is insufficiently responsive in longitudinal studies or trials in SLE (6, 7), and may lack domains that are particularly relevant to a population with SLE, such as fatigue or sleep (8). The three SLE-specific instruments reviewed here have been developed to address some of these potential limitations. As discussed below, preliminary validation work is available for each of these instruments in defined populations.
To measure disease-specific HRQoL in adult SLE. The original development and validation study was performed in the United Kingdom and published by McElhone et al. in 2007 (9).
Eight domains are covered, including physical health, emotional health, body image, pain, planning, fatigue, intimate relationships, and burden to others.
34 items total. Individual subscales include the following: physical health (8 items), emotional health (6 items), body image (5 items), pain (3 items), planning (3 items), fatigue (4 items), intimate relationships (2 items), burden to others (3 items).
Questionnaire has a 5-point Likert response format (0=all the time, 1=most of the time, 2=a good bit of the time, 3=occasionally, and 4=never).
Prior four weeks.
The LupusQoL has been used for research purposes in clinical cohorts in both the United Kingdom and the United States (10, 11). It has not yet been used in a clinical trial in SLE. The U.K. sample was predominantly Caucasian and had less severe disease, while the U.S. sample was predominantly African-American and had more severe disease. Median domain values for the LupusQoL in these two cohorts are presented in Table 2.
Available on the Arthritis Care & Research Web site at http://www.interscience.wiley.com/jpages/0004-3591:1/suppmat/index.html. A website has been launched with information regarding obtaining permissions to use the instrument, instructions for scoring and other useful information (www.lupusqol.com).
Written and electronic versions of questionnaire available.
The mean raw domain score is transformed to scores ranging from 0 (worst HRQoL) to 100 (best HRQoL) by dividing by 4 and then multiplying by 100. The result represents the transformed score for that domain. The authors suggest that transformed domain scores are obtainable when at least 50% of the items are answered. The mean raw domain score is then calculated by totaling the item response scores of the answered items and dividing by the number of answered items. A non-applicable response is treated as unanswered and the domain score is calculated as indicated above.
0 (worst HRQoL) to 100 (best HRQoL).
Time to complete is <10 minutes. No information on reading level required is provided (the educational attainment of the UK validation cohort was 13.8 ± 3.1 years).
Time to score is <5 minutes.
A Spanish language version has been adapted and validated (12). A version adapted and validated for a U.S. population is also available (13). Translations into 77 languages from 51 countries are available (see website), although these translations do not yet have published psychometric information.
The original measure was developed and validated by using a mixed qualitative and quantitative approach. Briefly, 30 individuals with SLE participated in semi-structured interviews and a combination of thematic analysis from these interviews as well as expert panel feedback was used to generate items. Feedback was sought again from a group of 20 patients to revise draft items. Subscales were generated using principal component analysis. A written survey (either mailed or administered in the clinic) was then used to assess validity and reliability.
It is important to note that the U.S. validation study found a different factor structure for the LupusQoL, with only five of the eight factors having eigenvalues >1 in the analysis (13); eigenvalues are used to measure how much of the variance each successive factor extracts, and only values >1 are generally retained in analyses (14).
Information on readability is not provided, but item response rates were very high (<2% of domains were not scored because of missing responses). However, it is important to note that some domains (i.e. intimate relationships) were not applicable to all respondents (7.3% missing). Floor and ceiling effects are reported for each domain and are reasonable; for all domains except intimate relationships, the percentage of individuals with a score of 0 was <10% (range 2.2–8.6%), and the percentage of individuals with a maximum score of 100 was <30% (range 6.2–28.2%).
Individual domains demonstrated good internal consistency (Cronbach’s α ranging from 0.88–0.96) in the original validation study as well as in the U.S. and Spanish adaptations. Test-retest reliability of the original LupusQoL was evaluated in a subset of 83 respondents and was good with intraclass correlation coefficients between 0.72–0.93 for the individual domains.
Concurrent validity was assessed by comparing domain scores of the LupusQoL with other comparable domains of the SF-36, with good correlation (r=0.71 to 0.79). Similar results were obtained in the U.S. and Spanish validation studies. Several recent follow-up studies performed in the United Kingdom, United States and Spain demonstrated that the LupusQoL has discriminant validity in that it functions relatively independently as an outcome measure in SLE. These studies found no or weak associations with factors such as disease duration, disease activity and damage (10–12). To assess construct validity, the developers examined LupusQoL scores in relation to disease activity (as measured by the British Isles Lupus Assessment Group or BILAG) and damage (Systemic Lupus International Collaborating Clinics/American College of Rheumatology damage index or SDI) (9). Patients with more active disease generally reported poorer HRQoL across all domains except fatigue, although the relationship with damage, as measured by the SDI was less clear.
Sensitivity to change (responsiveness) and minimally clinically important difference are not yet available, but are subjects of an ongoing study.
Of the available instruments to assess HRQoL, the LupusQoL has undergone the most validation process and has been modified to be culturally appropriate for the U.S. and Spanish populations. Translations are available in numerous languages, although psychometric evaluations of these translations have not yet been published. The importance of performing such evaluations is evidenced by the differences noted in the U.K. and U.S. validation studies of the LupusQoL, including the different factor structures identified. The reasons for these differences remain unclear, and further studies are needed to assess the optimal factor structure of the instrument.
Currently, the measure would be most appropriate for cross-sectional evaluations of HRQoL in SLE in the populations in which the measure is validated. Future studies examining the responsiveness of the LupusQoL will elucidate its role in treatment studies of SLE. For longitudinal assessments in observational studies, information about additional psychometric properties, such as response shift bias, may also be useful.
To assess quality-of-life in individuals with SLE. The original development and validation study of the English language survey took place in Singapore by Leong et al. (6).
Six domains including physical functioning, activities, symptoms, treatment, mood and self-image.
40 items, including physical functioning (6 items), activities (9 items), symptoms (8 items), treatment (4 items), mood (4 items) and self-image (9 items).
7-point response scale (subsections have different anchors, including “not difficult at all” to “extremely difficult”, “not at all” to “extremely troubled”, and “not at all” to “extremely often”).
A summary score is derived from the sum of all responses across the domains; alternatively the authors suggest that a summary score can be obtained by taking the mean of each of the six subsections. Item weighting is not available and needs to be addressed in future studies given that the current scoring system places greater emphasis on domains with a greater number of items. No specific instruction for dealing with missing values is provided.
Scores range from 40–280, with higher values corresponding to worse quality-of-life.
<5 minutes for both the SLEQoL and SLEQoL-C.
Time to score is not reported.
A Chinese language version is available (SLEQoL-C). This version was derived by translation, and back-translation and content validity was examined through interviews with 7 bilingual patients with SLE in Singapore. The study did not demonstrate differential item functioning (DIF) in the responses of English and Chinese-speaking patients, suggesting successful translation into Chinese (17). Psychometric testing of the SLEQoL-C is not yet available. The SLEQoL has also been culturally adapted and undergone preliminary validation testing in Brazilian-Portguese using a clinical cohort of 107 patients (16). Inter and intra-observer reliability for the adaptation was found to be high, and the measure had good internal consistency. The measure correlated well with the SF-36, suggesting construct validity, and poorly with lupus disease activity and damage measures, suggesting discriminant validity.
An unspecified number of rheumatologists and nurse clinicians familiar with SLE management generated an initial list of items. Feedback was elicited from 100 patients on these draft items; however, patients were not involved in generation of the items originally. Factor analysis and Rasch model analyses were used to compose the final questionnaire and create subscales. Psychometric properties were tested using responses obtained during routine clinical visits in 275 patients. The characteristics of this clinical cohort included a disease duration of approximately 9 years, a mean SLEDAI of 2.7 (SD 4.8) and mean SDI of 0.67 (SD 1.1). Patients were from Singapore and English-speaking. A subset of patients had repeat data collection to allow investigation of test-retest reliability and responsiveness.
A minority of participants in the original SLEQoL validation study had low educational attainment (10.5% had no formal education or a primary education only); this number was significantly higher for the SLEQoL-C (44.7% of the sample had no formal education or a primary education only). However, no specific information on readability is provided in the Singapore studies.
Research assistants ensured that patients completed items so no missing responses were reported.
An analysis of floor and ceiling effects revealed that the SLEQoL had significant floor effects (good perceived QoL), with three of the subsections having between 39 and 44% of individuals reporting good perceived QoL. Ceiling effects were not observed. The SF-36 in the same sample had fewer floor effects, but more significant ceiling effects; for four domains, between 28–59% of respondents reported poor QoL.
Internal consistency was good (Cronbach’s alpha was 0.95 for the summary score, and ranged from 0.76–0.93 for specific subsections).
Test-retest reliability was assessed in 51 patients who repeated the instrument at a 2-week interval. The intraclass correlation coefficient was 0.83 for the summary score, indicating good reliability. However, four of the six individual domains had intraclass correlation coefficients of <0.6, which indicates only moderate reliability. Reliability in the Brazilian-Portguese culturally adapted version was high (intraobserver correlation coefficient 0.97 and interobserver correlation coefficient 0.99) (16).
Although items were generated entirely by health professionals, patient feedback was solicited to add and modify items to assess content validity (6, 18). Construct validity was investigated by comparing scores on the SLEQoL to the SF-36, Rheumatology Attitudes Index and its helplessness subscale, commonly used physician-assessed disease activity (Systemic Lupus Erythematosus Disease Activity Index or SLEDAI and Systemic Lupus Activity Measure or SLAM) and damage indices (SDI). Absent or very weak correlations were demonstrated for the summary score for most SF-36 domains (the strongest correlation being between the SLEQoL physical functioning domain and the SF-36 physical functioning domain at 0.234), suggesting relatively low concurrent validity. Correlations were also weak or absent with the SLAM, SLEDAI, and SDI. However, these data provide evidence of discriminant validity, as the SLEQoL appears to be capturing constructs that are independent of traditional disease activity and damage measures.
Construct validity was supported by an analysis demonstrating that the SLEQoL summary score varied appropriately with self-perceived changes in global QoL.
Responsiveness was assessed in a subset of 95 patients who had return clinical visits within a three-month window. Participants were asked to rate the global change in QoL using a scale anchored from −7 to 7 (−7 representing ‘a very great deal worse’ and 7 representing ‘a very great deal better’). Few participants reported significant QoL deterioration, and therefore this group was not analyzed (n=12). Among individuals who reported QoL improvements or reported no change, responsiveness was assessed using multiple techniques, including the standardized response mean (SRM), effective size, Guyatt’s coefficient and relative efficacy (RE). All methods yielded similar results, with the SLEQoL demonstrating greater responsiveness than the individual domains of the SF-36. However, the SLEQoL also demonstrated greater variation of scores in participants who reported unchanged QoL compared to the SF-36, indicating decreased specificity.
Minimal clinically important difference (MCID) was derived using a distributional approach in which SLEQoL scores were anchored to the patient global ratings of changes in their QoL. By taking the mean of the absolute difference of SLEQoL scores in the group of patients who rated their global QoL change as +2 to +3 (‘moderately worse’ or ‘a little worse’) and −2 to −3 (‘moderately better’ or ‘a little better’), the MCID was calculated at approximately 25.
The strengths of the SLEQoL, which primarily assesses HRQoL, include that information is available on its responsiveness and the minimally important clinically difference. The instrument has good discriminant validity as it appears to function independently from commonly used measures of disease activity, damage, and disease-related attitudes.
Additional studies will be required to further assess and confirm psychometric properties. Psychometric testing of the Chinese language version (SLEQoL-C) is not available. Reliability for the individual domains was only moderate in the original validation study, which suggests that these scores should be used with caution given possible instability. Concurrent validity with the SF-36 is relatively poor, suggesting that the instrument should be used primarily in conjunction with other validated measures of HRQoL. In addition, floor effects should be considered, and as the developers note, the instrument may best be used with a companion generic instrument that does not have substantial floor effects.
To provide a needs based assessment of quality-of-life in SLE. The L-QoL was developed by Doward et al. in 2008 (19).
The questionnaire is based on the needs-based QoL model, which posits that life gains its quality from the ability and capacity of individuals to satisfy their needs. Items assess the overall effect of SLE and its treatment on QoL.
25 items in scale, including items assessing self-care, fatigue, and emotional reactions.
Dichotomous “true/not true” response ormat.
The instrument has not yet been used in published clinical or observational studies of SLE. The mean value for the L-QoL in the original validation study performed in the United Kingdom was 6.7 (SD 6.1).
The instrument is available from the University of Leeds; registration is required. Further information is provided on University of Leeds Psychometric laboratory website http://www.leeds.ac.uk/medicine/rehabmed/psychometric/Scales3.htm.
Count of symptoms and a higher score on the L-QoL indicates worse QoL. There are no specific instructions for dealing with missing values.
Score range is 0–25, with higher scores indicating worse QoL.
Time to score is not reported.
Published adaptations are not available.
The L-QoL was developed through a multi-step process that started with the use of qualitative interviews with 50 individuals with SLE in the United Kingdom. Analysis of this qualitative data was used to construct items that were 1) relevant to the needs model, and 2) applicable to all potential respondents. Draft items were revised based on feedback elicited during cognitive interviews with 16 patients. Scaling and psychometric properties were then tested through the use of two postal surveys (n=95 and 93, respectively). Rasch analysis was conducted to confirm unidimensionality and the absence of differential item functioning (DIF).
The readability of the survey is not reported, nor is the educational attainment of the development and validation samples. Overall response rate for the first postal survey was 76%. Missing data were encountered in 14/95 (14.7%) of responses, although the number of missing items per respondent was relatively low (mean 2.9 ± SD 2.7). The presence or absence of floor or ceiling effects is not explicitly analyzed; although the authors provide the range of scores obtained (0–22), the mean (6.7 ± SD 6.1) and the median (5.0 ± IQR 1.0–11.0).
Test-retest reliability was assessed by postal surveys administered 2 weeks apart. The interclass correlation coefficient was 0.95, indicating excellent reliability. Internal consistency using Person-separation reliability 0.91–0.92.
Items were derived from patient interviews and were largely phrased in the patients’ own words to maximize content validity. Construct validity was demonstrated through examining the relationship between the L-QoL and other measures of disease activity and severity; those with higher perceived disease activity (rated as perceived current disease flare yes/no), higher perceived disease severity (rated on a scale mild/moderate/quite severe), and fair/poor ratings of their general health, had statistically significantly L-QoL scores. Individuals who were unemployed also had lower L-QoL scores, and this reached statistical significance in the second postal sample (but not in the first). In addition, moderate correlations were observed between the L-QoL and Nottingham Health Profile scores (between 0.48 and 0.80).
A Rasch analysis was performed to determine unidimensionality of the scale. This method builds a hypothetical line along which items are located. Items falling close to this line contribute to the single dimension being examined, while those that fall far from the line are discarded since these items indicate construct-irrelevant variance. The fit of the final 25-item L-QoL to the Rasch model was good (overall item fit was −0.124 (SD 0.82) and overall person fit was −0.701 (SD 0.66). The items showed invariance of the scale across the trait.
Unlike many instruments that measure HRQoL using multi-dimensional constructs that yield a profile of scores, the L-QoL provides a single unidimensional score and is based on the needs-based model of QoL. Although testing in the original development and validation study show good reliability and validity, additional testing is required to confirm these initial findings. In particular, the original validation study examined construct validity in relation to a self-report measure of disease activity (flare) and a non-validated self-reported measure of disease severity. Administration of the instrument to a clinical cohort wherein physician-assessed measures of both disease activity and damage are available will yield further insight into both construct validity and also discriminant validity, or the independence of the L-QoL from other disease assessments in SLE. In addition, information on responsiveness is not available and will be needed to assess whether the measure might be applied to treatment studies of SLE. Finally, validation of the instrument in other populations, including patients with more severe disease phenotypes, will be useful.