|Home | About | Journals | Submit | Contact Us | Français|
We sought to develop and test an interviewer-administered measure of damage in SLE, the Brief Index of Lupus Damage (BILD), for use in epidemiological studies in which administration of the SLICC/ACR Damage Index (SDI) by trained physicians is not possible or feasible. In addition, we compared the BILD to another recently developed patient-reported measure, the Lupus Damage Index Questionnaire (LDIQ), which was designed as a written survey.
A sample of 81 patients from two university-affiliated SLE clinics was used to test the criterion validity of the BILD and the LDIQ. A second sample, the Lupus Outcomes Study (LOS, n=728) was used to ascertain the construct validity of the BILD.
We found good agreement between most BILD items and corresponding SDI items, and moderately high overall Spearman rank correlations for SDI with BILD (0.64 and with LDIQ (0.54). BILD scores were higher among older individuals, those with longer disease duration, and those with higher mean disease activity in the preceding four years. In addition, higher BILD scores were associated with poorer self-rated health and functional status, greater unemployment and work disability, and increased health care utilization.
We developed and performed a preliminary validation study demonstrating content, criterion and construct validity of a new practical patient-reported instrument of SLE disease damage. Although further studies are needed to examine reliability and to document psychometric properties in other populations, the BILD appears to represent a promising tool for studies of SLE outside the clinical setting.
As survival in patients with systemic lupus erythematosus (SLE) continues to improve, measuring outcomes beyond mortality has become a focus of epidemiologic research. In addition to disease activity, the concept of cumulative damage has emerged as an important outcome in SLE. Damage predicts not only mortality in SLE (1-5), but also a wide range of other outcomes, such as physical function (6, 7), health care utilization (8), and disability (9, 10).
Studies examining disease damage in SLE have traditionally relied on a validated physician-assessed measure, the Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index (SDI) (11). The SDI has been used widely over the last decade, with studies supporting its criterion, discriminant, and construct validity (1-5, 11, 12), as well as its reliability (13). Because the SDI requires a trained physician to complete a 41-item questionnaire, it has largely been used in research conducted in centers with resources and expertise to study SLE. In recent years, several large community-based studies in SLE using alternative data collection strategies, such as surveys administered by mail or by telephone, have been launched (14, 15). These studies have attempted to broaden clinical research in SLE to include individuals cared for outside of specialty centers. In doing so, a need has arisen to expand the tools available to measure important SLE-related outcomes to include patient-reported measures. A patient-reported measure of SLE disease activity, the Systemic Lupus Activity Questionnaire (SLAQ), has been developed and initially validated for this purpose (16, 17).
Similarly, researchers have recently developed a patient-reported proxy for the SDI, the Lupus Damage Index Questionnaire (LDIQ) (18). The questionnaire has been tested for criterion, content, and construct validity. An international examination of criterion validity in French, Spanish and Portuguese has also been completed (19). The LDIQ has 56 questions to assess all of the original SDI items and was designed for administration as a written survey. In a concurrent effort, we developed and tested a shorter patient-reported proxy for the SDI meant for interviewer administration in-person or on the telephone. The aims of the current study were threefold: 1) To develop and test the criterion validity of a new patient-reported damage index, the Brief Index of Lupus Damage (BILD); 2) To examine the BILD’s construct validity in a large, observational cohort of individuals with SLE; and 3) To compare the criterion validity of the BILD to the LDIQ, an instrument that was undergoing validation testing at the time the BILD development was initiated.
There were two sources of data for the study. One sample included 81 patients from two university-affiliated SLE clinics and was used to test the criterion validity of the BILD against the SDI. In addition, the clinic-based sample was used to test criterion validity of the newly developed LDIQ. All continuing patients seen between February and September 2009 at one clinic or between December 2010 and February 2011 at the other with a diagnosis of SLE for at least one year were eligible for this study.
Recruitment for the clinic-based study took place in the University of California, San Francisco (UCSF) lupus center and at the San Francisco General Hospital lupus clinic, which is staffed by UCSF physicians. Purposive sampling was used to recruit study participants. Patients were initially approached before their regular SLE clinic appointments about participating in a short study of SLE manifestations. Those responding affirmatively were given the BILD questionnaire over the telephone at home prior to their next clinic appointment. Basic demographic questions were included in the interviews, which averaged 10 minutes. Upon returning to the clinic, patients completed the LDIQ in the waiting room prior to their appointment. During that appointment, their rheumatologist completed the SDI. This protocol was designed to ensure that physician queries regarding SDI items occurred last, and therefore did not influence patient responses on either the BILD or the LDIQ. The mean time between the telephone interview and the clinic appointment was four months. Of 109 patients approached, eight declined to participate, eleven could not be reached by telephone, and nine did not return to the clinic before the end of the data collection period. Thus, 81 (74%) patients completed the study. Seven patients were unable to complete the LDIQ, primarily due to time constraints in the clinic.
A sample size calculation was performed to determine the minimum necessary sample size to assess BILD’s criterion validity. A 0.05 one-sided Fisher’s z test of the null hypothesis that the Spearman correlation coefficient rho=0 was estimated to have 80% power to detect an alternative rho of 0.28 when the sample size was 80; moreover with n=80 power was estimated at 82% to detect an alternative rho of 0.50 versus the null of rho=0.25.
The second data source was the Lupus Outcomes Study (LOS), an ongoing cohort of individuals with SLE, interviewed annually by telephone. Recruitment for the LOS took place in several settings, including university-based rheumatology clinics (25%), community rheumatology offices (11%), and non-clinical sources, including patient support groups and conferences (26%), and other forms of media (38%). All patients have a diagnosis of SLE from a physician, confirmed by a formal review of the medical record to document American College of Rheumatology criteria for SLE. Details of the LOS methodology have been published previously (14). The LOS (N=728) was used to test the construct validity of the BILD. The BILD items were included as part of the fifth annual wave of the LOS, which also included validated self-report measures of disease activity, general health status, employment status, work disability, health care utilization, and demographics. Eleven LOS participants who were also part of the clinic sample were excluded, leaving 717 for the present independent analysis.
The BILD was designed for administration by telephone interview as part of a longer survey. Investigators modified the existing SDI items to be comprehensible to a lay respondent. The goal was not to replicate the SDI item for item, but to develop a reasonable proxy measure that would distinguish between greater and lesser degrees of SLE damage. Therefore, not all items in the SDI were included, because the investigators deemed the manifestation either too rare to be likely to contribute meaningfully to the score (e.g., shrinking lung), or not likely to be interpreted with enough specificity to capture the concept of damage (e.g. alopecia). The initial set of questions was reviewed by three patients with SLE for acceptability, feasibility, and understanding; comments were solicited on the clarity and rationale of the directions, the meaning of the items, and the appropriateness of the response choices. Based on this feedback, several questions were revised for clarity. Next, the instrument was included in the LOS interview. During the first month of the survey wave, interviewers recorded all questions from 77 respondents regarding the BILD items. These questions were evaluated by study rheumatologists, which resulted in several revisions in item prompts and wording. The resulting BILD instrument contained 28 questions that captured information on 26 of the original SDI items (Appendix A). The final BILD survey was then administered to the entire LOS sample.
When the BILD was administered to the larger LOS sample, interviewers were instructed to provide clarifications on items if respondents were unclear on their meaning using scripted comments (parenthetical explanations that appear on the instrument). Occasionally, respondents had additional questions that were recorded as free text notes. A rheumatologist adjudicated these notes when necessary.
Study protocols were approved by the UC San Francisco Committee on Human Research. All participants gave their informed consent to be part of the study.
In the clinic sample, we compared the BILD and LDIQ item responses to the corresponding SDI responses. We calculated the item-by-item percent observed agreement (po) with the SDI for each of the two proxy measures rather than a kappa statistic, given the low prevalence of individual items. The kappa coefficient is significantly influenced by the prevalence of attributes and its magnitude is difficult to interpret meaningfully if attributes are either very common or very rare, resulting in the so-called kappa paradox with high observed agreement but low kappa (20). The prevalence-adjusted bias-adjusted kappa (PABAK = 2po − 1) has been proposed as a better measure of agreement than kappa when prevalence varies or when the prevalence of each method or instrument differs (21). Like kappa, a PABAK value of −1 indicates perfect disagreement, 0 indicates no agreement, while 1 indicates perfect agreement.
We also compared the distributions of the overall SDI, LDIQ, and BILD scores, calculating the Spearman rank correlation coefficients (rs) for both proxy measures with the SDI.
The LOS was used to assess the construct validity of the BILD and to quantify its acceptability. Because the BILD does not have a normal distribution, we divided the scores from the LOS sample into quartiles to examine its correspondence with demographic, SLE status, general health, and health care utilization measures found in literature to relate to disease damage measured by the SDI. Sociodemographic measures included age, gender, race/ethnicity (nonwhite vs. white), education (high school or less vs. some college education or more), household income (at or below 125% of the Federal Poverty Threshold), and employment status. SLE measures included disease duration and the Systemic Lupus Activity Questionnaire (SLAQ) (16), averaged over four prior interviews. General health status measures included global health (categorized as excellent/very good/good vs. fair/poor), the SF-36 physical and mental component scores, and work disability status. Health care utilization measures included the annual mean number of outpatient medical visits for SLE over the first five years of the study, as well as the total number of hospitalizations during that time. Distributions of categorical measures were compared across quartiles of BILD using 1 degree of freedom chi-square trend tests. Continuous measures were compared using ANOVA F-tests. Finally, to explore the independent association of the sociodemographic measures with BILD, we modeled the BILD score (either top quartile versus not, bottom quartile versus not, and raw score) as a function of age, disease duration, gender, race/ethnicity, and education, using logistic regression.
Both samples were comprised mainly of women and were fairly well-educated (Table 1). The clinic sample, however, was younger than the LOS survey sample, had been diagnosed more recently, and had a larger proportion of racial/ethnic minorities, likely due to recruitment from urban health care settings. Survey sample participants had moderate disease activity levels over the past four years and an average SF-36 score of 38, typical of a chronically ill population (general population mean = 50; higher scores represent better health). Fewer than half the sample was employed and 30% reported work disability.
The percent agreement between each SDI item and corresponding BILD and LDIQ items is displayed in Table 2. Of the 26 SDI items assessed in the BILD, two items were not reported by any patients: pulmonary fibrosis and osteomyelitis. Of the 42 SDI items assessed in the LDIQ, only chronic peritonitis was not reported. Observed agreement between BILD items and the SDI ranged from 75-100%, while prevalence-adjusted bias-adjusted kappas (PABAKs) ranged from 0.70-1.00, except for deforming or erosive arthritis (PABAK=0.68) and extensive scarring/panniculum (PABAK=0.51). Observed agreement between SDI and LDIQ for only the items retained in the BILD ranged from 77-100% with PABAKs from 0.68-1.00, except for deforming or erosive arthritis (PABAK=0.61), extensive scarring/panniculum (PABAK=0.54), and cognitive impairment (PABAK=0.54). Observed agreement between SDI and LDIQ for all LDIQ items ranged from 53-100% with PABAKs from 0.05 to 1.00, including 7 items with PABAK from 0.05 to 0.61; four of the 14 items in LDIQ but not in BILD had PABAKs in that range. After analysis of the correspondence between the BILD and SDI, two items (erosive arthritis and extensive scarring of the skin) were so commonly reported as to render them uninformative to the BILD score as a whole. Their PABAK scores were <0.70 so they were therefore dropped from the subsequent calculation of the BILD score.
Despite differences in the item-by-item comparisons of the SDI and the BILD, the distributions of these two scores were similar, while the LDIQ scores were predictably considerably higher. The BILD and SDI had a moderately high Spearman rank correlation (rs) of 0.64 (p<0.001). In our sample, the rs of the SDI and the LDIQ was 0.54 (p<0.001, Table 3), comparable to that reported in the original LDIQ validation paper (rs=0.48) (18).
The next phase of the study involved administering the BILD to participants in the LOS. The acceptability of the BILD for LOS respondents was very high. Only four items had more than 1% of individuals who did not respond (resulting in missing values); these included history of angina or bypass (n=12), retinal disease (n=11), peritonitis (n=10), and interstitial lung disease (n=9). In the LOS, the median BILD score was 1 with an interquartile range of 0 to 3 and a maximum score of 6 (data not shown), identical to the clinic sample.
In Table 4, we evaluated the construct validity of the BILD by comparing demographic, health status, and health utilization characteristics of LOS participants to their quartile of BILD (quartile scores were 0, 1, 2-3, and 4+ points). As BILD quartiles increased (reflecting greater damage), respondent age increased, as did the percent with incomes below poverty and the percent reporting being unemployed (all p<0.001). No clear relationship was seen by race/ethnicity or gender. Individuals in higher BILD quartiles also had longer disease duration, higher 4-year mean disease activity scores, lower SF-36 PCS and MCS scores, and higher percentages with poor self-rated health and work disability (all p<0.001), with each of these demonstrating monotonic relationships. Finally, individuals in the higher BILD quartiles also had more hospitalizations and a greater number of physician visits for SLE over the previous five years (p<0.001).
To further examine the relationship between sociodemographic characteristics and damage, we constructed several multivariable models, where the outcome was the BILD score (top quartile versus not, bottom quartile versus not, and raw score), and predictors included age, gender, race/ethnicity, disease duration, and poverty status. All analyses yielded similar conclusions, with disease duration and poverty emerging as significant predictors of damage; no statistically significant effect was seen by race/ethnicity.
Extending scientific research outside the clinical setting in SLE remains a challenging task. The relative rarity and complexity of the disease remain barriers for population studies, as do the lack of suitable case finding and disease assessment tools. Although measures of general health, such as the SF-36, provide some insight into disease status in epidemiologic studies, more specific tools have the potential to better detect health–related outcome changes. The development and validation of patient–reported instruments hold promise in addressing this gap. In this study, we report our methods for developing and performing an initial validation study of a patient-reported instrument, the BILD, designed to assess disease damage. Our findings suggest that the BILD is acceptable to respondents, is efficient to administer, and has content, criterion and construct validity.
We designed the BILD to capture the overall concept of damage in SLE for epidemiological research. It is important to note that it is not a direct substitute for the SDI, since the BILD omits many items from the SDI that were either not suitable for patient self-report or were not informative because of their frequent reporting by patients. Instead, among a group of patients (rather than in any individual patient), the BILD is able to differentiate between those with high or low degrees of SLE damage. We found good agreement between items in the BILD and the corresponding items in the SDI, and an overall moderately high correlation between the two measures (0.64), suggesting criterion validity. In addition, through both pilot testing and administration of the instrument to over 700 individuals with SLE, we found that the instrument was acceptable to patients, as evidenced by the very high individual item response rate.
To ascertain construct validity, we compared patients in the four quartiles of BILD scores on measures found in previous literature to relate to disease damage as measured by the SDI. Consistent with studies of the SDI, we found that BILD scores in the LOS were higher among older individuals, those with longer disease duration, and those living below the federal poverty level (9, 22-26). Some previous studies have suggested greater damage among certain racial/ethnic minority groups (22), but many have not, once poverty was accounted for (27-29); we also did not find a statistically significant association between damage and race/ethnicity in our multivariable analyses. As expected, those with a higher mean disease activity score over a four year period had higher damage scores (24). Higher BILD scores were also associated with worse self-rated health, a lower SF-36 physical component score, work disability and employment (6, 9, 10). Finally, individuals with higher BILD scores had significantly greater health care utilization, including a greater number of hospitalizations and physician visits over the last four years. This is consistent with health care utilization studies involving the SDI (8, 30).
A written survey to assess patient-reported damage, the LDIQ, was developed concurrently with our effort to develop and test the BILD for telephone or interviewer administration. LDIQ investigators allowed us to simultaneously test the criterion validity of that instrument in our clinic sample, providing a second U.S. validation for that instrument. We found that both the LDIQ and BILD correlated acceptably with the SDI (rs for LDIQ=0.54, rs for BILD 0.64). The correlation of the LDIQ and SDI in our study sample was similar to the published LDIQ criterion validity assessments performed in the United States (rs=0.48). Important differences between the BILD and LDIQ include the mode of administration (LDIQ is a written survey, BILD is designed for administration by an interviewer in-person or on the telephone) and length (LDIQ has 56 items, the final BILD instrument has 26 items). Given four large international patient samples, criterion validity testing for the LDIQ has been significantly more extensive; the BILD will require further testing to confirm criterion validity in larger, independent samples. Construct validity testing for the two instruments has been comparable in two large community-based samples (the National Databank of Rheumatic Diseases for the LDIQ and the Lupus Outcomes Study for the BILD) (18).
Although the analyses presented here support the content, criterion validity and construct validity of the BILD, it is important to note that characterization of the other psychometric properties of the instrument will require further research. For example, we did not assess the reliability of the BILD (either test-retest, or inter-interviewer reliability). Assessment of external validity in an independent sample with different sociodemographic or clinical characteristics should also be performed. The clinic-based sample used to assess criterion validity and the LOS sample differed significantly based on race/ethnicity, disease duration and age. Theoretically, the BILD could correlate differently among these subgroups with the physician-assessed SDI, and future studies of criterion validity with a larger, more heterogeneous sample should investigate this possibility. Finally, an important strength of the SDI is its association with significant long-term clinical outcomes, such as mortality. It remains to be seen whether either of the two newly developed patient-reported measures of disease damage will have similar predictive validity.
In summary, we have developed and performed a preliminary validation study of a new patient reported instrument of disease damage in SLE. The BILD, which is designed for telephone or interviewer administration, had content, criterion and construct validity in this study. Although further studies are needed to examine its reliability and to document its psychometric properties in populations with different sociodemographic or clinical characteristics, the BILD appears to represent a promising tool for studies of SLE outside the clinical setting.
Funding: Supported by NIAMS P60-AR-053308. Additional support from the Arthritis Foundation, AHRQ/NIAMS 2 RO1 HS013893, NIAMS 5R01AR56476, State of California Lupus Fund and the Rosalind Russell Medical Research Center for Arthritis. The study was also carried out in part in the General Clinical Research Center, Moffit Hospital, University of California, San Francisco, with funds provided by the National Center for Research Resources, 5 M01 RR-00079, U.S. Public Health Service.