|Home | About | Journals | Submit | Contact Us | Français|
Estimate the minimum important difference (MID) for the Urinary Distress Inventory (UDI), UDI-stress subscale of the Pelvic Floor Distress Inventory (PFDI), and Urinary Impact Questionnaire (UIQ) of the Pelvic Floor Impact Questionnaire (PFIQ).
We calculated MID using anchor- and distribution-based approaches from a randomized trial for non-surgical stress incontinence treatment. Anchors included a global impression of change, incontinence episodes from a urinary diary, and the Incontinence Severity Index. Effect size and standard error of measurement were the distribution methods employed.
Anchor-based MIDs ranged from −22.4 to −6.4 points for the UDI, −16.5 to −4.6 points for the UDI-stress, and −17.0 to −6.5 points for the UIQ. These data were supported by two distribution-based estimates.
Reasonable estimates of MID are 11, 8, and 16 points for the UDI, UDI-stress subscale and UIQ, respectively. Statistically significant improvements that meet these thresholds should be considered clinically important.
Psychometrically robust instruments for measuring health-related quality of life (HRQOL) are essential to evaluate women with pelvic floor disorders.1,2 The Pelvic Floor Distress Inventory (PFDI) and Pelvic Floor Impact Questionnaire (PFIQ) are two complementary condition-specific HRQOL instruments for women with urinary incontinence, fecal incontinence and pelvic organ prolapse.3 Each has three scales: urinary, colorectal/anal, and prolapse. The PFDI and PFIQ have each been shown to be psychometrically valid, reliable, and responsive.3–5
The smallest change in score associated with a clinically meaningful change in a questionnaire has been called the minimum important difference (MID).6 Because statistically significant differences may not be clinically meaningful changes, the MID of a questionnaire is essential to interpret questionnaire results when assessing within-group or between-group differences. The MID of the scales of the Pelvic Floor Distress Inventory (PFDI) and Pelvic Floor Impact Questionnaire (PFIQ) have been roughly estimated, but have not been specifically determined.4
In this analysis, we estimated the MID of the urinary scales of the PFDI and PFIQ, the Urinary Distress Inventory (UDI) and the Urinary Impact Questionnaire (UIQ), in women receiving non-surgical management of stress urinary incontinence (SUI) as part of a multi-center, randomized clinical trial. The results of this analysis should aid in interpreting and planning studies that use the PFDI and PFIQ to evaluate the impact of urinary incontinence on HRQOL.
This is an ancillary analysis of the Ambulatory Treatments for Leakage Associated with Stress (ATLAS) trial, a multi-center randomized trial conducted by the Pelvic Floor Disorders Network (PFDN) comparing behavioral therapy, incontinence pessary, and a combination of the two for treatment of stress urinary incontinence. The PFDN is sponsored by the National Institute of Child Health and Human Development and consists of seven clinical sites and a data coordinating center.
Women were eligible for the ATLAS trial if they were ≥18 years, reported stress urinary incontinence for at least 3 months, did not have advanced pelvic organ prolapse (> stage 2) and desired non-surgical treatment. Additionally, participants had predominant stress incontinence and at least 2 stress incontinence episodes on a 7-day bladder diary. Subjects were randomized to receive one of three interventions: an incontinence pessary, a 12-week behavioral therapy program, or both. The behavioral program consisted of 4 clinic visits at 2-week intervals and included pelvic floor muscle training and bladder control strategies. A detailed description of the methods of the ATLAS trial has been reported.7
At baseline and 3, 6 and 12 months after surgery, subjects completed a 7-day bladder diary8 and several self-administered questionnaires to assess pelvic floor symptoms and HRQOL including the Incontinence Severity Index (ISI)9, the PFDI and PFIQ. A global index of change was also completed at the 3, 6 and 12-month visits. ATLAS patients who completed their baseline and 3-month evaluations are the subject of this study. The three intervention groups were combined for this analysis and treatment assignment remains blinded. Comparisons between the interventions were not performed and will be the subject of future publications.
The 46-item PFDI symptom inventory measures the degree of bother caused by a broad array of pelvic symptoms. The urinary scale, the Urinary Distress Inventory (UDI; range 0–300) has three subscales: stress, irritative, and obstructive/discomfort (range 0–100 for each). The PFIQ is a functional status measure that assesses the degree that a subject’s bowel, bladder and/or pelvic symptoms impacts 31 different activities of daily living, social relationships or emotions. The urinary scale of the PFIQ is the Urinary Impact Questionnaire (UIQ; range 0–400). For the scales and subscales of the PFDI and the PFIQ, a higher score indicates worse symptom bother or poorer HRQOL.
The ISI is a two-item index that characterizes urinary incontinence severity and has demonstrated reliability, validity and responsiveness.9–12 The index is calculated by multiplying reported incontinence frequency (five levels) by the amount of leakage (3 levels). Based on the resultant ISI value (0–12) subjects were further classified into “dry” (0), “slight” (1–2), “moderate” (3–6), “severe” (8–9) and “very severe” (12). The ISI has been validated against a 24-hour pad test and bladder diary with increasing level of severity corresponding to clinically relevant differences in these measures.10, 12 For the global index of change, participants completed the statement “Compared to how your urinary incontinence was before treatment, do you feel you are” using a 7-point scale (very much better, much better, better, about the same, worse, much worse, very much worse).
MID can be estimated using both anchor- and distribution-based approaches and neither is known to be superior.6 Anchor-based methods assess responsiveness in relation to an independent measure (e.g. external event or rating) to quantify the meaning of a particular degree of change in the health construct.6, 13, 14 One way to assess the usefulness of an anchor is to determine if the anchor has at least a moderate correlation with the HRQOL measures (i.e., correlation coefficient, r ≥ .3)6, 13 Using Cohen’s guidelines, r =0.1 is small, and r = 0.3 is moderate.15
Distribution-based methods rely on the distribution of scores within a population and relate clinical significance to a change in magnitude at least equal to a statistical parameter of group data such as variability (e.g., standard deviation) or reliability (e.g., Cronbach’s α). Consistent with current recommendations, we assessed the MID of the UDI, UDI-stress subscale and UIQ using both anchor-based and distribution-based techniques.6, 14, 16
For each approach, we evaluated the change in score from baseline to 3 months in the UDI, UDI-stress subscale and UIQ. Anchors used included the global rating of change, the ISI, and incontinence episodes (IE) on the 7-day bladder diary. Consistent with current recommendations, the MID for the global rating of change was defined as the difference between the mean change in UDI score, UDI-stress and UIQ scales reported by women who indicated they were “better” at 3 months relative to the start of treatment and those who indicated they were “about the same”.6, 16 Using criteria similar to those used to calculate the MID of the Incontinence Quality of Life (I-QOL) 17, we determined the mean change in UDI, UDI-stress subscale and UIQ scores that corresponded to three outcomes assessed by the bladder diary: “worse” ≥ 25% increase in number of IE; “no change” = a change in any direction between 0 and 24%; and “better” ≥ 25% decrease in the number of IE. We defined the MID as the difference in scores between those who were “better” and those who demonstrated “no change.” Total incontinence episodes per week, regardless of type, were used for the MID determination for this anchor for each of the scales. For the UDI-stress scale, we also investigated whether considering only the stress incontinence episodes recorded on the bladder diary impacted this MID estimate. The MID using the ISI was defined as an improvement of one severity level from baseline to the 3 month evaluation (e.g. Severe to moderate, slight to dry, etc.) and determined the mean change in score of the UDI, UDI-stress subscale and UIQ that corresponded to each of these transitions. We defined the MID using the ISI as the anchor as the difference in change in score between those subjects who had a one level improvement on ISI and those whose ISI category did not change from baseline.
Distribution-based measures of MID included effect size and standard error of measurement (SEM).6 The effect size was based on the baseline standard deviation of each outcome; MID was calculated as 0.5 SD (medium effect size) and 0.2 SD (small effect size).15 SEM was calculated as baseline standard deviation multiplied by the square root of (1-Cronbach’s α) for each scale under consideration; 1 SEM was considered an estimate of MID.18 95% confidence intervals (CI) of the various MID estimates were calculated. For anchor-based methods, the CI was based on the standard 95% CI for difference between two population means; for distribution-based methods, the CI was based on the standard 95% CI for the population variance (effect size method)19 or the bootstrap CI (SEM).20
MID estimates and 95% CIs of each anchor- and distribution-based approach was compared and recommendations for the MID of the UDI, UDI-stress subscale and UIQ were made by consensus consistent with the recommendations of Revicki et al.16
Of the 445 subjects who were randomized in the ATLAS trial, about 75% of subjects completed both the baseline and 3-month evaluations and are the subject of this analysis (76% for UDI, 75% for UDI-Stress and 74% for UIQ). Baseline demographics are listed in Table 1. Three months after initiating treatment, significant improvements in UDI, UDI-stress and UIQ scores were noted (Table 2). Similarly, there were significant improvements in IE on the 7-day diary and ISI scores. Mean (SD) changes from baseline to 3 months after treatment in UDI, UDI-stress and UIQ scores were −34 (39), −19 (22), and −34 (44) points, respectively. The correlations between all anchor-based measures and HRQOL measures exceeded the recommended criteria of r ≥ .3.
Table 3 presents the change in UDI, UDI-stress and UIQ by global rating of change response category. The MID (95% CI) based on the global rating was −6.4 (−19.4, 6.5), −4.6 (−12.7, 3.5) and −6.5 (−22.8, 9.8) for the UDI, UDI-stress and UIQ, respectively. MID determined by the other anchor-based methods (ISI and bladder diary) were greater than those for the global rating and similar to the distribution-based findings.
Seventy-one percent of subjects satisfactorily completed the 7-day bladder diary at baseline and at 3 months. Of this group, 86% demonstrated a 25% or greater decrease in number of IE on diary from baseline to 3 months meeting our predefined criteria of “Improved”. (Table 4) The MID (95% CI) based on the bladder diary were −22.4 (−36.5, −8.2), −16.5 (−24.5, −8.3), and −17.0 (−32.9, −1.1) for the UDI, UDI-stress and UIQ, respectively. We explored other potential cut-points in improvement in incontinence episode frequency for differentiating “Improved” from “No change.” Regardless of dichotomous cut-point chosen, MID estimates were not substantially different from those of our predefined cut-point of 25%. For example, if subjects with 50% or greater improvement in incontinence episode frequency are considered “Improved” while those with less than 50% improvement were categorized as “No change,” the MID values are −20.9, −16.8 and −22.8 for the UDI, UDI-stress and UIQ, respectively. Considering only stress incontinence episodes recorded on the bladder diary for the UDI-stress scale, did not significantly change MID estimate for this scale (data not shown).
The relationship between change in the urinary scales of the PFDI and PFIQ and changes in incontinence severity category as measured by the ISI is shown in Table 5. For each increasing level of improvement in ISI severity category, there is a stepwise improvement in UDI, UDI-stress and UIQ. The MID (95% CI) using ISI as the anchor were −11.1 (−19.8, −2.3), −7.5 (−12.7, −2.3), and −16.0 (−26.4, −5.7) for the UDI, UDI-stress and UIQ, respectively.
MID estimates (95% CI) based on the distribution-based criteria for the UDI, UDI-stress subscale and UIQ are as follows: 0.5 SD corresponds to an improvement in score of −20.5 (−18.8, −21.9), −9.8 (−9..1, −10.6), and −28.7 (−26.7, −31.1) points, while 0.2 SD corresponds to score improvement of −8.1 (−8.8, −7.5), −3.9 (−4.2, −3.6), and −11.5 (−12.4, −10.7) points and 1 SEM corresponds to −15.3 (−14.2, −16.4), −13.1 (−12.3, −13.9), and −11.7 (−10.9, −12.6) points, respectively. Figure 1 shows the comparison of MID estimates with 95% CIs from the anchor- and distribution-based approaches.
This study used three anchor-based approaches and two distribution-based approaches to establish the MID for the urinary scales of the PFDI and PFIQ. The anchor-based approaches included one measure of the patient’s perspective (global rating of change) and two clinical measures of incontinence severity (ISI and number of incontinence episodes recorded on the bladder diary). Using anchor-based methods, the range of MID for the UDI was −22.4 to −6.4 points, for the UIQ was −16.5 to −4.6 points, and for the UDI-stress subscale was −17.0 to −6.5 points. These data were supported by two distribution-based estimates, effect size (1/2 SD) and 1 SEM.
When multiple approaches are used to determine the MID of a scale, a range of values rather than a single point estimate are expected, as was seen in this study. Clinically, a more narrow range or even a single point of MID for each urinary scale would be more helpful than the somewhat broad range determined by our calculations. It is recommended that a single MID value or narrow range be selected for a given scale by integrating the results from the multiple approaches in a systematic way, sometimes called triangulation.16, 21 Using this method, MID derived from anchor-based methods that reflect patient-rated and disease-specific variables are given the most weight.16 The global rating of change, the most commonly used anchor for MID determination, represents the best measure of the significance of change from an individual perspective.6.
For the scales considered in this study, the MID estimates from the patient’s perspective are considerably smaller than when clinical criteria are used. When determining the MID, it is recommended that the patient’s perspective be given the most weight, although the clinician’s perspective should also be considered.16 Retrospective self-reports like a global rating of change are subject to recall bias and have a tendency to reflect a subject’s current health state more than a change from baseline.14
Therefore, when examining the MID estimates and 95% CIs from all anchor-based approaches and considering the potential for some recall bias on our global health rating, we propose MID thresholds of −11, −8, and −16 for the UDI, UDI-stress subscale and UIQ respectively. These values represent the MID determined by a clinically relevant anchor, a one level change in the ISI and are higher than the MID estimates obtained from the global rating of change. Thus, values that meet or exceed these thresholds represent the minimal scores that can be considered clinically relevant to both the patient and the clinician. We recognize, based on MID estimates of the global rating of change, that smaller scores may be clinically relevant to some patients, however it is unclear if these smaller scores would be relevant from a physician’s perspective. MID estimates obtained from the bladder diary represent more conservative estimates of clinically meaningful change than the ISI or global rating of change, but do not represent the least or minimum change clinically relevant to patient’s and clinicians in our opinion.
In MID determination, distribution-based methods can support and help interpret estimates from anchor-based methods and be used in situations where anchor-based approaches are unavailable.16 There are increasing data and growing consensus that an effect size of 0.5 (or change of ½ SD) is a conservative estimate that is likely to be clinically significant across different patient-reported questionnaires.16, 22 Sloan et al proposed that, in the absence of other information, ½ SD is a reasonable and scientifically supportable estimate of meaningful effect.22 The MID for the scales of the PFDI and PFIQ were previously estimated using this approach in a cohort of patients undergoing surgery for pelvic organ prolapse with and without SUI 4 Our study improves on this because our participants were not undergoing surgery (a treatment expected to show a large difference and thus overestimate the MID) 6 and all had urinary incontinence and therefore using measures that assess incontinence was more appropriate. Trials of non-surgical SUI therapies, such as ATLAS are particularly suited for MID determination because mild to moderate treatment effects are more common. Not surprisingly MID estimates corresponding to ½ SD were considerably less in our study than those derived from the surgical cohort (−30 versus − 20 points, and −49 versus − 29 points for the UDI and UIQ respectively).
Although the ½ SD approach provides scores that are certainly clinically significant and meaningful, they are not necessarily minimal. Consistent with other studies, MID estimates corresponding to ½ SD in our study represent the upper boundary of the range of MID estimates identified.16 There is also increasing evidence that a change in score equivalent to 1 SEM is a valid alternative for estimating MID in patient-reported health outcome measures.18 In this study, the 1 SEM criteria corresponded to MID estimates within the range of the anchor-based approaches for the UDI, UDI-stress subscale and UIQ.
As the method of scoring and range of possible scores differ between instruments, it is not meaningful to directly compare MID across different measures. However, one can compare the percentage of the total scale score represented by the MID (i.e. dividing the MID by the highest possible scale score).23 The range of anchor-based MID estimates seen in this study represents 2.1 to 7.5% change of the total score for the UDI, 1.6% to 4.3% for the UIQ and 4.6 to 16.5% for the UDI-Stress subscale. These are consistent with results of several other HRQOL questionnaires including the Medical Outcomes Study Short-form 36 (SF-36) (range 3 to 6%)24, the Functional Assessment of Chronic Illness Therapy (FACIT) scales (3.7 to 12.5%)25, the Incontinence Quality of Life (I-QOL) scale (2.5%)26 and Kings Health Questionnaire scales (5%).27
The strengths of this study include the use of multiple approaches to triangulate MID estimates following currently recommended guidelines, the use of validated and widely accepted SUI outcome measures as anchors, the large sample size, and the wide breadth of treatment response allowing an assessment of minimal change. A limitation is that we only determined MID for the urinary scales of the PFDI and PFIQ; MID estimates for the colorectal anal and prolapse scales will require further study using different patient populations. Until such estimates are available, a change of ½ SD or greater seems a reasonable conservative estimate for a clinically important change. An additional limitation of the study is that there were too few subjects who had a decline in urinary function to provide estimates for an MID for deterioration. The MID values proposed from this study represent MID for improvement only. It is also worth noting that MID estimates can vary across populations.16
In conclusion, MID can help researchers and clinicians understand whether HRQOL score differences between treatment groups or if changes within one group over time are clinically meaningful.23 From this study we recommend that a difference of 11, 8, and 16 points can be considered reasonable estimates of MID for the UDI, UDI-stress subscale and UIQ, respectively. Statistically significant improvements that meet or exceed these thresholds should be considered clinically important. However, some patients with changes in scores less than these estimates may perceive clinical important improvements. MID values should be confirmed based on accumulating evidence from multiple studies and, with increasing evidence, we will become more precise in the MID values for the urinary scales of the PFDI and PFIQ.
Supported by grants from the National Institute of Child Health and Human Development and the NIH Office of Research on Women’s Health (U01 HD41249, U10 HD41250, U10 HD41261, U10 HD41267, U10 HD54136, U10 HD54214, U10 HD54215, and U10 HD54241).
Mathew D. Barber, MD, MHS, Principal Investigator
Marie Fidela R. Paraiso, MD, Co-Investigator
Mark D. Walters, MD, Co-Investigator
J. Eric Jelovsek, MD, Co-Investigator
Firouz Daneshgari, Co-Investigator
Linda McElrath, RN, Research Nurse Coordinator
Donel Murphy, RN, MSN, Research Nurse
Cheryl Williams, Research Assistant
Anthony G. Visco, MD, Principal Investigator
Jennifer Wu, MD, Co-Investigator
Alison Weidner, MD, Co-Investigator
Cindy Amundsen, MD, Co-Investigator
Mary J. Loomis, RN, BSN, Research Coordinator
Linda Brubaker, MD, MS, Principal Investigator
Kimberly Kenton, MD, MS, Investigator
MaryPat FitzGerald, MD, MS, Investigator
Elizabeth Mueller, MD, MSME, Investigator
Kathy Marchese, RN, Study Coordinator
Mary Tulke, RN, Study Coordinator
Holly E. Richter, PhD, MD, Principal Investigator
R. Edward Varner, MD, Co-Investigator
Robert L. Holley, MD, Co-Investigator
Thomas L. Wheeler, MD, Co-Investigator
Patricia S. Goode, MD, Co-Investigator
L. Keith Lloyd, MD, Co-Investigator
Alayne D. Markland, DO, Co-Investigator
Velria Willis, RN, BSN, Research Coordinator
Nancy Saxon, BSN, Research Nurse Clinician
LaChele Ward, LPN, Research Specialist
Lisa S. Pair, CRNP
Morton B. Brown, PhD, Co-Investigator
Cathie Spino, PhD, Principal Investigator
John T. Wei, MD, MS, Co-Principal Investigator
Beverly Marchant, RN, BS, Project Manager
Donna DiFranco, BS, Clinical Monitor
John O.L. DeLancey, MD, Co-Investigator
Dee Fenner, MD, Co-Investigator
Nancy K. Janz, PhD, Co-Investigator
Wen Ye, PhD, Statistician
Zhen Chen, MS, Statistician
Yang Wang Casher, MS, Database Programmer
Joseph Schaffer MD – Principal Investigator
Clifford Wai, MD - Co-Investigator
Marlene Corton, MD - Co-Investigator
Gary Lemack, MD - Co-Investigator
Kelly Moore - Research Coordinator
David Rahn, MD
Amanda White, MD
Shanna Atnip, NP
Margaret Hull, NP
Pam Martinez, NP
Deborah Lawson, NP
Ingrid Nygaard, MD, Principal Investigator
Peggy Norton, MD, Co-Investigator
Linda Freeman, RN, Research Coordinator
Susan Meikle, MD
This study was presented as an oral presentation at the 2008 American Urogynecologic Society Annual Meeting in Chicago IL, September 4–6.
This trial is registered at clinicaltrials.gov under Registration # NCT00270998
Condensation: The Minimum Important Differences for urinary scales of the Pelvic Floor Distress Inventory and Pelvic Floor Impact Questionnaire are developed from an incontinence treatment trial.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.