This pooled analysis was motivated by questions regarding the validity of traditional IBS endpoints, with particular focus on binary endpoints.5
The Rome Foundation Outcomes and Endpoints Committee combined data from over 9000 patients from 12 randomized controlled drug trials involving 5 separate investigational treatments with different mechanisms of action. Our goal was to leverage the power of this harmonized database to explicitly test key psychometric properties of binary endpoints, and to compare the performance of binary endpoint with the “50% improvement” criterion suggested as an alternative metric.5
Regarding the impact of baseline severity on endpoint performance, we found that the relationship between severity and binary responder status was not statistically significant in the adjusted analysis, and the relationship did not meet criteria for clinical significance, regardless of the resulting p-values. In contrast, we found that the “50% improvement” criterion for pain severity was significantly associated with baseline severity in the adjusted analysis, particularly for patients receiving investigational treatment. However, the clinical relevance of the relationship with baseline severity was minimal across all treatment groups. The observation that there is not
an important relationship between baseline pain severity and response status, either defined with a binary endpoint or “50% improvement” criterion, is consistent with previous studies6–8
and contrary to the results of Whitehead et al.5
It is possible that the analyses reported in a community sample by Whitehead et al. included subjects from a health maintenance organization with a broader range of IBS symptom severity than the subjects included in the clinical trials summarized in this pooled analysis.
We further tested the construct validity of both endpoints against a range of IBS illness domains. In short, we found that both the binary response and 50% improvement endpoints reveal excellent construct validity across a wide range of variables (). Both endpoints are able to detect MCIDs in key bowel symptoms, including bloating, abdominal pain, consistency, urgency, and hard stool. They are also able to detect MCIDs in worker productivity, visceral hypersensitivity scores, and fatigue scores. Whereas both endpoints are able to detect MCIDs for overall HRQOL, they are less capable of detecting MCIDs for the individual HRQOL components. Thus, both endpoints track with key components of IBS illness severity, neither is clearly superior over the other, and both work as expected.
We found that both the binary response and 50% improvement endpoints performed similarly in discriminating between MCID responders and non-responders for bowel symptoms in IBS sub-groups. Of note, both endpoints appear to provide better discrimination in the IBS-D than IBS-C subgroups (). This has potential implications for studies that seek to establish differences in response rates between treatment and placebo groups. The data suggest that both endpoints may be better suited for the IBS-D population. In IBS-C patients, the percentage achieving an MCID is numerically smaller, suggesting that more sensitive endpoints might be necessary for the IBS-C groups. A corollary is that drugs failing to show large effect sizes in this population might have been hampered by the psychometric properties of the binary endpoints. Further research should aim to test current and future endpoints in both IBS-C and IBS-D subgroups, and to establish whether the psychometric properties are similar or different in these phenotypically distinct populations. It is possible that a “one size fits all” approach to endpoints may not apply in IBS: different sub-groups may be better captured with tailored endpoints. This finding raises the question of whether clinical trialists should employ different endpoints for IBS-C vs. IBS-D. This would represent a notable change in our approach to endpoint measurement in IBS. Our study is unable to determine why endpoints may behave differently by sub-group; instead it merely raises the question. Future research should aim to understand why this might be. In the meantime, our finding suggests that further research should carefully evaluate endpoint performance in both groups separately.
These data add to previous conclusions that global binary endpoints are useful in IBS,2
based on the collective clinical trial experience in almost 20,000 IBS patients with at least five different medications (alosetron, cilansetron, tegaserod, lubiprostone, dextofisopam) tested with binary endpoints. Binary endpoints have been devalued given the relative lack of psychometric validation until now. Yet even before this pooled analysis, previous investigators demonstrated that binary endpoints were acceptable to patients, and that binary responses were driven by the patients’ most bothersome symptom.18, 19
Based on a systematic review of 12 pre-specified criteria, Bijkerk et al. concluded that the weight of evidence was in favor of using “adequate relief” – a binary endpoint – among the different available endpoints used in IBS trials.3
Drugs that are effective, based on the binary response endpoints, were also found to improve general or disease-specific quality of life.20
Based on these collective data, the Rome III guidance on IBS clinical trials endorsed using a global measure that integrates the symptom data into a single numerical index, measured either as a binary endpoint or a continuous integrative symptom questionnaire such as the IBS-SSS.10
We have now expanded and confirmed these collective results and conclusions by demonstrating excellent construct validity of the binary endpoints with a wide range of patient symptoms, psychosocial illness experiences, visceral sensitivity reporting, HRQOL, and even work productivity. Moreover, we have found that the performance of a binary endpoint is psychometrically equivalent to monitoring pain severity on a continuous scale, and adopting the “50% improvement” criterion recommended by Whitehead et al.5
Based on our data, coupled with extensive pre-existing data supporting the validity of the binary endpoints, it is reasonable to conclude that use of binary endpoints in IBS clinical trials is rational and valid. No endpoint can be fully validated; establishing the validity of a PRO is an ongoing and iterative effort. But our results add to this effort and further confirm that binary endpoints get the job done – they work as expected. This is an important conclusion because it supports the validity of existing studies, highlights the efficacy of therapies originally tested in trials employing binary endpoints, and indicates that future studies could also use these endpoints without undue concern.
Our study has several strengths. First, the sample size of this analysis is large, and the use of pooled patient-level data is a more powerful method of synthesizing multiple studies than conventional meta-analysis. This provides considerable power to investigate the psychometric properties of IBS endpoints. Second, because we are cognizant that large sample sizes can yield statistically significant relationships that are not clinically relevant, we overlaid a priori criteria for clinical relevance, and reported results that were both statistically significant and clinically relevant. Third, we conducted sub-analyses across key groups, including IBS sub-groups (i.e. IBS-C vs. IBS-D) and treatment groups (active vs. placebo). This allows us to generalize our results across different populations. Finally, we measured a range of key psychometric properties using multiple clinical anchors. This allows us to triangulate the validity of the endpoints from several perspectives.
Our study has limitations. First, as with any meta-analysis, we were faced with combining disparate data from different studies, each with unique inclusion and exclusion criteria, disease characteristics, and endpoint evaluations. However, we have been careful to acknowledge these variations, as described in our methods section, and have attempted to balance the power of harmonizing large datasets with the inevitable methodological shortcomings of combining disparate data. Second, it is possible that patients in randomized controlled trials are systematically different from other populations of IBS patients. However, this is precisely the population in question, since the current main use of PRO measures is for clinical trials to test the effect of pharmacologic interventions in IBS. As PROs continue to penetrate into everyday clinical practice, further validation studies will be necessary in non-clinical trial populations. Third, our measure of “IBS severity” was limited to “pain severity.” We were unable to employ multi-attribute severity scales like the IBS-SSS, because there were inadequate data for this purpose. However, pain is a cardinal symptom of IBS,11, 12
and it drives overall illness severity more, on average, than any other symptom. In short, there is sufficient rationale and precedent to use pain severity as a surrogate for overall IBS illness severity, as we have done here.
In conclusion, this large patient-level meta-analysis reveals that both the binary and 50% improvement endpoints are equivalent in their psychometric properties. Neither is impacted by baseline severity, and both demonstrate excellent construct validity. They appear optimized for the IBS-D population, but are also valid in IBS-C.