|Home | About | Journals | Submit | Contact Us | Français|
Critically review the validity of symptom-based criteria (Manning, Rome I, Rome II, and Rome III) for irritable bowel syndrome (IBS).
Two kinds of validation are reported: (1) studies testing whether symptom criteria discriminate patients with structural disease at colonoscopy from patients without structural disease; and (2) studies testing whether symptom criteria discriminate patients presumed to have IBS by positive diagnosis from healthy subjects or patients with other functional and structural disorders.
The first study type addresses an important clinical management question but cannot provide meaningful information on sensitivity or positive predictive value because IBS is defined only by exclusion of structural disease. Specificity is modest (about 0.7), but can be improved to 0.9 by addition of red flag signs and symptoms. The second type of study judges validity by whether the symptom criteria consistently perform as predicted by theory. Here factor analysis confirms consistent clusters of symptoms corresponding to IBS; symptom-based criteria agree reasonably well (sensitivity 0.4–0.9) with clinical diagnoses made by experienced clinicians; patients with a clinical diagnosis of IBS who fulfill Rome II criteria have greater symptom severity and poorer quality of life than patients with a clinical diagnosis of IBS who do not fulfill Rome criteria; and (4) somatization does not explain endorsement of the symptom-based criteria for IBS. There are no consistent differences in sensitivity or specificity between Manning, Rome I, and Rome II.
Both study types support the validity of symptom-based IBS criteria. Tests of Rome III are needed.
The irritable bowel syndrome, one of several functional GI disorders1, is a highly prevalent and potentially disabling condition that is diagnosed by the pattern of symptoms that patients present to their physicians. Some critics have questioned whether the irritable bowel syndrome (IBS) and other functional gastrointestinal disorders truly exist since they don’t have defining structural features2. The question of their existence is of little relevance to clinicians and patients who have to manage these symptoms on a day to day basis. However, for the purpose of research we need to have a reliable and standard method to properly select patients to learn about the specific pathophysiological features underlying these symptoms, and to develop optimal treatment for targeted patient groups. For this reason, the Rome Foundation has fostered the use of symptom based criteria1 as has been done in psychiatry3, to aid in the more precise diagnosis of these common disorders for research and clinical care. This paper presents a critical review of the evidence that the existing criteria for IBS and other functional gastrointestinal disorders are valid.
Several reviews and meta-analyses of studies that address the validity of symptom-based diagnostic criteria for the functional gastrointestinal disorders have recently appeared4–7. Nevertheless, controversy persists6, 7 and there are repeated assertions that insufficient research has been done on the validity of these diagnostic criteria6. The goals of this review are to discuss the designs of these studies and the disease models on which they are based, to show how deficiencies in some of these models have led to inconclusive results, and to suggest that the use of alternative disease models initially developed for the validation of diagnostic criteria for psychiatric disorders may be more appropriate for the functional gastrointestinal disorders than the biological models used to date. Published studies based on these disease models will be summarized with a discussion of their implications for (a) the validity of the Rome symptom-based criteria, (b) the importance of considering the context in which the symptom criteria are queried (i.e., whether they are combined with physical examination and additional medical history findings), and (c) recommendations for future studies on the validity of the Rome criteria.
IBS has an ICD98 diagnostic code (564.1) and is regarded by most clinicians and most researchers as a discrete medical disorder for which pathophysiological bases are being identified. There is an acknowledged overlap with other functional gastrointestinal9, 10, somatic disorders9, and psychiatric disorders11, 12, and this observation has led some to speculate that IBS is not a distinct diagnostic entity13. However, most investigators regard this overlap as an indication that IBS shares some etiologic features with other disorders9, 11, 14 rather than as a disconfirmation of the basic assumption that IBS exists as a distinct medical disorder.
The model used in most validation studies is described as “criterion-related validity” or alternatively as “concurrent validity”15. These studies are designed to test how well a diagnostic marker or set of markers can distinguish between two groups of subjects, one of which is known to have the disorder in question (true positives) and the other of which does not (true negatives). An example would be a test of the accuracy of breath hydrogen testing for small bowel bacterial overgrowth in which one assesses the ability of the breath test to distinguish between a group of subjects who are known to have small bowel bacterial overgrowth (SBBO) based on aspiration of fluid from the small bowel versus a second group of subjects known to not have bacterial overgrowth based on the same objective standard16. This is a powerful design which permits the calculation of test statistics such as:
The difficulty with applying this model of concurrent validation to the symptom-based diagnostic criteria for IBS (or most other functional gastrointestinal disorders) is that there is no independently verifiable gold standard diagnosis (i.e., there is no biomarker). In the absence of a gold standard, investigators have adopted either of the following two strategies:
The most commonly employed strategy is to test whether the diagnostic criteria for IBS allow one to correctly identify the subgroup of patients who are likely to have organic disease versus a group who are unlikely to have an organic diagnosis if tested by endoscopy or other imaging modality4, 5, 17. This experimental design addresses a very important concern of the clinician, which is that he/she not miss another diagnosis requiring a different treatment, such as inflammatory bowel disease or cancer, but it has the effect of treating IBS as a diagnosis of exclusion rather than as a discrete diagnostic entity. With this method it is not clear that the calculation of sensitivity, positive likelihood ratio, or positive predictive value have any meaning since these indices will be determined primarily by the inclusion criteria selected for the study, for which there is no standard and which can vary across investigators. On the other hand, this study design may have value for investigating specificity (where specificity is defined as identifying for exclusion patients who have a significant likelihood of endoscopically-diagnosable organic disease), or for calculating the negative likelihood ratio or the negative predictive value of the diagnostic criteria. However, this approach does not take into consideration the dilemma for clinicians that there is no endpoint with which to establish a positive diagnostic entity. Diagnosing by exclusion can lead to cost ineffective physician behaviors that are driven by the intensity of patient symptom reports and the physician’s degree of frustration and uncertainty18.
Other study designs are based on the assumption that IBS is a “real” medical disorder for which (a) pathophysiological features exist, albeit not fully delineated and (b) it is currently possible to use positive symptom criteria to identify which subjects have this disorder. A successful example of this approach is found within psychiatry where symptom based criteria identify diagnostic entities (e.g., post-traumatic stress disorder or anorexia nervosa) that “breed true” across clinical studies and cultures3, 19, 20. Although the pathophysiological determinants are not yet fully delineated, these conditions are readily identified for research and can be specifically identified in clinical practice and successfully treated. The developers of the Rome criteria view IBS and other functional GI disorders from this perspective and have consequently sought to identify symptom criteria which can distinguish IBS patients from healthy individuals as well as from patients with organic diseases or other functional disorders21.
A challenge that confronts investigators who believe that IBS patients have a specific medical disorder that is qualitatively different from health is that the symptoms that are used to identify patients with IBS also occur in healthy individuals; these are symptoms such as abdominal pain or discomfort and the occurrence of hard or loose stools in association with changes in abdominal pain or discomfort. Thus, the challenge is to identify thresholds for the frequency or intensity of occurrence of these symptoms which are clinically meaningful and which can distinguish patients presumed to have IBS from healthy controls. Attempts to address this challenge led the developers of the Rome III diagnostic criteria to measure these symptoms on frequency scales rather than presence/absence scales (which was done for Rome II and previous versions of the symptom-based diagnostic criteria)22. However data are still needed for large groups of age and sex stratified samples to provide an empirical basis for defining what is abnormal.
As noted, IBS is more similar to a psychiatric disorder which is diagnosed from symptom features than it is to a structural disorder such as hepatitis or ulcerative colitis since no discrete morphological features or biomarkers exist. For disorders such as these, it is not possible to identify “true positives”, and efforts to validate diagnostic criteria must rely on indirect strategies collectively referred to as “criterion validity” or sometimes simply as “predictive validity”15. The basic concept is that if a diagnostic indicator or questionnaire performs as you would predict it should based on the concept that IBS is a discrete medical disorder, this provides support that the diagnostic indicator is “valid”. When multiple predictions are confirmed, the evidence for the validity of the diagnostic test is stronger. Consistent with this concept, the approaches taken by the Rome Foundation began with expert clinicians and investigators who classified and categorized diagnostic entities based on their clinical features23 using a consensus-based “Delphi” approach24, 25. Once identified, the conditions were refined and can now be validated by a variety of methods including: (1) epidemiological studies showing agreement in the prevalence of defined symptom groups across populations when using the same criteria26, 27, (2) the use of statistical methods such as factor analysis to identify frequently occurring clusters of symptoms which can be compared to the Rome criteria28, 29, and (3) seeing whether the symptom criteria identify the patients who have been given a clinical diagnosis of IBS by experienced clinicians and discriminate them from patients diagnosed with other GI disorders30 or healthy controls (see below). More novel methods that can be considered to validate symptom criteria include (4) examining whether IBS patients diagnosed by physicians with poorer health related quality of life, more severe symptoms and greater health care utilization are more likely to fulfill symptom based diagnostic criteria than IBS patients diagnosed by physicians who have less severe health status, and (5) the use of discriminant validity where the goal is to show that endorsement of the symptom-based diagnostic criteria is not explained by a general tendency to notice and report multiple somatic symptoms (a behavioral trait referred to as “somatization”, which in its extreme form identifies patients with a psychiatric diagnosis of a somatization disorder)9, 11. Thus there exists a process of validation for symptom based criteria that is in place.
Two meta-analyses of the accuracy of symptom-based criteria for IBS have appeared in the past two years4, 5, and a third appeared 8 years ago17. A critical review of this evidence was included in the report of the American College of Gastroenterology IBS Task Force31. Since the Rome III criteria were published in 2006, these data can only summarize validation studies of symptom-based diagnostic criteria for IBS up to and including Rome II (published in 2000). These meta-analyses used quality criteria to select studies for inclusion and had two independent judges rate them. The meta-analysis by Ford et al4 used restrictive criteria including the requirement that patients be assessed for gastrointestinal symptoms prior to colonoscopy, barium enema, or computed tomographic colography (CT) exams which were used to exclude organic diagnoses; they identified 10 studies eligible for analysis. Because Jellema and colleagues5 wished to include studies carried out in primary care in their met-analysis, they were more inclusive, allowing studies in which an organic diagnosis was inferred from alarm symptoms, and they analyzed 25 studies including all 10 studies analyzed by Ford’s team.
The Ford4 and Jellema5 meta-analyses yielded similar conclusions, namely that symptom-based criteria alone are only moderately helpful at identifying patients in whom there is a low likelihood of finding organic disease: Jellema et al found the median specificity to be 0.69 for the Manning criteria, 0.70 for the Rome I criteria, and 0.66 for the Rome II criteria. Medians for sensitivity were 0.67 for Manning, 0.72 for Rome I, and 0.69 for Rome II. Ford et al reported an overall specificity for the Manning criteria of 0.72 and an overall sensitivity of 0.78. Ford’s team reviewed a single study employing the Rome I criteria32 which had a specificity of 0.85 and sensitivity of 0.71, but studies meeting their inclusion criteria were not available for Rome II. Thus, summary test statistics from these two reviews were very similar to each other and showed no significant differences between Manning, Rome I, and Rome II criteria.
Combining “red flag” or alarm signs that suggest structural disease (e.g., blood in stool, abnormal blood test) along with gastrointestinal symptoms improves accuracy for discriminating patients with a higher likelihood of having structural disease from patients with a low probability of such disease, compared to symptom criteria alone. This concept was first proposed by Kruis over 25 years ago33, and was replicated in multiple studies34–40. In the Vanner study40, for example, when patients with red flags (36% of otherwise eligible patients) were excluded from the analysis, specificity of the Rome I criteria was 100% and sensitivity was 65%. In summarizing the studies that have employed this combined strategy, Jellema et al’s5 meta-analysis showed a median specificity of 0.92 and sensitivity of 0.67, and Ford’s4 meta-analysis showed a median specificity of 0.87 and a sensitivity of 0.84. It should be noted, however, that when this combined strategy is used, most of the variance in risk of organic disease is related to the red flags and abnormal physical findings, with minimal additional predictive power attributable to the addition of the symptom criteria41.
Since alarm signs and symptoms can be easily incorporated into the office-based diagnostic assessment of patients in primary care, this strategy for deciding which patients with lower gastrointestinal symptoms should be referred for endoscopic evaluation and which may be safely managed without these tests is appealing. However, as shown by Whitehead et al39, most red flag symptoms greatly over-estimate the likelihood of organic disease; these investigators found that 84% of patients whom clinicians ultimately diagnosed as IBS reported one or more red flags. Because it may not be feasible to perform screening colonoscopies in all such patients, additional studies are needed which examine the sensitivity of individual red flags, alone and in combination, for predicting organic disease. It should also be remembered that in clinical practice, patients who have evident “red flags” are not usually considered when questioning the diagnosis of IBS; it is the patient who does not have red flag symptoms where the value of positive symptom criteria becomes paramount in making the diagnosis as noted below.
Some of the most compelling evidence for the validity of the symptom-based criteria for IBS comes from factor analysis studies. In 1990 Whitehead et al29 first reported that when community samples of adults are given gastrointestinal symptom checklists to complete, their symptoms tend to aggregate in clusters, i.e., there are some symptoms that are more strongly correlated with each other than with any other gastrointestinal symptoms. In this study and in multiple other studies by our group28, 42, 43 and by others44, 45, the three symptoms which define the symptom criteria for IBS have been found to cluster together. This cluster of symptoms is the same in males and females43, African Americans as well as Caucasian Americans43, subjects from different countries with distinct languages and cultures28, 45, and patients from GI clinics as well as population samples28. Factor analysis is also generally consistent with the symptom-based criteria for other functional gastrointestinal disorders28, 46, 47, but the most robust evidence exists for IBS.
Four studies have reported the agreement between symptom-based diagnostic criteria and physician diagnosis (Table 1). In our view, clinical diagnosis cannot be viewed as a reliable gold standard because clinicians differ based on their training and experience in how they define IBS, and it is known that academic gastroenterologists who see a small fraction of IBS patients are very familiar with the Manning and Rome criteria while primary care physicians who manage an estimated 80% of IBS patients are mostly unaware of symptom-based diagnostic criteria for IBS48, 49. Thus, one would not anticipate the agreement between symptom-based criteria and clinical diagnosis to be greater than about 0.70 because the agreement between pairs of clinicians is unlikely to be greater than this. Nevertheless, as one of several measures of criterion validity, the agreement between clinical diagnosis and the symptom-based criteria is a reasonable choice.
The sensitivity of the Rome II criteria compared to clinical diagnosis (Table 1) ranged from 0.47 to 0.73, and specificity ranged from 0.47 to 0.63. Two studies49, 50 compared Rome I to Rome II criteria and showed significantly greater sensitivity for Rome I compared to Rome II. Specificity was not evaluated in these two studies. In 3 studies39, 48, 50, patients were diagnosed primarily or exclusively in primary care while the fourth study49 came from a tertiary gastroenterology practice. There is some suggestion that sensitivity of the Rome II criteria was greater in tertiary care. Three other studies38, 51, 52 which employed symptom-based diagnostic criteria in clinical settings or epidemiological surveys were reviewed, but they did not provide estimates of the agreement between symptom-based diagnosis and clinical diagnosis.
If the Rome II criteria identify patients with a true IBS disorder, then patients with a clinical diagnosis of IBS who fulfill Rome II criteria may have poorer health status compared to patients with a clinical diagnosis of IBS who do not fulfill Rome II criteria for IBS. In Table 2 we show previously unpublished analyses of the study reported by Whitehead et al39. Only patients who received a clinical diagnosis of IBS from their health care provider at Group Health Cooperative are included in these analyses. As predicted, patients fulfilling Rome II criteria for IBS have more severe symptoms of IBS, and they also have poor IBS-specific quality of life. The hypothesis that patients who fulfill Rome II criteria would make more health care visits for gastrointestinal complaints was not confirmed; however, the failure to see a difference in number of visits may relate to the easy access to care and absence of additional costs incurred in an HMO setting compared to a fee for service health care model where more severe symptoms may drive greater health care visits.
IBS patients are known to score higher than healthy controls on questionnaires measuring the trait of somatization and they report multiple comorbid psychiatric and somatic symptoms11, 53; it has been suggested that the cognitive and psychological mechanisms that lead some individuals to report excess numbers of somatic symptoms in the absence of disease might account for their endorsing the symptom criteria for IBS13. If this is the case, one would predict that statistically adjusting for the trait of somatization might eliminate or significantly attenuate the association between the Rome II symptom criteria and a clinical diagnosis of IBS. We used logistic regression to test this prediction in the data collected from the HMO population by Whitehead et al39. When only the Rome II criteria were entered into the regression predicting clinical diagnosis of IBS (as distinct from a diagnosis of abdominal pain, constipation, or diarrhea), the Rome criteria were found to be a highly significant predictor of clinical diagnosis (Exp(B)=1.970, p<0.001). When scores on the somatization scale of the Brief Symptom Inventory were entered in the second step, this did not add significantly to the prediction (Exp(B)=1.005, p=0.307) and Rome II symptom criteria remained a strong predictor (Exp(B)=1.970, p<0.001). This suggests that endorsement of the Rome II symptom criteria does not explain the agreement between the symptom-based criteria and clinical diagnosis of IBS. This conclusion is supported by other work from our laboratory showing that, although IBS patients frequently report excess numbers of comorbid somatic disorders, this only characterizes a minority of patients with a clinical diagnosis of IBS; most IBS patients do not report excess numbers of comorbid disorders9. Further evidence for the discriminant validity of the Rome symptom criteria for IBS comes from a factor analysis study by Robins and Kirmayer who found that IBS is associated with a cluster of symptoms that are independent of fibromyalgia, chronic fatigue syndrome, anxiety, and depression54. These data support the discriminant validity for the Rome II criteria relative to somatization. More recent studies now suggest that somatization and other psychological measures appear more related to the severity of IBS based on CNS amplification of incoming visceral and somatic signals rather than being specific for the disorder itself14.
Only one22 published study has addressed the validity of the Rome III criteria: as reviewed, most studies have investigated the Manning criteria with a smaller number investigating the Rome I and Rome II criteria. Can one generalize from these studies and draw conclusions about the validity of the Rome III criteria? It needs to be understood that the symptom-based diagnostic criteria have evolved from each other: the same cluster of three pain-related symptoms form the core of the Rome I, II, and III criteria, and they are also present in the Manning criteria. Thus all these diagnostic criteria should perform similarly. However, it is difficult to compare across studies because prevalence estimates differ depending on which criteria are used55–57 and the sensitivity and specificity of the criteria differ when compared to clinical diagnosis49, 50 and to a lesser extent when compared to the results of endoscopic examinations4, 5. Thus when the Rome III criteria were compared to the Rome II criteria, significantly different prevalence estimates were seen55, 56, and there were differences in which patients were identified as IBS55. However, one study comparing Rome II and Rome III criteria for IBS and it’s subtypes (IBS-C, D, M) within the same patient cohort showed excellent agreement (86.5%, kappa 0.79), and the behavior of these patients were similar in terms of subtype prevalence and stability over a one year period58. Thus it will be necessary in the future to perform validation studies specifically on Rome III criteria to fully address this question. However, the similarity of the question items and the evidence from at least one study comparing criteria from the same clinical cohort do provide some evidence for the extrapolation of the evidence for validity of Rome I and Rome II into Rome III.
The focus of most studies purporting to validate symptom-based diagnostic criteria for functional gastrointestinal disorders has been to test whether symptomatic patients at high risk for inflammatory bowel disease, gastrointestinal cancer, or malabsorption can be discriminated from those at low risk. This study design defines IBS as the absence of structural diseases. We believe this renders meaningless any estimates of the sensitivity of the symptom-based criteria for a positive diagnosis of IBS because the inclusion criteria largely determine who is defined as having IBS. Studies using this model demonstrate specificities of about 0.7. In the primary care setting where most IBS patients are managed and where the base rate of organic disease is known to be low, it is considered appropriate, in the absence of alarm symptoms or age over 50, to make a firm diagnosis based on symptom-based criteria without further physiological testing or colonoscopy. In secondary and tertiary care settings where the base rate of organic disease is much higher, often approaching 50%4, these test statistics provide less reassurance and diagnostic testing is often performed.
The specificity of symptom-based criteria can be improved by combining them with laboratory tests, especially screening tests for inflammation and blood in stools35–37, 59–62. In these contexts, however, the laboratory tests are far more robust than the symptom criteria as predictors of who is at risk for organic disease. Incorporating patient-reported alarm symptoms into the diagnostic algorithm also yields small gains in specificity38, 40, but excluding all patients with alarm symptoms from a diagnosis of IBS is impractical because 84% of patients ultimately diagnosed IBS have one or more alarm symptoms39.
Validating positive symptom criteria for IBS on the assumption that it is a distinct medical disorder is difficult because there is no objective marker of IBS – IBS is a theoretical construct similar to psychiatric disorders, and the “criterion validity” methods used to validate diagnostic methods in psychiatry are also appropriate for functional gastrointestinal disorders. This method relies on a converging set of studies testing the performance of the symptom-based diagnostic criteria against theoretical predictions. The symptom-based criteria for IBS are supported by (a) moderately good agreement with clinical diagnosis; (b) evidence for clustering of symptoms consistent with Rome criteria based on factor analysis, (c) evidence that among patients with a medical diagnosis of IBS, those meeting Rome II criteria have worse health outcomes than those who do not; and (d) evidence that endorsement of the Rome III criteria is not explained by the psychological trait, somatization.
Future research on the validity of the Rome III criteria should pursue both of these validation models in parallel: On the one hand, we should continue to seek ways of identifying patients at low risk of organic disease by combining symptom-based criteria with inexpensive laboratory tests available to the primary care physician as well as medical history items such as age, family history, symptomatic exacerbation following meals, and duration of illness. On the other hand, we should also pursue criterion validation of positive symptom criteria for IBS against new targets, which may include a variety of biomarkers such as visceral hyperalgesia, mucosal inflammatory markers, genetic factors, alterations in microflora and a variety of other possibilities yet to be determined. In addition, we need more data on the frequency of occurrence of symptoms such as abdominal pain that are used to diagnose functional gastrointestinal disorders; these symptoms also occur in healthy individuals, and empirical data for defining clinically meaningful deviations from the healthy range are needed to better inform symptom-based diagnostic criteria.
Validation of symptom based criteria is a process; it is not etched in stone and will change as new data on its underlying pathophysiology emerges. Furthermore, the Rome Foundation is committed to supporting ongoing validation studies of symptom based criteria in order to ultimately help our patients with these disorders.
Supported by grants R01 DK031369 and R24 DK067674