|Home | About | Journals | Submit | Contact Us | Français|
Barrett’s esophagus (BE) is the only known precursor to esophageal adenocarcinoma. As definitive diagnosis requires costly endoscopic investigation, we sought to develop a risk prediction model to aid in deciding which patients with gastroesophageal reflux (GER) symptoms to refer for endoscopic screening for BE.
The study included data from patients with incident nondysplastic BE (n=285) and endoscopy control patients with esophageal inflammatory changes without BE (“inflammation controls”, n=313). We used two phases of stepwise backwards logistic regression to identify the important predictors for BE in men and women separately: firstly including all significant covariates from univariate analyses; then fitting non-significant covariates from univariate analyses to identify those effects detectable only after adjusting for other factors. The final model pooled these predictors and was externally validated for discrimination and calibration using data from a BE study conducted in western Washington State, USA.
The final risk model included terms for age, sex, smoking status, body mass index, highest level of education, and frequency of use of acid suppressant medications (area under the ROC curve, 0.70, 95%CI 0.66–0.74). The model had moderate discrimination in the external dataset (area under the ROC curve, 0.61, 95%CI 0.56–0.66). The model was well calibrated (Hosmer-Lemeshow test, p=0.75), with predicted probability and observed risk highly correlated.
The prediction model performed reasonably well and has the potential to be an effective and useful clinical tool in selecting patients with GER symptoms to refer for endoscopic screening for BE.
Esophageal cancer is the sixth leading cause of cancer-related mortality worldwide; in 2008, an estimated 482,000 new cases and 407,000 cancer-associated deaths were predicted (1). In the UK, the US and most other Western countries, esophageal adenocarcinoma (EAC) is the predominant histological subtype of esophageal cancer (2). From 1975 to 2001, the rate of increase in incidence of EAC was among the highest for any cancer in Western countries, with incidence increasing by a factor of 6 in the US (3). In 2007, the UK reported the world’s highest rate of EAC incidence (approximately 5 per 100,000) (4). Survival for EAC is poor as most EAC patients present with metastatic disease and less than 20% survive for 5 years (5).
Almost all cases of EAC are thought to arise from underlying Barrett’s esophagus (BE), progressing through low to high grade dysplasia (6); however the risk of progressing from BE to EAC is unclear. Initial studies suggested that the absolute rate of progression was between 5 to 10 per 1000 person-years (7), however a recent study in Denmark reported a much lower estimate of 1.2 per 1000 person-years (8). It is widely accepted that gastroesophageal reflux (GER) is the principal underlying cause of BE (9), however a number of large population-based studies have identified other factors associated with BE, including central obesity (10, 11), and smoking (12).
The identification of patients with BE holds the greatest potential, at least in theory, for reducing mortality from EAC. The challenge remains however, to determine which patients presenting with GER symptoms are more likely to have underlying, undiagnosed BE, based on their history and symptoms. Current guidelines from both the British Society of Gastroenterology (13) and the American Gastroenterological Association (14) do not recommend endoscopic BE screening of the general GER population. The American Gastroenterological Association do recommend endoscopic BE screening for those with multiple risk factors for EAC (including age 50 years or older, male sex, white race, chronic GER, hiatal hernia, and elevated body mass index). However, these guidelines have been developed based on expert opinion and weak evidence, and have not been validated (14, 15).
There is growing interest in developing risk prediction models to assist practitioners in identifying patients who may benefit most from investigations and interventions. Models have been developed and used for a variety of conditions ranging from cardiovascular diseases (16), to cancers of the breast (17–20), lung (21–23), prostate (24), and bladder (25), and to melanoma (26). These models have been shown to be clinically useful and reliable at the population level, providing a cost-effective approach to disease prevention and treatment. While models have been developed to predict BE in symptomatic patients, they have focused on only a restricted list of factors (e.g., age, sex and upper gastrointestinal symptoms) and did not consider phenotypic and environmental factors which may explain a high proportion of BE cases not attributable to GER (27, 28). If a comprehensive risk model could be developed, it could have the potential to influence clinical decision-making in the care of patients with GER.
We aimed to develop and validate a risk prediction model for quantifying the probability of BE in patients with frequent GER symptoms and to demonstrate the potential utility of this model in the clinical setting for selection of patients for endoscopic screening for the presence of BE.
To derive the prediction model, we used data from participants in the Study of Digestive Health, a population-based study of BE and reflux-related conditions conducted in Brisbane, Australia. The study population has been described in full previously (12). We then validated the model in a separate case-control study of BE conducted in western Washington State, USA (29).
In brief, eligible cases for the Study of Digestive Health were residents of metropolitan Brisbane aged 18–79 years with a new (incident) histologically confirmed diagnosis of BE (for nondysplastic BE cases) or dysplasia (for dysplastic BE cases) between 1 February 2003 and 30 June 2006. BE was defined as the presence of specialized intestinal metaplasia (that is, columnar epithelium with goblet cells) in an esophageal biopsy taken from the tubular esophagus by upper gastrointestinal endoscopy, irrespective of the length of involvement. BE patients were prospectively identified from the private and public pathology laboratories servicing metropolitan Brisbane. Of 1714 patients with presumptive BE, we gained permission to contact 1096 (64% response rate). Of these, 614 were ineligible (487 were ‘prevalent cases’ with a previous BE diagnosis, 86 had only intestinal metaplasia of the gastroesophageal junction, 30 invalid address, 6 too old, 5 other) and 89 were excluded from the study (3 too ill, 5 unable to complete an English language questionnaire, 5 unable to be contacted, 76 failed to return a completed questionnaire). Thus, 393 BE cases (285 nondysplastic, 108 dysplastic) completed the study.
Two separate control groups were recruited for the Study of Digestive Health: ‘inflammation controls’ (i.e., patients who underwent endoscopy but for whom the histology report identified only esophageal inflammatory changes consistent with GER and no other pathological or macroscopic changes including no evidence of BE), and population controls. For the purposes of developing risk prediction models to discriminate patients with BE from GERD patients without BE, the present analyses compared the BE cases to the inflammation controls only. In total, 706 of 1354 patients approached as inflammation controls gave permission to be contacted (52% response rate). Of these, 57 refused to participate, 317 were ineligible (304 previous diagnosis of inflammation of the esophagus, 11 invalid address, 1 too old, 1 other), 19 were excluded (6 uncontactable, 1 psychological problems, 1 too ill, 3 unable to complete an English language questionnaire, 8 other), and 313 completed the study.
Both cases and inflammation controls were ineligible for the study if they had a previous diagnosis of BE or cancer. Approval to undertake the Study of Digestive was obtained from the human research ethics committees of the Queensland Institute of Medical Research and all participating hospitals and written informed consent was obtained from all participants.
Candidate predictor variables were selected a priori from the literature and practitioner input, and included: age (years); highest level of education (school only, technical college/diploma, university); body mass index (BMI) 1 year prior to diagnosis (<25, 25–29.9, ≥30 kg/m2); smoking status (never smoker, ex-smoker, current smoker); cumulative smoking exposure (never smoker, 0–29.9, ≥30 pack-years); smoking duration (never smoker, <15, 15–24, 25–34, ≥35 years); average lifetime alcohol consumption (non-drinker, <1, 1–6, 7–20, ≥21 drinks/week); frequency of use of acid suppressant medications (including proton pump inhibitors and H2-receptor antagonists) in the past 5 years (never, ever); frequency of use of aspirin and other non-steroidal anti-inflammatory drugs (NSAIDs) in the past 5 years (never, less than weekly, at least weekly); physical activity levels (low, medium, high); average fruit (<2, ≥2 serves/day) and vegetable (<3, ≥3 serves/day) consumption; and number of co-morbidities (categories defined by Charlson et al. (30)). A standardized health and lifestyle questionnaire was used to collect detailed information on these variables for each participant. Most items in the questionnaire showed excellent repeatability after four months (31). Furthermore, we conducted a follow-up interview with the BE cases up to seven years after diagnosis and found similar self-reports of key characteristics (κ ranged from 0.65 to 0.80), suggesting very high reproducibility for these measures. We imputed data for the small proportion of participants with missing values. We compared the model with imputed data with a complete case analysis and found similar model coefficients, but more precise estimates with imputed data.
The prediction model was externally validated using data from a community-based case-control study of BE conducted in western Washington State, USA (29). BE cases were defined as residents aged 20–80 years newly diagnosed with BE (i.e., specialized intestinal metaplasia in an esophageal biopsy). Of the 208 patients diagnosed with BE, 193 (92.8%) were successfully interviewed. We subsequently excluded 18 cases who were simultaneously diagnosed with esophageal adenocarcinoma (n=2) and/or dysplasia (n=16) from the validation analysis. GERD controls were a random sample of patients (~50%) who underwent endoscopy for reflux symptoms, but who were biopsy-proven negative for BE. Of the 463 patients selected to be GERD controls, 418 (90.8%) were successfully interviewed and were included in the validation analysis.
We used basic descriptive statistics to characterize the study populations. For comparisons between BE cases and inflammation controls, we used the χ2 test for categorical variables and the Student’s t-test for continuous variables. Statistical computations were performed using SAS software (version 9.2; SAS Institute, Cary, NC), and all tests for statistical significance were two-sided at α = 0.05.
We developed separate risk models to predict nondysplastic BE and dysplastic BE in patients with GER symptoms. However, we were unable to externally validate the results for dysplastic BE and report only the nondysplastic BE model here (see supplementary material for the dysplastic BE risk model). As there is a sex difference in the incidence of BE and as some risk factors appear to have different associations with BE in men and women, we identified the important sex-specific predictors for BE and then pooled these in an overall risk model which included a term for sex. The predictors were identified using two phases of stepwise backwards logistic regression. In the first phase, we included in the multivariate model those variables that were statistically significantly associated with BE at the 5% level in univariate analyses and we performed a backwards stepwise regression procedure, whereby those factors losing their significance in the multivariate analysis were dropped. In the second phase, those factors not significant in the univariate analyses were subsequently fitted to the multivariate model to identify those effects detectable only after adjusting for the major risk factors. There was no evidence of multicollinearity between the final list of predictor variables (all variance inflation factors 10) and no interaction terms were statistically significant (all p-values for the type III analysis of effects for interaction terms were >0.10).
The accuracy of the model was assessed using tests for discrimination and calibration (32). We evaluated predictive discrimination using the area under the received operator characteristic curve (AUC; also known as the c-statistic) and its 95% confidence interval (95% CI). The AUC can be interpreted as the probability that the model will assign a higher probability of actually having BE to a randomly chosen patient with BE than to a randomly chosen patient without BE, or simply the ability of the model to separate cases and inflammation controls. An AUC of 0.5 indicates that the model has a predictive discrimination no better than chance, whereas an AUC of 1.0 indicates a perfectly discriminating model. The second measure calculated was calibration, which compares the predicted probability with the observed risk. When the average predicted risk within decile categories matches the proportion observed, the model is well calibrated. We evaluated calibration using the Hosmer-Lemeshow goodness-of-fit statistic (33), where a high p-value indicates excellent calibration. Calibration curves were also plotted to illustrate the model’s fit across the range of predicted risk for BE compared with the observed outcome.
Epidemiologic data were available for the derivation analysis from 285 nondysplastic BE patients (cases) and 313 inflammation controls (Table 1). BE cases were more likely to be men (64% vs. 47%, p<0.001) and were, on average, 5 years older than inflammation controls. The majority (96%–97%) of cases and inflammation controls reported being Caucasian (p=0.50).
All the potential predictive covariates with their univariate analyses are presented in Table 2. In the univariate analyses among men (181 cases, 147 inflammation controls), highest level of education, BMI, tobacco smoking, and frequency of use of acid suppressant medications were all statistically significantly associated with BE risk. In women (104 cases, 166 inflammation controls), only highest level of education and frequency of use of acid suppressant medications were significantly associated with BE risk in the univariate analyses. Alcohol consumption, frequency of use of NSAIDs, physical activity, fruit and vegetable consumption, and the presence of co-morbidities were not statistically significantly associated with BE for men or women in the univariate analyses.
The variables retained in the final multivariate risk model included age, sex, smoking status, BMI, highest level of education and frequency of use of acid suppressant medications (Table 3). There were no statistically significant interactions between sex and the other variables in the final model.
The risk prediction model for nondysplastic BE had good discrimination, with an AUC of 0.70 (95%CI 0.66–0.74) in the development dataset. The discriminatory performance of the model in the validation dataset was more moderate with an AUC of 0.61 (95%CI 0.56–0.66). Performance of the model was statistically good by the goodness-of-fit test (Hosmer-Lemeshow test, p=0.75) and the calibration curve (Figure 1) shows good agreement between predicted probabilities and actual BE risk across the observed range of risk.
To assess the potential effects of using the prediction model to guide referral for endoscopic screening for BE, we calculated the proportion of patients that would be referred for endoscopy at different probability thresholds for BE (Table 4). The first row gives the scenario of referring every patient with GER symptoms for endoscopy and therefore identifying all patients with GER symptoms who have BE (that is, sensitivity is 100%). If patients are referred for endoscopy only if their predicted probability is, for example, 50% or more, the proportion of patients referred for endoscopy will be reduced to 46%. At that threshold however, about 41% of BE cases (sensitivity, 59%) will not be referred for endoscopy. As the threshold increases, the number of referrals is reduced; as a consequence, however, the number of patients with BE who will not be referred for endoscopy increases.
In this study, we developed and externally validated a clinical risk prediction model for BE based on existing data from a large, population-based study. We used a rigorous statistical approach to determine the most important panel of risk factors to predict the presence of BE in patients with GER symptoms. As recent epidemiological studies have shown that some risk factors, notably obesity, appear to have different associations with BE in men and women, we identified sex-specific predictors and then pooled these in an overall model. The final risk model included terms for age, sex, smoking status, BMI, highest level of education and frequency of use of acid suppressant medications. External validation of the model showed that it performed moderately well in discriminating between patients with nondysplastic BE complicating their GER (cases) and those with no BE (inflammation controls), and the predicted risk correlated well with the observed risk.
As BE is assumed to be an intermediate step in the development of EAC, screening to identify people with BE may be an effective strategy to prevent progression to cancer, at least in theory. Given the high prevalence of GER symptoms in the population and low prevalence of BE in these patients, endoscopic screening for BE in all patients with GER symptoms is not recommended in current international guidelines (13, 14, 34). The American Gastroenterological Association guideline recommends screening in selective populations with multiple risk factors for EAC (age 50 years or older, male sex, white race, chronic GER, hiatal hernia, and elevated BMI) (14). The evidence base underpinning these guidelines is not strong however, and adherence to the guidelines is likely to be incomplete. Data for Australia are limited, however a New Zealand study showed that approximately 50% of indications for endoscopy in 2003 were for heartburn or dyspepsia (i.e., to exclude BE/EAC), significantly higher than in 1997 (35). In healthcare systems throughout the world, there is an increasing need for evidence based strategies including the need to establish an effective means of risk stratification for endoscopic BE screening among patients with GER symptoms.
Risk prediction can be used in clinical settings to stratify individuals into homogeneous risk groups. Risk prediction models are used in public health to quantify the probability of disease based on a combination of risk factors (36). So far, risk prediction approaches have been used extensively for cardiovascular diseases (16), and more recently, there has been a focus on deriving cancer risk prediction models (17–26). These models can complement clinical assessment, but they also ensure that the decision making process is more uniform across different centers by moving away from using any individual clinician’s personal experience (37).
Previous efforts to develop a risk model for BE focused on identifying BE among GER patients using gastrointestinal symptoms (27, 28). While these models performed well (AUCs > 0.70), they have not been externally validated. Our study utilized similar methods to derive a risk prediction model for BE using phenotypic and environmental risk factors, and tested its performance in an external population. The variables included in the model are all important and not necessarily causal correlates of BE and are supported by published findings from our own and other case-control and cohort studies of BE (10, 12, 38–41). Furthermore, to encourage generalizability, we emphasized the use of information on risk factors that can be obtained by practitioners in the office setting during routine healthcare. Importantly, the discriminatory accuracy for our model (AUC=0.61 in the external validation dataset) compares favorably with cancer risk prediction models, such as the Gail (42) model for Breast cancer risk (0.58) (43), and the LLP (22), Spitz (23), and Bach (21) lung cancer risk models (0.69, 0.69, and 0.62, respectively) (44).
Applying this model to all patients with GER symptoms currently being considered for endoscopy, and using a threshold for making a decision has the potential to reduce the number of unnecessary endoscopies performed to exclude BE. This model makes explicit the proportion of BE cases who would be missed because their predicted risk lies below the threshold. Due to the case-control study design, we were unable to determine precisely the positive and negative predictive values for the model. However, if we assume 5% prevalence of BE among persons with GER symptoms, our best estimate is that 5–17% of persons referred for endoscopy will have BE (depending upon the cut point chosen for referral) and 95–100% of persons not referred will not have BE. Assuming 15% prevalence, 15–41% of persons referred for endoscopy will have BE and 85–100% of people not referred will not have BE. In general, determining an acceptable threshold involves a trade-off between sensitivity and specificity. In screening for a lethal cancer for example, high sensitivity is desirable, whereas for diseases with lower severity, a lower sensitivity can be tolerated. For our BE model, as the absolute risk of progression to cancer in BE patients is low (8), a threshold whereby fewer investigations are performed at the risk of missing more BE cases (i.e., increased specificity and decreased sensitivity) may be desirable.
Our study had strengths and limitations. The large samples of patients newly diagnosed with BE in the two settings were recruited prospectively, and comparable, consistent and standardized criteria were used for histologic and endoscopic definitions of BE. Both the derivation sample and the validation sample only included patients who were selected for endoscopy; the large (but unknown) proportion of patients with GER symptoms not referred for screening had already been triaged away from endoscopy by clinicians using their own internal algorithms. Presumably, the clinicians had decided that those patients were at such low risk of significant pathology that there was no net benefit from undergoing endoscopic investigation. As such, it is likely that had those low risk patients been included in the two samples, our prediction models would have performed even better. While our modeling assumes that endoscopy is performed in the setting of GER symptoms solely to exclude BE, endoscopy may be undertaken for other indications in this clinical setting. If so, then this would tend to attenuate the predictive value of the models we have derived, since those patients being referred for other indications would presumably be at lower risk of BE than those being referred to confirm the clinical diagnosis.
A limitation of the Australian study was the relatively low rate of participation, raising concerns about possible biased selection of cases and controls. Because BE cases and inflammation controls were sampled from the same population, navigated the same clinical pathways and were recruited using identical methods, it is unlikely that systematically biased selection of one or other group explains our findings. Moreover, BE cases and inflammation controls were not informed about the hypotheses being tested, and so while biased recall of non-reflux exposures is possible, we consider the likelihood that this accounts for our observations as low. Although there were 108 dysplastic BE cases in our development dataset, we were unable to obtain a validation dataset with dysplastic BE cases. The best estimate for the performance of our model for predicting BE with dysplasia was an AUC estimate of 0.87 (95%CI 0.83–0.91) in the development dataset. Recently, central obesity has been found to be more strongly associated with BE than BMI, however measures of central obesity (e.g., waist-to-hip ratio) were not collected for this study. It is likely that adding such measures to the model, rather than BMI, may improve predictive accuracy. Finally, our sample was predominantly white (~97%) and thus the models may not be applicable to other ethnic groups.
This parsimonious model however could be considered as a starting point for further development, as a number of risk factors were not included and genetic information may also be important in predicting risk of BE. The inclusion of other environmental risk factors and the extension of the model to include biomarkers may go further to improving performance. However, a recent study has shown that breast cancer risk prediction does not improve significantly when genetic information is included in the risk model (45).
In summary, we have derived and externally validated a risk prediction model which estimates the likelihood of undiagnosed BE in patients with GER symptoms being considered for upper gastrointestinal endoscopy. The prediction model has the potential to be a useful tool in the clinical setting for decisions regarding investigation and treatment of patients with GER.
APT is supported by an Australian Postgraduate Award (University of Queensland) and the Cancer Council NSW STREP grant 08-04. NP and DCW are supported by Fellowships from the National Health and Medical Research Council of Australia and Australian Research Council, respectively.
We gratefully acknowledge the cooperation of the following institutions:
Sullivan and Nicolaides Pathology (Brisbane); Queensland Medical Laboratory (Brisbane); Queensland Health Pathology Services (Brisbane). We also acknowledge the contribution of the study nurses and research assistants and would like to thank all of the people who participated in the study.
Funding for the Study of Digestive Health (5 RO1 CA001833-02) and the validation study (R01 CA072866, K05 CA124911, R01 CA136725) was received from the National Cancer Institute. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute.
Queensland Institute of Medical Research, Brisbane, Australia: David C Whiteman MBBS, PhD; Adele C Green MBBS, PhD; Nicholas K Hayward PhD; Peter G Parsons PhD; Sandra J Pavey PhD, David M Purdie PhD; Penelope M Webb DPhil.
University of Queensland, Brisbane, Australia: David Gotley FRACS; Mark Smithers FRACS.
The University of Adelaide, Adelaide, Australia: Glyn G Jamieson FRACS.
Flinders University, Adelaide, Australia: Paul Drew PhD; David I Watson FRACS.
Envoi Pathology, Brisbane, Australia: Andrew Clouston PhD, FRCPA.
Project Manager: Suzanne O’Brien (QIMR); Data Manager: Troy Sadkowsky (QIMR);
Research Nurses: Andrea McMurtrie, Linda Terry, Michael Connard, Lea Jackman, Susan Perry, Marcia Davis.
Ian Brown (Envoi Pathology), Neal Walker (Envoi Pathology).
Disclosure of potential conflicts of interest:
The authors disclose no conflicts.
Author contributions:APT was responsible for conception and design, analysis and interpretation of data, and drafting and revising the manuscript critically for important intellectual and statistical content. BK and DCW contributed to the conception and design, interpretation of data and critical revision of the manuscript for important intellectual content. NP assisted in interpretation of data and revision of the statistical content. TLV contributed to conception and design of the validation study, and interpretation of data. All authors read and approved the final version of the manuscript.