|Home | About | Journals | Submit | Contact Us | Français|
Postoperative pancreatic fistula is still a major complication after pancreatic surgery, despite improvements of surgical technique and perioperative management. We sought to systematically review and critically access the conduct and reporting of methods used to develop risk prediction models for predicting postoperative pancreatic fistula. We conducted a systematic search of PubMed and EMBASE databases to identify articles published before January 1, 2015, which described the development of models to predict the risk of postoperative pancreatic fistula. We extracted information of developing a prediction model including study design, sample size and number of events, definition of postoperative pancreatic fistula, risk predictor selection, missing data, model-building strategies, and model performance. Seven studies of developing seven risk prediction models were included. In three studies (42 %), the number of events per variable was less than 10. The number of candidate risk predictors ranged from 9 to 32. Five studies (71 %) reported using univariate screening, which was not recommended in building a multivariate model, to reduce the number of risk predictors. Six risk prediction models (86 %) were developed by categorizing all continuous risk predictors. The treatment and handling of missing data were not mentioned in all studies. We found use of inappropriate methods that could endanger the development of model, including univariate pre-screening of variables, categorization of continuous risk predictors, and model validation. The use of inappropriate methods affects the reliability and the accuracy of the probability estimates of predicting postoperative pancreatic fistula.
Both improvements in surgical and perioperative management have reduced surgical morbidity and mortality after resective pancreatic surgery in high-volume centers. However, postoperative pancreatic fistula (POPF) is still regarded as the most relevant complication after resective pancreatic surgery (distal pancreatectomy and pancreatoduodenectomy) because it potentially leads to deleterious secondary complications, increased health care costs, and prolonged hospital stay [1, 2].
More studies using the strict definition applied by the International Study Group of Pancreatic Surgery (ISGPS)  report that POPF rates vary between 3 % after pancreatic head resection [2, 4] and up to 30 % following distal pancreatectomy [5–7]. Many risk factors for POPF have been identified, such as gender , body mass index (BMI) [9, 10], status of the pancreatic parenchyma , diameter of the main pancreatic duct (MPD) , and kind of disease . These risk factors lead to use some special techniques or perioperative precautions to minimize POPF rates [5, 14, 15]. However, it remains difficult to integrate various risk factors and make an accurate prediction of POPF. Recently, multiple studies have constructed perioperative scoring systems to predict POPF. Ideally, a predictive model should be based on easily available perioperative parameters allowing the surgeon to adopt strategies on individual basis.
The aim of this study was to review the perioperative predictive model for POPF based on the methodology and reporting quality and to inform and prompt further improvements in model building.
We attempted to identify all observational studies that developed the predictive scoring system for postoperative pancreatic fistula, which were published in the Pubmed and EMBASE databases before January 1, 2015.
The search strategy for Pubmed was “postoperative pancreatic fistula” and (“risk prediction model” or “predictive model” or “predictive equation” or “prediction model” or “risk calculator” or “prediction rule” or “risk model” or “risk factor” or “scoring system” or “statistical model” or “Cox model” or “multivariable”) not (review [publication type] or bibliography [publication type] or editorial [publication type] or letter [publication type] or meta-analysis [publication type] or news [publication type]).
The search strategy for EMBASE was risk prediction model or risk prediction model or predictive model or predictive model or predictive equation or predictive equation or prediction model or prediction model or risk calculator or risk calculator or prediction rule or prediction rule or risk model or risk factor or scoring system or statistical model or Cox model or multivariable and postoperative pancreatic fistula not letter not review not editorial not conference not book.
Additionally, the references of the primary and review articles were examined to identify publications not retrieved by electronic searches. Finally, we attempted to identify any imminent or unpublished material relevant to this topic using the clinical trial search and clinical trials. Titles and abstracts of all citations were screened independently by two reviewers. Discrepancies between reviewers’ opinions were resolved by a third reviewer. Articles were restricted to the English-language literatures.
A study was included in the systematic review if it provided the perioperative predictive scoring system or model for postoperative pancreatic fistula. Postoperative pancreatic fistula was diagnosed according to the International Study Group on Pancreatic Fistula (ISGPF) definition.
Articles were excluded if (1) they included only validation of a preexisting risk prediction model (that is, the article did not develop a model), (2) participants were children, and (3) the authors developed a genetic risk prediction model.
Data were extracted by two reviewers and checked by a third. Data items extracted for this article include study design, sample size and number of events, outcome definition, risk predictor selection and coding, missing data, model-building strategies, and aspects of performance. The data extraction form for this article was based largely on two previous reviews of prognostic models of cancer and can be obtained on request from the first author. Any discrepancy was resolved by discussion with a fourth reviewer and reanalysis of publication.
We reported our systematic review in according with the PRISMA guideline , with the exception of items relating to meta-analysis, as our study includes no formal meta-analysis.
The search string retrieved 199 articles in PubMed and 42 articles in EMBASE, and, after removing duplicates, our database search yields 216 articles (see Fig. 1). Seven articles met our inclusion criteria. Totally, seven studies were eligible for reviewing, published between September 2009 and October 2011 (Table (Table1)1) [8, 10–13, 17, 18]. Postoperative pancreatic fistula predicted by the models was defined by the International Study Group for Pancreatic Fistula. One studies described the development of a predictive model for POPF, based on perioperative risk factors . Three studies derived a predictive model for POPF based on preoperative risk factors [8, 13, 17]. Intraoperative risk factors were used by one study to derive a predictive model of POPF . Two studies built a predictive model for POPF in accordance with postoperative risk factors [11, 18].
The number of participants included in developing risk prediction model was clearly reported in all articles. The median number of participants included in model development was 244 (interquartile range (IQR) 50~387). The median number of events about postoperative pancreatic fistula was 69 (IQR 25~197). Five studies reported the number of events on postoperative pancreatic fistula grades based on ISGPF [5, 8, 11, 12, 18], the medium number of events on grade A 47 (IQR 6 to 76), grade B 17 (IQR 10 to 94), and grade C 8 (IQR 5 to 25). Two studies did not report the number of events on postoperative pancreatic fistula grades [10, 17].
A median of 11 risk predictors (IQR 10 to 16, range 9 to 32) were considered candidate risk predictors. The rational or references for including risk predictors were in seven studies [8, 10–13, 17, 18]. The final reported prediction models included a median of four risk predictors (IQR 3 to 5, range 3 to 10). In total, 18 different risk predictors were included in the final risk prediction models (see Fig. 2). The most commonly identified risk predictors included in the final risk predictor model were fatty pancreas (n=3), pancreatitis (n=2), MPD index (n=2), pancreatic fibrosis (n=2), pancreatic duct size (n=2), and diagnosis of pancreatic cancer (n=2). Other 12 risk predictors appeared only once in the final risk prediction model.
Three risk prediction models (43 %) were developed in which the number of events per variable was <10 [10, 12, 13]. Overall, the medium number of events per variable was 10 (IQR 7 to 39, range 5 to 175).
One prediction model (14 %) was developed retaining a continuous risk predictor as continuous , and six risk prediction models (86 %) dichotomized or categorized all continuous risk predictors [8, 10–13, 18].
All seven studies made no mention of missing data (Table (Table2);2); thus, it can only be assumed that a complete case analysis was conducted or that all data for all risk predictions (including candidate risk predictors) were available.
Five studies (71 %) reported using univariate screening to reduce the number of risk predictors [8, 10, 11, 13, 18], while it was unclear how the risk predictors were reduced prior to development of the risk predictor model in two studies (29 %) [12, 17]. Two studies (29 %) included all risk predictors in the multivariable analysis [11, 17].
Two studies (29 %) reported using automatic variable selection procedures to derive the final multivariable model [8, 13] (Table (Table2).2). One study reported using stepwise backward elimination . One study reported using forward stepwise selection model .
All seven studies clearly demonstrated the type of model they used to derive the prediction model. The final models were based on logistic regression in four articles [8, 10, 13, 18], linear regression in one article , and univariate analysis in one article .
Two studies (29 %) split cohort into development and validation cohorts [8, 13]. All studies conducted and published an internal validation of their risk prediction models within the same article and used two or more data sets in an attempt to demonstrate the internal validity of the risk prediction model [8, 10–13, 17, 18].
The type of performance measure was assessed in the risk prediction models (Table (Table3).3). Six studies (86 %) reported C-statistics, calculating C-statistics on internal validation data sets [8, 10–12, 17, 18]. Spearman’s rank correlation in one study was used to validate the risk predictor model . Only two studies (29 %) assessed how well the predicted risks compared to the observed risks (calibration); investigators in these two studies chose to calculate the Hosmer-Lemeshow test [10, 11].
Five studies (62.5 %) derived simplified scoring systems from the risk predictor models [8, 10–13]. The combination of serum albumin level and leukocyte counts on postoperative day 4 was used as the risk predictor model in one study . One study used preoperative measurements of pancreatic fat by magnetic resonance as the risk predictor model .
In the present article, we have highlighted the methods currently used to develop risk prediction models for POPF and poor reporting of those methods. The quality of risk prediction models depends on study design and statistical methods. Also, developed models need to provide accurate and validated estimates of probabilities for POPF. Pancreatic surgeons should understand the principles of statistical methods, choose the appropriated risk prediction model, and subsequently improve individual outcomes and the cost-effective care.
When developing a risk prediction model, one of the problems researcher meet is overfitting. Overfitting generally occurs when a model is excessively complex, such as having too many candidate risk predictors relative to the number of events. A rule of thumb is that models should be developed with 10~20 events per variable (EPV) . Of the studies included in this review, 43 % had fewer than 10 EPV. A risk prediction model which has been overfit will generally have poor predictive performance, as it would exaggerate minor fluctuation in the data . Other investigators have reported similar findings (EPV<10) when appraising the development of multivariable risk prediction models [21, 22].
Another key component affecting the performance of the final risk prediction model is how continuous variables are treated, whether they are kept as continuous measurements or whether they have been categorized into two or more categories . Common approaches include dichotomizing at the median value or choosing an optimal cutoff point based on minimizing a P value. Regardless of the approach used, the practice of artificially treating a continuous risk predictor as categorical should be avoided [23, 24]; yet, this is often done in the development of risk prediction models [22, 24–27]. In this review, 86 % of studies were identified that all or some continuous risk predictors were dichotomized or categorized. Dichotomizing continuous variables causes an inevitable loss of information and power to detect real relationship, equivalent to losing a third of the data. If the predictor is exponentially distributed, the loss associated with dichotomization at the median is even larger . Continuous risk predictors (like age) should be retained in the risk prediction model as continuous variables, and the risk predictor has a nonlinear relationship with the outcome; then, the use of splines or fractional polynomial functions is recommended .
Missing data, which is a serious problem in studies deriving a risk prediction model, is a potential source of bias when analyzing clinical data sets. There are many possible reasons for missing data (e.g., patient refusal to continue in the study, treatment failures or successes, adverse events, and patients moving). Regardless of study design, collecting all data on all risk predictors for all individuals is a difficult task . There can be no universally methodological approach for handling missing data. A common approach is to exclude individuals with missing values and conduct a complete case analysis. However, a complete case analysis, in addition to sacrificing and discarding useful information, is not recommended as it has been shown that it can yield biased results [30–32]. There is no missing data reported in all studies involved in our review. It is advised that the completeness of data should be reported so the reader can judge the quality of the data.
In the development of risk prediction model, many efforts have been made to use multivariable analysis in accessing the importance of risk factors on outcome. To identify the most significant risk factors associated with an outcome, bivariate analysis is frequently performed. In bivariate selection, if the statistical p value of a risk factor in bivariate analysis is greater than an arbitrary value (often p=0.05), then this factor will not be allowed to compete for inclusion in multivariable analysis. However, it is not simple to select predictors only based on statistical significance during model development, as it is crucial to retain those risk predictors known to be important from literatures, but which may not reach statistical significance in a data set. This has been shown to be inappropriate, as it can wrongly reject potentially important variables when the relationship between an outcome and a risk factor is confounded by any confounder and when this confounder is not properly controlled, thus leading to an unreliable model [30, 33]. Five studies in this review reduced the initial number of candidate risk predictor prior to the final model; however, two studies failed to provide sufficient detail on how the procedure was carried out.
Automated variable selection methods (forward selection, backward elimination, or stepwise) are frequently used to derive the final risk prediction models (29 % in our review). However, automated selection methods are data-driven approaches based on statistical significance without literature to clinical relevance, and it has been shown that these methods are unstable and not reproducible, have biased estimates of regression coefficient, and yield poor predictions [29, 34, 35].
After a risk prediction model has been derived, it is essential to access the performance of the model. To address this, several approaches have been suggested to estimate a risk prediction model’s optimism: (1) internal validation using bootstrapping, cross validation, and split-sampling techniques  and (2) temporal validation to evaluate the performance of a model on subsequent patients from the same center(s). Temporal validation is no different in principle from splitting a single data set by time; however, temporal validation is a prospective evaluation of a model. Temporal validation can be considered external in time and thus intermediate between internal validation and external validation , and (3) external validation aims to address the accuracy of a risk prediction model in patients from different centers or locations . Investigators in all of the studies in our review reported an internal validation on cohorts.
Few prediction models are routinely used in clinical practice, probably because most have not been externally validated . A risk score for prediction model should be clinically credible, accurate (well calibrated with good discriminative ability), have generality (be externally validated), and ideally be shown to be clinically effective, that is, provide useful additional information to clinicians that improves therapeutic decision making and thus patient outcome . It is crucial to quantify the performance of a prognostic model on a new series of patients, ideally in a different location, before applying the model in daily practice to guide patient care. Although still rare, temporal and external validation studies do seem to be becoming more common. We have observed poor reporting in all aspects of developing the risk prediction models based on the data and detail provided in all steps in building the model.
Definition of POPF is also important for developing a risk predictor model of POPF. The classification system of ISGPF was used to definite POPF in all seven studies. However, because of its retrospective character, the ISGPF may have limitations in clinical decision making. To improve clinical decision making about management of patients, there is a need that the ISGPF classification system is merged with newer clinical data.
An ideal prediction model of pancreatic fistula should include objective characteristics, which are identified preoperatively, intraoperatively, or postoperatively and be easily investigated. These characteristics include BMI, acoustic radiation force impulse, signal intensity of pancreas on magnetic resonance, the histological assessment of pancreatic steatosis and fibrosis, pancreatic duct width, inflammatory cytokines, and chemokines. The model should also be used simply and quickly to assure the surgeon intraoperative decision of performing an anastomosis in individual critical cases and validated by prospective multicenter clinical trials. An individual assessment of risk for postoperative pancreatic fistula enhances preoperative counseling and patient selection for pancreatic surgery. Furthermore, the model may permit a change to established clinical practice and strategies to decrease pancreatic fistula among those patients at high risk. Thus, further larger standardized studies are required.
Despite the systematic search and inclusion of the most recent publications, this systematic review was limited to English-language articles and did not consider grey literatures. Therefore, we may have missed some studies.
In conclusions, we found that published risk prediction models for PF were often characterized by both use of inappropriate methods for development of multivariable models and poor reporting. Additionally, these models were limited by the lack of studies based on prospective data set of sufficient sample size to avoid overfitting. There is an urgent need for investigators to use appropriate method and to report effectively. An appropriate risk prediction model would guide the surgeons to predict the objective probability of POPF, treat POPF promptly, and also benefit patients.
The study was supported by the National Natural Science Foundation of China (Grant No. 81560387).
The authors declare that they have no conflict of interest.
Zhang Wen and Ya Guo contributed equally to this work.