1.  The magnitude and temporal changes of response in the placebo arm of surgical randomized controlled trials: a systematic review and meta-analysis 
Trials  2016;17:589.
Understanding changes in the placebo arm is essential for correct design and interpretation of randomized controlled trials (RCTs). It is assumed that placebo response, defined as the total improvement in the placebo arm of surgical trials, is large; however, its precise magnitude and properties remain to be characterized. To the best of our knowledge, the temporal changes in the placebo arm have not been investigated. The aim of this paper was to determine, in surgical RCTs, the magnitude of placebo response and how it is affected by duration of follow-up.
The databases of MEDLINE, EMBASE, the Cochrane Central Register of Controlled Trials and were searched from their inception to 20 October 2015 for studies comparing the efficacy of a surgical intervention with placebo. Inclusion was not limited to any particular condition, intervention, outcome or patient population. The magnitude of placebo response was estimated using standardized mean differences (SMDs). Study estimates were pooled using random effects meta-analysis. Potential sources of heterogeneity were evaluated using stratification and meta-regression.
Database searches returned 88 studies, but for 41 studies SMDs could not be calculated, leaving 47 trials (involving 1744 participants) eligible for inclusion. There were no temporal changes in placebo response within the analysed trials. Meta-regression analysis showed that duration of follow-up did not have a significant effect on the magnitude of the placebo response and that the strongest predictor of placebo response was subjectivity of the outcome. The pooled effect in the placebo arm of studies with subjective outcomes was large (0.64, 95% CI 0.5 to 0.8) and remained significantly different from zero regardless of the duration of follow-up, whereas for objective outcomes, the effect was small (0.11, 95% CI 0.04 to 0.26) or non-significant across all time points.
This is the first study to investigate the temporal changes of placebo response in surgical trials and the first to investigate the sources of heterogeneity of placebo response. Placebo response in surgical trials was large for subjective outcomes, persisting as a time-invariant effect throughout blinded follow-up. Therefore, placebo response cannot be minimized in these types of outcomes through their appraisal at alternative time points. The analyses suggest that objective outcomes may be preferable as trial end-points. Where subjective outcomes are of primary interest, a placebo arm is necessary to control for placebo response.
Electronic supplementary material
The online version of this article (doi:10.1186/s13063-016-1720-7) contains supplementary material, which is available to authorized users.
PMCID: PMC5154040  PMID: 27955685
Surgery; Placebos; Randomized controlled trials; Systematic review; Meta-analysis
2.  No rationale for 1 variable per 10 events criterion for binary logistic regression analysis 
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.
The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared.
The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.
The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
PMCID: PMC5122171  PMID: 27881078
EPV; Bias; Separation; Logistic regression; Sample size; Simulations
3.  COmmunity-based Rehabilitation after Knee Arthroplasty (CORKA): study protocol for a randomised controlled trial 
Trials  2016;17:501.
The number of knee arthroplasties performed each year is steadily increasing. Although the outcome is generally favourable, up to 15 % fail to achieve a satisfactory clinical outcome which may indicate that the existing model of rehabilitation after surgery may not be the most efficacious. Given the increasing number of knee arthroplasties, the relative limited physiotherapy resources available and the increasing age and frailty of patients receiving arthroplasty surgery, it is important that we concentrate our rehabilitation resources on those patients who most need help to achieve a good outcome. This pragmatic randomised controlled trial will investigate the clinical and cost-effectiveness of a community-based multidisciplinary rehabilitation intervention in comparison to usual care.
The trial is designed as a prospective, single-blind, two-arm randomised controlled trial (RCT). A bespoke algorithm to predict which patients are at risk of poor outcome will be developed to screen patients for inclusion into a RCT using existing datasets. Six hundred and twenty patients undergoing knee arthroplasty, and assessed as being at risk of poor outcome using this algorithm, will be recruited and randomly allocated to one of two rehabilitation strategies: usual care or an individually tailored community-based rehabilitation package. The primary outcome is the Late Life Function and Disability Instrument measured at 1 year after surgery. Secondary outcomes include the Oxford Knee Score, the Knee injury and Osteoarthritis Outcome Score quality of life subscale, the Physical Activity Scale for the Elderly, the EQ-5D-5L and physical function measured by three performance-based tests: figure of eight, sit to stand and single-leg stand. A nested qualitative study will explore patient experience and perceptions and a health economic analysis will assess whether a home-based multidisciplinary individually tailored rehabilitation package represents good value for money when compared to usual care.
There is lack of consensus about what constitutes the optimum package of rehabilitation after knee arthroplasty surgery. There is also a need to tailor rehabilitation to the needs of those predicted to do least well by focussing on interventions that target the elderly and frailer population receiving arthroplasty surgery.
Trial registration
ISRCTN 13517704, registered on 12 February 2015.
PMCID: PMC5064916  PMID: 27737685
Randomised controlled trial; Knee arthroplasty; Physiotherapy; Occupational therapy; Rehabilitation; Community; Elderly; Frail
5.  Study protocol: first nationwide comparative audit of acute lower gastrointestinal bleeding in the UK 
BMJ Open  2016;6(8):e011752.
Acute lower gastrointestinal bleeding (LGIB) is a common indication for emergency hospitalisation worldwide. In contrast to upper GIB, patient characteristics, modes of investigation, transfusion, treatment and outcomes are poorly described. There are minimal clinical guidelines to inform care pathways and the use of endoscopy, including (diagnostic and therapeutic yields), interventional radiology and surgery are poorly defined. As a result, there is potential for wide variation in practice and clinical outcomes.
Methods and analysis
The UK Lower Gastrointestinal Bleeding Audit is a large nationwide audit of adult patients acutely admitted with LGIB or those who develop LGIB while hospitalised for another reason. Consecutive, unselected presentations with LGIB will be enrolled prospectively over a 2-month period at the end of 2015 and detailed data will be collected on patient characteristics, comorbidities, use of anticoagulants, transfusion, timing and modalities of diagnostic and therapeutic procedures, clinical outcome, length of stay and mortality. These will be audited against predefined minimum standards of care for LGIB. It is anticipated that over 80% of all acute hospitals in England and some hospitals in Scotland, Wales and Northern Ireland will participate. Data will be collected on the availability and organisation of care, provision of diagnostic and therapeutic GI endoscopy, interventional radiology, surgery and transfusion protocols.
Ethics and dissemination
This audit will be conducted as part of the national comparative audit programme of blood transfusion through collaboration with specialists in gastroenterology, surgery and interventional radiology. Individual reports will be provided to each participant site as well as an overall report and disseminated through specialist societies. Results will also be published in peer-reviewed journals. The study has been funded by National Health Services (NHS) Blood and Transplant and the Bowel Disease Research Foundation and endorsed by the Association of Coloproctology of Great Britain and Ireland.
PMCID: PMC4985848  PMID: 27491671
6.  Adequate sample size for developing prediction models is not simply related to events per variable 
The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated.
Study Design and Setting
We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection.
Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model.
Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy.
PMCID: PMC5045274  PMID: 26964707
Events per variable; Cox model; External validation; Predictive modeling; Sample size; Resampling study
7.  An external validation of models to predict the onset of chronic kidney disease using population-based electronic health records from Salford, UK 
BMC Medicine  2016;14:104.
Chronic kidney disease (CKD) is a major and increasing constituent of disease burdens worldwide. Early identification of patients at increased risk of developing CKD can guide interventions to slow disease progression, initiate timely referral to appropriate kidney care services, and support targeting of care resources. Risk prediction models can extend laboratory-based CKD screening to earlier stages of disease; however, to date, only a few of them have been externally validated or directly compared outside development populations. Our objective was to validate published CKD prediction models applicable in primary care.
We synthesised two recent systematic reviews of CKD risk prediction models and externally validated selected models for a 5-year horizon of disease onset. We used linked, anonymised, structured (coded) primary and secondary care data from patients resident in Salford (population ~234 k), UK. All adult patients with at least one record in 2009 were followed-up until the end of 2014, death, or CKD onset (n = 178,399). CKD onset was defined as repeated impaired eGFR measures over a period of at least 3 months, or physician diagnosis of CKD Stage 3–5. For each model, we assessed discrimination, calibration, and decision curve analysis.
Seven relevant CKD risk prediction models were identified. Five models also had an associated simplified scoring system. All models discriminated well between patients developing CKD or not, with c-statistics around 0.90. Most of the models were poorly calibrated to our population, substantially over-predicting risk. The two models that did not require recalibration were also the ones that had the best performance in the decision curve analysis.
Included CKD prediction models showed good discriminative ability but over-predicted the actual 5-year CKD risk in English primary care patients. QKidney, the only UK-developed model, outperformed the others. Clinical prediction models should be (re)calibrated for their intended uses.
Electronic supplementary material
The online version of this article (doi:10.1186/s12916-016-0650-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4940699  PMID: 27401013
Chronic kidney disease; Clinical prediction models; eGFR; Decision support; Electronic health records; Model validation; Model calibration
8.  Guidelines for Accurate and Transparent Health Estimates Reporting: the GATHER statement 
PLoS Medicine  2016;13(6):e1002056.
Gretchen Stevens and colleagues present the GATHER statement, which seeks to promote good practice in the reporting of global health estimates.
PMCID: PMC4924581  PMID: 27351744
9.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges 
The BMJ  2016;353:i3140.
Access to big datasets from e-health records and individual participant data (IPD) meta-analysis is signalling a new advent of external validation studies for clinical prediction models. In this article, the authors illustrate novel opportunities for external validation in big, combined datasets, while drawing attention to methodological challenges and reporting issues.
PMCID: PMC4916924  PMID: 27334381
10.  Prediction models for cardiovascular disease risk in the general population: systematic review 
The BMJ  2016;353:i2416.
Objective To provide an overview of prediction models for risk of cardiovascular disease (CVD) in the general population.
Design Systematic review.
Data sources Medline and Embase until June 2013.
Eligibility criteria for study selection Studies describing the development or external validation of a multivariable model for predicting CVD risk in the general population.
Results 9965 references were screened, of which 212 articles were included in the review, describing the development of 363 prediction models and 473 external validations. Most models were developed in Europe (n=167, 46%), predicted risk of fatal or non-fatal coronary heart disease (n=118, 33%) over a 10 year period (n=209, 58%). The most common predictors were smoking (n=325, 90%) and age (n=321, 88%), and most models were sex specific (n=250, 69%). Substantial heterogeneity in predictor and outcome definitions was observed between models, and important clinical and methodological information were often missing. The prediction horizon was not specified for 49 models (13%), and for 92 (25%) crucial information was missing to enable the model to be used for individual risk prediction. Only 132 developed models (36%) were externally validated and only 70 (19%) by independent investigators. Model performance was heterogeneous and measures such as discrimination and calibration were reported for only 65% and 58% of the external validations, respectively.
Conclusions There is an excess of models predicting incident CVD in the general population. The usefulness of most of the models remains unclear owing to methodological shortcomings, incomplete presentation, and lack of external validation and model impact studies. Rather than developing yet another similar CVD risk prediction model, in this era of large datasets, future research should focus on externally validating and comparing head-to-head promising CVD risk models that already exist, on tailoring or even combining these models to local settings, and investigating whether these models can be extended by addition of new predictors.
PMCID: PMC4868251  PMID: 27184143
11.  Lean Participative Process Improvement: Outcomes and Obstacles in Trauma Orthopaedics 
PLoS ONE  2016;11(4):e0152360.
To examine the effectiveness of a “systems” approach using Lean methodology to improve surgical care, as part of a programme of studies investigating possible synergy between improvement approaches.
A controlled before-after study using the orthopaedic trauma theatre of a UK Trust hospital as the active site and an elective orthopaedic theatre in the same Trust as control.
All staff involved in surgical procedures in both theatres.
A one-day “lean” training course delivered by an experienced specialist team was followed by support and assistance in developing a 6 month improvement project. Clinical staff selected the subjects for improvement and designed the improvements.
Outcome Measures
We compared technical and non-technical team performance in theatre using WHO checklist compliance evaluation, “glitch count” and Oxford NOTECHS II in a sample of directly observed operations, and patient outcome (length of stay, complications and readmissions) for all patients. We collected observational data for 3 months and clinical data for 6 months before and after the intervention period. We compared changes in measures using 2-way analysis of variance.
We studied 576 cases before and 465 after intervention, observing the operation in 38 and 41 cases respectively. We found no significant changes in team performance or patient outcome measures. The intervention theatre staff focused their efforts on improving first patient arrival time, which improved by 20 minutes after intervention.
This version of “lean” system improvement did not improve measured safety processes or outcomes. The study highlighted an important tension between promoting staff ownership and providing direction, which needs to be managed in “lean” projects. Space and time for staff to conduct improvement activities are important for success.
PMCID: PMC4849765  PMID: 27124012
12.  Pulmonary function in an international sample of HIV-positive, treatment-naïve adults with CD4 counts >500 cells/μL: a substudy of the INSIGHT Strategic Timing of AntiRetroviral Treatment trial 
HIV medicine  2015;16(0 0):119-128.
To describe the prevalence and correlates of chronic obstructive pulmonary disease (COPD) in a multicentre international cohort of persons living with HIV (PLWH).
We performed a cross-sectional analysis of adult PLWH, naïve to HIV treatment, with baseline CD4 cell count >500 cells/μL enrolled in the Pulmonary Substudy of the Strategic Timing of AntiRetroviral Treatment trial. We collected standardised, quality-controlled spirometry. COPD was defined as forced expiratory volume in one second/forced vital capacity (FEV1/FVC) ratio less than the lower limit of normal.
Among 1026 participants from 80 sites and 20 countries, median (IQR) age was 36 (30, 44) years, 29% were female, and time since HIV diagnosis was 1.2 (0.4, 3.5) years. Baseline CD4 cell count was 648 (583, 767) cells/μL, viral load was 4.2 (3.5, 4.7) log10 copies/mL, and 10% had viral load ≤400 copies/mL despite lack of HIV treatment. Current/former/never smokers comprised 28%/11%/61% of the cohort, respectively. COPD was present in 6.8% of participants, and varied by age, smoking status, and region. 48% of those with COPD reported lifelong nonsmoking. In multivariable regression, age and pack-years of smoking had the strongest associations with FEV1/FVC ratio (p<0.0001). There were significant differences between the effect of region on FEV1/FVC ratio (p=0.010).
Our data suggest that among PLWH, naïve to HIV treatment and with CD4 cell count >500 cells/μL, smoking and age are important factors related to COPD. Smoking cessation should remain a high global priority for clinical care and research in PLWH.
PMCID: PMC4341938  PMID: 25711330
HIV; pulmonary disease; spirometry; smoking; START trial
13.  Feasibility of surgical randomised controlled trials with a placebo arm: a systematic review 
BMJ Open  2016;6(3):e010194.
To find evidence, either corroborating or refuting, for many persisting beliefs regarding the feasibility of carrying out surgical randomised controlled trials with a placebo arm, with emphasis on the challenges related to recruitment, funding, anaesthesia or blinding.
Systematic review.
Data sources and study selection
The analysis involved studies published between 1959 and 2014 that were identified during an earlier systematic review of benefits and harms of placebo-controlled surgical trials published in 2014.
63 trials were included in the review. The main problem reported in many trials was a very slow recruitment rate, mainly due to the difficulty in finding eligible patients. Existing placebo trials were funded equally often from commercial and non-commercial sources. General anaesthesia or sedation was used in 41% of studies. Among the reviewed trials, 81% were double-blinded, and 19% were single-blinded. Across the reviewed trials, 96% (range 50–100%) of randomised patients completed the study. The withdrawal rate during the study was similar in the surgical and in the placebo groups.
This review demonstrated that placebo-controlled surgical trials are feasible, at least for procedures with a lower level of invasiveness, but also that recruitment is difficult. Many of the presumed challenges to undertaking such trials, for example, funding, anaesthesia or blinding of patients and assessors, were not reported as obstacles to completion in any of the reviewed trials.
PMCID: PMC4800115  PMID: 27008687
SURGERY; Randomised Controlled Trials; Placebos
14.  Quality Improvement in Surgery Combining Lean Improvement Methods with Teamwork Training: A Controlled Before-After Study 
PLoS ONE  2015;10(9):e0138490.
To investigate the effectiveness of combining teamwork training and lean process improvement, two distinct approaches to improving surgical safety. We conducted a controlled interrupted time series study in a specialist UK Orthopaedic hospital incorporating a plastic surgery team (which received the intervention) and an Orthopaedic theatre team acting as a control.
Study Design
We used a 3 month intervention with 3 months data collection period before and after it. A combined teamwork training and lean process improvement intervention was delivered by an experienced specialist team. Before and after the intervention we evaluated team non-technical skills using NOTECHS II, technical performance using the glitch rate and WHO checklist compliance using a simple 3 point scale. We recorded complication rate, readmission rate and length of hospital stay data for 6 months before and after the intervention.
In the active group, but not the control group, full compliance with WHO Time Out (T/O) increased from 14 to 71% (p = 0.032), Sign Out attempt rate (S/O) increased from 0% to 50% (p<0.001) and Oxford NOTECHS II scores increased after the intervention (P = 0.058). Glitch rate decreased in the active group and increased in the control group (p = 0.001). Complications and length of stay appeared to rise in the control group and fall in the active group.
Combining teamwork training and systems improvement enhanced both technical and non-technical operating team process measures, and were associated with a trend to better safety outcome measures in a controlled study comparison. We suggest that approaches which address both system and culture dimensions of safety may prove valuable in reducing risks to patients.
PMCID: PMC4575036  PMID: 26381643
15.  Markers of inflammation and activation of coagulation are associated with anemia in antiretroviral-treated HIV disease 
AIDS (London, England)  2014;28(12):1791-1796.
to determine the relationship between inflammatory (IL-6 and hsCRP) and coagulation (D-dimer) biomarkers and the presence and type of anemia among HIV+ individuals.
cross-sectional study
cART-treated adults participating in an international HIV trial with hemoglobin and mean corpuscular volume (MCV) measurements at entry were categorized by presence of anemia (hemoglobin ≤ 14 g/dL in men and ≤ 12 g/dl in women) and, for those with anemia, by type (microcytic [MCV< 80 fL], normocytic [80–100], macrocytic [>100]). We analyzed the association between inflammation (IL-6 and hsCRP) and coagulation (D-dimer) and hemoglobin, controlling for demographics (age, race, and gender), body mass index, HIV plasma RNA levels, CD4+ T cell counts (nadir and baseline), Karnofsky score, previous AIDS diagnosis, hepatitis B/C co-infection and use of zidovudine.
Among 1,410 participants, 313 (22.2%) had anemia. Of these, 4.1%, 27.2% and 68.7% had microcytic, normocytic and macrocytic anemia, respectively. When compared with participants with normal hemoglobin values, those with anemia were more likely to be older, black, male and on zidovudine. They also had lower baseline CD4+ T cell counts and lower Karnofsky scores. Adjusted relative odds of anemia per two fold higher biomarker levels were 1.22 (P= 0.007) for IL-6, 0.99 for hsCRP (P= 0.86) and 1.35 (P< 0.001) for D-dimer. Similar associations were seen in those with normal and high MCV values.
Persistent inflammation and hypercoagulation appear to be associated with anemia. Routine measurements of hemoglobin might provide insights into the inflammatory state of treated HIV infection.
PMCID: PMC4499102  PMID: 25003720
HIV; coagulation; D-dimer; CRP; inflammation; IL-6; anemia
16.  The effect of teamwork training on team performance and clinical outcome in elective orthopaedic surgery: a controlled interrupted time series study 
BMJ Open  2015;5(4):e006216.
To evaluate the effectiveness of aviation-style teamwork training in improving operating theatre team performance and clinical outcomes.
3 operating theatres in a UK district general hospital, 1 acting as a control group and the other 2 as the intervention group.
72 operations (37 intervention, 35 control) were observed in full by 2 trained observers during two 3-month observation periods, before and after the intervention period.
A 1-day teamwork training course for all staff, followed by 6 weeks of weekly in-service coaching to embed learning.
Primary and secondary outcome measures
We measured team non-technical skills using Oxford NOTECHS II, (evaluating the whole team and the surgical, anaesthetic and nursing subteams, and evaluated technical performance using the Glitch count. We evaluated compliance with the WHO checklist by recording whether time-out (T/O) and sign-out (S/O) were attempted, and whether T/O was fully complied with. We recorded complications, re-admissions and duration of hospital stay using hospital administrative data. We compared the before–after change in the intervention and control groups using 2-way analysis of variance (ANOVA) and regression modelling.
Mean NOTECHS II score increased significantly from 71.6 to 75.4 in the active group but remained static in the control group (p=0.047). Among staff subgroups, the nursing score increased significantly (p=0.006), but the anaesthetic and surgical scores did not. The attempt rate for WHO T/O procedures increased significantly in both active and control groups, but full compliance with T/O improved only in the active group (p=0.003). Mean glitch rate was unchanged in the control group but increased significantly (7.2–10.2/h, p=0.002) in the active group.
Teamwork training was associated with improved non-technical skills in theatre teams but also with a rise in operative glitches.
PMCID: PMC4410121  PMID: 25897025
SURGERY; Quality improvement; Patient safety
18.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement 
BMC Medicine  2015;13:1.
Prediction models are developed to aid health care providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will occur in the future (prognostic models), to inform their decision making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative developed a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, health care professionals, and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors. The resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document. To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at
Editors’ note: In order to encourage dissemination of the TRIPOD Statement, this article is freely accessible on the Annals of Internal Medicine Web site ( and will be also published in BJOG, British Journal of Cancer, British Journal of Surgery, BMC Medicine, British Medical Journal, Circulation, Diabetic Medicine, European Journal of Clinical Investigation, European Urology, and Journal of Clinical Epidemiology. The authors jointly hold the copyright of this article. An accompanying Explanation and Elaboration article is freely available only on; Annals of Internal Medicine holds copyright for that article.
PMCID: PMC4284921  PMID: 25563062
Prediction models; Prognostic; Diagnostic; Model development; Validation; Transparency; Reporting
19.  Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography 
PLoS ONE  2014;9(10):e107633.
Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection.
In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods.
Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC.
The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
PMCID: PMC4212964  PMID: 25353643
20.  Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist 
PLoS Medicine  2014;11(10):e1001744.
Carl Moons and colleagues provide a checklist and background explanation for critically appraising and extracting data from systematic reviews of prognostic and diagnostic prediction modelling studies.
Please see later in the article for the Editors' Summary
PMCID: PMC4196729  PMID: 25314315
21.  Identifying patients with undetected pancreatic cancer in primary care: an independent and external validation of QCancer® (Pancreas) 
The British Journal of General Practice  2013;63(614):e636-e642.
Despite its rarity, the prognosis of pancreatic cancer is very poor and it is a major cause of cancer mortality; being ranked fourth in the world, it has one of the worst survival rates of any cancer.
To evaluate the performance of QCancer® (Pancreas) for predicting the absolute risk of pancreatic cancer in an independent UK cohort of patients, from general practice records.
Design and setting
Prospective cohort study to evaluate the performance QCancer (Pancreas) prediction models in 364 practices from the UK, contributing to The Health Improvement Network (THIN) database.
Records were extracted from the THIN database for 2.15 million patients registered with a general practice surgery between 1 January 2000 and 30 June 2008, aged 30–84 years (3.74 million person-years), with 618 pancreatic cancer cases. Pancreatic cancer was defined as incident diagnosis of pancreatic cancer during the 2 years after study entry.
The results from this independent and external validation of QCancer (Pancreas) demonstrated good performance data on a large cohort of general practice patients. QCancer (Pancreas) had very good discrimination properties, with areas under the receiver operating characteristic curve of 0.89 and 0.92 for females and males respectively. QCancer (Pancreas) explained 60% and 67% of the variation in females and males respectively. QCancer (Pancreas) over-predicted risk in both females and males, notably in older patients.
QCancer (Pancreas) is potentially useful for identifying undetected cases of pancreatic cancer in primary care in the UK.
PMCID: PMC3750803  PMID: 23998844
pancreatic cancer; primary care; risk prediction; validation
22.  Reassessing the approach to informed consent: the case of unrelated hematopoietic stem cell transplantation in adult thalassemia patients 
The informed consent process is the legal embodiment of the fundamental right of the individual to make decisions affecting his or her health., and the patient’s permission is a crucial form of respect of freedom and dignity, it becomes extremely important to enhance the patient’s understanding and recall of the information given by the physician. This statement acquires additional weight when the medical treatment proposed can potentially be detrimental or even fatal. This is the case of thalassemia patients pertaining to class 3 of the Pesaro classification where Allogenic hematopoietic stem cell transplantation (HSCT) remains the only potentially curative treatment. Unfortunately, this kind of intervention is burdened by an elevated transplantation-related mortality risk (TRM: all deaths considered related to transplantation), equal to 30% according to published reports. In thalassemia, the role of the patient in the informed consent process leading up to HSCT has not been fully investigated. This study investigated the hypothesis that information provided by physicians in the medical scenario of HSCT is not fully understood by patients and that misunderstanding and communication biases may affect the clinical decision-making process.
A questionnaire was either mailed or given personally to 25 patients. A second questionnaire was administered to the 12 physicians attending the patients enrolled in this study. Descriptive statistics were used to evaluate the communication factors.
The results pointed out the difference between the risks communicated by physicians and the risks perceived by patients. Besides the study highlighted the mortality risk considered to be acceptable by patients and that considered to be acceptable by physicians.
Several solutions have been suggested to reduce the gap between communicated and perceived data. A multi-disciplinary approach may possibly help to attenuate some aspects of communication bias. Several tools have also been proposed to fill or to attenuate the gap between communicated and perceived data. But the most important tool is the ability of the physician to comprehend the right place of conscious consent in the relationship with the patient.
PMCID: PMC4136633  PMID: 25115172
Patient-doctor relationship; Communication bias; Conscious consent
23.  Impact of peer review on reports of randomised trials published in open peer review journals: retrospective before and after study 
The BMJ  2014;349:g4145.
Objective To investigate the effectiveness of open peer review as a mechanism to improve the reporting of randomised trials published in biomedical journals.
Design Retrospective before and after study.
Setting BioMed Central series medical journals.
Sample 93 primary reports of randomised trials published in BMC-series medical journals in 2012.
Main outcome measures Changes to the reporting of methodological aspects of randomised trials in manuscripts after peer review, based on the CONSORT checklist, corresponding peer reviewer reports, the type of changes requested, and the extent to which authors adhered to these requests.
Results Of the 93 trial reports, 38% (n=35) did not describe the method of random sequence generation, 54% (n=50) concealment of allocation sequence, 50% (n=46) whether the study was blinded, 34% (n=32) the sample size calculation, 35% (n=33) specification of primary and secondary outcomes, 55% (n=51) results for the primary outcome, and 90% (n=84) details of the trial protocol. The number of changes between manuscript versions was relatively small; most involved adding new information or altering existing information. Most changes requested by peer reviewers had a positive impact on the reporting of the final manuscript—for example, adding or clarifying randomisation and blinding (n=27), sample size (n=15), primary and secondary outcomes (n=16), results for primary or secondary outcomes (n=14), and toning down conclusions to reflect the results (n=27). Some changes requested by peer reviewers, however, had a negative impact, such as adding additional unplanned analyses (n=15).
Conclusion Peer reviewers fail to detect important deficiencies in reporting of the methods and results of randomised trials. The number of these changes requested by peer reviewers was relatively small. Although most had a positive impact, some were inappropriate and could have a negative impact on reporting in the final publication.
PMCID: PMC4077234  PMID: 24986891
24.  Use of placebo controls in the evaluation of surgery: systematic review 
The BMJ  2014;348:g3253.
Objective To investigate whether placebo controls should be used in the evaluation of surgical interventions.
Design Systematic review.
Data sources We searched Medline, Embase, and the Cochrane Controlled Trials Register from their inception to November 2013.
Study selection Randomised clinical trials comparing any surgical intervention with placebo. Surgery was defined as any procedure that both changes the anatomy and requires a skin incision or use of endoscopic techniques.
Data extraction Three reviewers (KW, BJFD, IR) independently identified the relevant trials and extracted data on study details, outcomes, and harms from included studies.
Results In 39 out of 53 (74%) trials there was improvement in the placebo arm and in 27 (51%) trials the effect of placebo did not differ from that of surgery. In 26 (49%) trials, surgery was superior to placebo but the magnitude of the effect of the surgical intervention over that of the placebo was generally small. Serious adverse events were reported in the placebo arm in 18 trials (34%) and in the surgical arm in 22 trials (41.5%); in four trials authors did not specify in which arm the events occurred. However, in many studies adverse events were unrelated to the intervention or associated with the severity of the condition. The existing placebo controlled trials investigated only less invasive procedures that did not involve laparotomy, thoracotomy, craniotomy, or extensive tissue dissection.
Conclusions Placebo controlled trial is a powerful, feasible way of showing the efficacy of surgical procedures. The risks of adverse effects associated with the placebo are small. In half of the studies, the results provide evidence against continued use of the investigated surgical procedures. Without well designed placebo controlled trials of surgery, ineffective treatment may continue unchallenged.
PMCID: PMC4029190  PMID: 24850821
25.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting 
Before considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models.
We conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures.
11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models.
The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data.
PMCID: PMC3999945  PMID: 24645774

