Search tips
Search criteria

Results 1-25 (52)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  No rationale for 1 variable per 10 events criterion for binary logistic regression analysis 
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.
The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared.
The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.
The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
PMCID: PMC5122171  PMID: 27881078
EPV; Bias; Separation; Logistic regression; Sample size; Simulations
2.  STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration 
BMJ Open  2016;6(11):e012799.
Diagnostic accuracy studies are, like other clinical studies, at risk of bias due to shortcomings in design and conduct, and the results of a diagnostic accuracy study may not apply to other patient groups and settings. Readers of study reports need to be informed about study design and conduct, in sufficient detail to judge the trustworthiness and applicability of the study findings. The STARD statement (Standards for Reporting of Diagnostic Accuracy Studies) was developed to improve the completeness and transparency of reports of diagnostic accuracy studies. STARD contains a list of essential items that can be used as a checklist, by authors, reviewers and other readers, to ensure that a report of a diagnostic accuracy study contains the necessary information. STARD was recently updated. All updated STARD materials, including the checklist, are available at Here, we present the STARD 2015 explanation and elaboration document. Through commented examples of appropriate reporting, we clarify the rationale for each of the 30 items on the STARD 2015 checklist, and describe what is expected from authors in developing sufficiently informative study reports.
PMCID: PMC5128957
Reporting quality; Sensitivity and specificity; Diagnostic accuracy; Research waste; Peer review; Medical publishing
3.  STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies 
The BMJ  2015;351:h5527.
Incomplete reporting has been identified as a major source of avoidable waste in biomedical research. Essential information is often not provided in study reports, impeding the identification, critical appraisal, and replication of studies. To improve the quality of reporting of diagnostic accuracy studies, the Standards for Reporting Diagnostic Accuracy (STARD) statement was developed. Here we present STARD 2015, an updated list of 30 essential items that should be included in every report of a diagnostic accuracy study. This update incorporates recent evidence about sources of bias and variability in diagnostic accuracy and is intended to facilitate the use of STARD. As such, STARD 2015 may help to improve completeness and transparency in reporting of diagnostic accuracy studies.
PMCID: PMC4623764  PMID: 26511519
4.  High Hospitalization Rates in Survivors of Childhood Cancer: A Longitudinal Follow-Up Study Using Medical Record Linkage 
PLoS ONE  2016;11(7):e0159518.
Hospitalization rates over time of childhood cancer survivors (CCS) provide insight into the burden of unfavorable health conditions on CCS and health care resources. The objective of our study was to examine trends in hospitalizations of CCS and risk factors in comparison with the general population. We performed a medical record linkage study of a cohort of 1564 ≥five-year CCS with national registers. We obtained a random sample of the general population matched on year of birth, gender and calendar year per CCS retrieved. We quantified and compared hospitalization rates of CCS and reference persons from 1995 until 2005, and we analyzed risk factors for hospitalization within the CCS cohort with multivariable Poisson models. We retrieved hospitalization information from 1382 CCS and 25583 reference persons. The overall relative hospitalization rate (RHR) was 2.2 (95%CI:1.9–2.5) for CCS compared to reference persons. CCS with central nervous system and solid tumors had highest RHRs. Hospitalization rates in CCS were increased compared to reference persons up to at least 30 years after primary diagnosis, with highest rates 5–10 and 20–30 years after primary cancer. RHRs were highest for hospitalizations due to neoplasms (10.7; 95%CI:7.1–16.3) and endocrine/nutritional/metabolic disorders (7.3; 95%CI:4.6–11.7). Female gender (P<0.001), radiotherapy to head and/or neck (P<0.001) or thorax and/or abdomen (P = 0.03) and surgery (P = 0.01) were associated with higher hospitalization rates in CCS. In conclusion, CCS have increased hospitalization rates compared to the general population, up to at least 30 years after primary cancer treatment. These findings imply a high and long-term burden of unfavorable health conditions after childhood cancer on survivors and health care resources.
PMCID: PMC4951023  PMID: 27433937
5.  External Validation of Prediction Models for Pneumonia in Primary Care Patients with Lower Respiratory Tract Infection: An Individual Patient Data Meta-Analysis 
PLoS ONE  2016;11(2):e0149895.
Pneumonia remains difficult to diagnose in primary care. Prediction models based on signs and symptoms (S&S) serve to minimize the diagnostic uncertainty. External validation of these models is essential before implementation into routine practice. In this study all published S&S models for prediction of pneumonia in primary care were externally validated in the individual patient data (IPD) of previously performed diagnostic studies.
Methods and Findings
S&S models for diagnosing pneumonia in adults presenting to primary care with lower respiratory tract infection and IPD for validation were identified through a systematical search. Six prediction models and IPD of eight diagnostic studies (N total = 5308, prevalence pneumonia 12%) were included. Models were assessed on discrimination and calibration. Discrimination was measured using the pooled Area Under the Curve (AUC) and delta AUC, representing the performance of an individual model relative to the average dataset performance. Prediction models by van Vugt et al. and Heckerling et al. demonstrated the highest pooled AUC of 0.79 (95% CI 0.74–0.85) and 0.72 (0.68–0.76), respectively. Other models by Diehr et al., Singal et al., Melbye et al., and Hopstaken et al. demonstrated pooled AUCs of 0.65 (0.61–0.68), 0.64 (0.61–0.67), 0.56 (0.49–0.63) and 0.53 (0.5–0.56), respectively. A similar ranking was present based on the delta AUCs of the models. Calibration demonstrated close agreement of observed and predicted probabilities in the models by van Vugt et al. and Singal et al., other models lacked such correspondence. The absence of predictors in the IPD on dataset level hampered a systematical comparison of model performance and could be a limitation to the study.
The model by van Vugt et al. demonstrated the highest discriminative accuracy coupled with reasonable to good calibration across the IPD of different study populations. This model is therefore the main candidate for primary care use.
PMCID: PMC4769284  PMID: 26918859
6.  Anticipating missing reference standard data when planning diagnostic accuracy studies 
Results obtained using a reference standard may be missing for some participants in diagnostic accuracy studies. This paper looks at methods for dealing with such missing data when designing or conducting a prospective diagnostic accuracy study
PMCID: PMC4772780  PMID: 26861453
7.  Assessing variability in results in systematic reviews of diagnostic studies 
To describe approaches used in systematic reviews of diagnostic test accuracy studies for assessing variability in estimates of accuracy between studies and to provide guidance in this area.
Meta-analyses of diagnostic test accuracy studies published between May and September 2012 were systematically identified. Information on how the variability in results was investigated was extracted.
Of the 53 meta-analyses included in the review, most (n=48; 91 %) presented variability in diagnostic accuracy estimates visually either through forest plots or ROC plots and the majority (n=40; 75 %) presented a test or statistical measure for the variability. Twenty-eight reviews (53 %) tested for variability beyond chance using Cochran’s Q test and 31 (58 %) reviews quantified it with I2. 7 reviews (13 %) presented between-study variance estimates (τ2) from random effects models and 3 of these presented a prediction interval or ellipse to facilitate interpretation. Half of all the meta-analyses specified what was considered a significant amount of variability (n=24; 49 %).
Approaches to assessing variability in estimates of accuracy varied widely between diagnostic test accuracy reviews and there is room for improvement. We provide initial guidance, complemented by an overview of the currently available approaches.
Electronic supplementary material
The online version of this article (doi:10.1186/s12874-016-0108-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4714528  PMID: 26772804
Meta-analysis; Diagnostic techniques and procedures/standards; Sensitivity and specificity; Data interpretation; Statistical; Bias (epidemiology)
8.  Internet-Based Early Intervention to Prevent Posttraumatic Stress Disorder in Injury Patients: Randomized Controlled Trial 
Posttraumatic stress disorder (PTSD) develops in 10-20% of injury patients. We developed a novel, self-guided Internet-based intervention (called Trauma TIPS) based on techniques from cognitive behavioral therapy (CBT) to prevent the onset of PTSD symptoms.
To determine whether Trauma TIPS is effective in preventing the onset of PTSD symptoms in injury patients.
Adult, level 1 trauma center patients were randomly assigned to receive the fully automated Trauma TIPS Internet intervention (n=151) or to receive no early intervention (n=149). Trauma TIPS consisted of psychoeducation, in vivo exposure, and stress management techniques. Both groups were free to use care as usual (nonprotocolized talks with hospital staff). PTSD symptom severity was assessed at 1, 3, 6, and 12 months post injury with a clinical interview (Clinician-Administered PTSD Scale) by blinded trained interviewers and self-report instrument (Impact of Event Scale—Revised). Secondary outcomes were acute anxiety and arousal (assessed online), self-reported depressive and anxiety symptoms (Hospital Anxiety and Depression Scale), and mental health care utilization. Intervention usage was documented.
The mean number of intervention logins was 1.7, SD 2.5, median 1, interquartile range (IQR) 1-2. Thirty-four patients in the intervention group did not log in (22.5%), 63 (41.7%) logged in once, and 54 (35.8%) logged in multiple times (mean 3.6, SD 3.5, median 3, IQR 2-4). On clinician-assessed and self-reported PTSD symptoms, both the intervention and control group showed a significant decrease over time (P<.001) without significant differences in trend. PTSD at 12 months was diagnosed in 4.7% of controls and 4.4% of intervention group patients. There were no group differences on anxiety or depressive symptoms over time. Post hoc analyses using latent growth mixture modeling showed a significant decrease in PTSD symptoms in a subgroup of patients with severe initial symptoms (n=20) (P<.001).
Our results do not support the efficacy of the Trauma TIPS Internet-based early intervention in the prevention of PTSD symptoms for an unselected population of injury patients. Moreover, uptake was relatively low since one-fifth of individuals did not log in to the intervention. Future research should therefore focus on innovative strategies to increase intervention usage, for example, adding gameplay, embedding it in a blended care context, and targeting high-risk individuals who are more likely to benefit from the intervention.
Trial Registration
International Standard Randomized Controlled Trial Number (ISRCTN): 57754429; (Archived by WebCite at
PMCID: PMC3742408  PMID: 23942480
early intervention; prevention; Internet; posttraumatic stress disorder; cognitive behavior therapy
9.  The need to balance merits and limitations from different disciplines when considering the stepped wedge cluster randomized trial design 
Various papers have addressed pros and cons of the stepped wedge cluster randomized trial design (SWD). However, some issues have not or only limitedly been addressed. Our aim was to provide a comprehensive overview of all merits and limitations of the SWD to assist researchers, reviewers and medical ethics committees when deciding on the appropriateness of the SWD for a particular study.
We performed an initial search to identify articles with a methodological focus on the SWD, and categorized and discussed all reported advantages and disadvantages of the SWD. Additional aspects were identified during multidisciplinary meetings in which ethicists, biostatisticians, clinical epidemiologists and health economists participated. All aspects of the SWD were compared to the parallel group cluster randomized design. We categorized the merits and limitations of the SWD to distinct phases in the design and conduct of such studies, highlighting that their impact may vary depending on the context of the study or that benefits may be offset by drawbacks across study phases. Furthermore, a real-life illustration is provided.
New aspects are identified within all disciplines. Examples of newly identified aspects of an SWD are: the possibility to measure a treatment effect in each cluster to examine the (in)consistency in effects across clusters, the detrimental effect of lower than expected inclusion rates, deviation from the ordinary informed consent process and the question whether studies using the SWD are likely to have sufficient social value. Discussions are provided on e.g. clinical equipoise, social value, health economical decision making, number of study arms, and interim analyses.
Deciding on the use of the SWD involves aspects and considerations from different disciplines not all of which have been discussed before. Pros and cons of this design should be balanced in comparison to other feasible design options as to choose the optimal design for a particular intervention study.
PMCID: PMC4627408  PMID: 26514920
Epidemiologic research design; Stepped wedge design; Cluster randomized trial; Health economics; Research ethics; Biostatistics
11.  Studying Hospitalizations and Mortality in the Netherlands: Feasible and Valid Using Two-Step Medical Record Linkage with Nationwide Registers 
PLoS ONE  2015;10(7):e0132444.
In the Netherlands, the postal code is needed to study hospitalizations of individuals in the nationwide hospitalization register. Studying hospitalizations longitudinally becomes troublesome if individuals change address. We aimed to report on the feasibility and validity of a two-step medical record linkage approach to examine longitudinal trends in hospitalizations and mortality in a study cohort. First, we linked a study cohort of 1564 survivors of childhood cancer with the Municipal Personal Records Database (GBA) which has postal code history and mortality data available. Within GBA, we sampled a reference population matched on year of birth, gender and calendar year. Second, we extracted hospitalizations from the Hospital Discharge Register (LMR) with a date of discharge during unique follow-up (based on date of birth, gender and postal code in GBA). We calculated the agreement of death and being hospitalized in survivors according to the registers and to available cohort data. We retrieved 1477 (94%) survivors from GBA. Median percentages of unique/potential follow-up were 87% (survivors) and 83% (reference persons). Characteristics of survivors and reference persons contributing to unique follow-up were comparable. Agreement of hospitalization during unique follow-up was 94% and agreement of death was 98%. In absence of unique identifiers in the Dutch hospitalization register, it is feasible and valid to study hospitalizations and mortality of individuals longitudinally using a two-step medical record linkage approach. Cohort studies in the Netherlands have the opportunity to study mortality and hospitalization rates over time. These outcomes provide insight into the burden of clinical events and healthcare use in studies on patients at risk of long-term morbidities.
PMCID: PMC4493069  PMID: 26147988
12.  Circulating antigen tests and urine reagent strips for diagnosis of active schistosomiasis in endemic areas 
Point-of-care (POC) tests for diagnosing schistosomiasis include tests based on circulating antigen detection and urine reagent strip tests. If they had sufficient diagnostic accuracy they could replace conventional microscopy as they provide a quicker answer and are easier to use.
To summarise the diagnostic accuracy of: a) urine reagent strip tests in detecting active Schistosoma haematobium infection, with microscopy as the reference standard; and b) circulating antigen tests for detecting active Schistosoma infection in geographical regions endemic for Schistosoma mansoni or S. haematobium or both, with microscopy as the reference standard.
Search methods
We searched the electronic databases MEDLINE, EMBASE, BIOSIS, MEDION, and Health Technology Assessment (HTA) without language restriction up to 30 June 2014.
Selection criteria
We included studies that used microscopy as the reference standard: for S. haematobium, microscopy of urine prepared by filtration, centrifugation, or sedimentation methods; and for S. mansoni, microscopy of stool by Kato-Katz thick smear. We included studies on participants residing in endemic areas only.
Data collection and analysis
Two review authors independently extracted data, assessed quality of the data using QUADAS-2, and performed meta-analysis where appropriate. Using the variability of test thresholds, we used the hierarchical summary receiver operating characteristic (HSROC) model for all eligible tests (except the circulating cathodic antigen (CCA) POC for S. mansoni, where the bivariate random-effects model was more appropriate). We investigated heterogeneity, and carried out indirect comparisons where data were sufficient. Results for sensitivity and specificity are presented as percentages with 95% confidence intervals (CI).
Main results
We included 90 studies; 88 from field settings in Africa. The median S. haematobium infection prevalence was 41% (range 1% to 89%) and 36% for S. mansoni (range 8% to 95%). Study design and conduct were poorly reported against current standards.
Tests for S. haematobium
Urine reagent test strips versus microscopy
Compared to microscopy, the detection of microhaematuria on test strips had the highest sensitivity and specificity (sensitivity 75%, 95% CI 71% to 79%; specificity 87%, 95% CI 84% to 90%; 74 studies, 102,447 participants). For proteinuria, sensitivity was 61% and specificity was 82% (82,113 participants); and for leukocyturia, sensitivity was 58% and specificity 61% (1532 participants). However, the difference in overall test accuracy between the urine reagent strips for microhaematuria and proteinuria was not found to be different when we compared separate populations (P = 0.25), or when direct comparisons within the same individuals were performed (paired studies; P = 0.21).
When tests were evaluated against the higher quality reference standard (when multiple samples were analysed), sensitivity was marginally lower for microhaematuria (71% vs 75%) and for proteinuria (49% vs 61%). The specificity of these tests was comparable.
Antigen assay
Compared to microscopy, the CCA test showed considerable heterogeneity; meta-analytic sensitivity estimate was 39%, 95% CI 6% to 73%; specificity 78%, 95% CI 55% to 100% (four studies, 901 participants).
Tests for S. mansoni
Compared to microscopy, the CCA test meta-analytic estimates for detecting S. mansoni at a single threshold of trace positive were: sensitivity 89% (95% CI 86% to 92%); and specificity 55% (95% CI 46% to 65%; 15 studies, 6091 participants) Against a higher quality reference standard, the sensitivity results were comparable (89% vs 88%) but specificity was higher (66% vs 55%). For the CAA test, sensitivity ranged from 47% to 94%, and specificity from 8% to 100% (4 studies, 1583 participants).
Authors' conclusions
Among the evaluated tests for S. haematobium infection, microhaematuria correctly detected the largest proportions of infections and non-infections identified by microscopy.
The CCA POC test for S. mansoni detects a very large proportion of infections identified by microscopy, but it misclassifies a large proportion of microscopy negatives as positives in endemic areas with a moderate to high prevalence of infection, possibly because the test is potentially more sensitive than microscopy.
Plain Language Summary
How well do point-of-care tests detect Schistosoma infections in people living inendemic areas?
Schistosomiasis, also known as bilharzia, is a parasitic disease common in the tropical and subtropics. Point-of-care tests and urine reagent strip tests are quicker and easier to use than microscopy. We estimate how well these point-of-care tests are able to detect schistosomiasis infections compared with microscopy.
We searched for studies published in any language up to 30 June 2014, and we considered the study’s risk of providing biased results.
What do the results say?
We included 90 studies involving almost 200,000 people, with 88 of these studies carried out in Africa in field settings. Study design and conduct were poorly reported against current expectations. Based on our statistical model, we found:
• Among the urine strips for detecting urinary schistosomiasis, the strips for detecting blood were better than those detecting protein or white cells (sensitivity and specificity for blood 75% and 87%; for protein 61% and 82%; and for white cells 58% and 61%, respectively).
• For urinary schistosomiasis, the parasite antigen test performance was worse (sensitivity, 39% and specificity, 78%) than urine strips for detecting blood.
• For intestinal schistosomiasis, the parasite antigen urine test, detected many infections identified by microscopy but wrongly labelled many uninfected people as sick (sensitivity, 89% and specificity, 55%).
What are the consequences of using these tests?
If we take 1000 people, of which 410 have urinary schistosomiasis on microscopy testing, then using the strip detecting blood in the urine would misclassify 77 uninfected people as infected, and thus may receive unnecessary treatment; and it would wrongly classify 102 infected people as uninfected, who thus may not receive treatment.
If we take 1000 people, of which 360 have intestinal schistosomiasis on microscopy testing, then the antigen test would misclassify 288 uninfected people as infected. These people may be given unnecessary treatment. This test also would wrongly classify 40 infected people as uninfected who thus may not receive treatment.
Conclusion of review
For urinary schistosomiasis, the urine strip for detecting blood leads to some infected people being missed and some non-infected people being diagnosed with the condition, but is better than the protein or white cell tests. The parasite antigen test is not accurate.
For intestinal schistosomiasis, the parasite antigen urine test can wrongly classify many uninfected people as infected.
PMCID: PMC4455231  PMID: 25758180
13.  Small-study effects and time trends in diagnostic test accuracy meta-analyses: a meta-epidemiological study 
Systematic Reviews  2015;4:66.
Small-study effects and time trends have been identified in meta-analyses of randomized trials. We evaluated whether these effects are also present in meta-analyses of diagnostic test accuracy studies.
A systematic search identified test accuracy meta-analyses published between May and September 2012. In each meta-analysis, the strength of the associations between estimated accuracy of the test (diagnostic odds ratio (DOR), sensitivity, and specificity) and sample size and between accuracy estimates and time since first publication were evaluated using meta-regression models. The regression coefficients over all meta-analyses were summarized using random effects meta-analysis.
Forty-six meta-analyses and their corresponding primary studies (N = 859) were included. There was a non-significant relative change in the DOR of 1.01 per 100 additional participants (95% CI 1.00 to 1.03; P = 0.07). In the subgroup of imaging studies, there was a relative increase in sensitivity of 1.13 per 100 additional diseased subjects (95% CI 1.05 to 1.22; P = 0.002). The relative change in DOR with time since first publication was 0.94 per 5 years (95% CI 0.80 to 1.10; P = 0.42). Sensitivity was lower in studies published later (relative change 0.89, 95% CI 0.80 to 0.99; P = 0.04).
Small-study effects and time trends do not seem to be as pronounced in meta-analyses of test accuracy studies as they are in meta-analyses of randomized trials. Small-study effects seem to be reversed in imaging, where larger studies tend to report higher sensitivity.
Electronic supplementary material
The online version of this article (doi:10.1186/s13643-015-0049-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4450491  PMID: 25956716
Diagnostic test accuracy; Sensitivity; Specificity; Meta-analyses; Publication bias; Small-study effects; Time trends; Systematic reviews
14.  Clinical decision-making of cardiologists regarding admission and treatment of patients with suspected unstable angina or non-ST-elevation myocardial infarction: protocol of a clinical vignette study 
BMJ Open  2015;5(4):e006441.
Cardiologists face the difficult task of rapidly distinguishing cardiac-related chest pain from other conditions, and to thoroughly consider whether invasive diagnostic procedures or treatments are indicated. The use of cardiac risk-scoring instruments has been recommended in international cardiac guidelines. However, it is unknown to what degree cardiac risk scores and other clinical information influence cardiologists’ decision-making. This paper describes the development of a binary choice experiment using realistic descriptions of clinical cases. The study aims to determine the importance cardiologists put on different types of clinical information, including cardiac risk scores, when deciding on the management of patients with suspected unstable angina or non-ST-elevation myocardial infarction.
Methods and analysis
Cardiologists were asked, in a nationwide survey, to weigh different clinical factors in decision-making regarding patient admission and treatment using realistic descriptions of patients in which specific characteristics are varied in a systematic way (eg, web-based clinical vignettes). These vignettes represent patients with suspected unstable angina or non-ST-elevation myocardial infarction. Associations between several clinical characteristics, with cardiologists’ management decisions, will be analysed using generalised linear mixed models.
Ethics and dissemination
The study has received ethics approval and informed consent will be obtained from all participating cardiologists. The results of the study will provide insight into the relative importance of cardiac risk scores and other clinical information in cardiac decision-making. Further, the results indicate cardiologists’ adherence to the European Society of Cardiology guideline recommendations. In addition, the detailed description of the method of vignette development applied in this study could assist other researchers or clinicians in creating future choice experiments.
PMCID: PMC4390690  PMID: 25854966
case scenarios; acute coronary syndromes; risk assesment; decision making
15.  Are novel non-invasive imaging techniques needed in patients with suspected prosthetic heart valve endocarditis? A systematic review and meta-analysis 
European Radiology  2015;25(7):2125-2133.
Multimodal non-invasive imaging plays a key role in establishing a diagnosis of PHV endocarditis. The objective of this study was to provide a systematic review of the literature and meta-analysis of the diagnostic accuracy of TTE, TEE, and MDCT in patients with (suspected) PHV endocarditis.
Studies published between 1985 and 2013 were identified via search and cross-reference of PubMed/Embase databases. Studies were included if (1) they reported on the non-invasive index tests TTE, TEE, or MDCT; (2) data was provided on PHV endocarditis as the condition of interest; and (3) imaging results were verified against either surgical inspection/autopsy or clinical follow-up reference standards, thereby enabling the extraction of 2-by-2 tables.
Twenty articles (including 496 patients) met the inclusion criteria for PHV endocarditis. TTE, TEE, and MDCT + TEE had a pooled sensitivity/specificity for vegetations of 29/100 %; 82/95 %, and 88/94 %, respectively. The pooled sensitivity/specificity of TTE, TEE, and MDCT + TEE for periannular complications was 36/93 %, 86/98 %, and 100/94 %, respectively.
TEE showed good sensitivity and specificity for establishing a diagnosis of PHV endocarditis. Although MDCT data are limited, this review showed that MDCT in addition to TEE may improve sensitivity in detecting life-threatening periannular complications.
Key Points
• Multimodal imaging is an important ingredient of diagnostic workup for PHV endocarditis.
• Transthoracic and transesophageal echography may miss life-threatening periannular complications.
• MDCT can improve sensitivity for the detection of life-threatening periannular complications.
Electronic supplementary material
The online version of this article (doi:10.1007/s00330-015-3605-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4457913  PMID: 25680715
Echocardiography; Computed tomography; Endocarditis; Prosthetic heart valve; Systematic review
16.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement 
BMC Medicine  2015;13:1.
Prediction models are developed to aid health care providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will occur in the future (prognostic models), to inform their decision making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative developed a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, health care professionals, and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors. The resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document. To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at
Editors’ note: In order to encourage dissemination of the TRIPOD Statement, this article is freely accessible on the Annals of Internal Medicine Web site ( and will be also published in BJOG, British Journal of Cancer, British Journal of Surgery, BMC Medicine, British Medical Journal, Circulation, Diabetic Medicine, European Journal of Clinical Investigation, European Urology, and Journal of Clinical Epidemiology. The authors jointly hold the copyright of this article. An accompanying Explanation and Elaboration article is freely available only on; Annals of Internal Medicine holds copyright for that article.
PMCID: PMC4284921  PMID: 25563062
Prediction models; Prognostic; Diagnostic; Model development; Validation; Transparency; Reporting
17.  Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist 
PLoS Medicine  2014;11(10):e1001744.
Carl Moons and colleagues provide a checklist and background explanation for critically appraising and extracting data from systematic reviews of prognostic and diagnostic prediction modelling studies.
Please see later in the article for the Editors' Summary
PMCID: PMC4196729  PMID: 25314315
18.  Latent class bivariate model for the meta-analysis of diagnostic test accuracy studies 
Several types of statistical methods are currently available for the meta-analysis of studies on diagnostic test accuracy. One of these methods is the Bivariate Model which involves a simultaneous analysis of the sensitivity and specificity from a set of studies. In this paper, we review the characteristics of the Bivariate Model and demonstrate how it can be extended with a discrete latent variable. The resulting clustering of studies yields additional insight into the accuracy of the test of interest.
A Latent Class Bivariate Model is proposed. This model captures the between-study variability in sensitivity and specificity by assuming that studies belong to one of a small number of latent classes. This yields both an easier to interpret and a more precise description of the heterogeneity between studies. Latent classes may not only differ with respect to the average sensitivity and specificity, but also with respect to the correlation between sensitivity and specificity.
The Latent Class Bivariate Model identifies clusters of studies with their own estimates of sensitivity and specificity. Our simulation study demonstrated excellent parameter recovery and good performance of the model selection statistics typically used in latent class analysis. Application in a real data example on coronary artery disease showed that the inclusion of latent classes yields interesting additional information.
Our proposed new meta-analysis method can lead to a better fit of the data set of interest, less biased estimates and more reliable confidence intervals for sensitivities and specificities. But even more important, it may serve as an exploratory tool for subsequent sub-group meta-analyses.
PMCID: PMC4105799  PMID: 25015209
Meta-analysis; Meta-regression; Bivariate model; Latent class model
19.  Comparing Screening Instruments to Predict Posttraumatic Stress Disorder 
PLoS ONE  2014;9(5):e97183.
Following traumatic exposure, a proportion of trauma victims develops posttraumatic stress disorder (PTSD). Early PTSD risk screening requires sensitive instruments to identify everyone at risk for developing PTSD in need of diagnostic follow-up.
This study compares the accuracy of the 4-item SPAN, 10-item Trauma Screening Questionnaire (TSQ) and 22-item Impact of Event Scale-Revised (IES-R) in predicting chronic PTSD at a minimum sensitivity of 80%.
Injury patients admitted to a level-I trauma centre (N = 311) completed the instruments at a median of 23 days and were clinically assessed for PTSD at 6 months. Areas under the curve and specificities at 80% sensitivity were compared between instruments.
Areas under the curve in all instruments were adequate (SPAN: 0.83; TSQ: 0.82; IES-R: 0.83) with no significant differences. At 80% sensitivity, specificities were 64% for SPAN, 59% for TSQ and 72% for IES-R.
The SPAN, TSQ and IES-R show similar accuracy in early detection of individuals at risk for PTSD, despite differences in number of items. The modest specificities and low positive predictive values found for all instruments could lead to relatively many false positive cases, when applied in clinical practice.
PMCID: PMC4016271  PMID: 24816642
20.  Incorporating quality assessments of primary studies in the conclusions of diagnostic accuracy reviews: a cross-sectional study 
Drawing conclusions from systematic reviews of test accuracy studies without considering the methodological quality (risk of bias) of included studies may lead to unwarranted optimism about the value of the test(s) under study. We sought to identify to what extent the results of quality assessment of included studies are incorporated in the conclusions of diagnostic accuracy reviews.
We searched MEDLINE and EMBASE for test accuracy reviews published between May and September 2012. We examined the abstracts and main texts of these reviews to see whether and how the results of quality assessment were linked to the accuracy estimates when drawing conclusions.
We included 65 reviews of which 53 contained a meta-analysis. Sixty articles (92%) had formally assessed the methodological quality of included studies, most often using the original QUADAS tool (n = 44, 68%). Quality assessment was mentioned in 28 abstracts (43%); with a majority (n = 21) mentioning it in the methods section. In only 5 abstracts (8%) were results of quality assessment incorporated in the conclusions. Thirteen reviews (20%) presented results of quality assessment in the main text only, without further discussion. Forty-seven reviews (72%) discussed results of quality assessment; the most frequent form was as limitations in assessing quality (n = 28). Only 6 reviews (9%) further linked the results of quality assessment to their conclusions, 3 of which did not conduct a meta-analysis due to limitations in the quality of included studies. In the reviews with a meta-analysis, 19 (36%) incorporated quality in the analysis. Eight reported significant effects of quality on the pooled estimates; in none of them these effects were factored in the conclusions.
While almost all recent diagnostic accuracy reviews evaluate the quality of included studies, very few consider results of quality assessment when drawing conclusions. The practice of reporting systematic reviews of test accuracy should improve if readers not only want to be informed about the limitations in the available evidence, but also on the associated implications for the performance of the evaluated tests.
PMCID: PMC3942773  PMID: 24588874
Diagnostic tests; Test accuracy; Systematic reviews; Meta-analysis; Quality; QUADAS; Risk of bias
21.  Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting 
PLoS Medicine  2013;10(10):e1001531.
Loes C. M. Bertens and colleagues survey the published diagnostic research literature for use of expert panels to define the reference standard, characterize components and missing information, and recommend elements that should be reported in diagnostic studies.
Please see later in the article for the Editors' Summary
In diagnostic studies, a single and error-free test that can be used as the reference (gold) standard often does not exist. One solution is the use of panel diagnosis, i.e., a group of experts who assess the results from multiple tests to reach a final diagnosis in each patient. Although panel diagnosis, also known as consensus or expert diagnosis, is frequently used as the reference standard, guidance on preferred methodology is lacking. The aim of this study is to provide an overview of methods used in panel diagnoses and to provide initial guidance on the use and reporting of panel diagnosis as reference standard.
Methods and Findings
PubMed was systematically searched for diagnostic studies applying a panel diagnosis as reference standard published up to May 31, 2012. We included diagnostic studies in which the final diagnosis was made by two or more persons based on results from multiple tests. General study characteristics and details of panel methodology were extracted. Eighty-one studies were included, of which most reported on psychiatry (37%) and cardiovascular (21%) diseases. Data extraction was hampered by incomplete reporting; one or more pieces of critical information about panel reference standard methodology was missing in 83% of studies. In most studies (75%), the panel consisted of three or fewer members. Panel members were blinded to the results of the index test results in 31% of studies. Reproducibility of the decision process was assessed in 17 (21%) studies. Reported details on panel constitution, information for diagnosis and methods of decision making varied considerably between studies.
Methods of panel diagnosis varied substantially across studies and many aspects of the procedure were either unclear or not reported. On the basis of our review, we identified areas for improvement and developed a checklist and flow chart for initial guidance for researchers conducting and reporting of studies involving panel diagnosis.
Please see later in the article for the Editors' Summary
Editors' Summary
Before any disease or condition can be treated, a correct diagnosis of the condition has to be made. Faced with a patient with medical problems and no diagnosis, a doctor will ask the patient about their symptoms and medical history and generally will examine the patient. On the basis of this questioning and examination, the clinician will form an initial impression of the possible conditions the patient may have, usually with a most likely diagnosis in mind. To support or reject the most likely diagnosis and to exclude the other possible diagnoses, the clinician will then order a series of tests and diagnostic procedures. These may include laboratory tests (such as the measurement of blood sugar levels), imaging procedures (such as an MRI scan), or functional tests (such as spirometry, which tests lung function). Finally, the clinician will use all the data s/he has collected to reach a firm diagnosis and will recommend a program of treatment or observation for the patient.
Why Was This Study Done?
Researchers are continually looking for new, improved diagnostic tests and multivariable diagnostic models—combinations of tests and characteristics that point to a diagnosis. Diagnostic research, which assesses the accuracy of new tests and models, requires that each patient involved in a diagnostic study has a final correct diagnosis. Unfortunately, for most conditions, there is no single, error-free test that can be used as the reference (gold) standard for diagnosis. If an imperfect reference standard is used, errors in the final disease classification may bias the results of the diagnostic study and may lead to a new test being adopted that is actually less accurate than existing tests. One widely used solution to the lack of a reference standard is “panel diagnosis” in which two or more experts assess the results from multiple tests to reach a final diagnosis for each patient in a diagnostic study. However, there is currently no formal guidance available on the conduct and reporting of panel diagnosis. Here, the researchers undertake a systematic review (a study that uses predefined criteria to identify research on a given topic) to provide an overview of the methodology and reporting of panel diagnosis.
What Did the Researchers Do and Find?
The researchers identified 81 published diagnostic studies that used panel diagnosis as a reference standard. 37% of these studies reported on psychiatric diseases, 21% reported on cardiovascular diseases, and 12% reported on respiratory diseases. Most of the studies (64%) were designed to assess the accuracy of one or more diagnostic test. Notably, one or more critical piece of information on methodology was missing in 83% of the studies. Specifically, information on the constitution of the panel was missing in a quarter of the studies and information on the decision-making process (whether, for example, a diagnosis was reached by discussion among panel members or by combining individual panel member's assessments) was incomplete in more than two-thirds of the studies. In three-quarters of the studies for which information was available, the panel consisted of only two or three members; different fields of expertise were represented in the panels in nearly two-thirds of the studies. In a third of the studies for which information was available, panel members made their diagnoses without access to the results of the test being assessed. Finally, the reproducibility of the decision-making process was assessed in a fifth of the studies.
What Do These Findings Mean?
These findings indicate that the methodology of panel diagnosis varies substantially among diagnostic studies and that reporting of this methodology is often unclear or absent. Both the methodology and reporting of panel diagnosis could, therefore, be improved substantially. Based on their findings, the researchers provide a checklist and flow chart to help guide the conduct and reporting of studies involving panel diagnosis. For example, they suggest that, when designing a study that uses panel diagnosis as the reference standard, the number and background of panel members should be considered, and they provide a list of options that should be considered when planning the decision-making process. Although more research into each of the options identified by the researchers is needed, their recommendations provide a starting point for the development of formal guidelines on the methodology and reporting of panel diagnosis for use as a reference standard in diagnostic research.
Additional Information
Please access these Web sites via the online version of this summary at
Wikipedia has a page on medical diagnosis (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Equator Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines, including the STAndards for the Reporting of Diagnostic accuracy studies (STARD), an initiative that aims to improve the accuracy and completeness of reporting of studies of diagnostic accuracy
PMCID: PMC3797139  PMID: 24143138
22.  Development and validation of a model to predict the risk of exacerbations in chronic obstructive pulmonary disease 
Prediction models for exacerbations in patients with chronic obstructive pulmonary disease (COPD) are scarce. Our aim was to develop and validate a new model to predict exacerbations in patients with COPD.
Patients and methods
The derivation cohort consisted of patients aged 65 years or over, with a COPD diagnosis, who were followed up over 24 months. The external validation cohort consisted of another cohort of COPD patients, aged 50 years or over. Exacerbations of COPD were defined as symptomatic deterioration requiring pulsed oral steroid use or hospitalization. Logistic regression analysis including backward selection and shrinkage were used to develop the final model and to adjust for overfitting. The adjusted regression coefficients were applied in the validation cohort to assess calibration of the predictions and calculate changes in discrimination applying C-statistics.
The derivation and validation cohort consisted of 240 and 793 patients with COPD, of whom 29% and 28%, respectively, experienced an exacerbation during follow-up. The final model included four easily assessable variables: exacerbations in the previous year, pack years of smoking, level of obstruction, and history of vascular disease, with a C-statistic of 0.75 (95% confidence interval [CI]: 0.69–0.82). Predictions were well calibrated in the validation cohort, with a small loss in discrimination potential (C-statistic 0.66 [95% CI 0.61–0.71]).
Our newly developed prediction model can help clinicians to predict the risk of future exacerbations in individual patients with COPD, including those with mild disease.
PMCID: PMC3797610  PMID: 24143086
exacerbation of COPD; risk prediction; external validation; vascular disease
23.  The impact of the HEART risk score in the early assessment of patients with acute chest pain: design of a stepped wedge, cluster randomised trial 
Chest pain remains a diagnostic challenge: physicians do not want to miss an acute coronary syndrome (ACS), but, they also wish to avoid unnecessary additional diagnostic procedures. In approximately 75% of the patients presenting with chest pain at the emergency department (ED) there is no underlying cardiac cause. Therefore, diagnostic strategies focus on identifying patients in whom an ACS can be safely ruled out based on findings from history, physical examination and early cardiac marker measurement. The HEART score, a clinical prediction rule, was developed to provide the clinician with a simple, early and reliable predictor of cardiac risk. We set out to quantify the impact of the use of the HEART score in daily practice on patient outcomes and costs.
We designed a prospective, multi-centre, stepped wedge, cluster randomised trial. Our aim is to include a total of 6600 unselected chest pain patients presenting at the ED in 10 Dutch hospitals during an 11-month period. All clusters (i.e. hospitals) start with a period of ‘usual care’ and are randomised in their timing when to switch to ‘intervention care’. The latter involves the calculation of the HEART score in each patient to guide clinical decision; notably reassurance and discharge of patients with low scores and intensive monitoring and early intervention in patients with high HEART scores. Primary outcome is occurrence of major adverse cardiac events (MACE), including acute myocardial infarction, revascularisation or death within 6 weeks after presentation. Secondary outcomes include occurrence of MACE in low-risk patients, quality of life, use of health care resources and costs.
Stepped wedge designs are increasingly used to evaluate the real-life effectiveness of non-pharmacological interventions because of the following potential advantages: (a) each hospital has both a usual care and an intervention period, therefore, outcomes can be compared within and across hospitals; (b) each hospital will have an intervention period which enhances participation in case of a promising intervention; (c) all hospitals generate data about potential implementation problems. This large impact trial will generate evidence whether the anticipated benefits (in terms of safety and cost-effectiveness) of using the HEART score will indeed be achieved in real-life clinical practice.
Trial registration 80-82310-97-12154.
PMCID: PMC3849098  PMID: 24070098
HEART score; Chest pain; Clinical prediction rule; Risk score implementation; Impact; Stepped wedge design; Cluster randomised trial
24.  Variation of a test’s sensitivity and specificity with disease prevalence 
Anecdotal evidence suggests that the sensitivity and specificity of a diagnostic test may vary with disease prevalence. Our objective was to investigate the associations between disease prevalence and test sensitivity and specificity using studies of diagnostic accuracy.
We used data from 23 meta-analyses, each of which included 10–39 studies (416 total). The median prevalence per review ranged from 1% to 77%. We evaluated the effects of prevalence on sensitivity and specificity using a bivariate random-effects model for each meta-analysis, with prevalence as a covariate. We estimated the overall effect of prevalence by pooling the effects using the inverse variance method.
Within a given review, a change in prevalence from the lowest to highest value resulted in a corresponding change in sensitivity or specificity from 0 to 40 percentage points. This effect was statistically significant (p < 0.05) for either sensitivity or specificity in 8 meta-analyses (35%). Overall, specificity tended to be lower with higher disease prevalence; there was no such systematic effect for sensitivity.
The sensitivity and specificity of a test often vary with disease prevalence; this effect is likely to be the result of mechanisms, such as patient spectrum, that affect prevalence, sensitivity and specificity. Because it may be difficult to identify such mechanisms, clinicians should use prevalence as a guide when selecting studies that most closely match their situation.
PMCID: PMC3735771  PMID: 23798453
25.  A decision rule to aid selection of patients with abdominal sepsis requiring a relaparotomy 
BMC Surgery  2013;13:28.
Accurate and timely identification of patients in need of a relaparotomy is challenging since there are no readily available strongholds. The aim of this study is to develop a prediction model to aid the decision-making process in whom to perform a relaparotomy.
Data from a randomized trial comparing surgical strategies for relaparotomy were used. Variables were selected based on previous reports and common clinical sense and screened in a univariable regression analysis to identify those associated with the need for relaparotomy. Variables with the strongest association were considered for the prediction model which was constructed after backward elimination in a multivariable regression analysis. The discriminatory capacity of the model was expressed with the area under the curve (AUC). A cut-off analysis was performed to illustrate the consequences in clinical practice.
One hundred and eighty-two patients were included; 46 were considered cases requiring a relaparotomy. A prediction model was build containing 6 variables. This final model had an AUC of 0.80 indicating good discriminatory capacity. However, acceptable sensitivity would require a low threshold for relaparotomy leading to an unacceptable rate of negative relaparotomies (63%). Therefore, the prediction model was incorporated in a decision rule were the interval until re-assessment and the use of Computed Tomography are related to the outcome of the model.
To construct a prediction model that will provide a definite answer whether or not to perform a relaparotomy seems a utopia. However, our prediction model can be used to stratify patients on their underlying risk and could guide further monitoring of patients with abdominal sepsis in order to identify patients with suspected ongoing peritonitis in a timely fashion.
PMCID: PMC3750491  PMID: 23870702
Secondary peritonitis; Abdominal sepsis; Relaparotomy; On-demand; Prediction model; Decision rule

Results 1-25 (52)