The hospitalized elderly are at risk of functional decline. We evaluated the effects and care costs of a specialized geriatric rehabilitation program aimed at preventing functional decline among at-risk hospitalized elderly.
The prospective nonrandomized controlled trial reported here was performed in three hospitals in the Netherlands. One hospital implemented the Prevention and Reactivation Care Program (PReCaP), while two other hospitals providing usual care served as control settings. Within the PReCaP hospital we compared patients pre-implementation with patients post-implementation of the PReCaP (“within-hospital analysis”), while our nonrandomized controlled trial compared patients of the PReCaP hospital post-implementation with patients from the two control hospitals providing usual care (“between-hospital analysis”). Hospitalized patients 65 years or older and at risk of functional decline were interviewed at baseline and at 3 and 12 months using validated questionnaires to score functioning, depression, and health-related quality of life (HRQoL). We estimated costs per unit of care from hospital information systems and national data sources. We used adjusted general linear mixed models to analyze functioning and HRQoL.
Between-hospital analysis showed no difference in activities of daily living (ADL) or instrumental activities of daily living (IADL) between PReCaP patients and control groups. PReCaP patients did have slightly better cognitive functioning (Mini Mental State Examination; 0.4 [95% confidence interval (CI) 0.2–0.6]), lower depression (Geriatric Depression Scale 15; −0.9 [95% −1.1 to −0.6]) and higher perceived health (Short-Form 20; 5.6 [95% CI 2.8–8.4]) than control patients. Analyses within the PReCaP hospital comparing patients pre-and post-implementation of the PReCaP showed no improvement over time in functioning, depression, and HRQoL. One-year health care costs were higher for PReCaP patients, both for the within-hospital analysis (+€7,000) and the between-hospital analysis (+€2,500).
We did not find any effect of the PReCaP on ADL and IADL. The PReCaP may possibly provide some benefits to hospitalized patients at risk of functional decline with respect to cognitive functioning, depression, and perceived health. Further evaluations of integrated intervention programs to limit functional decline are therefore required.
functional decline; geriatric rehabilitation; health-related quality of life; activities of daily living
elderly; clopidogrel; glycoprotein IIb/IIIa blockers
To investigate whether the beneficial and harmful effects of platelet glycoprotein IIb/IIIa receptor blockers in non‐ST elevation acute coronary syndromes (NSTE‐ACS) depend on age.
A meta‐analysis of six trials of platelet glycoprotein IIb/IIIa receptor blockers in patients with NSTE‐ACS (PRISM, PRISM‐PLUS, PARAGON‐A, PURSUIT, PARAGON‐B, GUSTO IV‐ACS; n = 31 402) was performed. We applied multivariable logistic regression analyses to evaluate the drug effects on death or non‐fatal myocardial infarction at 30 days, and on major bleeding, by age subgroups (<60, 60–69, 70–79, ⩾80 years). We quantified the reduction of death or myocardial infarction as the number needed to treat (NNT), and the increase of major bleeding as the number needed to harm (NNH).
Subgroups had 11 155 (35%), 9727 (31%), 8468 (27%) and 2049 (7%) patients, respectively. The relative benefit of platelet glycoprotein IIb/IIIa receptor blockers did not differ significantly (p = 0.5) between age subgroups (OR (95% CI) for death or myocardial infarction: 0.86 (0.74 to 0.99), 0.90 (0.80 to 1.02), 0.97 (0.86 to 1.10), 0.90 (0.73 to 1.16); overall 0.91 (0.86 to 0.99). ORs for major bleeding were 1.9 (1.3 to 2.8), 1.9 (1.4 to 2.7), 1.6 (1.2 to 2.1) and 2.5 (1.5–4.1). Overall NNT was 105, and overall NNH was 90. The oldest patients had larger absolute increases in major bleeding, but also had the largest absolute reductions of death or myocardial infarction. Patients ⩾80 years had half of the NNT and a third of the NNH of patients <60 years.
In patients with NSTE‐ACS, the relative reduction of death or non‐fatal myocardial infarction with platelet glycoprotein IIb/IIIa receptor blockers was independent of patient age. Larger absolute outcome reductions were seen in older patients, but with a higher risk of major bleeding. Close monitoring of these patients is warranted.
Clinical prediction models provide risk estimates for the presence of disease (diagnosis) or an event in the future course of disease (prognosis) for individual patients. Although publications that present and evaluate such models are becoming more frequent, the methodology is often suboptimal. We propose that seven steps should be considered in developing prediction models: (i) consideration of the research question and initial data inspection; (ii) coding of predictors; (iii) model specification; (iv) model estimation; (v) evaluation of model performance; (vi) internal validation; and (vii) model presentation. The validity of a prediction model is ideally assessed in fully independent data, where we propose four key measures to evaluate model performance: calibration-in-the-large, or the model intercept (A); calibration slope (B); discrimination, with a concordance statistic (C); and clinical usefulness, with decision-curve analysis (D). As an application, we develop and validate prediction models for 30-day mortality in patients with an acute myocardial infarction. This illustrates the usefulness of the proposed framework to strengthen the methodological rigour and quality for prediction models in cardiovascular research.
Prediction model; Non-linearity; Missing values; Shrinkage; Calibration; Discrimination; Clinical usefulness
The choice of disease-specific versus generic scales is common to many fields of medicine. In the area of traumatic brain injury, evidence is coming forward that disease-specific prognostic models and disease-specific scoring systems are preferable in the intensive care setting. In monitoring prognosis, the use of a calibration belt in validation studies potentially provides accurate and intuitively attractive insight into performance. This approach deserves further empirical evaluation of its added value as well as its limitations.
To assess the impact of a clinical decision model for febrile children at risk for serious bacterial infections (SBI) attending the emergency department (ED).
Randomized controlled trial with 439 febrile children, aged 1 month-16 years, attending the pediatric ED of a Dutch university hospital during 2010-2012. Febrile children were randomly assigned to the intervention (clinical decision model; n=219) or the control group (usual care; n=220). The clinical decision model included clinical symptoms, vital signs, and C-reactive protein and provided high/low-risks for “pneumonia” and “other SBI”. Nurses were guided by the intervention to initiate additional tests for high-risk children. The clinical decision model was evaluated by 1) area-under-the-receiver-operating-characteristic-curve (AUC) to indicate discriminative ability and 2) feasibility, to measure nurses’ compliance to model recommendations. Primary patient outcome was defined as correct SBI diagnoses. Secondary process outcomes were defined as length of stay; diagnostic tests; antibiotic treatment; hospital admission; revisits and medical costs.
The decision model had good discriminative ability for both pneumonia (n=33; AUC 0.83 (95% CI 0.75-0.90)) and other SBI (n=22; AUC 0.81 (95% CI 0.72-0.90)). Compliance to model recommendations was high (86%). No differences in correct SBI determination were observed. Application of the clinical decision model resulted in less full-blood-counts (14% vs. 22%, p-value<0.05) and more urine-dipstick testing (71% vs. 61%, p-value<0.05).
In contrast to our expectations no substantial impact on patient outcome was perceived. The clinical decision model preserved, however, good discriminatory ability to detect SBI, achieved good compliance among nurses and resulted in a more standardized diagnostic approach towards febrile children, with less full blood-counts and more rightfully urine-dipstick testing.
Nederlands Trial Register NTR2381
To develop and validate 10-year cumulative incidence functions of intracerebral hemorrhage (ICH) and ischemic stroke (IS).
We used data on 27,493 participants from 3 population-based cohort studies: the Atherosclerosis Risk in Communities Study, median age 54 years, 45% male, median follow-up 20.7 years; the Rotterdam Study, median age 68 years, 38% male, median follow-up 14.3 years; and the Cardiovascular Health Study, median age 71 years, 41% male, median follow-up 12.8 years. Among these participants, 325 ICH events, 2,559 IS events, and 9,909 nonstroke deaths occurred. We developed 10-year cumulative incidence functions for ICH and IS using stratified Cox regression and competing risks analysis. Basic models including only established nonlaboratory risk factors were extended with diastolic blood pressure, total cholesterol/high-density lipoprotein cholesterol ratio, body mass index, waist-to-hip ratio, and glomerular filtration rate. The cumulative incidence functions' performances were cross-validated in each cohort separately by Harrell C-statistic and calibration plots.
High total cholesterol/high-density lipoprotein cholesterol ratio decreased the ICH rates but increased IS rates (p for difference across stroke types <0.001). For both the ICH and IS models, C statistics increased more by model extension in the Atherosclerosis Risk in Communities and Cardiovascular Health Study cohorts. Improvements in C statistics were reproduced by cross-validation. Models were well calibrated in all cohorts. Correlations between 10-year ICH and IS risks were moderate in each cohort.
We developed and cross-validated cumulative incidence functions for separate prediction of 10-year ICH and IS risk. These functions can be useful to further specify an individual's stroke risk.
Measurement of health-related quality of life (HRQL) is essential to quantify the subjective burden of traumatic brain injury (TBI) in survivors. We performed a systematic review of HRQL studies in TBI to evaluate study design, instruments used, methodological quality, and outcome. Fifty-eight studies were included, showing large variation in HRQL instruments and assessment time points used. The Short Form-36 (SF-36) was most frequently used. A high prevalence of health problems during and after the first year of TBI was a common finding of the studies included. In the long term, patients with a TBI still showed large deficits from full recovery compared to population norms. Positive results for internal consistency and interpretability of the SF-36 were reported in validity studies. The Quality of Life after Brain Injury instrument (QOLIBRI), European Brain Injury Questionnaire (EBIQ), Child Health Questionnaire (CHQ), and the World Health Organization Quality of Life short version (WHOQOL-BREF) showed positive results, but evidence was limited. Meta-analysis of SF-36 showed that TBI outcome is heterogeneous, encompassing a broad spectrum of HRQL, with most problems reported in the physical, emotional, and social functioning domain. The use of SF-36 in combination with a TBI-specific instrument, i.e., QOLIBRI, seems promising. Consensus on preferred methodologies of HRQL measurement in TBI would facilitate comparability across studies, resulting in improved insights in recovery patterns and better estimates of the burden of TBI.
Electronic supplementary material
The online version of this article (doi:10.1186/s12963-015-0037-1) contains supplementary material, which is available to authorized users.
Traumatic brain injury; Systematic review; Health-related quality of life; Functional outcome; Methodology
The Glasgow Coma Scale (GCS) and pupillary reactivity are well-known prognostic factors in traumatic brain injury (TBI). The aim of this study was to compare the GCS motor score and pupillary reactivity assessed in the field and at hospital admission and assess their prognostic value for 6-month mortality in patients with moderate or severe TBI. We studied 445 patients with moderate or severe TBI from Austria enrolled to hospital in 2009–2012. The area under the curve (AUC) and Nagelkerke's R2 were used to evaluate the predictive ability of GCS motor score and pupillary reactivity assessed in the field and at admission. Uni- and multi-variable analyses—adjusting for age, other clinical, and computed tomography findings—were performed using combinations of field and admission GCS motor score and pupillary reactivity. Motor scores generally deteriorated from the field to admission, whereas pupillary reactivity was similar. GCS motor score assessed in field (AUC=0.754; R2=0.273) and pupillary assessment at admission (AUC=0.662; R2=0.214) performed best as predictors of 6-month mortality in the univariate analysis. This combination also showed best performance in the adjusted analyses (AUC=0.876; R2=0.508), but the performance of both predictors assessed at admission was not much worse (AUC=0.857; R2=0.460). Field GCS motor score and pupillary reactivity at hospital admission, compared to other combinations of these parameters, possess the best prognostic value to predict 6-month mortality in patients with moderate-to-severe TBI. Given that differences in prognostic performance are only small, both the field and admission values of GCS motor score and pupillary reaction may be reasonable to use in multi-variable prediction models to predict 6-month outcome.
assessment at admission; Glasgow Coma Scale; prehospital assessment; pupillary reactivity; traumatic brain injuries
Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size (“data hungriness”).
We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01).
We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable.
Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2288-14-137) contains supplementary material, which is available to authorized users.
Research in Traumatic Brain Injury (TBI) is challenging because of many differences between patients. Advances in basic science have failed to translate into successful clinical treatments and the evidence underpinning guideline recommendations is low. Clinical research has been hampered by lack of standardized data collection, limited multidisciplinary collaboration and by insensitive approaches to classification and efficacy analyses. Multidisciplinary collaborations are now being fostered. Approaches for dealing with heterogeneity have been developed by the IMPACT study group. These can increase statistical power in clinical trials by up to 50% and are also relevant to other heterogeneous neurological diseases, such as stroke and subarachnoid hemorrhage. Rather than trying to limit heterogeneity, we may also be able to exploit it by analyzing differences in treatment and outcome between countries and centers in comparative effectiveness designs. This concept offers an additional research approach with great potential to advance the care in TBI.
External validation on different TBI populations is important in order to assess the generalizability of prognostic models to different settings. We aimed to externally validate recently developed models for prediction of six month unfavourable outcome and six month mortality.
The International Neurotrauma Research Organization – Prehospital dataset (INRO-PH) was collected within an observational study between 2009-2012 in Austria and includes 778 patients with TBI of GCS < = 12. Three sets of prognostic models were externally validated: the IMPACT core and extended models, CRASH basic models and the Nijmegen models developed by Jacobs et al – all for prediction of six month unfavourable outcome and six month mortality. The external validity of the models was assessed by discrimination (Area Under the receiver operating characteristic Curve, AUC) and calibration (calibration statistics and plots).
Median age in the validation cohort was 50 years and 44% had an admission GSC motor score of 1-3. Six-month mortality was 27%. Mortality could better be predicted (AUCs around 0.85) than unfavourable outcome (AUCs around 0.80). Calibration plots showed that the observed outcomes were systematically better than was predicted for all models considered. The best performance was noted for the original Nijmegen model, but refitting led to similar performance for the IMPACT Extended, CRASH Basic, and Nijmegen models.
In conclusion, all the prognostic models we validated in this study possess good discriminative ability for prediction of six month outcome in patients with moderate or severe TBI but outcomes were systemically better than predicted. After adjustment for this under prediction in locally adapted models, these may well be used for recent TBI patients.
Prognostic model; External validation; Traumatic brain injury; Outcome prediction
Hospital readmission rates are increasingly used for both quality improvement and cost control. However, the validity of readmission rates as a measure of quality of hospital care is not evident. We aimed to give an overview of the different methodological aspects in the definition and measurement of readmission rates that need to be considered when interpreting readmission rates as a reflection of quality of care.
We conducted a systematic literature review, using the bibliographic databases Embase, Medline OvidSP, Web-of-Science, Cochrane central and PubMed for the period of January 2001 to May 2013.
The search resulted in 102 included papers. We found that definition of the context in which readmissions are used as a quality indicator is crucial. This context includes the patient group and the specific aspects of care of which the quality is aimed to be assessed. Methodological flaws like unreliable data and insufficient case-mix correction may confound the comparison of readmission rates between hospitals. Another problem occurs when the basic distinction between planned and unplanned readmissions cannot be made. Finally, the multi-faceted nature of quality of care and the correlation between readmissions and other outcomes limit the indicator's validity.
Although readmission rates are a promising quality indicator, several methodological concerns identified in this study need to be addressed, especially when the indicator is intended for accountability or pay for performance. We recommend investing resources in accurate data registration, improved indicator description, and bundling outcome measures to provide a more complete picture of hospital care.
Treatment decisions can be difficult in men with low-risk prostate cancer (PCa).
To evaluate the ability of a panel of four kallikrein markers in blood—total prostate-specific antigen (PSA), free PSA, intact PSA, and kallikrein-related peptidase 2—to distinguish between pathologically insignificant and aggressive disease on pathologic examination of radical prostatectomy (RP) specimens as well as to calculate the number of avoidable surgeries.
Design, setting, and participants
The cohort comprised 392 screened men participating in rounds 1 and 2 of the Rotterdam arm of the European Randomized Study of Screening for Prostate Cancer. Patients were diagnosed with PCa because of an elevated PSA ≥3.0 ng/ml and were treated with RP between 1994 and 2004.
Outcome measurements and statistical analysis
We calculated the accuracy (area under the curve [AUC]) of statistical models to predict pathologically aggressive PCa (pT3–T4, extracapsular extension, tumor volume >0.5 cm3, or any Gleason grade ≥4) based on clinical predictors (age, stage, PSA, biopsy findings) with and without levels of four kallikrein markers in blood.
Results and limitations
A total of 261 patients (67%) had significant disease on pathologic evaluation of the RP specimen. While the clinical model had good accuracy in predicting aggressive disease, reflected in a corrected AUC of 0.81, the four kallikrein markers enhanced the base model, with an AUC of 0.84 (p < 0.0005). The model retained its ability in patients with low-risk and very-low-risk disease and in comparison with the Steyerberg nomogram, a published prediction model. Clinical application of the model incorporating the kallikrein markers would reduce rates of surgery by 135 of 1000 patients overall and 110 of 334 patients with pathologically insignificant disease. A limitation of the present study is that clinicians may be hesitant to make recommendations against active treatment on the basis of a statistical model.
Our study provided proof of principle that predictions based on levels of four kallikrein markers in blood distinguish between pathologically insignificant and aggressive disease after RP with good accuracy. In the future, clinical use of the model could potentially reduce rates of immediate unnecessary active treatment.
Prostate-specific antigen/blood; Prostatic neoplasms; Mass screening; Radical prostatectomy; Kallikrein-related peptidases
Despite advances in operative repair, ruptured abdominal aortic aneurysm (rAAA) remains associated with high mortality and morbidity rates, especially in elderly patients. The purpose of this study was to evaluate the outcomes of emergency endovascular aneurysm repair (eEVAR), conventional open repair (OPEN), and conservative treatment in elderly patients with rAAA.
We conducted a retrospective study of all rAAA patients treated with OPEN or eEVAR between January 2005 and December 2011 in the vascular surgery department at Amphia Hospital, the Netherlands. The outcome in patients treated for rAAA by eEVAR or OPEN repair was investigated. Special attention was paid to patients who were admitted and did not receive operative intervention due to serious comorbidity, extremely advanced age, or poor physical condition. We calculated the 30-day rAAA-related mortality for all rAAA patients admitted to our hospital.
Twelve patients did not receive operative emergency repair due to extreme fragility (mean age 87 years, median time to mortality 27 hours). Twenty-three patients had eEVAR and 82 had OPEN surgery. The 30-day mortality rate in operated patients was 30% (7/23) in the eEVAR group versus 26% (21/82) in the OPEN group (P=0.64). No difference in mortality was noted between eEVAR and OPEN over 5 years of follow-up. There were more cardiac adverse events in the OPEN group (n=25, 31%) than in the eEVAR group (n=2, 9%; P=0.035). Reintervention after discharge was more frequent in patients who received eEVAR (35%) than in patients who had OPEN (6%, P<0.001). Advancing age was associated with increasing mortality (hazard ratio 1.05 [95% confidence interval 1.01–1.09]) per year for patients who received operative repair, with a 67%, 76%, and 100% 5-year mortality rate in the 34 patients aged <70 years, 59 patients aged 70–79 years, and 12 octogenarians, respectively; 30-day rAAA-related mortality was also associated with increasing age (21%, 30%, and 61%, respectively; P=0.008).
The 30-day and 5-year mortality in patients who survived rAAA was equal between the treatment options of eEVAR and OPEN. Particularly fragile and very elderly patients did not receive operative repair. The decision to intervene in rAAA should not be made on the basis of patient age alone, but also in relation to comorbidity and patient preference.
ruptured abdominal aneurysm repair; clinical decision-making; emergency endovascular aneurysm repair; open repair
Background & Aims
Individuals with a family history of colorectal cancer (CRC) have a higher risk of developing CRC than the general population, and studies have shown that they are more likely to undergo CRC screening. We assessed the overall and race- and ethnic-specific effects of a family history of CRC on screening.
We analyzed data from the 2009 California Health Interview Survey to estimate overall and race- and ethnicity-specific odds ratios (ORs) for the association between family history of CRC and CRC screening.
The unweighted and weighted sample sizes were 23,837 and 8,851,003, respectively. Individuals with a family history of CRC were more likely to participate in any form of screening (OR, 2.3; 95% confidence limit [CL], 1.7–3.1) and in colonoscopy screening (OR, 2.7; 95% CL, 2.2–3.4) than those without a family history, but this association varied among racial and ethnic groups. The magnitude of the association between family history and colonoscopy screening was highest among Asians (OR, 6.1; 95% CL, 3.1–11.9), lowest among Hispanics (OR, 1.4; 95% CL, 0.67–2.8), and comparable between non-Hispanic Whites (OR, 3.1; 95% CL, 2.6–3.8) and non-Hispanic Blacks (OR 2.6; 95% CL, 1.2–5.7) (P for interaction <0.001).
The effects of family history of CRC on participation in screening vary among racial and ethnic groups, and have the lowest effects on Hispanics, compared with other groups. Consequently, interventions to promote CRC screening among Hispanics with a family history should be considered.
population study; database analysis; early detection; colon cancer prevention
Preventive measures are essential to limit the spread of new viruses; their uptake is key to their success. However, the vaccination uptake in pandemic outbreaks is often low. We aim to elicit how disease and vaccination characteristics determine preferences of the general public for new pandemic vaccinations.
In an internet-based discrete choice experiment (DCE) a representative sample of 536 participants (49% participation rate) from the Dutch population was asked for their preference for vaccination programs in hypothetical communicable disease outbreaks. We used scenarios based on two disease characteristics (susceptibility to and severity of the disease) and five vaccination program characteristics (effectiveness, safety, advice regarding vaccination, media attention, and out-of-pocket costs). The DCE design was based on a literature review, expert interviews and focus group discussions. A panel latent class logit model was used to estimate which trade-offs individuals were willing to make.
All above mentioned characteristics proved to influence respondents’ preferences for vaccination. Preference heterogeneity was substantial. Females who stated that they were never in favor of vaccination made different trade-offs than males who stated that they were (possibly) willing to get vaccinated. As expected, respondents preferred and were willing to pay more for more effective vaccines, especially if the outbreak was more serious (€6–€39 for a 10% more effective vaccine). Changes in effectiveness, out-of-pocket costs and in the body that advises the vaccine all substantially influenced the predicted uptake.
We conclude that various disease and vaccination program characteristics influence respondents’ preferences for pandemic vaccination programs. Agencies responsible for preventive measures during pandemics can use the knowledge that out-of-pocket costs and the way advice is given affect vaccination uptake to improve their plans for future pandemic outbreaks. The preference heterogeneity shows that information regarding vaccination needs to be targeted differently depending on gender and willingness to get vaccinated.
George Peat and colleagues review and discuss current approaches to transparency and published debates and concerns about efforts to standardize prognosis research practice, and make five recommendations.
Please see later in the article for the Editors' Summary
In patients with acute ischemic stroke, early treatment with recombinant tissue plasminogen activator (rtPA) improves functional outcome by effectively reducing disability and dependency. Timely thrombolysis, within 1 h, is a vital aspect of acute stroke treatment, and is reflected in the widely used performance indicator ‘door-to-needle time’ (DNT). DNT measures the time from the moment the patient enters the emergency department until he/she receives intravenous rtPA. The purpose of the study was to measure quality improvement from the first implementation of thrombolysis in stroke patients in a university hospital in the Netherlands. We further aimed to identify specific interventions that affect DNT.
We included all patients with acute ischemic stroke consecutively admitted to a large university hospital in the Netherlands between January 2006 and December 2012, and focused on those treated with thrombolytic therapy on admission. Data were collected routinely for research purposes and internal quality measurement (the Erasmus Stroke Study). We used a retrospective interrupted time series design to study the trend in DNT, analyzed by means of segmented regression.
Between January 2006 and December 2012, 1,703 patients with ischemic stroke were admitted and 262 (17%) were treated with rtPA. Patients treated with thrombolysis were on average 63 years old at the time of the stroke and 52% were male. Mean age (p = 0.58) and sex distribution (p = 0.98) did not change over the years. The proportion treated with thrombolysis increased from 5% in 2006 to 22% in 2012. In 2006, none of the patients were treated within 1 h. In 2012, this had increased to 81%. In a logistic regression analysis, this trend was significant (OR 1.6 per year, CI 1.4-1.8). The median DNT was reduced from 75 min in 2006 to 45 min in 2012 (p < 0.001 in a linear regression model). In this period, a 12% annual decrease in DNT was achieved (CI from 16 to 8%). We could not find a significant association between any specific intervention and the trend in DNT.
Conclusion and Implications
The DNT steadily improved from the first implementation of thrombolysis. Specific explanations for this improvement require further study, and may relate to the combined impact of a series of structural and logistic interventions. Our results support the use of performance measures for internal communication. Median DNT should be used on a monthly or quarterly basis to inform all professionals treating stroke patient of their achievements.
Acute ischemic stroke; Acute stroke care; Door-to-needle time; Recombinant tissue plasminogen activator; Performance indicator; Quality of care; Process indicators; Quality improvement
For the evaluation and comparison of markers and risk prediction models, various novel measures have recently been introduced as alternatives to the commonly used difference in the area under the ROC curve (ΔAUC). The Net Reclassification Improvement (NRI) is increasingly popular to compare predictions with one or more risk thresholds, but decision-analytic approaches have also been proposed.
We aimed to identify the mathematical relationships between novel performance measures for the situation that a single risk threshold T is used to classify patients as having the outcome or not.
We considered the NRI and three utility-based measures that take misclassification costs into account: difference in Net Benefit (ΔNB), difference in Relative Utility (ΔRU), and weighted NRI (wNRI). We illustrate the behavior of these measures in 1938 women suspect of ovarian cancer (prevalence 28%).
The three utility-based measures appear transformations of each other, and hence always lead to consistent conclusions. On the other hand, conclusions may differ when using the standard NRI, depending on the adopted risk threshold T, prevalence P and the obtained differences in sensitivity and specificity of the two models that are compared. In the case study, adding the CA-125 tumor marker to a baseline set of covariates yielded a negative NRI yet a positive value for the utility-based measures.
The decision-analytic measures are each appropriate to indicate the clinical usefulness of an added marker or compare prediction models, since these measures each reflect misclassification costs. This is of practical importance as these measures may thus adjust conclusions based on purely statistical measures. A range of risk thresholds should be considered in applying these measures.
The use of alternative modeling techniques for predicting patient survival is complicated by the fact that some alternative techniques cannot readily deal with censoring, which is essential for analyzing survival data. In the current study, we aimed to demonstrate that pseudo values enable statistically appropriate analyses of survival outcomes when used in seven alternative modeling techniques.
In this case study, we analyzed survival of 1282 Dutch patients with newly diagnosed Head and Neck Squamous Cell Carcinoma (HNSCC) with conventional Kaplan-Meier and Cox regression analysis. We subsequently calculated pseudo values to reflect the individual survival patterns. We used these pseudo values to compare recursive partitioning (RPART), neural nets (NNET), logistic regression (LR) general linear models (GLM) and three variants of support vector machines (SVM) with respect to dichotomous 60-month survival, and continuous pseudo values at 60 months or estimated survival time. We used the area under the ROC curve (AUC) and the root of the mean squared error (RMSE) to compare the performance of these models using bootstrap validation.
Of a total of 1282 patients, 986 patients died during a median follow-up of 66 months (60-month survival: 52% [95% CI: 50%−55%]). The LR model had the highest optimism corrected AUC (0.791) to predict 60-month survival, followed by the SVM model with a linear kernel (AUC 0.787). The GLM model had the smallest optimism corrected RMSE when continuous pseudo values were considered for 60-month survival or the estimated survival time followed by SVM models with a linear kernel. The estimated importance of predictors varied substantially by the specific aspect of survival studied and modeling technique used.
The use of pseudo values makes it readily possible to apply alternative modeling techniques to survival problems, to compare their performance and to search further for promising alternative modeling techniques to analyze survival time.
We aimed to determine the validity of two risk scores for patients with non-muscle invasive bladder cancer in different European settings, in patients with primary tumours.
We included 1,892 patients with primary stage Ta or T1 non-muscle invasive bladder cancer who underwent a transurethral resection in Spain (n = 973), the Netherlands (n = 639), or Denmark (n = 280). We evaluated recurrence-free survival and progression-free survival according to the European Organisation for Research and Treatment of Cancer (EORTC) and the Spanish Urological Club for Oncological Treatment (CUETO) risk scores for each patient and used the concordance index (c-index) to indicate discriminative ability.
The 3 cohorts were comparable according to age and sex, but patients from Denmark had a larger proportion of patients with the high stage and grade at diagnosis (p<0.01). At least one recurrence occurred in 839 (44%) patients and 258 (14%) patients had a progression during a median follow-up of 74 months. Patients from Denmark had the highest 10-year recurrence and progression rates (75% and 24%, respectively), whereas patients from Spain had the lowest rates (34% and 10%, respectively). The EORTC and CUETO risk scores both predicted progression better than recurrence with c-indices ranging from 0.72 to 0.82 while for recurrence, those ranged from 0.55 to 0.61.
The EORTC and CUETO risk scores can reasonably predict progression, while prediction of recurrence is more difficult. New prognostic markers are needed to better predict recurrence of tumours in primary non-muscle invasive bladder cancer patients.
According to population-based cohort studies CT coronary calcium score (CTCS), carotid intima-media thickness (cIMT), high-sensitivity C- reactive protein (CRP), and ankle-brachial index (ABI) are promising novel risk markers for improving cardiovascular risk assessment. Their impact in the U.S. general population is however uncertain. Our aim was to estimate the predictive value of four novel cardiovascular risk markers for the U.S. general population.
Methods and Findings
Risk profiles, CRP and ABI data of 3,736 asymptomatic subjects aged 40 or older from the National Health and Nutrition Examination Survey (NHANES) 2003–2004 exam were used along with predicted CTCS and cIMT values. For each subject, we calculated 10-year cardiovascular risks with and without each risk marker. Event rates adjusted for competing risks were obtained by microsimulation. We assessed the impact of updated 10-year risk scores by reclassification and C-statistics. In the study population (mean age 56±11 years, 48% male), 70% (80%) were at low (<10%), 19% (14%) at intermediate (≥10–<20%), and 11% (6%) at high (≥20%) 10-year CVD (CHD) risk. Net reclassification improvement was highest after updating 10-year CVD risk with CTCS: 0.10 (95%CI 0.02–0.19). The C-statistic for 10-year CVD risk increased from 0.82 by 0.02 (95%CI 0.01–0.03) with CTCS. Reclassification occurred most often in those at intermediate risk: with CTCS, 36% (38%) moved to low and 22% (30%) to high CVD (CHD) risk. Improvements with other novel risk markers were limited.
Only CTCS appeared to have significant incremental predictive value in the U.S. general population, especially in those at intermediate risk. In future research, cost-effectiveness analyses should be considered for evaluating novel cardiovascular risk assessment strategies.
This multicenter study examines the performance of the Manchester Triage System (MTS) after changing discriminators, and with the addition use of abnormal vital sign in patients presenting to pediatric emergency departments (EDs).
International multicenter study
EDs of two hospitals in The Netherlands (2006–2009), one in Portugal (November–December 2010), and one in UK (June–November 2010).
Children (<16years) triaged with the MTS who presented at the ED.
Changes to discriminators (MTS 1) and the value of including abnormal vital signs (MTS 2) were studied to test if this would decrease the number of incorrect assignment. Admission to hospital using the new MTS was compared with those in the original MTS. Likelihood ratios, diagnostic odds ratios (DORs), and c-statistics were calculated as measures for performance and compared with the original MTS. To calculate likelihood ratios and DORs, the MTS had to be dichotomized in low urgent and high urgent.
60,375 patients were included, of whom 13% were admitted. When MTS 1 was used, admission to hospital increased from 25% to 29% for MTS ‘very urgent’ patients and remained similar in lower MTS urgency levels. The diagnostic odds ratio improved from 4.8 (95%CI 4.5–5.1) to 6.2 (95%CI 5.9–6.6) and the c-statistic remained 0.74. MTS 2 did not improve the performance of the MTS.
MTS 1 performed slightly better than the original MTS. The use of vital signs (MTS 2) did not improve the MTS performance.
The discriminative ability of a risk model is often measured by Harrell’s concordance-index (c-index). The c-index estimates for two randomly chosen subjects the probability that the model predicts a higher risk for the subject with poorer outcome (concordance probability). When data are clustered, as in multicenter data, two types of concordance are distinguished: concordance in subjects from the same cluster (within-cluster concordance probability) and concordance in subjects from different clusters (between-cluster concordance probability). We argue that the within-cluster concordance probability is most relevant when a risk model supports decisions within clusters (e.g. who should be treated in a particular center). We aimed to explore different approaches to estimate the within-cluster concordance probability in clustered data.
We used data of the CRASH trial (2,081 patients clustered in 35 centers) to develop a risk model for mortality after traumatic brain injury. To assess the discriminative ability of the risk model within centers we first calculated cluster-specific c-indexes. We then pooled the cluster-specific c-indexes into a summary estimate with different meta-analytical techniques. We considered fixed effect meta-analysis with different weights (equal; inverse variance; number of subjects, events or pairs) and random effects meta-analysis. We reflected on pooling the estimates on the log-odds scale rather than the probability scale.
The cluster-specific c-index varied substantially across centers (IQR = 0.70-0.81; I
= 0.76 with 95% confidence interval 0.66 to 0.82). Summary estimates resulting from fixed effect meta-analysis ranged from 0.75 (equal weights) to 0.84 (inverse variance weights). With random effects meta-analysis – accounting for the observed heterogeneity in c-indexes across clusters – we estimated a mean of 0.77, a between-cluster variance of 0.0072 and a 95% prediction interval of 0.60 to 0.95. The normality assumptions for derivation of a prediction interval were better met on the probability than on the log-odds scale.
When assessing the discriminative ability of risk models used to support decisions at cluster level we recommend meta-analysis of cluster-specific c-indexes. Particularly, random effects meta-analysis should be considered.
Clustered data; Concordance; Discrimination; Meta-analysis; Prediction; Risk model