Prediction models are developed to aid health care providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will occur in the future (prognostic models), to inform their decision making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative developed a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, health care professionals, and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors. The resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document. To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at www.tripod-statement.org).
Editors’ note: In order to encourage dissemination of the TRIPOD Statement, this article is freely accessible on the Annals of Internal Medicine Web site (www.annals.org) and will be also published in BJOG, British Journal of Cancer, British Journal of Surgery, BMC Medicine, British Medical Journal, Circulation, Diabetic Medicine, European Journal of Clinical Investigation, European Urology, and Journal of Clinical Epidemiology. The authors jointly hold the copyright of this article. An accompanying Explanation and Elaboration article is freely available only on www.annals.org; Annals of Internal Medicine holds copyright for that article.
Prediction models; Prognostic; Diagnostic; Model development; Validation; Transparency; Reporting
Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection.
In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods.
Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC.
The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
Carl Moons and colleagues provide a checklist and background explanation for critically appraising and extracting data from systematic reviews of prognostic and diagnostic prediction modelling studies.
Please see later in the article for the Editors' Summary
Despite its rarity, the prognosis of pancreatic cancer is very poor and it is a major cause of cancer mortality; being ranked fourth in the world, it has one of the worst survival rates of any cancer.
To evaluate the performance of QCancer® (Pancreas) for predicting the absolute risk of pancreatic cancer in an independent UK cohort of patients, from general practice records.
Design and setting
Prospective cohort study to evaluate the performance QCancer (Pancreas) prediction models in 364 practices from the UK, contributing to The Health Improvement Network (THIN) database.
Records were extracted from the THIN database for 2.15 million patients registered with a general practice surgery between 1 January 2000 and 30 June 2008, aged 30–84 years (3.74 million person-years), with 618 pancreatic cancer cases. Pancreatic cancer was defined as incident diagnosis of pancreatic cancer during the 2 years after study entry.
The results from this independent and external validation of QCancer (Pancreas) demonstrated good performance data on a large cohort of general practice patients. QCancer (Pancreas) had very good discrimination properties, with areas under the receiver operating characteristic curve of 0.89 and 0.92 for females and males respectively. QCancer (Pancreas) explained 60% and 67% of the variation in females and males respectively. QCancer (Pancreas) over-predicted risk in both females and males, notably in older patients.
QCancer (Pancreas) is potentially useful for identifying undetected cases of pancreatic cancer in primary care in the UK.
pancreatic cancer; primary care; risk prediction; validation
The informed consent process is the legal embodiment of the fundamental right of the individual to make decisions affecting his or her health., and the patient’s permission is a crucial form of respect of freedom and dignity, it becomes extremely important to enhance the patient’s understanding and recall of the information given by the physician. This statement acquires additional weight when the medical treatment proposed can potentially be detrimental or even fatal. This is the case of thalassemia patients pertaining to class 3 of the Pesaro classification where Allogenic hematopoietic stem cell transplantation (HSCT) remains the only potentially curative treatment. Unfortunately, this kind of intervention is burdened by an elevated transplantation-related mortality risk (TRM: all deaths considered related to transplantation), equal to 30% according to published reports. In thalassemia, the role of the patient in the informed consent process leading up to HSCT has not been fully investigated. This study investigated the hypothesis that information provided by physicians in the medical scenario of HSCT is not fully understood by patients and that misunderstanding and communication biases may affect the clinical decision-making process.
A questionnaire was either mailed or given personally to 25 patients. A second questionnaire was administered to the 12 physicians attending the patients enrolled in this study. Descriptive statistics were used to evaluate the communication factors.
The results pointed out the difference between the risks communicated by physicians and the risks perceived by patients. Besides the study highlighted the mortality risk considered to be acceptable by patients and that considered to be acceptable by physicians.
Several solutions have been suggested to reduce the gap between communicated and perceived data. A multi-disciplinary approach may possibly help to attenuate some aspects of communication bias. Several tools have also been proposed to fill or to attenuate the gap between communicated and perceived data. But the most important tool is the ability of the physician to comprehend the right place of conscious consent in the relationship with the patient.
Patient-doctor relationship; Communication bias; Conscious consent
Before considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models.
We conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures.
11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models.
The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data.
We previously developed and validated the Oxford NOTECHS rating system for evaluating the non-technical skills of an entire operating theatre team. Experience with the scale identified the need for greater discrimination between levels of performance within the normal range. We report here the development of a modified scale (Oxford NOTECHS II) to facilitate this. The new measure uses an eight-point instead of a four point scale to measure each dimension of non-technical skills, and begins with a default rating of 6 for each element. We evaluated this new scale in 297 operations at five NHS sites in four surgical specialities. Measures of theatre process reliability (glitch count) and compliance with the WHO surgical safety checklist were scored contemporaneously, and relationships with NOTECHS II scores explored.
Mean team Oxford NOTECHS II scores was 73.39 (range 37–92). The means for surgical, anaesthetic and nursing sub-teams were 24.61 (IQR 23, 27); 24.22 (IQR 23, 26) and 24.55 (IQR 23, 26). Oxford NOTECHS II showed good inter-rater reliability between human factors and clinical observers in each of the four domains. Teams with high WHO compliance had higher mean Oxford NOTECHS II scores (74.5) than those with low compliance (71.1) (p = 0.010). We observed only a weak correlation between Oxford NOTECHS II scores and glitch count; r = −0.26 (95% CI −0.36 to −0.15). Oxford NOTECHS II scores did not vary significantly between 5 different hospital sites, but a significant difference was seen between specialities (p = 0.001).
Oxford NOTECHS II provides good discrimination between teams while retaining reliability and correlation with other measures of teamwork performance, and is not confounded by technical performance. It is therefore suitable for combined use with a technical performance scale to provide a global description of operating theatre team performance.
To measure Irish opinion on a range of assisted human reproduction (AHR) treatments.
A nationally representative sample of Irish adults (n=1,003) were anonymously sampled by telephone survey.
Most participants (77%) agreed that any fertility services offered internationally should also be available in Ireland, although only a small minority of the general Irish population had personal familiarity with AHR or infertility. This sample finds substantial agreement (63%) that the Government of Ireland should introduce legislation covering AHR. The range of support for gamete donation in Ireland ranged from 53% to 83%, depending on how donor privacy and disclosure policies are presented. For example, donation where the donor agrees to be contacted by the child born following donation, and anonymous donation where donor privacy is completely protected by law were supported by 68% and 66%, respectively. The least popular (53%) donor gamete treatment type appeared to be donation where the donor consents to be involved in the future life of any child born as a result of donor fertility treatment. Respondents in social class ABC1 (58%), age 18 to 24 (62%), age 25 to 34 (60%), or without children (61%) were more likely to favour this donor treatment policy in our sample.
This is the first nationwide assessment of Irish public opinion on the advanced reproductive technologies since 2005. Access to a wide range of AHR treatment was supported by all subgroups studied. Public opinion concerning specific types of AHR treatment varied, yet general support for the need for national AHR legislation was reported by 63% of this national sample. Contemporary views on AHR remain largely consistent with the Commission for Assisted Human Reproduction recommendations from 2005, although further research is needed to clarify exactly how popular opinion on these issues has changed. It appears that legislation allowing for the full range of donation options (and not mandating disclosure of donor identity at a stipulated age) would better align with current Irish public opinion.
Assisted fertility; Legislation; Public policy; In vitro fertilization; Ireland
To develop a sensitive, reliable tool for enumerating and evaluating technical process imperfections during surgical operations.
Prospective cohort study with direct observation.
Operating theatres on five sites in three National Health Service Trusts.
Staff taking part in elective and emergency surgical procedures in orthopaedics, trauma, vascular and plastic surgery; including anaesthetists, surgeons, nurses and operating department practitioners.
Reliability and validity of the glitch count method; frequency, type, temporal pattern and rate of glitches in relation to site and surgical specialty.
The glitch count has construct and face validity, and category agreement between observers is good (κ=0.7). Redundancy between pairs of observers significantly improves the sensitivity over a single observation. In total, 429 operations were observed and 5742 glitches were recorded (mean 14 per operation, range 0–83). Specialty-specific glitch rates varied from 6.9 to 8.3/h of operating (ns). The distribution of glitch categories was strikingly similar across specialties, with distractions the commonest type in all cases. The difference in glitch rate between specialty teams operating at different sites was larger than that between specialties (range 6.3–10.5/h, p<0.001). Forty per cent of glitches occurred in the first quarter of an operation, and only 10% occurred in the final quarter.
The glitch method allows collection of a rich dataset suitable for analysing the changes following interventions to improve process safety, and appears reliable and sensitive. Glitches occur more frequently in the early stages of an operation. Hospital environment, culture and work systems may influence the operative process more strongly than the specialty.
SURGERY; patient safety; quality improvement; process of care
For a randomized trial, the primary publication is usually the one which reports the results of the primary outcome and provides consolidated data from all study centers. Other aspects of a randomized trial’s findings (that is, non-primary results) are often reported in subsequent publications.
We carried out a cross-sectional review of the characteristics and type of information reported in non-primary reports (n = 69) of randomized trials (indexed in PubMed core clinical journals in 2009) and whether they report pre-specified or exploratory analyses. We also compared consistency of information in non-primary publications with that reported in the primary publication.
The majority (n = 56; 81%) of non-primary publications were large, multicenter trials, published in specialty journals. Most reported subgroup analyses (n = 27; 39%), analyzing a specific subgroup of patients from the randomized trial, or reported on secondary outcomes (n = 29; 42%); 19% (n = 13) reported extended follow-up. Less than half reported details of trial registration (n = 30; 43%) or the trial protocol (n = 27; 39%) and in 41% (n = 28) it was unclear from reading the abstract that the report was not the primary publication for the trial. Non-primary publications often analyzed and reported multiple different outcomes (16% reported >20 outcomes) and in 10% (n = 7) it was unclear how many outcomes had actually been assessed; in 42% (n = 29) it was unclear whether the analyses reported were pre-specified or exploratory. Only 39% (n = 27) of non-primary publications described the primary outcome of the randomized trial, 6% (n = 4) reported its numerical results and 9% (n = 6) details of how participants were randomized.
Non-primary publications often lack important information about the randomized trial and the type of analyses conducted and whether these were pre-specified or exploratory to enable readers to accurately identify and assess the validity and reliably of the study findings. We provide recommendations for what information authors should include in non-primary reports of randomized trials.
Randomized controlled trial; Non-primary publication; Subgroup analyses; Secondary outcomes
Chronic kidney disease is a major health concern that, if left untreated, may progress to end-stage kidney failure (ESKF). Identifying individuals at an increased risk of kidney disease and who might benefit from a therapeutic or preventive intervention is an important challenge.
To evaluate the performance of the QKidney® scores for predicting 5-year risk of developing moderate-severe kidney disease and ESKF in an independent UK cohort of patients from general practice records.
Design and setting
Prospective cohort study to evaluate the performance of two risk scores for kidney disease in 364 practices from the UK, contributing to The Health Improvement Network (THIN) database.
Data were obtained from 1.6 million patients registered with a general practice surgery between 1 January 2002 and 1 July 2008, aged 35–74 years, with 43 186 incident cases of moderate-severe kidney disease and 2663 incident cases of ESKF. This is the first recorded evidence of moderate-severe chronic kidney and ESKF as recorded in general practice records.
The results from this independent and external validation of QKidney scores indicate that both scores showed good performance data for both moderate-severe kidney disease and ESKF, on a large cohort of general practice patients. Discrimination and calibration statistics were better for models including serum creatinine; however, there were considerable amounts of missing data for serum creatinine. QKidney scores both with and without serum creatinine were well calibrated.
QKidney scores have been shown to be useful tools to predict the 5-year risk of moderate-severe kidney disease and ESKF in the UK.
kidney disease; primary care; risk prediction; validation
Chlorthalidone (CTD) reduces 24-hour blood pressure more effectively than hydrochlorothiazide (HCTZ), but whether this influences electrocardiographic left ventricular hypertrophy (LVH) is uncertain. One source of comparative data is the Multiple Risk Factor Intervention Trial (MRFIT), which randomly assigned 8,012 hypertensive men to special intervention (SI) or usual care (UC). SI participants could use CTD or HCTZ initially; previous analyses have grouped clinics by their main diuretic used (C-clinics: CTD; H-clinics: HCTZ). After 48 months, SI participants receiving HCTZ were recommended to switch to CTD, in part, because higher mortality was observed for SI compared to UC participants in H-clinics, while the opposite was found in C-clinics. In this analysis, we examined change in continuous measures of electrocardiographic LVH using both an ecologic analysis by previously-reported C- or H-clinic groupings, and an individual participant analysis where use of CTD or HCTZ by SI participants was considered and updated annually. Through 48 months, differences between SI and UC in LVH were larger for C-clinics compared to H-clinics (Sokolow-Lyon: −93.9 vs −54.9 μV, P=0.049; Cornell voltage: −68.1 vs −35.9 μV, P=0.019; Cornell voltage product: −4.6 vs −2.2 μV/ms, P=0.071; left ventricular mass: −4.4 vs −2.8 gm, P=0.002). At the individual participant level, Sokolow-Lyon and left ventricular mass were significantly lower for SI men receiving CTD compared to HCTZ through 48 months and 84 months of follow-up. Our findings on LVH support the idea that greater blood pressure reduction with CTD than HCTZ may have led to differences in mortality observed in MRFIT.
hydrochlorothiazide; chlorthalidone; left ventricular hypertrophy; hypertension; blood pressure; electrocardiography
During in vitro fertilization (IVF), fertility patients are expected to self-administer many injections as part of this treatment. While newer medications have been developed to substantially reduce the number of these injections, such agents are typically much more expensive. Considering these differences in both cost and number of injections, this study compared patient preferences between GnRH-agonist and GnRH-antagonist based protocols in IVF.
Data were collected by voluntary, anonymous questionnaire at first consultation appointment. Patient opinion concerning total number of s.c. injections as a function of non-reimbursed patient cost associated with GnRH-agonist [A] and GnRH-antagonist [B] protocols in IVF was studied.
Completed questionnaires (n = 71) revealed a mean +/− SD patient age of 34 +/− 4.1 yrs. Most (83.1%) had no prior IVF experience; 2.8% reported another medical condition requiring self-administration of subcutaneous medication(s). When out-of-pocket cost for [A] and [B] were identical, preference for [B] was registered by 50.7% patients. The tendency to favor protocol [B] was weaker among patients with a health occupation. Estimated patient costs for [A] and [B] were $259.82 +/− 11.75 and $654.55 +/− 106.34, respectively (p < 0.005). Measured patient preference for [B] diminished as the cost difference increased.
This investigation found consistently higher non-reimbursed direct medication costs for GnRH-antagonist IVF vs. GnRH-agonist IVF protocols. A conditional preference to minimize downregulation (using GnRH-antagonist) was noted among some, but not all, IVF patient sub-groups. Compared to IVF patients with a health occupation, the preference for GnRH-antagonist was weaker than for other patients. While reducing total number of injections by using GnRH-antagonist is a desirable goal, it appears this advantage is not perceived equally by all IVF patients and its utility is likely discounted heavily by patients when nonreimbursed medication costs reach a critical level.
GnRH-antagonist; IVF; Preference; Patient cost; Health economics
Thalassemia is a common disorder worldwide with a predominant incidence in Mediterranean countries, North Africa, the Middle East, India, Central Asia, and Southeast Asia. Whilst substantial progress has been made towards the improvement of Health related quality of life (HRQoL) in western countries, scarce evidence-based data exists on HRQol of thalassemia children and adolescents living in developing countries.
We studied 60 thalassemia children from Middle Eastern countries with a median age of 10 years (range 5 to 17 years). HRQoL was assessed with the Pediatric Quality of Life Inventory (PedsQL) 4.0. The Questionnaire was completed at baseline by all patients and their parents. The agreement between child-self and parent-proxy HRQoL reports and the relationship between HRQoL profiles and socio-demographic and clinical factors were investigated.
The scores of parents were generally lower than those of their children for Emotional Functioning (mean 75 vs 85; p = 0.002), Psychosocial Health Summary (mean 70.3 vs 79.1; p = 0.015) and the Total Summary Score (mean 74.3 vs 77.7 p = 0.047). HRQoL was not associated with ferritin levels, hepatomegaly or frequency of transfusions or iron chelation therapy. Multivariate analysis showed that a delayed start of iron chelation had a negative impact on total PedsQL scores of both children (p = 0.046) and their parents (p = 0.007).
The PedsQL 4.0 is a useful tool for the measurement of HRQoL in pediatric thalassemia patients. This study shows that delayed start of iron chelation has a negative impact on children’s HRQoL.
Quality of life; Thalassemia; PEDsQL 4.0
During IVF, non-transferred embryos are usually selected for cryopreservation on the basis of morphological criteria. This investigation evaluated an application for array comparative genomic hybridization (aCGH) in assessment of surplus embryos prior to cryopreservation.
First-time IVF patients undergoing elective single embryo transfer and having at least one extra non-transferred embryo suitable for cryopreservation were offered enrollment in the study. Patients were randomized into two groups: Patients in group A (n=55) had embryos assessed first by morphology and then by aCGH, performed on cells obtained from trophectoderm biopsy on post-fertilization day 5. Only euploid embryos were designated for cryopreservation. Patients in group B (n=48) had embryos assessed by morphology alone, with only good morphology embryos considered suitable for cryopreservation.
Among biopsied embryos in group A (n=425), euploidy was confirmed in 226 (53.1%). After fresh single embryo transfer, 64 (28.3%) surplus euploid embryos were cryopreserved for 51 patients (92.7%). In group B, 389 good morphology blastocysts were identified and a single top quality blastocyst was selected for fresh transfer. All group B patients (48/48) had at least one blastocyst remaining for cryopreservation. A total of 157 (40.4%) blastocysts were frozen in this group, a significantly larger proportion than was cryopreserved in group A (p=0.017, by chi-squared analysis).
While aCGH and subsequent frozen embryo transfer are currently used to screen embryos, this is the first investigation to quantify the impact of aCGH specifically on embryo cryopreservation. Incorporation of aCGH screening significantly reduced the total number of cryopreserved blastocysts compared to when suitability for freezing was determined by morphology only. IVF patients should be counseled that the benefits of aCGH screening will likely come at the cost of sharply limiting the number of surplus embryos available for cryopreservation.
Fertilization in vitro; Comparative genomic hybridization; Preimplantation genetic diagnosis; Cryopreservation
Single embryo transfer (SET) remains underutilized as a strategy to reduce multiple gestation risk in IVF, and its overall lower pregnancy rate underscores the need for improved techniques to select one embryo for fresh transfer. This study explored use of comprehensive chromosomal screening by array CGH (aCGH) to provide this advantage and improve pregnancy rate from SET.
First-time IVF patients with a good prognosis (age <35, no prior miscarriage) and normal karyotype seeking elective SET were prospectively randomized into two groups: In Group A, embryos were selected on the basis of morphology and comprehensive chromosomal screening via aCGH (from d5 trophectoderm biopsy) while Group B embryos were assessed by morphology only. All patients had a single fresh blastocyst transferred on d6. Laboratory parameters and clinical pregnancy rates were compared between the two groups.
For patients in Group A (n = 55), 425 blastocysts were biopsied and analyzed via aCGH (7.7 blastocysts/patient). Aneuploidy was detected in 191/425 (44.9%) of blastocysts in this group. For patients in Group B (n = 48), 389 blastocysts were microscopically examined (8.1 blastocysts/patient). Clinical pregnancy rate was significantly higher in the morphology + aCGH group compared to the morphology-only group (70.9 and 45.8%, respectively; p = 0.017); ongoing pregnancy rate for Groups A and B were 69.1 vs. 41.7%, respectively (p = 0.009). There were no twin pregnancies.
Although aCGH followed by frozen embryo transfer has been used to screen at risk embryos (e.g., known parental chromosomal translocation or history of recurrent pregnancy loss), this is the first description of aCGH fully integrated with a clinical IVF program to select single blastocysts for fresh SET in good prognosis patients. The observed aneuploidy rate (44.9%) among biopsied blastocysts highlights the inherent imprecision of SET when conventional morphology is used alone. Embryos randomized to the aCGH group implanted with greater efficiency, resulted in clinical pregnancy more often, and yielded a lower miscarriage rate than those selected without aCGH. Additional studies are needed to verify our pilot data and confirm a role for on-site, rapid aCGH for IVF patients contemplating fresh SET.
Reporting of the flow of participants through each stage of a randomized trial is essential to assess the generalisability and validity of its results. We assessed the type and completeness of information reported in CONSORT (Consolidated Standards of Reporting Trials) flow diagrams published in current reports of randomized trials.
A cross sectional review of all primary reports of randomized trials which included a CONSORT flow diagram indexed in PubMed core clinical journals (2009). We assessed the proportion of parallel group trial publications reporting specific items recommended by CONSORT for inclusion in a flow diagram.
Of 469 primary reports of randomized trials, 263 (56%) included a CONSORT flow diagram of which 89% (237/263) were published in a CONSORT endorsing journal. Reports published in CONSORT endorsing journals were more likely to include a flow diagram (62%; 237/380 versus 29%; 26/89). Ninety percent (236/263) of reports which included a flow diagram had a parallel group design, of which 49% (116/236) evaluated drug interventions, 58% (137/236) were multicentre, and 79% (187/236) compared two study groups, with a median sample size of 213 participants. Eighty-one percent (191/236) reported the overall number of participants assessed for eligibility, 71% (168/236) the number excluded prior to randomization and 98% (231/236) the overall number randomized. Reasons for exclusion prior to randomization were more poorly reported. Ninety-four percent (223/236) reported the number of participants allocated to each arm of the trial. However, only 40% (95/236) reported the number who actually received the allocated intervention, 67% (158/236) the number lost to follow up in each arm of the trial, 61% (145/236) whether participants discontinued the intervention during the trial and 54% (128/236) the number included in the main analysis.
Over half of published reports of randomized trials included a diagram showing the flow of participants through the trial. However, information was often missing from published flow diagrams, even in articles published in CONSORT endorsing journals. If important information is not reported it can be difficult and sometimes impossible to know if the conclusions of that trial are justified by the data presented.
To report on relationships among baseline serum anti-Müllerian hormone (AMH) measurements, blastocyst development and other selected embryology parameters observed in non-donor oocyte IVF cycles.
Pre-treatment AMH was measured in patients undergoing IVF (n = 79) and retrospectively correlated to in vitro embryo development noted during culture.
Mean (+/- SD) age for study patients in this study group was 36.3 ± 4.0 (range = 28-45) yrs, and mean (+/- SD) terminal serum estradiol during IVF was 5929 +/- 4056 pmol/l. A moderate positive correlation (0.49; 95% CI 0.31 to 0.65) was noted between basal serum AMH and number of MII oocytes retrieved. Similarly, a moderate positive correlation (0.44) was observed between serum AMH and number of early cleavage-stage embryos (95% CI 0.24 to 0.61), suggesting a relationship between serum AMH and embryo development in IVF. Of note, serum AMH levels at baseline were significantly different for patients who did and did not undergo blastocyst transfer (15.6 vs. 10.9 pmol/l; p = 0.029).
While serum AMH has found increasing application as a predictor of ovarian reserve for patients prior to IVF, its roles to estimate in vitro embryo morphology and potential to advance to blastocyst stage have not been extensively investigated. These data suggest that baseline serum AMH determinations can help forecast blastocyst developmental during IVF. Serum AMH measured before treatment may assist patients, clinicians and embryologists as scheduling of embryo transfer is outlined. Additional studies are needed to confirm these correlations and to better define the role of baseline serum AMH level in the prediction of blastocyst formation.
serum AMH; IVF; embryo development; blastocyst transfer
Clinical Prediction Rules (CPRs) are tools that quantify the contribution of symptoms, clinical signs and available diagnostic tests, and in doing so stratify patients according to the probability of having a target outcome or need for a specified treatment. Most focus on the derivation stage with only a minority progressing to validation and very few undergoing impact analysis. Impact analysis studies remain the most efficient way of assessing whether incorporating CPRs into a decision making process improves patient care. However there is a lack of clear methodology for the design of high quality impact analysis studies.
We have developed a sequential four-phased framework based on the literature and the collective experience of our international working group to help researchers identify and overcome the specific challenges in designing and conducting an impact analysis of a CPR.
There is a need to shift emphasis from deriving new CPRs to validating and implementing existing CPRs. The proposed framework provides a structured approach to this topical and complex area of research.
The World Health Organisation estimates that by 2030 there will be approximately 350 million people with type 2 diabetes. Associated with renal complications, heart disease, stroke and peripheral vascular disease, early identification of patients with undiagnosed type 2 diabetes or those at an increased risk of developing type 2 diabetes is an important challenge. We sought to systematically review and critically assess the conduct and reporting of methods used to develop risk prediction models for predicting the risk of having undiagnosed (prevalent) or future risk of developing (incident) type 2 diabetes in adults.
We conducted a systematic search of PubMed and EMBASE databases to identify studies published before May 2011 that describe the development of models combining two or more variables to predict the risk of prevalent or incident type 2 diabetes. We extracted key information that describes aspects of developing a prediction model including study design, sample size and number of events, outcome definition, risk predictor selection and coding, missing data, model-building strategies and aspects of performance.
Thirty-nine studies comprising 43 risk prediction models were included. Seventeen studies (44%) reported the development of models to predict incident type 2 diabetes, whilst 15 studies (38%) described the derivation of models to predict prevalent type 2 diabetes. In nine studies (23%), the number of events per variable was less than ten, whilst in fourteen studies there was insufficient information reported for this measure to be calculated. The number of candidate risk predictors ranged from four to sixty-four, and in seven studies it was unclear how many risk predictors were considered. A method, not recommended to select risk predictors for inclusion in the multivariate model, using statistical significance from univariate screening was carried out in eight studies (21%), whilst the selection procedure was unclear in ten studies (26%). Twenty-one risk prediction models (49%) were developed by categorising all continuous risk predictors. The treatment and handling of missing data were not reported in 16 studies (41%).
We found widespread use of poor methods that could jeopardise model development, including univariate pre-screening of variables, categorisation of continuous risk predictors and poor handling of missing data. The use of poor methods affects the reliability of the prediction model and ultimately compromises the accuracy of the probability estimates of having undiagnosed type 2 diabetes or the predicted risk of developing type 2 diabetes. In addition, many studies were characterised by a generally poor level of reporting, with many key details to objectively judge the usefulness of the models often omitted.
Background & Aims
Liver disease is a major cause of morbidity and mortality among HIV-infected persons. We evaluated the prevalence, etiology, and factors associated with liver dysfunction in patients during the highly active antiretroviral therapy (HAART) era.
We performed tests for liver function (baseline and after a 6-month follow-up period) in HIV-infected patients treated at a large clinic. Comprehensive laboratory and ultrasound analyses were performed. Factors associated with liver test abnormalities were assessed using multivariate logistic regression models.
Eighty of 299 HIV-positive patients (27%) had abnormal liver test results during the 6-month study period. The majority of abnormalities were grade 1. Of those with liver test abnormalities, the most common diagnosis was nonalcoholic fatty liver disease (NAFLD, 30%), followed by excessive alcohol use (13%), chronic hepatitis B (9%), chronic active hepatitis C (5%), and other (hemochromatosis and autoimmune hepatitis, 2%); 8 participants (10%) had more than 1 diagnosis. In total, 39 HIV patients with abnormal liver tests (49%) had a defined underlying liver disease. Despite laboratory tests and ultrasound examination, 41 abnormal liver test results (51%) were unexplained. Multivariate analyses of this group found that increased total cholesterol levels (odds ratio 1.6 per 40 mg/dl increase, p=0.01) were associated with liver abnormalities.
Liver test abnormalities are common among HIV patients during the HAART era. The most common diagnosis was NAFLD. Despite laboratory and radiologic investigations into the cause of liver dysfunction, 51% were unexplained, but might be related to unrecognized fatty liver disease.
HIV; liver test abnormalities; liver disease; NAFLD; antiretroviral medications
Objective To evaluate the performance of the QFractureScores for predicting the 10 year risk of osteoporotic and hip fractures in an independent UK cohort of patients from general practice records.
Design Prospective cohort study.
Setting 364 UK general practices contributing to The Health Improvement Network (THIN) database.
Participants 2.2 million adults registered with a general practice between 27 June 1994 and 30 June 2008, aged 30-85 (13 million person years), with 25 208 osteoporotic fractures and 12 188 hip fractures.
Main outcome measures First (incident) diagnosis of osteoporotic fracture (vertebra, distal radius, or hip) and incident hip fracture recorded in general practice records.
Results Results from this independent and external validation of QFractureScores indicated good performance data for both osteoporotic and hip fracture end points. Discrimination and calibration statistics were comparable to those reported in the internal validation of QFractureScores. The hip fracture score had better performance data for both women and men. It explained 63% of the variation in women and 60% of the variation in men, with areas under the receiver operating characteristic curve of 0.89 and 0.86, respectively. The risk score for osteoporotic fracture explained 49% of the variation in women and 38% of the variation in men, with corresponding areas under the receiver operating characteristic curve of 0.82 and 0.74. QFractureScores were well calibrated, with predicted risks closely matching those across all 10ths of risk and for all age groups.
Conclusion QFractureScores are useful tools for predicting the 10 year risk of osteoporotic and hip fractures in patients in the United Kingdom.
The PELICAN Multidisciplinary Team Total Mesorectal Excision (MDT-TME) Development Programme aimed to improve clinical outcomes for rectal cancer by educating colorectal cancer teams in precision surgery and related aspects of multidisciplinary care. The Programme reached almost all colorectal cancer teams across England. We took the opportunity to assess the impact of participating in this novel team-based Development Programme on the working lives of colorectal cancer team members.
The impact of participating in the programme on team members' self-reported job stress, job satisfaction and team performance was assessed in a pre-post course study. 333/568 (59%) team members, from the 75 multidisciplinary teams who attended the final year of the Programme, completed questionnaires pre-course, and 6-8 weeks post-course.
Across all team members, the main sources of job satisfaction related to working in multidisciplinary teams; whilst feeling overloaded was the main source of job stress. Surgeons and clinical nurse specialists reported higher levels of job satisfaction than team members who do not provide direct patient care, whilst MDT coordinators reported the lowest levels of job satisfaction and job stress. Both job stress and satisfaction decreased after participating in the Programme for all team members. There was a small improvement in team performance.
Participation in the Development Programme had a mixed impact on the working lives of team members in the immediate aftermath of attending. The decrease in team members' job stress may reflect the improved knowledge and skills conferred by the Programme. The decrease in job satisfaction may be the consequence of being unable to apply these skills immediately in clinical practice because of a lack of required infrastructure and/or equipment. In addition, whilst the Programme raised awareness of the challenges of teamworking, a greater focus on tackling these issues may have improved working lives further.
Objective To evaluate the performance of the QRISK2 score for predicting 10-year cardiovascular disease in an independent UK cohort of patients from general practice records and to compare it with the NICE version of the Framingham equation and QRISK1.
Design Prospective cohort study to validate a cardiovascular risk score.
Setting 365 practices from United Kingdom contributing to The Health Improvement Network (THIN) database.
Participants 1.58 million patients registered with a general practice between 1 January 1993 and 20 June 2008, aged 35-74 years (9.4 million person years) with 71 465 cardiovascular events.
Main outcome measures First diagnosis of cardiovascular disease (myocardial infarction, angina, coronary heart disease, stroke, and transient ischaemic stroke) recorded in general practice records.
Results QRISK2 offered improved prediction of a patient’s 10-year risk of cardiovascular disease over the NICE version of the Framingham equation. Discrimination and calibration statistics were better with QRISK2. QRISK2 explained 33% of the variation in men and 40% for women, compared with 29% and 34% respectively for the NICE Framingham and 32% and 38% respectively for QRISK1. The incidence rate of cardiovascular events (per 1000 person years) among men in the high risk group was 27.8 (95% CI 27.4 to 28.2) with QRISK2, 21.9 (21.6 to 22.2) with NICE Framingham, and 24.8 (22.8 to 26.9) with QRISK1. Similarly, the incidence rate of cardiovascular events (per 1000 person years) among women in the high risk group was 24.3 (23.8 to 24.9) with QRISK2, 20.6 (20.1 to 21.0) with NICE Framingham, and 21.8 (18.9 to 24.6) with QRISK1.
Conclusions QRISK2 is more accurate in identifying a high risk population for cardiovascular disease in the United Kingdom than the NICE version of the Framingham equation. Differences in performance between QRISK2 and QRISK1 were marginal.