PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1374922)

Clipboard (0)
None

Related Articles

1.  Training Improves Interobserver Reliability for the Diagnosis of Scaphoid Fracture Displacement 
Background
The diagnosis of displacement in scaphoid fractures is notorious for poor interobserver reliability.
Questions/purposes
We tested whether training can improve interobserver reliability and sensitivity, specificity, and accuracy for the diagnosis of scaphoid fracture displacement on radiographs and CT scans.
Methods
Sixty-four orthopaedic surgeons rated a set of radiographs and CT scans of 10 displaced and 10 nondisplaced scaphoid fractures for the presence of displacement, using a web-based rating application. Before rating, observers were randomized to a training group (34 observers) and a nontraining group (30 observers). The training group received an online training module before the rating session, and the nontraining group did not. Interobserver reliability for training and nontraining was assessed by Siegel’s multirater kappa and the Z-test was used to test for significance.
Results
There was a small, but significant difference in the interobserver reliability for displacement ratings in favor of the training group compared with the nontraining group. Ratings of radiographs and CT scans combined resulted in moderate agreement for both groups. The average sensitivity, specificity, and accuracy of diagnosing displacement of scaphoid fractures were, respectively, 83%, 85%, and 84% for the nontraining group and 87%, 86%, and 87% for the training group. Assuming a 5% prevalence of fracture displacement, the positive predictive value was 0.23 in the nontraining group and 0.25 in the training group. The negative predictive value was 0.99 in both groups.
Conclusions
Our results suggest training can improve interobserver reliability and sensitivity, specificity and accuracy for the diagnosis of scaphoid fracture displacement, but the improvements are slight. These findings are encouraging for future research regarding interobserver variation and how to reduce it further.
doi:10.1007/s11999-012-2260-4
PMCID: PMC3369105  PMID: 22290132
2.  Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting 
PLoS Medicine  2013;10(10):e1001531.
Loes C. M. Bertens and colleagues survey the published diagnostic research literature for use of expert panels to define the reference standard, characterize components and missing information, and recommend elements that should be reported in diagnostic studies.
Please see later in the article for the Editors' Summary
Background
In diagnostic studies, a single and error-free test that can be used as the reference (gold) standard often does not exist. One solution is the use of panel diagnosis, i.e., a group of experts who assess the results from multiple tests to reach a final diagnosis in each patient. Although panel diagnosis, also known as consensus or expert diagnosis, is frequently used as the reference standard, guidance on preferred methodology is lacking. The aim of this study is to provide an overview of methods used in panel diagnoses and to provide initial guidance on the use and reporting of panel diagnosis as reference standard.
Methods and Findings
PubMed was systematically searched for diagnostic studies applying a panel diagnosis as reference standard published up to May 31, 2012. We included diagnostic studies in which the final diagnosis was made by two or more persons based on results from multiple tests. General study characteristics and details of panel methodology were extracted. Eighty-one studies were included, of which most reported on psychiatry (37%) and cardiovascular (21%) diseases. Data extraction was hampered by incomplete reporting; one or more pieces of critical information about panel reference standard methodology was missing in 83% of studies. In most studies (75%), the panel consisted of three or fewer members. Panel members were blinded to the results of the index test results in 31% of studies. Reproducibility of the decision process was assessed in 17 (21%) studies. Reported details on panel constitution, information for diagnosis and methods of decision making varied considerably between studies.
Conclusions
Methods of panel diagnosis varied substantially across studies and many aspects of the procedure were either unclear or not reported. On the basis of our review, we identified areas for improvement and developed a checklist and flow chart for initial guidance for researchers conducting and reporting of studies involving panel diagnosis.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Before any disease or condition can be treated, a correct diagnosis of the condition has to be made. Faced with a patient with medical problems and no diagnosis, a doctor will ask the patient about their symptoms and medical history and generally will examine the patient. On the basis of this questioning and examination, the clinician will form an initial impression of the possible conditions the patient may have, usually with a most likely diagnosis in mind. To support or reject the most likely diagnosis and to exclude the other possible diagnoses, the clinician will then order a series of tests and diagnostic procedures. These may include laboratory tests (such as the measurement of blood sugar levels), imaging procedures (such as an MRI scan), or functional tests (such as spirometry, which tests lung function). Finally, the clinician will use all the data s/he has collected to reach a firm diagnosis and will recommend a program of treatment or observation for the patient.
Why Was This Study Done?
Researchers are continually looking for new, improved diagnostic tests and multivariable diagnostic models—combinations of tests and characteristics that point to a diagnosis. Diagnostic research, which assesses the accuracy of new tests and models, requires that each patient involved in a diagnostic study has a final correct diagnosis. Unfortunately, for most conditions, there is no single, error-free test that can be used as the reference (gold) standard for diagnosis. If an imperfect reference standard is used, errors in the final disease classification may bias the results of the diagnostic study and may lead to a new test being adopted that is actually less accurate than existing tests. One widely used solution to the lack of a reference standard is “panel diagnosis” in which two or more experts assess the results from multiple tests to reach a final diagnosis for each patient in a diagnostic study. However, there is currently no formal guidance available on the conduct and reporting of panel diagnosis. Here, the researchers undertake a systematic review (a study that uses predefined criteria to identify research on a given topic) to provide an overview of the methodology and reporting of panel diagnosis.
What Did the Researchers Do and Find?
The researchers identified 81 published diagnostic studies that used panel diagnosis as a reference standard. 37% of these studies reported on psychiatric diseases, 21% reported on cardiovascular diseases, and 12% reported on respiratory diseases. Most of the studies (64%) were designed to assess the accuracy of one or more diagnostic test. Notably, one or more critical piece of information on methodology was missing in 83% of the studies. Specifically, information on the constitution of the panel was missing in a quarter of the studies and information on the decision-making process (whether, for example, a diagnosis was reached by discussion among panel members or by combining individual panel member's assessments) was incomplete in more than two-thirds of the studies. In three-quarters of the studies for which information was available, the panel consisted of only two or three members; different fields of expertise were represented in the panels in nearly two-thirds of the studies. In a third of the studies for which information was available, panel members made their diagnoses without access to the results of the test being assessed. Finally, the reproducibility of the decision-making process was assessed in a fifth of the studies.
What Do These Findings Mean?
These findings indicate that the methodology of panel diagnosis varies substantially among diagnostic studies and that reporting of this methodology is often unclear or absent. Both the methodology and reporting of panel diagnosis could, therefore, be improved substantially. Based on their findings, the researchers provide a checklist and flow chart to help guide the conduct and reporting of studies involving panel diagnosis. For example, they suggest that, when designing a study that uses panel diagnosis as the reference standard, the number and background of panel members should be considered, and they provide a list of options that should be considered when planning the decision-making process. Although more research into each of the options identified by the researchers is needed, their recommendations provide a starting point for the development of formal guidelines on the methodology and reporting of panel diagnosis for use as a reference standard in diagnostic research.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001531.
Wikipedia has a page on medical diagnosis (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Equator Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines, including the STAndards for the Reporting of Diagnostic accuracy studies (STARD), an initiative that aims to improve the accuracy and completeness of reporting of studies of diagnostic accuracy
doi:10.1371/journal.pmed.1001531
PMCID: PMC3797139  PMID: 24143138
3.  Analysis of the inter- and intra-observer agreement in radiographic evaluation of wrist fractures using the multimedia messaging service 
Hand (New York, N.Y.)  2011;6(4):384-389.
Background
Orthopaedic surgeons are often asked to evaluate X-rays of patients admitted to the Accident and Emergency Department with the suspicion of a wrist fracture or, in the case of an evident fracture, to decide the correct treatment. The aim of this study was to evaluate the feasibility of a correct interpretation of the images of injured wrists on the screen of a last generation mobile phone, in order to evaluate if the specialist could make the right diagnosis and choose the correct treatment.
Methods
Five orthopaedic and one hand surgeons have evaluate the X-rays of 67 patients who sustained an injury to their wrist. In the case of fracture, they were asked to classify it according to the AO and Mayo classification systems. The evaluation of the images was accomplished through the PACS and using a mobile phone, at a different time. In order to check the inter- and intra-observer reliability, the same pattern was followed after a few months.
Results
The mobile phone showed basically the same agreement between the observers highlighting the worsening of the inter- and intra-observer reliability with the increment of the variables considered by a classification system.
Conclusions
The present paper confirms that a last generation mobile phone can already be used in the clinical practise of orthopaedic surgeons on call who could use it as a useful device in remote or poorly served areas for a rapid and economic consultation
Level of Evidence
The level of evidence of this case is economic and decision analysis, level 2
doi:10.1007/s11552-011-9362-4
PMCID: PMC3213258  PMID: 23204964
Wrist fracture; Telemedicine; Inter-observer agreement; Intra-observer agreement
4.  Reliability of Magnetic Resonance Imaging Readings for Lumbar Disc Herniation in the Spine Patient Outcomes Research Trial (SPORT) 
Spine  2008;33(9):991-998.
Study Design
Assessment of the reliability of standardized magnetic resonance imaging (MRI) interpretations and measurements.
Objective
To determine the intra- and inter-reader reliability of MRI parameters relevant to patients with intervertebral disc herniation (IDH), including disc morphology classification, degree of thecal sac compromise, grading of nerve root impingement, and measurements of cross-sectional area of the spinal canal, thecal sac, and disc fragment.
Summary of Background Data
MRI is increasingly used to assess patients with sciatica and IDH, but the relationship between specific imaging characteristics and patient outcomes remains uncertain. Although other studies have evaluated the reliability of certain MRI characteristics, comprehensive evaluation of the reliability of readings of herniated disc features on MRI is lacking.
Methods
Sixty randomly selected MR images from patients with IDH enrolled in the Spine Patient Outcomes Research Trial were each rated according to defined criteria by 4 independent readers (3 radiologists and 1 orthopedic surgeon). Quantitative measurements were performed separately by 2 other radiologists. A sample of 20 MRIs was re-evaluated by each reader at least 1 month later. Agreement for rating data were assessed with kappa statistics using linear weights. Reliability of the quantitative measurements was assessed using intraclass correlation coefficients (ICCs) and summaries of measurement error.
Results
Inter-reader reliability was substantial for disc morphology [overall kappa 0.81 (95% confidence interval (CI): 0.78, 0.85)], moderate for thecal sac compression [overall kappa 0.54 (95% CI: 0.37, 0.68)], and moderate for grading nerve root impingement [overall kappa 0.47 (95% CI: 0.36, 0.56)]. Quantitative measures showed high ICCs of 0.87 to 0.96 for spinal canal and thecal sac cross-sectional areas. Measures of disc fragment area had moderate ICCs of 0.65 to 0.83. Mean absolute differences between measurements ranged from approximately 15% to 20%.
Conclusion
Classification of disc morphology showed substantial intra- and inter-reader agreement, whereas thecal sac and nerve root compression showed more moderate reader reliability. Quantitative measures of canal and thecal sac area showed good reliability, whereas measurement of disc fragment area showed more modest reliability.
doi:10.1097/BRS.0b013e31816c8379
PMCID: PMC2745940  PMID: 18427321
disc herniation; MRI; reliability study
5.  Promotional Tone in Reviews of Menopausal Hormone Therapy After the Women's Health Initiative: An Analysis of Published Articles 
PLoS Medicine  2011;8(3):e1000425.
Adriane Fugh-Berman and colleagues analyzed a selection of published opinion pieces on hormone therapy and show that there may be a connection between receiving industry funding for speaking, consulting, or research and the tone of such opinion pieces.
Background
Even after the Women's Health Initiative (WHI) found that the risks of menopausal hormone therapy (hormone therapy) outweighed benefit for asymptomatic women, about half of gynecologists in the United States continued to believe that hormones benefited women's health. The pharmaceutical industry has supported publication of articles in medical journals for marketing purposes. It is unknown whether author relationships with industry affect promotional tone in articles on hormone therapy. The goal of this study was to determine whether promotional tone could be identified in narrative review articles regarding menopausal hormone therapy and whether articles identified as promotional were more likely to have been authored by those with conflicts of interest with manufacturers of menopausal hormone therapy.
Methods and Findings
We analyzed tone in opinion pieces on hormone therapy published in the four years after the estrogen-progestin arm of the WHI was stopped. First, we identified the ten authors with four or more MEDLINE-indexed reviews, editorials, comments, or letters on hormone replacement therapy or menopausal hormone therapy published between July 2002 and June 2006. Next, we conducted an additional search using the names of these authors to identify other relevant articles. Finally, after author names and affiliations were removed, 50 articles were evaluated by three readers for scientific accuracy and for tone. Scientific accuracy was assessed based on whether or not the findings of the WHI were accurately reported using two criteria: (1) Acknowledgment or lack of denial of the risk of breast cancer diagnosis associated with hormone therapy, and (2) acknowledgment that hormone therapy did not benefit cardiovascular disease endpoints. Determination of promotional tone was based on the assessment by each reader of whether the article appeared to promote hormone therapy. Analysis of inter-rater consistency found moderate agreement for scientific accuracy (κ = 0.57) and substantial agreement for promotional tone (κ = 0.65). After discussion, readers found 86% of the articles to be scientifically accurate and 64% to be promotional in tone. Themes that were common in articles considered promotional included attacks on the methodology of the WHI, arguments that clinical trial results should not guide treatment for individuals, and arguments that observational studies are as good as or better than randomized clinical trials for guiding clinical decisions. The promotional articles we identified also implied that the risks associated with hormone therapy have been exaggerated and that the benefits of hormone therapy have been or will be proven. Of the ten authors studied, eight were found to have declared payment for speaking or consulting on behalf of menopausal hormone manufacturers or for research support (seven of these eight were speakers or consultants). Thirty of 32 articles (90%) evaluated as promoting hormone therapy were authored by those with potential financial conflicts of interest, compared to 11 of 18 articles (61%) by those without such conflicts (p = 0.0025). Articles promoting the use of menopausal hormone therapy were 2.41 times (95% confidence interval 1.49–4.93) as likely to have been authored by authors with conflicts of interest as by authors without conflicts of interest. In articles from three authors with conflicts of interest some of the same text was repeated word-for-word in different articles.
Conclusion
There may be a connection between receiving industry funding for speaking, consulting, or research and the publication of promotional opinion pieces on menopausal hormone therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Over the past three decades, menopausal hormones have been heavily promoted for preventing disease in women. However, the Women's Health Initiative (WHI) study—which enrolled more than 26,000 women in the US and which was published in 2004—found that estrogen-progestin and estrogen-only formulations (often prescribed to women around the age of menopause) increased the risk of stroke, deep vein thrombosis, dementia, and incontinence. Furthermore, this study found that the estrogen-progestin therapy increased rates of breast cancer. In fact, the estrogen-progestin arm of the WHI study was stopped in 2002 due to harmful findings, and the estrogen-only arm was stopped in 2004, also because of harmful findings. In addition, the study also found that neither therapy reduced cardiovascular risk or markedly benefited health-related quality of life measures.
Despite these results, two years after the results of WHI study were published, a survey of over 700 practicing gynecologists—the specialists who prescribe the majority of menopausal hormone therapies—in the US found that almost half did not find the findings of the WHI study convincing and that 48% disagreed with the decision to stop the trial early. Furthermore, follow-up surveys found similar results.
Why Was This Study Done?
It is unclear why gynecologists and other physicians continue to prescribe menopausal hormone therapies despite the results of the WHI. Some academics argue that published industry-funded reviews and commentaries may be designed to convey specific, but subtle, marketing messages and several academic analyses have used internal industry documents disclosed in litigation cases. So this study was conducted to investigate whether hormone therapy–promoting tone could be identified in narrative review articles and if so, whether these articles were more likely to have been authored by people who had accepted funding from hormone manufacturers.
What Did the Researchers Do and Find?
The researchers conducted a comprehensive literature search that identified 340 relevant articles published between July 2002 and June 2006—the four years following the cessation of the estrogen-progestin arm of the women's health initiative study. Ten authors had published four to six articles, 47 authored two or three articles, and 371 authored one article each. The researchers focused on authors who had published four or more articles in the four-year period under study and, after author names and affiliations were removed, 50 articles were evaluated by three readers for scientific accuracy and for tone. After individually analyzing a batch of articles, the readers met to provide their initial assessments, to discuss them, and to reach consensus on tone and scientific accuracy. Then after the papers were evaluated, each author was identified and the researchers searched for authors' potential financial conflicts of interest, defined as publicly disclosed information that the authors had received payment for research, speaking, or consulting on behalf of a manufacturer of menopausal hormone therapy.
Common themes in the 50 articles included arguments that clinical trial results should not guide treatment for individuals and suggestions that the risks associated with hormone therapy have been exaggerated and that the benefits of hormone therapy have been or will be proven. Furthermore, of the ten authors studied, eight were found to have received payment for research, speaking or consulting on behalf of menopause hormone manufacturers, and 30 of 32 articles evaluated as promoting hormone therapy were authored by those with potential financial conflicts of interest. Articles promoting the use of menopausal hormone therapy were more than twice as likely to have been written by authors with conflicts of interest as by authors without conflicts of interest. Furthermore, Three authors who were identified as having financial conflicts of interest were authors on articles where sections of their previously published articles were repeated word-for-word without citation.
What Do These Findings Mean?
The findings of this study suggest that there may be a link between receiving industry funding for speaking, consulting, or research and the publication of apparently promotional opinion pieces on menopausal hormone therapy. Furthermore, such publications may encourage physicians to continue prescribing these therapies to women of menopausal age. Therefore, physicians and other health care providers should interpret the content of review articles with caution. In addition, medical journals should follow the International Committee of Medical Journal Editors Uniform Requirements for Manuscripts, which require that all authors submit signed statements of their participation in authorship and full disclosure of any conflicts of interest.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000425.
The US National Heart, Lung, and Blood Institute has more information on the Womens Health Initiative
The US National Institutes of Health provide more information about the effects of menopausal hormone replacement therapy
The Office of Women's Health, U.S. Department of Health and Human Services provides information on menopausal hormone therapy
The International Committee of Medical Journal Editors Uniform Requirements for Manuscripts presents Uniform Requirements for Manuscripts published in biomedical journals
The National Womens Health Network, a consumer advocacy group that takes no industry money, has factsheets and articles about menopausal hormone therapy
PharmedOut, a Georgetown University Medical Center project, has many resources on pharmaceutical marketing practices
doi:10.1371/journal.pmed.1000425
PMCID: PMC3058057  PMID: 21423581
6.  Diagnostic accuracy of cone beam computed tomography and conventional multislice spiral tomography in sheep mandibular condyle fractures 
Dentomaxillofacial Radiology  2010;39(6):336-342.
Objectives
The aim of this study was to compare diagnostic accuracy of cone beam CT (CBCT) and multislice CT in artificially created fractures of the sheep mandibular condyle.
Methods
63 full-thickness sheep heads were used in this study. Two surgeons created the fractures, which were either displaced or non-displaced. CBCT images were acquired by the NewTom 3G® CBCT scanner (NIM, Verona, Italy) and CT imaging was performed using the Toshiba Aquillon® multislice CT scanner (Toshiba Medical Systems, Otawara, Japan). Two-dimensional (2D) cross-sectional images and three-dimensional (3D) reconstructions were evaluated by two observers who were asked to determine the presence or absence of fracture and displacement, the type of fracture, anatomical localization and type of displacement. The naked-eye inspection during surgery served as the gold standard. Inter- and intra-observer agreements were calculated with weighted kappa statistics. The receiver operating characteristics (ROC) curve analyses were used to compare statistically the area under the curve (AUC) of both imaging modalities.
Results
Kappa coefficients of intra- and interobserver agreement scores varied between 0.56 – 0.98, which were classified as moderate and excellent, respectively. There was no statistically significant difference between the imaging modalities, which were both sensitive and specific for the diagnosis of sheep condylar fractures.
Conclusions
This study confirms that CBCT is similar to CT in the diagnosis of different types of experimentally created sheep condylar fractures and can provide a cost- and dose-effective diagnostic option.
doi:10.1259/dmfr/29930707
PMCID: PMC3520235  PMID: 20729182
cone beam computed tomography; multislice computed tomography; condyle; fracture; sheep
7.  Prediction accuracy of a sample-size estimation method for ROC studies 
Academic radiology  2010;17(5):628-638.
Rationale and Objectives
Sample-size estimation is an important consideration when planning a receiver operating characteristic (ROC) study. The aim of this work was to assess the prediction accuracy of a sample-size estimation method using the Monte Carlo simulation method.
Materials and Methods
Two ROC ratings simulators characterized by low reader and high case variabilities (LH) and high reader and low case variabilities (HL) were used to generate pilot data sets in 2 modalities. Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the ratings yielded estimates of the modality-reader, modality-case and error variances. These were input to the Hillis-Berbaum (HB) sample-size estimation method, which predicted the number of cases needed to achieve 80% power for 10 readers and an effect size of 0.06 in the pivotal study. Predictions that generalized to readers and cases (random-all), to cases only (random-cases) and to readers only (random-readers) were generated. A prediction-accuracy index defined as the probability that any single prediction yields true power in the range 75% to 90% was used to assess the HB method.
Results
For random-case generalization the HB-method prediction-accuracy was reasonable, ~ 50% for 5 readers in the pilot study. Prediction-accuracy was generally higher under low reader variability conditions (LH) than under high reader variability conditions (HL). Under ideal conditions (many readers in the pilot study) the DBM-MRMC based HB method overestimated the number of cases. The overestimates could be explained by the observed large variability of the DBM-MRMC modality-reader variance estimates, particularly when reader variability was large (HL). The largest benefit of increasing the number of readers in the pilot study was realized for LH, where 15 readers were enough to yield prediction accuracy > 50% under all generalization conditions, but the benefit was lesser for HL where prediction accuracy was ~ 36% for 15 readers under random-all and random-reader conditions.
Conclusion
The HB method tends to overestimate the number of cases. Random-case generalization had reasonable prediction accuracy. Provided about 15 readers were used in the pilot study the method performed reasonably under all conditions for LH. When reader variability was large, the prediction-accuracy for random-all and random-reader generalizations was compromised. Study designers may wish to compare the HB predictions to those of other methods and to sample-sizes used in previous similar studies.
doi:10.1016/j.acra.2010.01.007
PMCID: PMC2867097  PMID: 20380980
ROC; sample-size; methodology assessment; statistical power; DBM; MRMC; simulation; Monte Carlo
8.  Reliability of Readings of Magnetic Resonance Imaging Features of Lumbar Spinal Stenosis 
Spine  2008;33(14):1605-1610.
Study Design
A reliability assessment of standardized magnetic resonance imaging (MRI) interpretations and measurements.
Objective
To determine the intra- and inter-reader reliability of MRI features of lumbar spinal stenosis (SPS), including severity of central, subarticular, and foraminal stenoses, grading of nerve root impingement, and measurements of cross-sectional area of the spinal canal and thecal sac.
Summary of Background Data
MRI is commonly used to assess patients with spinal stenosis. Although a number of studies have evaluated the reliability of certain MRI characteristics, comprehensive evaluation of the reliability of MRI readings in spinal stenosis is lacking.
Methods
Fifty-eight randomly selected MR images from patients with SPS enrolled in the Spine Patient Outcomes Research Trial were evaluated. Qualitative ratings of imaging features were performed according to defined criteria by 4 independent readers (3 radiologists and 1 orthopedic surgeon). A sample of 20 MRIs was reevaluated by each reader at least 1 month later. Weighted κ statistics were used to characterize intra- and inter-reader reliability for qualitative rating data. Separate quantitative measurements were performed by 2 other radiologists. Intraclass correlation coefficients and summaries of measurement error were used to characterize reliability for quantitative measurements.
Results
Intra-reader reliability was higher than interreader reliability for all features. Inter-reader reliability in assessing central stenosis was substantial, with an overall κ of 0.73 (95% CI 0.69-0.77). Foraminal stenosis and nerve root impingement showed moderate to substantial agreement with overall κ of 0.58 (95% CI 0.53-0.63) and 0.51 (95% CI 0.42-0.59), respectively. Subarticular zone stenosis yielded the poorest agreement (overall κ 0.49; 95% CI 0.42-0.55) and showed marked variability in agreement between reader pairs. Quantitative measures showed inter-reader intraclass correlation coefficients ranging from 0.58 to 0.90. The mean absolute difference between readers in measured thecal sac area was 128 mm2 (13%).
Conclusion
The imaging characteristics of spinal stenosis assessed in this study showed moderate to substantial reliability; future studies should assess whether these findings have prognostic significance in SPS patients.
doi:10.1097/BRS.0b013e3181791af3
PMCID: PMC2754786  PMID: 18552677
spinal stenosis; MRI; reliability
9.  Radiographic union score for hip substantially improves agreement between surgeons and radiologists 
Background
Despite the prominence of hip fractures in orthopedic trauma, the assessment of fracture healing using radiographs remains subjective. The variability in the assessment of fracture healing has important implications for both clinical research and patient care. With little existing literature regarding reliable consensus on hip fracture healing, this study was conducted to determine inter-rater reliability between orthopedic surgeons and radiologists on healing assessments using sequential radiographs in patients with hip fractures. Secondary objectives included evaluating a checklist designed to assess hip fracture healing and determining whether agreement improved when reviewers were aware of the timing of the x-rays in relation to the patients’ surgery.
Methods
A panel of six reviewers (three orthopedic surgeons and three radiologists) independently assessed fracture healing using sequential radiographs from 100 patients with femoral neck fractures and 100 patients with intertrochanteric fractures. During their independent review they also completed a previously developed radiographic checklist (Radiographic Union Score for Hip (RUSH)). Inter and intra-rater reliability scores were calculated. Data from the current study was compared to the findings from a previously conducted study where the same reviewers, unaware of the timing of the x-rays, completed the RUSH score.
Results
The agreement between surgeons and radiologists for fracture healing was moderate for “general impression of fracture healing” in both femoral neck (ICC = 0.60, 95% CI: 0.42-0.71) and intertrochanteric fractures (0.50, 95% CI: 0.33-0.62). Using a standardized checklist (RUSH), agreement was almost perfect in both femoral neck (ICC = 0.85, 95% CI: 0.82-0.87) and intertrochanteric fractures (0.88, 95% CI: 0.86-0.90). We also found a high degree of correlation between healing and the total RUSH score using a Receiver Operating Characteristic (ROC) analysis, there was an area under the curve of 0.993 for femoral neck cases and 0.989 for intertrochanteric cases. Agreement within the radiologist group and within the surgeon group did not significantly differ in our analyses. In all cases, radiographs in which the time from surgery was known resulted in higher agreement scores compared to those from the previous study in which reviewers were unaware of the time the radiograph was obtained.
Conclusions
Agreement in hip fracture radiographic healing may be improved with the use of a standardized checklist and appears highly influenced by the timing of the radiograph. These findings should be considered when evaluating patient outcomes and in clinical studies involving patients with hip fractures. Future research initiatives are required to further evaluate the RUSH checklist.
doi:10.1186/1471-2474-14-70
PMCID: PMC3599458  PMID: 23442540
Hip fractures; Reliability; Fracture healing; Radiographs
10.  Classification and treatment of proximal humerus fractures: inter-observer reliability and agreement across imaging modalities and experience 
Summary
Proximal humerus fractures (PHF) are common injuries, but previous studies have documented poor inter-observer reliability in fracture classification. This disparity has been attributed to multiple variables including poor imaging studies and inadequate surgeon experience. The purpose of this study is to evaluate whether inter-observer agreement can be improved with the application of multiple imaging modalities including X-ray, CT, and 3D CT reconstructions, stratified by physician experience, for both classification and treatment of PHFs.
Methods
Inter-observer agreement was measured for classification and treatment of PHFs. A total of sixteen fractures were imaged by plain X-ray (scapular AP and lateral), CT scan, and 3D CT reconstruction, yielding 48 randomized image sets. The observers consisted of 16 orthopaedic surgeons (4 upper extremity specialists, 4 general orthopedists, 4 senior residents, 4 junior residents), who were asked to classify each image set using the Neer system, and recommend treatment from four pre-selected choices. The results were evaluated by kappa reliability coefficients for inter-observer agreement between all imaging modalities and sub-divided by: fracture type and observer experience.
Results
All kappa values ranged from "slight" to "moderate" (k = .03 to .57) agreement. For overall classification and treatment, no advanced imaging modality had significantly higher scores than X-ray. However, when sub-divided by experience, 3D reconstruction and CT scan both had significantly higher agreement on classification than X-ray, among upper extremity specialists. Agreement on treatment among upper extremity specialists was best with CT scan. No other experience sub-division had significantly different kappa scores. When sub-divided by fracture type, CT scan and 3D reconstruction had higher scores than X-ray for classification only in 4-part fractures. Agreement on treatment of 4 part fractures was best with CT scan. No other fracture type sub-division had significantly different kappa scores.
Conclusions
Although 3D reconstruction showed a slight improvement in the inter-observer agreement for fracture classification among specialized upper extremity surgeons compared to all imaging modalities, fracture types, and surgeon experience; overall all imaging modalities continue to yield low inter-observer agreement for both classification and treatment regardless of physician experience.
doi:10.1186/1749-799X-6-38
PMCID: PMC3162565  PMID: 21801370
11.  Satisfaction of Search in Multi-trauma Patients: Severity of Detected Fractures 
Academic radiology  2007;14(6):711-722.
Purpose
Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected. This research studied whether the severity of a detected fracture determines whether subsequent fractures are overlooked.
Materials and Methods
Each of seventy simulated multi-trauma patients presented examinations of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic area included a fracture (the SOS condition), and when it did not (the control condition). The SOS effect was measured on detection accuracy for subtle test fractures presented on examinations of the second and third anatomic areas. In an experiment with twelve radiology readers, the initial SOS radiographs showed non-displaced fractures of extremities, fractures associated with low morbidity. In another experiment with twelve different radiology readers, the initial examination, usually a CT, showed cervical and pelvic fractures of the type associated with high morbidity. Because of their more direct role in patient care, the experiment using high morbidity SOS fractures was repeated with seventeen orthopedic readers.
Results
Detection of subtle test fractures was substantially reduced when fractures of low morbidity were added (p<0.01). No similar SOS effect was observed in either experiment in which added fractures were associated with high morbidity.
Conclusion
The satisfaction of search effect in skeletal radiology was replicated, essentially doubling the evidence for SOS in musculoskeletal radiology, and providing an essential contrast to the absence of SOS from high morbidity fractures.
doi:10.1016/j.acra.2007.02.016
PMCID: PMC1978092  PMID: 17502261
12.  Intra-and inter-reader reliability of semi-automated quantitative morphometry measurements and vertebral fracture assessment using lateral scout views from computed tomography 
Summary
Intra-and inter-reader reliability of semi-automated quantitative vertebral morphometry measurements was determined using lateral computed tomography (CT) scout views. The method requires less time than conventional morphometry. Reliability was excellent for vertebral height measurements, good for height ratios, and comparable to semi-quantitative grading by radiologists for identification of vertebral fractures.
Introduction
Underdiagnosis and undertreatment of vertebral fracture (VFx) is a well-known problem worldwide. Thus, new methods are needed to facilitate identification of VFx. This study aimed to determine intra- and inter-reader reliability of semi-automated quantitative vertebral morphometry based on shape-based statistical modeling (SpineAnalyzer, Optasia Medical, Cheadle, UK).
Methods
Two non-radiologists independently assessed vertebral morphometry from CT lateral scout views at two time points in 96 subjects (50 men, 46 women, 70.3±8.9 years) selected from the Framingham Heart Study Offspring and Third Generation Multi-Detector CT Study. VFxs were classified based solely on morphometry measurements using Genant’s criteria. Intraclass correlation coefficients (ICCs), root mean squared coefficient of variation (RMS CV) and kappa (k) statistics were used to assess reliability.
Results
We analyzed 1,246 vertebrae in 96 subjects. The analysis time averaged 5.4±1.7 min per subject (range, 3.2–9.1 min). Intra-and inter-reader ICCs for vertebral heights were excellent (>0.95) for all vertebral levels combined. Intra- and inter-reader RMS CV for height measurements ranged from 2.5% to 3.9% and 3.3% to 4.4%, respectively. Reliability of vertebral height ratios was good to fair. Based on morphometry measurements alone, readers A and B identified 51–52 and 46–59 subjects with at least one prevalent VFx, respectively, and there was a good intra-and inter-reader agreement (k=0.59–0.69) for VFx identification.
Conclusions
Semi-automated quantitative vertebral morphometry measurements from CT lateral scout views are convenient and reproducible, and may facilitate assessment of VFx.
doi:10.1007/s00198-011-1530-4
PMCID: PMC3650637  PMID: 21271340
Computed tomography; Reliability; Semi-automated vertebral morphometry; Vertebral fracture
13.  Quantification of Lower Leg Arterial Calcifications by High-Resolution Peripheral Quantitative Computed Tomography 
Bone  2013;58:42-47.
Vascular calcifications and bone health seem to be etiologically linked via common risk factors such as aging and subclinical chronic inflammation. Epidemiologic studies have shown significant associations between low bone mineral density (BMD), fragility fractures and calcifications of the coronary arteries and the abdominal aorta. In the last decade, high-resolution peripheral quantitative computed tomography (HR-pQCT) has emerged as in-vivo research tool for the assessment of peripheral bone geometry, density, and microarchitecture. Although vascular calcifications are frequently observed as incidental findings in HR-pQCT scans, they have not yet been incorporated into quantitative HR-pQCT analyses. We developed a semi-automated algorithm to quantify lower leg arterial calcifications (LLAC), captured by HR-pQCT. The objective of our study was to determine validity and reliability of the LLAC measure.
HR-pQCT scans were downscaled to a voxel size of 250 µm. After subtraction of bone volumes from the scans, LLAC were detected and contoured by a semi-automated, dual-threshold seed-point segmentation. LLAC mass (in mg hydroxyapatite; HA) was calculated as the product of voxel-based calcification volume (mm3) and mean calcification density (mgHA/cm3)/1000. To determine validity, we compared LLAC to coronary artery calcifications (CAC), as quantified by multi-detector computed tomography (MDCT) and Agatston scoring in forty-six patients on chronic hemodialysis. Moreover, we investigated associations of LLAC with age, time on dialysis, type-2 diabetes mellitus, history of stroke, and myocardial infarction. In a second step, we determined intra- and inter-reader reliability of the LLAC measure.
In the validity study, LLAC were present (>0 mgHA) in 76% of patients, 78% of patients had CAC (>0 mgHA). Median LLAC was 6.65 (0.08 – 24.40) mgHA and median CAC as expressed by Agatston score was 266.3 (15.88 – 1877.28). We found a significant positive correlation between LLAC and CAC (rho=0.6; p<0.01). Dialysis patients with type-2 diabetes mellitus (DM; 35%) and history of stroke (13%) had higher median LLAC than patients without those conditions (DM 20.0 fold greater, p=0.006; Stroke 5.1 fold greater, p=0.047;). LLAC was positively correlated with time on dialysis (rho=0.337, p=0.029), there was a trend towards a positive association of LLAC and age (rho=0.289, p=0.053). The reliability study yielded excellent intra- and inter-reader agreement of the LLAC measure (intra-reader ICC=0.999, 95% CI=0.998–1.000; inter-reader ICC=0.998, 95% CI=0.994–0.999).
Our study indicates that the LLAC measure has good validity and excellent reliability. The use of HR-pQCT for the simultaneous evaluation of arterial calcifications, peripheral bone geometry, bone density, and bone microarchitecture should facilitate future research on osteo-vascular interactions and potential associations with cardiovascular events.
doi:10.1016/j.bone.2013.08.006
PMCID: PMC4042679  PMID: 23954758
HR-pQCT; Lower Leg Arterial Calcifications; Quantification; Agatston-Score
14.  A Review on the Article: Role of Common Biochemical Markers for the Assessment of Fracture Union 
Criterion for patient selection and demographic comparison between the two groups, the numbers in individual groups (normal union or defective union) how the observers were blinded for the samples and control group measurements in the graphs were not mentioned. No data on actual measurement levels is given. No where it is mentioned like it is an average of all normal union or defective union in tables or diagrams. The treatment methodology maybe still focused including either conservative or surgical treatment since a displaced fracture cannot be compared with an osteotomy. The cases shall be followed up for a longer period. Some suggestions on how to blind the observers is given. In the one year period there was neither mention of the failure of treatment or complications of any of neither these 36 cases nor any drop out for follow-ups. There was also no mention of any case which initially put on conservation was changed to surgical management. The authors could have one group of patients taking this food stuff and another group who refused this food stuff. The statistical test used to compare the levels of factor is not mentioned. Mere statement that ‘p’ values were significant will not benefit the reader. Failure to produce X-rays even for a single case weakens the study. X-rays are needed to confirm the diagnosis of a fracture and confirm the position of implants and fracture fragments. The remaining period after confirming the fracture/implant position the case shall be followed only with marker estimation. Once the desired levels of increase or tapering of marker level achieved then X-rays can be taken to correlate with clinical findings and radiology. Union as one group and the second group shall be called as non-union or delayed union. The second group (the poor callus group) is mentioned as malunion possibly by over-sight. Probably they were meaning the non union or delayed union group or defective union as malunion. This should be preferably be written non union or delayed union. Malunion means the fracture actually unites and union process is completed. In a group of fractures (hypertrophic non unions) the callus formation is excessive, still the fracture is ununited. Thus the enzyme or markers alone cannot disclose the details of the completion of union they can herald bone formation.
doi:10.1007/s12291-012-0233-8
PMCID: PMC3477447  PMID: 24082473
Blinding; Marker estimation; X-rays; Non-union; Malunion
15.  School Playground Surfacing and Arm Fractures in Children: A Cluster Randomized Trial Comparing Sand to Wood Chip Surfaces 
PLoS Medicine  2009;6(12):e1000195.
In a randomized trial of elementary schools in Toronto, Andrew Howard and colleagues show that granitic sand playground surfaces reduce the risk of arm fractures from playground falls when compared with wood fiber surfaces.
Background
The risk of playground injuries, especially fractures, is prevalent in children, and can result in emergency room treatment and hospital admissions. Fall height and surface area are major determinants of playground fall injury risk. The primary objective was to determine if there was a difference in playground upper extremity fracture rates in school playgrounds with wood fibre surfacing versus granite sand surfacing. Secondary objectives were to determine if there were differences in overall playground injury rates or in head injury rates in school playgrounds with wood fibre surfacing compared to school playgrounds with granite sand surfacing.
Methods and Findings
The cluster randomized trial comprised 37 elementary schools in the Toronto District School Board in Toronto, Canada with a total of 15,074 students. Each school received qualified funding for installation of new playground equipment and surfacing. The risk of arm fracture from playground falls onto granitic sand versus onto engineered wood fibre surfaces was compared, with an outcome measure of estimated arm fracture rate per 100,000 student-months. Schools were randomly assigned by computer generated list to receive either a granitic sand or an engineered wood fibre playground surface (Fibar), and were not blinded. Schools were visited to ascertain details of the playground and surface actually installed and to observe the exposure to play and to periodically monitor the depth of the surfacing material. Injury data, including details of circumstance and diagnosis, were collected at each school by a prospective surveillance system with confirmation of injury details through a validated telephone interview with parents and also through collection (with consent) of medical reports regarding treated injuries. All schools were recruited together at the beginning of the trial, which is now closed after 2.5 years of injury data collection. Compliant schools included 12 schools randomized to Fibar that installed Fibar and seven schools randomized to sand that installed sand. Noncompliant schools were added to the analysis to complete a cohort type analysis by treatment received (two schools that were randomized to Fibar but installed sand and seven schools that were randomized to sand but installed Fibar). Among compliant schools, an arm fracture rate of 1.9 (95% confidence interval [CI] 0.04–6.9) per 100,000 student-months was observed for falls into sand, compared with an arm fracture rate of 9.4 (95% CI 3.7–21.4) for falls onto Fibar surfaces (p≤0.04905). Among all schools, the arm fracture rate was 4.5 (95% CI 0.26–15.9) per 100,000 student-months for falls into sand compared with 12.9 (95% CI 5.1–30.1) for falls onto Fibar surfaces. No serious head injuries and no fatalities were observed in either group.
Conclusions
Granitic sand playground surfaces reduce the risk of arm fractures from playground falls when compared with engineered wood fibre surfaces. Upgrading playground surfacing standards to reflect this information will prevent arm fractures.
Trial Registration
Current Controlled Trials ISRCTN02647424
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Playgrounds and outdoor play equipment provide children with a place to let steam off, play creatively, socialize, and learn new skills. And, in a world where childhood obesity is a burgeoning problem, playgrounds provide a place where children can be encouraged to exercise. But playgrounds are not without hazards. Even in well-maintained and well-run facilities, children can hurt themselves by falling off climbing frames, monkey bars, and other equipment or by falling from standing height during playground games such as tag. In the US alone, more than 200,000 children are treated in emergency departments for injuries sustained in playgrounds every year and about 6,400 children are admitted to hospitals because of playground injuries, most of which are bone fractures (broken bones). In fact, playground injuries in the US are more severe and have a higher hospital admission rate than any other sort of child injury except those involving vehicles.
Why Was This Study Done?
Children who fall off playground equipment are nearly four times as likely to break a bone (often in an arm) as children who fall from standing height. To reduce the number of fractures that occur in playgrounds, some governments have limited the height of playground equipment. Some have also set standards for the type of surfaces installed in playgrounds and for the depth of sand or engineered wood fiber in loose fill surfaces. These standards are based on laboratory tests in which headforms (artificial heads) are dropped onto surfaces. However, these tests provide no information about the ability of different surfaces to prevent broken arms and other specific injuries in the real world. In this cluster randomized trial (a study in which groups of people are randomly assigned to receive different interventions), the researchers compare the rates of arm fractures in elementary (primary) school playgrounds in Toronto (Canada) that have wood fiber surfacing with the rates in playgrounds that have granite sand surfacing.
What Did the Researchers Do and Find?
The researchers randomly assigned 37 elementary schools that had qualified for school board funding for replacement playground equipment to receive either wood fiber (19 schools) or granite sand surfacing (18 schools) in their playgrounds. 19 of the schools complied with their randomization (12 installed fiber and seven installed sand); two schools installed sand although they were randomized to install fiber and seven schools installed fiber instead of sand. The researchers evaluated the playgrounds and their surfaces several times during the 2.5-year study and collected data on how playground injuries happened and types of injuries from the schools, parents, and medical reports. Among the schools that complied with randomization, falls from height into sand resulted in 1.9 arm fractures per 100,000 student-months whereas falls into fiber resulted in 9.4 arm fractures per 100,000 student-months. Arm fracture rates and other injury rates were also higher for falls from height into fiber than into sand when all the schools that had installed new surfaces were considered. However, the rates of arm fracture and other injuries that did not involve a fall from height did not vary between surfaces.
What Do These Findings Mean?
The accuracy of these findings is limited by the small number of arm fractures that occurred during the trial—only 20 children who fell into fiber and two who fell into sand broke an arm. The accuracy of the findings may also be limited by the failure of many schools to comply with randomization although the researchers found no obvious differences between the schools that did and did not comply with randomization that might have affected the trial's outcome. However, even with these limitations, the findings of this real-world study indicate that granitic sand surfaces substantially reduce the risk of arm fractures and other injuries caused by falls from playground equipment when compared with wood fiber surfaces. Thus, because falls from playground equipment are more likely to cause a fracture than falls from standing height, if playground surfacing standards are adjusted to reflect the findings of this study (that is, if sand surfaces are recommended in preference to wood fiber surfaces), many arm fractures in children should be prevented.
Additional Information
Please access these Web sites via the online version of this summary at ttp://dx.doi.org/10.1371/journal.pmed.1000195.
Safe Kids Canada provides information about playground safety and other aspects of childhood safety (in English and French)
Safe Kids Worldwide is a global network of organizations whose mission is to prevent accidental childhood injury (in English and Spanish)
The Nemours Foundation, a nonprofit organization for child health, provides information for parents on playground safety
The Royal Society for the Prevention of Accidents provides information on the safety of indoor and outdoor play areas
The US Centers for Disease Control and Prevention provides fact sheets on playground injuries
The US Consumer Product Safety Commission also has information on playground safety, including resources designed for children such as The Further Adventures of Kidd Safety and Little Big Kids, a booklet on play safety written by children for children
doi:10.1371/journal.pmed.1000195
PMCID: PMC2784292  PMID: 20016688
16.  Reliability of vertebral fracture assessment using multidetector CT lateral scout views: the Framingham Osteoporosis Study 
Summary
Two radiologists evaluated images of the spine from computed tomography (CT) scans on two occasions to diagnose vertebral fracture in 100 individuals. Agreement was fair to good for mild fractures, and agreement was good to excellent for more severe fractures. CT scout views are useful to assess vertebral fracture.
Introduction
We investigated inter-reader agreement between two radiologists and intra-reader agreement between duplicate readings for each radiologist, in assessment of vertebral fracture using a semi-quantitative method from lateral scout views obtained by CT.
Methods
Participants included 50 women and 50 men (age 50-87 years, mean 70 years) in the Framingham Study. T4-L4 vertebrae were assessed independently by two radiologists on two occasions using a semi-quantitative scale as normal, mild, moderate, or severe fracture.
Results
Vertebra-specific prevalence of grade ≥1 (mild) fracture ranged from 3% to 5%. We found fair (κ=56-59%) inter-reader agreement for grade ≥1 vertebral fractures and good (κ=68-72%) inter-reader agreement for grade ≥2 fractures. Intra-reader agreement for grade ≥1 vertebral fracture was fair (κ=55%) for one reader and excellent for another reader (κ=77%), whereas intra-reader agreement for grade ≥2 vertebral fracture was excellent for both readers (κ=76% and 98%). Thoracic vertebrae were more difficult to evaluate than the lumbar region, and agreement was lowest (inter-reader κ=43%) for fracture at the upper (T4-T9) thoracic levels and highest (inter-reader κ=76-78%) for the lumbar spine (L1-L4).
Conclusions
Based on a semi-quantitative method to classify vertebral fractures using CT scout views, agreement within and between readers was fair to good, with the greatest source of variation occurring for fractures of mild severity and for the upper thoracic region. Agreement was good to excellent for fractures of at least moderate severity. Lateral CT scout views can be useful in clinical research settings to assess vertebral fracture.
doi:10.1007/s00198-010-1290-6
PMCID: PMC2964444  PMID: 20495902
Computed tomography; Lateral scout; Reliability; Scoutviews; Semiquantitative; Vertebral fracture
17.  World's first telepathology experiments employing WINDS ultra-high-speed internet satellite, nicknamed “KIZUNA” 
Background:
Recent advances in information technology have allowed the development of a telepathology system involving high-speed transfer of high-volume histological figures via fiber optic landlines. However, at present there are geographical limits to landlines. The Japan Aerospace Exploration Agency (JAXA) has developed the “Kizuna” ultra-high speed internet satellite and has pursued its various applications. In this study we experimented with telepathology in collaboration with JAXA using Kizuna. To measure the functionality of the Wideband InterNet working engineering test and Demonstration Satellite (WINDS) ultra-high speed internet satellite in remote pathological diagnosis and consultation, we examined the adequate data transfer speed and stability to conduct telepathology (both diagnosis and conferencing) with functionality, and ease similar or equal to telepathology using fiber-optic landlines.
Materials and Methods:
We performed experiments for 2 years. In year 1, we tested the usability of the WINDS for telepathology with real-time video and virtual slide systems. These are state-of-the-art technologies requiring massive volumes of data transfer. In year 2, we tested the usability of the WINDS for three-way teleconferencing with virtual slides. Facilities in Iwate (northern Japan), Tokyo, and Okinawa were connected via the WINDS and voice conferenced while remotely examining and manipulating virtual slides.
Results:
Network function parameters measured using ping and Iperf were within acceptable limits. However; stage movement, zoom, and conversation suffered a lag of approximately 0.8 s when using real-time video, and a delay of 60-90 s was experienced when accessing the first virtual slide in a session. No significant lag or inconvenience was experienced during diagnosis and conferencing, and the results were satisfactory. Our hypothesis was confirmed for both remote diagnosis using real-time video and virtual slide systems, and also for teleconferencing using virtual slide systems with voice functionality.
Conclusions:
Our results demonstrate the feasibility of ultra-high-speed internet satellite networks for use in telepathology. Because communications satellites have less geographical and infrastructural requirements than landlines, ultra-high-speed internet satellite telepathology represents a major step toward alleviating regional disparity in the quality of medical care.
doi:10.4103/2153-3539.119002
PMCID: PMC3815045  PMID: 24244882
KIZUNA (絆); optical fiber; real-time video system; telepathology; ultra-high-speed internet satellite; virtual slide system
18.  Inter-observer reliability assessment of the Schatzker, AO/OTA and three-column classification of tibial plateau fractures 
Background
The purpose of our study was to evaluate inter-observer reliability of the Three-Column classifications with conventional Schatzker and AO/OTA of Tibial Plateau Fractures.
Methods
50 cases involving all kinds of the fracture patterns were collected from 278 consecutive patients with tibial plateau fractures who were internal fixed in department of Orthopedics and Trauma III in Shanghai Sixth People’s Hospital. The series were arranged randomly, numbered 1 to 50. Four observers were chosen to classify these cases. Before the research, a classification training session was held to each observer. They were given as much time as they required evaluating the radiographs accurately and independently. The classification choices made at the first viewing were not available during the second viewing. The observers were not provided with any feedback after the first viewing. The kappa statistic was used to analyze the inter-observer reliability of the three fracture classification made by the four observers.
Results
The mean kappa values for inter-observer reliability regarding Schatzker classification was 0.567 (range: 0.513–0.589), representing “moderate agreement”. The mean kappa values for inter-observer reliability regarding AO/ASIF classification systems was 0.623 (range: 0.510–0.710) representing “substantial agreement”. The mean kappa values for inter-observer reliability regarding Three-Column classification systems was 0.766 (range: 0.706–0.890), representing “substantial agreement”.
Conclusion
Three-Column classification, which is dependent on the understanding of the fractures using CT scans as well as the 3D reconstruction can identity the posterior column fracture or fragment. It showed “substantial agreement” in the assessment of inter-observer reliability, higher than the conventional Schatzker and AO/OTA classifications. We finally conclude that Three-Column classification provides a higher agreement among different surgeons and could be popularized and widely practiced in other clinical centers.
doi:10.1186/1752-2897-7-7
PMCID: PMC3848575  PMID: 24025650
Inter-observer reliability; Tibial plateau fracture; Classification
19.  Advice from a Medical Expert through the Internet on Queries about AIDS and Hepatitis: Analysis of a Pilot Experiment 
PLoS Medicine  2006;3(7):e256.
Background
Advice from a medical expert on concerns and queries expressed anonymously through the Internet by patients and later posted on the Web, offers a new type of patient–doctor relationship. The aim of the current study was to perform a descriptive analysis of questions about AIDS and hepatitis made to an infectious disease expert and sent through the Internet to a consumer-oriented Web site in the Spanish language.
Methods and Findings
Questions were e-mailed and the questions and answers were posted anonymously in the “expert-advice” section of a Web site focused on AIDS and hepatitis. We performed a descriptive study and a temporal analysis of the questions received in the first 12 months after the launch of the site. A total of 899 questions were received from December 2003 to November 2004, with a marked linear growth pattern. Questions originated in Spain in 68% of cases and 32% came from Latin America (the Caribbean, Central America, and South America). Eighty percent of the senders were male. Most of the questions concerned HIV infection (79%) with many fewer on hepatitis (17%) . The highest numbers of questions were submitted just after the weekend (37% of questions were made on Mondays and Tuesdays). Risk factors for contracting HIV infection were the most frequent concern (69%), followed by the window period for detection (12.6%), laboratory results (5.9%), symptoms (4.7%), diagnosis (2.7%), and treatment (2.2%).
Conclusions
Our results confirm a great demand for this type of “ask-the-expert” Internet service, at least for AIDS and hepatitis. Factors such as anonymity, free access, and immediate answers have been key factors in its success.
Editors' Summary
Background.
Although substantial progress has been made in the fight against HIV/AIDS, in terms of developing new treatments and understanding factors that cause the disease to worsen, putting this knowledge into practice can be difficult. Two main barriers exist that can prevent individuals seeking information or treatment. The first is the considerable social stigma still associated with HIV; the second is the poverty of the developing countries—such as those in Latin America—where the disease has reached pandemic proportions. In addition, the disease, which used to be spread mainly through the sharing of injecting drug needles or through sex between men, has now entered the general population. When healthcare services are limited, people are often unable to seek information about HIV, and even when services do exist, the cost of accessing them can be too high. The same is true for other diseases such as hepatitis infection, which often co-exists with HIV. The Internet has the potential to go some way to filling this health information gap. And, many patients seek information on the Internet before consulting their doctor.
Why Was This Study Done?
In 2003, the Madrid-based newspaper El Mundo launched an HIV and hepatitis information resource situated in the health section of its existing Web site. One aspect of this resource was an “ask-the-expert” section, in which readers could anonymously e-mail questions about HIV and hepatitis that would be answered by an infectious disease expert. These ranged from how the diseases can be transmitted and who is most at risk, to what to do if an individual thinks they might have the disease. There seems to be a clear need for this Spanish-language service; in Latin America, 2.1 million people are infected with HIV, with 230,000 new cases in 2005. In the Caribbean, AIDS is the leading cause of death in people aged 15–44 years. In Spain, 71,000 people were infected with HIV in 2005. Although the Internet contains a vast store of health information, and many aspects of patient–doctor interactions have been made electronic, little is known about what format is ideal. The researchers, who included employees of the newspaper, decided to investigate the effectiveness of the question–answer format used by El Mundo.
What Did the Researchers Do and Find?
In the first 12 months after the service was launched, the researchers recorded several details: what day of the week questions were sent, what the questions were about, and whether they were sent by the person needing the information or by a family member or friend. They also noted demographic information, such as the age, sex, and country of origin of the person e-mailing the question.
Of 899 questions sent to the Web site between December 2003 and November 2004, most (80%) were sent by males. Most questions came from Spain, followed by Latin America, and most questions were sent on Mondays and Tuesdays. Some e-mails were from people who felt they had been waiting too long for an answer to their first e-mail—despite the mean time for answering a question being fewer than seven days. Messages of support for the Web site rose during the year from 2% to 22%.
What Do These Findings Mean?
The messages of support and encouragement sent in by users indicated that the service was well-received and useful. Most of the questions were about HIV rather than about hepatitis, which the researchers say could represent the more prominent media coverage of HIV. However, despite the disease's high profile, the questions about HIV were very basic. It could also mean that people hold a false impression that hepatitis is a less serious illness or that they have more information about it than about HIV.
Since most questions were sent in at the start of the week, the researchers believe that many individuals wrote in after engaging in potentially risky sexual behaviour over the weekend.
The researchers also found that existing information on the Web site already answered many of the new questions, indicating that people prefer a question-and-answer model over ready-prepared information. The anonymity, free access, and immediacy of the Internet-based service suggest this could be a model for providing other types of health information.
The findings also suggest that such a service can highlight the needs and concerns of specific populations and can help health planners and policymakers respond to those needs in their countries.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0030256.
• The AIDSinfo Web site from the US Department of Health and Human Services provides information on all aspects of HIV/AIDS treatment and prevention and has sections specially written for patients and the general public
• AVERT, an international AIDS charity, has a section on HIV in Latin America that includes details of transmission, infection rates, and treatment
Marco and colleagues analyzed questions sent by the public to a Spanish language "ask-the-expert" Internet site, and found that 70% of queries were about risk factors for acquiring HIV.
doi:10.1371/journal.pmed.0030256
PMCID: PMC1483911  PMID: 16796404
20.  Evaluation of two novel thoracolumbar trauma classification systems 
Indian Journal of Orthopaedics  2007;41(4):322-326.
Background:
Despite numerous attempts at classifying thoracolumbar spinal injuries, there remains no consensus on a single unifying algorithm of management. The ideal system should provide diagnostic and prognostic information, exhibit adequate reliability and validity and be easily applicable to clinical practice. The purpose of this study is to assess the reliability and validity of two novel classification systems for thoracolumbar fractures – the Thoracolumbar Injury Severity Score (TLISS) and the Thoracolumbar Injury Classification and Severity Score (TLICS) – and also to discuss potential efforts towards research in the future.
Matereials and Methods:
Seventy-one patients with thoracolumbar fractures were prospectively assessed by surgeons with different levels of training and experience (attending orthopedic surgeon, attending neurosurgeon, spine fellows, senior level and junior level residents) at a single institution. Plain radiographs, CT and MRI imaging were used to classify these injuries using the TLISS system. Seven months later, 25 consecutive injuries were prospectively assessed with the TLISS and TLICS systems. Unweighted Cohen's kappa coefficients and Spearman's correlation values were calculated to assess inter-observer reliability and validity at each point in time.
Results:
For both the TLISS and TLICS algorithms, the inter-rater kappa statistics for all of the subgroups demonstrated moderate-to-substantial reliability (0.45-0.74), although there were no significant differences among the shared subgroups. The kappa score of the TLISS system was greater than that of the TLICS system for injury mechanism/ morphology. Correlation values were also greater across all subgroups (P ≤0.01). Statistically significant improvements in TLISS inter-observer reliability were observed across all TLISS fields (P <0.05). The TLISS and TLICS schemes both demonstrated excellent validity.
Conclusion:
The TLISS and TLICS scales both exhibited substantial reliability and validity. However, the TLISS system displayed greater inter-observer correlation than did the TLICS and demonstrated significant improvements in reliability over time.
doi:10.4103/0019-5413.36995
PMCID: PMC2989501  PMID: 21139786
Injury classification; injury severity score; spinal cord injury; thoracolumbar dislocation; thoracolumbar fracture
21.  Performance of Double Reading Mammography in an Iranian Population and Its Effect on Patient Outcome 
Iranian Journal of Radiology  2013;10(2):51-55.
Background
Considering the importance and responsibility of reporting mammography and the necessity to notice details with a high degree of precision, double reading mammography has been introduced and recommended.
Objectives
This study aimed to assess the performance of double reading of mammograms and its effect on patient outcomes.
Patients and Methods
Throughout this cross sectional study, 1284 digitized mammographic views of 642 breasts which belonged to 339 women (of which 303 were bilateral and 36 were unilateral mammographies) were enrolled. Two independent radiologists interpreted these mammograms and BI-RADS categories of both reports were compared. Discordant results were determined and assumed significant if they were in the positive (BI-RADS 0, 4, 5) versus negative (BI-RADS 1, 2, 3) groups and then significant discordant cases were followed up to determine benign versus malignant final diagnosis. The recall rate was calculated for each reader. Inter-observer agreement in breast density was determined by Kappa test.
Results
Readers had consensus on BI-RADS categories in 459 breasts (71%), but diverse categories were used for 183 breasts (29%), including 132 significant and 51 non-significant discrepancies. According to weighted Kappa test, agreement between two readers in positive or negative reports was 0.78 (95% CI=0.73-0.83) and in parenchymal density, it was 0.73 (95% CI=0.7-0.77). Most of the discrepancies were between category zero versus categories 1 and 2 (63.4%). The recall rate was 36% for the first and 44% for the second reader. Among 132 significant discordant results, one case had the final diagnosis of malignancy and the others had benign or negative diagnosis. There was 0.2% increase in cancer detection rate by double reading.
Conclusion
This study shows no significant improvement in the cancer detection rate by double reading; however, a lower recall rate could be a more helpful consequence.
doi:10.5812/iranjradiol.11729
PMCID: PMC3767012  PMID: 24046778
Double Reading; Mammography; Recall Rate
22.  Selection in Reported Epidemiological Risks: An Empirical Assessment 
PLoS Medicine  2007;4(3):e79.
Background
Epidemiological studies may be subject to selective reporting, but empirical evidence thereof is limited. We empirically evaluated the extent of selection of significant results and large effect sizes in a large sample of recent articles.
Methods and Findings
We evaluated 389 articles of epidemiological studies that reported, in their respective abstracts, at least one relative risk for a continuous risk factor in contrasts based on median, tertile, quartile, or quintile categorizations. We examined the proportion and correlates of reporting statistically significant and nonsignificant results in the abstract and whether the magnitude of the relative risks presented (coined to be consistently ≥1.00) differs depending on the type of contrast used for the risk factor. In 342 articles (87.9%), ≥1 statistically significant relative risk was reported in the abstract, while only 169 articles (43.4%) reported ≥1 statistically nonsignificant relative risk in the abstract. Reporting of statistically significant results was more common with structured abstracts, and was less common in US-based studies and in cancer outcomes. Among 50 randomly selected articles in which the full text was examined, a median of nine (interquartile range 5–16) statistically significant and six (interquartile range 3–16) statistically nonsignificant relative risks were presented (p = 0.25). Paradoxically, the smallest presented relative risks were based on the contrasts of extreme quintiles; on average, the relative risk magnitude was 1.41-, 1.42-, and 1.36-fold larger in contrasts of extreme quartiles, extreme tertiles, and above-versus-below median values, respectively (p < 0.001).
Conclusions
Published epidemiological investigations almost universally highlight significant associations between risk factors and outcomes. For continuous risk factors, investigators selectively present contrasts between more extreme groups, when relative risks are inherently lower.
An evaluation of published articles reporting epidemiological studies found that they almost universally highlight significant associations between risk factors and outcomes.
Editors' Summary
Background.
Medical and scientific researchers use statistical tests to try to work out whether their observations—for example, seeing a difference in some characteristic between two groups of people—might have occurred as a result of chance alone. Statistical tests cannot determine this for sure, rather they can only give a probability that the observations would have arisen by chance. When researchers have many different hypotheses, and carry out many statistical tests on the same set of data, they run the risk of concluding that there are real differences where in fact there are none. At the same time, it has long been known that scientific and medical researchers tend to pick out the findings on which to report in their papers. Findings that are more interesting, impressive, or statistically significant are more likely to be published. This is termed “publication bias” or “selective reporting bias.” Therefore, some people are concerned that the published scientific literature might contain many false-positive findings, i.e., findings that are not true but are simply the result of chance variation in the data. This would have a serious impact on the accuracy of the published scientific literature and would tend to overestimate the strength and direction of relationships being studied.
Why Was This Study Done?
Selective reporting bias has already been studied in detail in the area of randomized trials (studies where participants are randomly allocated to receive an intervention, e.g., a new drug, versus an alternative intervention or “comparator,” in order to understand the benefits or safety of the new intervention). These studies have shown that very many of the findings of trials are never published, and that statistically significant findings are more likely to be included in published papers than nonsignificant findings. However, much medical research is carried out that does not use randomized trial methods, either because that method is not useful to answer the question at hand or is unethical. Epidemiological research is often concerned with looking at links between risk factors and the development of disease, and this type of research would generally use observation rather than experiment to uncover connections. The researchers here were concerned that selective reporting bias might be just as much of a problem in epidemiological research as in randomized trials research, and wanted to study this specifically.
What Did the Researchers Do and Find?
In this investigation, searches were carried out of PubMed, a database of biomedical research studies, to extract epidemiological studies that were published between January 2004 and October 2005. The researchers wanted to specifically look at studies reporting the effect of continuous risk factors and their effect on health or disease outcomes (a continuous risk factor is something like age or glucose concentration in the blood, is a number, and can have any value on a sliding scale). Three hundred and eighty-nine original research studies were found, and the researchers pulled out from the abstracts and full text of these papers the relative risks that were reported along with the results of statistical tests for them. (Relative risk is the chance of getting an outcome, say disease, in one group as compared to another group.) The researchers found that nearly 90% of these studies had one or more statistically significant risks reported in the abstract, but only 43% reported one or more risks that were not statistically significant. When looking at all of the findings reported anywhere in the full text for 50 of these studies, the researchers saw that papers overall reported more statistically significant risks than nonsignificant risks. Finally, it seemed that in the set of papers studied here, the way in which statistical analyses were done produced a bias towards more extreme findings: for datasets showing small relative risks, papers were more likely to report a comparison between extreme subsets of the data so as to report larger relative risks.
What Do These Findings Mean?
These findings suggest that there is a tendency among epidemiology researchers to highlight statistically significant findings and to avoid highlighting nonsignificant findings in their research papers. This behavior may be a problem, because many of these significant findings could in future turn out to be “false positives.” At present, registers exist for researchers to describe ongoing clinical trials, and to set out the outcomes that they plan to analyze for those trials. These registers will go some way towards addressing some of the problems described here, but only for clinical trials research. Registers do not yet exist for epidemiological studies, and therefore it is important that researchers and readers are aware of and cautious about the problem of selective reporting in epidemiological research.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040079.
Wikipedia entry on publication bias (note: Wikipedia is an internet encyclopedia that anyone can edit)
The International Committee of Medical Journal Editors gives guidelines for submitting manuscripts to its member journals, and includes comments about registration of ongoing studies and the obligation to publish negative studies
ClinicalTrials.gov and the ISRCTN register are two registries of ongoing clinical trials
doi:10.1371/journal.pmed.0040079
PMCID: PMC1808481  PMID: 17341129
23.  Trends in the Surgical Treatment of Pathologic Proximal Femur Fractures Among Musculoskeletal Tumor Society Members 
Background
Several strategies for the treatment of pathologic proximal femur fractures are practiced but treatment outcomes have not been rigorously compared.
Questions/purposes
Major variations in the use of intramedullary fixation, extramedullary/plate-screw fixation, and endoprosthetic reconstruction techniques for pathologic proximal femur fractures in patients with skeletal metastases are reported. The clinical and surgical variables that influence this choice differ among treating surgeons. To characterize the technique preferences and to identify areas of consensus regarding specific clinical presentations, we administered an online survey to the Musculoskeletal Tumor Society (MSTS) membership. We also tested whether responses correlated with the respondents’ years in practice and asked about the indications for wide tumor resection and the role of tumor debulking and adjuvant cementation.
Methods
A 10-minute, web-based survey was sent via email to 244 MSTS members. The survey queried participants’ musculoskeletal oncology training and experience and presented case scenarios illustrating different combinations of four variables that influence decision-making: cancer type, estimated patient survival, fracture displacement, and anatomic region of involvement.
Results
Forty-one percent (n = 98) of MSTS members completed the survey. Intramedullary nail fixation (IMN; 45%) and proximal femur resection and reconstruction (34%) were the most commonly recommended techniques followed by long-stem cemented hemiarthroplasty/cemented hemiarthroplasty (15%) and open reduction and internal fixation (7%). Most respondents (56%) recommended use of cementation with IMN. Differences of opinion on recommended treatment were associated with variations in cancer type, fracture displacement, and anatomic region of involvement.
Conclusions
Our online survey showed a trend among MSTS members for selecting IMN and arthroplasty-related techniques to treat pathologic fractures of the proximal femur, but major differences in preferred operative technique exist. Prospective studies are needed to develop consistent, evidence-based treatment recommendations.
Electronic supplementary material
The online version of this article (doi:10.1007/s11999-012-2724-6) contains supplementary material, which is available to authorized users.
doi:10.1007/s11999-012-2724-6
PMCID: PMC3706680  PMID: 23247815
24.  Evaluation of low‐cost computer monitors for the detection of cervical spine injuries in the emergency room: an observer confidence‐based study 
Emergency Medicine Journal : EMJ  2006;23(11):850-853.
Background
To compare the diagnostic value of low‐cost computer monitors and a Picture Archiving and Communication System (PACS) workstation for the evaluation of cervical spine fractures in the emergency room.
Methods
Two groups of readers blinded to the diagnoses (2 radiologists and 3 orthopaedic surgeons) independently assessed–digital radiographs of the cervical spine (anterior–posterior, oblique and trans‐oral‐dens views). The radiographs of 57 patients who arrived consecutively to the emergency room in 2004 with clinical suspicion of a cervical spine injury were evaluated. The diagnostic values of these radiographs were scored on a 3‐point scale (1 = diagnosis not possible/bad image quality, 2 = diagnosis uncertain, 3 = clear diagnosis of fracture or no fracture) on a PACS workstation and on two different liquid crystal display (LCD) personal computer monitors. The images were randomised to avoid memory effects. We used logistic mixed‐effects models to determine the possible effects of monitor type on the evaluation of x ray images. To determine the overall effects of monitor type, this variable was used as a fixed effect, and the image number and reader group (radiologist or orthopaedic surgeon) were used as random effects on display quality. Group‐specific effects were examined, with the reader group and additional fixed effects as terms. A significance level of 0.05 was established for assessing the contribution of each fixed effect to the model.
Results
Overall, the diagnostic score did not differ significantly between standard personal computer monitors and the PACS workstation (both p values were 0.78).
Conclusion
Low‐cost LCD personal computer monitors may be useful in establishing a diagnosis of cervical spine fractures in the emergency room.
doi:10.1136/emj.2006.036822
PMCID: PMC2464403  PMID: 17057136
25.  AO spine injury classification system: a revision proposal for the thoracic and lumbar spine 
European Spine Journal  2013;22(10):2184-2201.
Purpose
The AO Spine Classification Group was established to propose a revised AO spine injury classification system. This paper provides details on the rationale, methodology, and results of the initial stage of the revision process for injuries of the thoracic and lumbar (TL) spine.
Methods
In a structured, iterative process involving five experienced spine trauma surgeons from various parts of the world, consecutive cases with TL injuries were classified independently by members of the classification group, and analyzed for classification reliability using the Kappa coefficient (κ) and for accuracy using latent class analysis. The reasons for disagreements were examined systematically during review meetings. In four successive sessions, the system was revised until consensus and sufficient reproducibility were achieved.
Results
The TL spine injury system is based on three main injury categories adapted from the original Magerl AO concept: A (compression), B (tension band), and C (displacement) type injuries. Type-A injuries include four subtypes (wedge-impaction/split-pincer/incomplete burst/complete burst); B-type injuries are divided between purely osseous and osseo-ligamentous disruptions; and C-type injuries are further categorized into three subtypes (hyperextension/translation/separation). There is no subgroup division. The reliability of injury types (A, B, C) was good (κ = 0.77). The surgeons’ pairwise Kappa ranged from 0.69 to 0.90. Kappa coefficients κ for reliability of injury subtypes ranged from 0.26 to 0.78.
Conclusions
The proposed TL spine injury system is based on clinically relevant parameters. Final evaluation data showed reasonable reliability and accuracy. Further validation of the proposed revised AO Classification requires follow-up evaluation sessions and documentation by more surgeons from different countries and backgrounds and is subject to modification based on clinical parameters during subsequent phases.
doi:10.1007/s00586-013-2738-0
PMCID: PMC3804719  PMID: 23508335
Spinal injury classification; Thoracolumbar; Consensus development; Reliability; Accuracy

Results 1-25 (1374922)