PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1316435)

Clipboard (0)
None

Related Articles

1.  Training Improves Interobserver Reliability for the Diagnosis of Scaphoid Fracture Displacement 
Background
The diagnosis of displacement in scaphoid fractures is notorious for poor interobserver reliability.
Questions/purposes
We tested whether training can improve interobserver reliability and sensitivity, specificity, and accuracy for the diagnosis of scaphoid fracture displacement on radiographs and CT scans.
Methods
Sixty-four orthopaedic surgeons rated a set of radiographs and CT scans of 10 displaced and 10 nondisplaced scaphoid fractures for the presence of displacement, using a web-based rating application. Before rating, observers were randomized to a training group (34 observers) and a nontraining group (30 observers). The training group received an online training module before the rating session, and the nontraining group did not. Interobserver reliability for training and nontraining was assessed by Siegel’s multirater kappa and the Z-test was used to test for significance.
Results
There was a small, but significant difference in the interobserver reliability for displacement ratings in favor of the training group compared with the nontraining group. Ratings of radiographs and CT scans combined resulted in moderate agreement for both groups. The average sensitivity, specificity, and accuracy of diagnosing displacement of scaphoid fractures were, respectively, 83%, 85%, and 84% for the nontraining group and 87%, 86%, and 87% for the training group. Assuming a 5% prevalence of fracture displacement, the positive predictive value was 0.23 in the nontraining group and 0.25 in the training group. The negative predictive value was 0.99 in both groups.
Conclusions
Our results suggest training can improve interobserver reliability and sensitivity, specificity and accuracy for the diagnosis of scaphoid fracture displacement, but the improvements are slight. These findings are encouraging for future research regarding interobserver variation and how to reduce it further.
doi:10.1007/s11999-012-2260-4
PMCID: PMC3369105  PMID: 22290132
2.  Use of Expert Panels to Define the Reference Standard in Diagnostic Research: A Systematic Review of Published Methods and Reporting 
PLoS Medicine  2013;10(10):e1001531.
Loes C. M. Bertens and colleagues survey the published diagnostic research literature for use of expert panels to define the reference standard, characterize components and missing information, and recommend elements that should be reported in diagnostic studies.
Please see later in the article for the Editors' Summary
Background
In diagnostic studies, a single and error-free test that can be used as the reference (gold) standard often does not exist. One solution is the use of panel diagnosis, i.e., a group of experts who assess the results from multiple tests to reach a final diagnosis in each patient. Although panel diagnosis, also known as consensus or expert diagnosis, is frequently used as the reference standard, guidance on preferred methodology is lacking. The aim of this study is to provide an overview of methods used in panel diagnoses and to provide initial guidance on the use and reporting of panel diagnosis as reference standard.
Methods and Findings
PubMed was systematically searched for diagnostic studies applying a panel diagnosis as reference standard published up to May 31, 2012. We included diagnostic studies in which the final diagnosis was made by two or more persons based on results from multiple tests. General study characteristics and details of panel methodology were extracted. Eighty-one studies were included, of which most reported on psychiatry (37%) and cardiovascular (21%) diseases. Data extraction was hampered by incomplete reporting; one or more pieces of critical information about panel reference standard methodology was missing in 83% of studies. In most studies (75%), the panel consisted of three or fewer members. Panel members were blinded to the results of the index test results in 31% of studies. Reproducibility of the decision process was assessed in 17 (21%) studies. Reported details on panel constitution, information for diagnosis and methods of decision making varied considerably between studies.
Conclusions
Methods of panel diagnosis varied substantially across studies and many aspects of the procedure were either unclear or not reported. On the basis of our review, we identified areas for improvement and developed a checklist and flow chart for initial guidance for researchers conducting and reporting of studies involving panel diagnosis.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Before any disease or condition can be treated, a correct diagnosis of the condition has to be made. Faced with a patient with medical problems and no diagnosis, a doctor will ask the patient about their symptoms and medical history and generally will examine the patient. On the basis of this questioning and examination, the clinician will form an initial impression of the possible conditions the patient may have, usually with a most likely diagnosis in mind. To support or reject the most likely diagnosis and to exclude the other possible diagnoses, the clinician will then order a series of tests and diagnostic procedures. These may include laboratory tests (such as the measurement of blood sugar levels), imaging procedures (such as an MRI scan), or functional tests (such as spirometry, which tests lung function). Finally, the clinician will use all the data s/he has collected to reach a firm diagnosis and will recommend a program of treatment or observation for the patient.
Why Was This Study Done?
Researchers are continually looking for new, improved diagnostic tests and multivariable diagnostic models—combinations of tests and characteristics that point to a diagnosis. Diagnostic research, which assesses the accuracy of new tests and models, requires that each patient involved in a diagnostic study has a final correct diagnosis. Unfortunately, for most conditions, there is no single, error-free test that can be used as the reference (gold) standard for diagnosis. If an imperfect reference standard is used, errors in the final disease classification may bias the results of the diagnostic study and may lead to a new test being adopted that is actually less accurate than existing tests. One widely used solution to the lack of a reference standard is “panel diagnosis” in which two or more experts assess the results from multiple tests to reach a final diagnosis for each patient in a diagnostic study. However, there is currently no formal guidance available on the conduct and reporting of panel diagnosis. Here, the researchers undertake a systematic review (a study that uses predefined criteria to identify research on a given topic) to provide an overview of the methodology and reporting of panel diagnosis.
What Did the Researchers Do and Find?
The researchers identified 81 published diagnostic studies that used panel diagnosis as a reference standard. 37% of these studies reported on psychiatric diseases, 21% reported on cardiovascular diseases, and 12% reported on respiratory diseases. Most of the studies (64%) were designed to assess the accuracy of one or more diagnostic test. Notably, one or more critical piece of information on methodology was missing in 83% of the studies. Specifically, information on the constitution of the panel was missing in a quarter of the studies and information on the decision-making process (whether, for example, a diagnosis was reached by discussion among panel members or by combining individual panel member's assessments) was incomplete in more than two-thirds of the studies. In three-quarters of the studies for which information was available, the panel consisted of only two or three members; different fields of expertise were represented in the panels in nearly two-thirds of the studies. In a third of the studies for which information was available, panel members made their diagnoses without access to the results of the test being assessed. Finally, the reproducibility of the decision-making process was assessed in a fifth of the studies.
What Do These Findings Mean?
These findings indicate that the methodology of panel diagnosis varies substantially among diagnostic studies and that reporting of this methodology is often unclear or absent. Both the methodology and reporting of panel diagnosis could, therefore, be improved substantially. Based on their findings, the researchers provide a checklist and flow chart to help guide the conduct and reporting of studies involving panel diagnosis. For example, they suggest that, when designing a study that uses panel diagnosis as the reference standard, the number and background of panel members should be considered, and they provide a list of options that should be considered when planning the decision-making process. Although more research into each of the options identified by the researchers is needed, their recommendations provide a starting point for the development of formal guidelines on the methodology and reporting of panel diagnosis for use as a reference standard in diagnostic research.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001531.
Wikipedia has a page on medical diagnosis (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Equator Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines, including the STAndards for the Reporting of Diagnostic accuracy studies (STARD), an initiative that aims to improve the accuracy and completeness of reporting of studies of diagnostic accuracy
doi:10.1371/journal.pmed.1001531
PMCID: PMC3797139  PMID: 24143138
3.  Reliability of Magnetic Resonance Imaging Readings for Lumbar Disc Herniation in the Spine Patient Outcomes Research Trial (SPORT) 
Spine  2008;33(9):991-998.
Study Design
Assessment of the reliability of standardized magnetic resonance imaging (MRI) interpretations and measurements.
Objective
To determine the intra- and inter-reader reliability of MRI parameters relevant to patients with intervertebral disc herniation (IDH), including disc morphology classification, degree of thecal sac compromise, grading of nerve root impingement, and measurements of cross-sectional area of the spinal canal, thecal sac, and disc fragment.
Summary of Background Data
MRI is increasingly used to assess patients with sciatica and IDH, but the relationship between specific imaging characteristics and patient outcomes remains uncertain. Although other studies have evaluated the reliability of certain MRI characteristics, comprehensive evaluation of the reliability of readings of herniated disc features on MRI is lacking.
Methods
Sixty randomly selected MR images from patients with IDH enrolled in the Spine Patient Outcomes Research Trial were each rated according to defined criteria by 4 independent readers (3 radiologists and 1 orthopedic surgeon). Quantitative measurements were performed separately by 2 other radiologists. A sample of 20 MRIs was re-evaluated by each reader at least 1 month later. Agreement for rating data were assessed with kappa statistics using linear weights. Reliability of the quantitative measurements was assessed using intraclass correlation coefficients (ICCs) and summaries of measurement error.
Results
Inter-reader reliability was substantial for disc morphology [overall kappa 0.81 (95% confidence interval (CI): 0.78, 0.85)], moderate for thecal sac compression [overall kappa 0.54 (95% CI: 0.37, 0.68)], and moderate for grading nerve root impingement [overall kappa 0.47 (95% CI: 0.36, 0.56)]. Quantitative measures showed high ICCs of 0.87 to 0.96 for spinal canal and thecal sac cross-sectional areas. Measures of disc fragment area had moderate ICCs of 0.65 to 0.83. Mean absolute differences between measurements ranged from approximately 15% to 20%.
Conclusion
Classification of disc morphology showed substantial intra- and inter-reader agreement, whereas thecal sac and nerve root compression showed more moderate reader reliability. Quantitative measures of canal and thecal sac area showed good reliability, whereas measurement of disc fragment area showed more modest reliability.
doi:10.1097/BRS.0b013e31816c8379
PMCID: PMC2745940  PMID: 18427321
disc herniation; MRI; reliability study
4.  Reliability of Readings of Magnetic Resonance Imaging Features of Lumbar Spinal Stenosis 
Spine  2008;33(14):1605-1610.
Study Design
A reliability assessment of standardized magnetic resonance imaging (MRI) interpretations and measurements.
Objective
To determine the intra- and inter-reader reliability of MRI features of lumbar spinal stenosis (SPS), including severity of central, subarticular, and foraminal stenoses, grading of nerve root impingement, and measurements of cross-sectional area of the spinal canal and thecal sac.
Summary of Background Data
MRI is commonly used to assess patients with spinal stenosis. Although a number of studies have evaluated the reliability of certain MRI characteristics, comprehensive evaluation of the reliability of MRI readings in spinal stenosis is lacking.
Methods
Fifty-eight randomly selected MR images from patients with SPS enrolled in the Spine Patient Outcomes Research Trial were evaluated. Qualitative ratings of imaging features were performed according to defined criteria by 4 independent readers (3 radiologists and 1 orthopedic surgeon). A sample of 20 MRIs was reevaluated by each reader at least 1 month later. Weighted κ statistics were used to characterize intra- and inter-reader reliability for qualitative rating data. Separate quantitative measurements were performed by 2 other radiologists. Intraclass correlation coefficients and summaries of measurement error were used to characterize reliability for quantitative measurements.
Results
Intra-reader reliability was higher than interreader reliability for all features. Inter-reader reliability in assessing central stenosis was substantial, with an overall κ of 0.73 (95% CI 0.69-0.77). Foraminal stenosis and nerve root impingement showed moderate to substantial agreement with overall κ of 0.58 (95% CI 0.53-0.63) and 0.51 (95% CI 0.42-0.59), respectively. Subarticular zone stenosis yielded the poorest agreement (overall κ 0.49; 95% CI 0.42-0.55) and showed marked variability in agreement between reader pairs. Quantitative measures showed inter-reader intraclass correlation coefficients ranging from 0.58 to 0.90. The mean absolute difference between readers in measured thecal sac area was 128 mm2 (13%).
Conclusion
The imaging characteristics of spinal stenosis assessed in this study showed moderate to substantial reliability; future studies should assess whether these findings have prognostic significance in SPS patients.
doi:10.1097/BRS.0b013e3181791af3
PMCID: PMC2754786  PMID: 18552677
spinal stenosis; MRI; reliability
5.  Analysis of the inter- and intra-observer agreement in radiographic evaluation of wrist fractures using the multimedia messaging service 
Hand (New York, N.Y.)  2011;6(4):384-389.
Background
Orthopaedic surgeons are often asked to evaluate X-rays of patients admitted to the Accident and Emergency Department with the suspicion of a wrist fracture or, in the case of an evident fracture, to decide the correct treatment. The aim of this study was to evaluate the feasibility of a correct interpretation of the images of injured wrists on the screen of a last generation mobile phone, in order to evaluate if the specialist could make the right diagnosis and choose the correct treatment.
Methods
Five orthopaedic and one hand surgeons have evaluate the X-rays of 67 patients who sustained an injury to their wrist. In the case of fracture, they were asked to classify it according to the AO and Mayo classification systems. The evaluation of the images was accomplished through the PACS and using a mobile phone, at a different time. In order to check the inter- and intra-observer reliability, the same pattern was followed after a few months.
Results
The mobile phone showed basically the same agreement between the observers highlighting the worsening of the inter- and intra-observer reliability with the increment of the variables considered by a classification system.
Conclusions
The present paper confirms that a last generation mobile phone can already be used in the clinical practise of orthopaedic surgeons on call who could use it as a useful device in remote or poorly served areas for a rapid and economic consultation
Level of Evidence
The level of evidence of this case is economic and decision analysis, level 2
doi:10.1007/s11552-011-9362-4
PMCID: PMC3213258  PMID: 23204964
Wrist fracture; Telemedicine; Inter-observer agreement; Intra-observer agreement
6.  Radiographic union score for hip substantially improves agreement between surgeons and radiologists 
Background
Despite the prominence of hip fractures in orthopedic trauma, the assessment of fracture healing using radiographs remains subjective. The variability in the assessment of fracture healing has important implications for both clinical research and patient care. With little existing literature regarding reliable consensus on hip fracture healing, this study was conducted to determine inter-rater reliability between orthopedic surgeons and radiologists on healing assessments using sequential radiographs in patients with hip fractures. Secondary objectives included evaluating a checklist designed to assess hip fracture healing and determining whether agreement improved when reviewers were aware of the timing of the x-rays in relation to the patients’ surgery.
Methods
A panel of six reviewers (three orthopedic surgeons and three radiologists) independently assessed fracture healing using sequential radiographs from 100 patients with femoral neck fractures and 100 patients with intertrochanteric fractures. During their independent review they also completed a previously developed radiographic checklist (Radiographic Union Score for Hip (RUSH)). Inter and intra-rater reliability scores were calculated. Data from the current study was compared to the findings from a previously conducted study where the same reviewers, unaware of the timing of the x-rays, completed the RUSH score.
Results
The agreement between surgeons and radiologists for fracture healing was moderate for “general impression of fracture healing” in both femoral neck (ICC = 0.60, 95% CI: 0.42-0.71) and intertrochanteric fractures (0.50, 95% CI: 0.33-0.62). Using a standardized checklist (RUSH), agreement was almost perfect in both femoral neck (ICC = 0.85, 95% CI: 0.82-0.87) and intertrochanteric fractures (0.88, 95% CI: 0.86-0.90). We also found a high degree of correlation between healing and the total RUSH score using a Receiver Operating Characteristic (ROC) analysis, there was an area under the curve of 0.993 for femoral neck cases and 0.989 for intertrochanteric cases. Agreement within the radiologist group and within the surgeon group did not significantly differ in our analyses. In all cases, radiographs in which the time from surgery was known resulted in higher agreement scores compared to those from the previous study in which reviewers were unaware of the time the radiograph was obtained.
Conclusions
Agreement in hip fracture radiographic healing may be improved with the use of a standardized checklist and appears highly influenced by the timing of the radiograph. These findings should be considered when evaluating patient outcomes and in clinical studies involving patients with hip fractures. Future research initiatives are required to further evaluate the RUSH checklist.
doi:10.1186/1471-2474-14-70
PMCID: PMC3599458  PMID: 23442540
Hip fractures; Reliability; Fracture healing; Radiographs
7.  Intra-and inter-reader reliability of semi-automated quantitative morphometry measurements and vertebral fracture assessment using lateral scout views from computed tomography 
Summary
Intra-and inter-reader reliability of semi-automated quantitative vertebral morphometry measurements was determined using lateral computed tomography (CT) scout views. The method requires less time than conventional morphometry. Reliability was excellent for vertebral height measurements, good for height ratios, and comparable to semi-quantitative grading by radiologists for identification of vertebral fractures.
Introduction
Underdiagnosis and undertreatment of vertebral fracture (VFx) is a well-known problem worldwide. Thus, new methods are needed to facilitate identification of VFx. This study aimed to determine intra- and inter-reader reliability of semi-automated quantitative vertebral morphometry based on shape-based statistical modeling (SpineAnalyzer, Optasia Medical, Cheadle, UK).
Methods
Two non-radiologists independently assessed vertebral morphometry from CT lateral scout views at two time points in 96 subjects (50 men, 46 women, 70.3±8.9 years) selected from the Framingham Heart Study Offspring and Third Generation Multi-Detector CT Study. VFxs were classified based solely on morphometry measurements using Genant’s criteria. Intraclass correlation coefficients (ICCs), root mean squared coefficient of variation (RMS CV) and kappa (k) statistics were used to assess reliability.
Results
We analyzed 1,246 vertebrae in 96 subjects. The analysis time averaged 5.4±1.7 min per subject (range, 3.2–9.1 min). Intra-and inter-reader ICCs for vertebral heights were excellent (>0.95) for all vertebral levels combined. Intra- and inter-reader RMS CV for height measurements ranged from 2.5% to 3.9% and 3.3% to 4.4%, respectively. Reliability of vertebral height ratios was good to fair. Based on morphometry measurements alone, readers A and B identified 51–52 and 46–59 subjects with at least one prevalent VFx, respectively, and there was a good intra-and inter-reader agreement (k=0.59–0.69) for VFx identification.
Conclusions
Semi-automated quantitative vertebral morphometry measurements from CT lateral scout views are convenient and reproducible, and may facilitate assessment of VFx.
doi:10.1007/s00198-011-1530-4
PMCID: PMC3650637  PMID: 21271340
Computed tomography; Reliability; Semi-automated vertebral morphometry; Vertebral fracture
8.  Promotional Tone in Reviews of Menopausal Hormone Therapy After the Women's Health Initiative: An Analysis of Published Articles 
PLoS Medicine  2011;8(3):e1000425.
Adriane Fugh-Berman and colleagues analyzed a selection of published opinion pieces on hormone therapy and show that there may be a connection between receiving industry funding for speaking, consulting, or research and the tone of such opinion pieces.
Background
Even after the Women's Health Initiative (WHI) found that the risks of menopausal hormone therapy (hormone therapy) outweighed benefit for asymptomatic women, about half of gynecologists in the United States continued to believe that hormones benefited women's health. The pharmaceutical industry has supported publication of articles in medical journals for marketing purposes. It is unknown whether author relationships with industry affect promotional tone in articles on hormone therapy. The goal of this study was to determine whether promotional tone could be identified in narrative review articles regarding menopausal hormone therapy and whether articles identified as promotional were more likely to have been authored by those with conflicts of interest with manufacturers of menopausal hormone therapy.
Methods and Findings
We analyzed tone in opinion pieces on hormone therapy published in the four years after the estrogen-progestin arm of the WHI was stopped. First, we identified the ten authors with four or more MEDLINE-indexed reviews, editorials, comments, or letters on hormone replacement therapy or menopausal hormone therapy published between July 2002 and June 2006. Next, we conducted an additional search using the names of these authors to identify other relevant articles. Finally, after author names and affiliations were removed, 50 articles were evaluated by three readers for scientific accuracy and for tone. Scientific accuracy was assessed based on whether or not the findings of the WHI were accurately reported using two criteria: (1) Acknowledgment or lack of denial of the risk of breast cancer diagnosis associated with hormone therapy, and (2) acknowledgment that hormone therapy did not benefit cardiovascular disease endpoints. Determination of promotional tone was based on the assessment by each reader of whether the article appeared to promote hormone therapy. Analysis of inter-rater consistency found moderate agreement for scientific accuracy (κ = 0.57) and substantial agreement for promotional tone (κ = 0.65). After discussion, readers found 86% of the articles to be scientifically accurate and 64% to be promotional in tone. Themes that were common in articles considered promotional included attacks on the methodology of the WHI, arguments that clinical trial results should not guide treatment for individuals, and arguments that observational studies are as good as or better than randomized clinical trials for guiding clinical decisions. The promotional articles we identified also implied that the risks associated with hormone therapy have been exaggerated and that the benefits of hormone therapy have been or will be proven. Of the ten authors studied, eight were found to have declared payment for speaking or consulting on behalf of menopausal hormone manufacturers or for research support (seven of these eight were speakers or consultants). Thirty of 32 articles (90%) evaluated as promoting hormone therapy were authored by those with potential financial conflicts of interest, compared to 11 of 18 articles (61%) by those without such conflicts (p = 0.0025). Articles promoting the use of menopausal hormone therapy were 2.41 times (95% confidence interval 1.49–4.93) as likely to have been authored by authors with conflicts of interest as by authors without conflicts of interest. In articles from three authors with conflicts of interest some of the same text was repeated word-for-word in different articles.
Conclusion
There may be a connection between receiving industry funding for speaking, consulting, or research and the publication of promotional opinion pieces on menopausal hormone therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Over the past three decades, menopausal hormones have been heavily promoted for preventing disease in women. However, the Women's Health Initiative (WHI) study—which enrolled more than 26,000 women in the US and which was published in 2004—found that estrogen-progestin and estrogen-only formulations (often prescribed to women around the age of menopause) increased the risk of stroke, deep vein thrombosis, dementia, and incontinence. Furthermore, this study found that the estrogen-progestin therapy increased rates of breast cancer. In fact, the estrogen-progestin arm of the WHI study was stopped in 2002 due to harmful findings, and the estrogen-only arm was stopped in 2004, also because of harmful findings. In addition, the study also found that neither therapy reduced cardiovascular risk or markedly benefited health-related quality of life measures.
Despite these results, two years after the results of WHI study were published, a survey of over 700 practicing gynecologists—the specialists who prescribe the majority of menopausal hormone therapies—in the US found that almost half did not find the findings of the WHI study convincing and that 48% disagreed with the decision to stop the trial early. Furthermore, follow-up surveys found similar results.
Why Was This Study Done?
It is unclear why gynecologists and other physicians continue to prescribe menopausal hormone therapies despite the results of the WHI. Some academics argue that published industry-funded reviews and commentaries may be designed to convey specific, but subtle, marketing messages and several academic analyses have used internal industry documents disclosed in litigation cases. So this study was conducted to investigate whether hormone therapy–promoting tone could be identified in narrative review articles and if so, whether these articles were more likely to have been authored by people who had accepted funding from hormone manufacturers.
What Did the Researchers Do and Find?
The researchers conducted a comprehensive literature search that identified 340 relevant articles published between July 2002 and June 2006—the four years following the cessation of the estrogen-progestin arm of the women's health initiative study. Ten authors had published four to six articles, 47 authored two or three articles, and 371 authored one article each. The researchers focused on authors who had published four or more articles in the four-year period under study and, after author names and affiliations were removed, 50 articles were evaluated by three readers for scientific accuracy and for tone. After individually analyzing a batch of articles, the readers met to provide their initial assessments, to discuss them, and to reach consensus on tone and scientific accuracy. Then after the papers were evaluated, each author was identified and the researchers searched for authors' potential financial conflicts of interest, defined as publicly disclosed information that the authors had received payment for research, speaking, or consulting on behalf of a manufacturer of menopausal hormone therapy.
Common themes in the 50 articles included arguments that clinical trial results should not guide treatment for individuals and suggestions that the risks associated with hormone therapy have been exaggerated and that the benefits of hormone therapy have been or will be proven. Furthermore, of the ten authors studied, eight were found to have received payment for research, speaking or consulting on behalf of menopause hormone manufacturers, and 30 of 32 articles evaluated as promoting hormone therapy were authored by those with potential financial conflicts of interest. Articles promoting the use of menopausal hormone therapy were more than twice as likely to have been written by authors with conflicts of interest as by authors without conflicts of interest. Furthermore, Three authors who were identified as having financial conflicts of interest were authors on articles where sections of their previously published articles were repeated word-for-word without citation.
What Do These Findings Mean?
The findings of this study suggest that there may be a link between receiving industry funding for speaking, consulting, or research and the publication of apparently promotional opinion pieces on menopausal hormone therapy. Furthermore, such publications may encourage physicians to continue prescribing these therapies to women of menopausal age. Therefore, physicians and other health care providers should interpret the content of review articles with caution. In addition, medical journals should follow the International Committee of Medical Journal Editors Uniform Requirements for Manuscripts, which require that all authors submit signed statements of their participation in authorship and full disclosure of any conflicts of interest.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000425.
The US National Heart, Lung, and Blood Institute has more information on the Womens Health Initiative
The US National Institutes of Health provide more information about the effects of menopausal hormone replacement therapy
The Office of Women's Health, U.S. Department of Health and Human Services provides information on menopausal hormone therapy
The International Committee of Medical Journal Editors Uniform Requirements for Manuscripts presents Uniform Requirements for Manuscripts published in biomedical journals
The National Womens Health Network, a consumer advocacy group that takes no industry money, has factsheets and articles about menopausal hormone therapy
PharmedOut, a Georgetown University Medical Center project, has many resources on pharmaceutical marketing practices
doi:10.1371/journal.pmed.1000425
PMCID: PMC3058057  PMID: 21423581
9.  Diagnostic accuracy of cone beam computed tomography and conventional multislice spiral tomography in sheep mandibular condyle fractures 
Dentomaxillofacial Radiology  2010;39(6):336-342.
Objectives
The aim of this study was to compare diagnostic accuracy of cone beam CT (CBCT) and multislice CT in artificially created fractures of the sheep mandibular condyle.
Methods
63 full-thickness sheep heads were used in this study. Two surgeons created the fractures, which were either displaced or non-displaced. CBCT images were acquired by the NewTom 3G® CBCT scanner (NIM, Verona, Italy) and CT imaging was performed using the Toshiba Aquillon® multislice CT scanner (Toshiba Medical Systems, Otawara, Japan). Two-dimensional (2D) cross-sectional images and three-dimensional (3D) reconstructions were evaluated by two observers who were asked to determine the presence or absence of fracture and displacement, the type of fracture, anatomical localization and type of displacement. The naked-eye inspection during surgery served as the gold standard. Inter- and intra-observer agreements were calculated with weighted kappa statistics. The receiver operating characteristics (ROC) curve analyses were used to compare statistically the area under the curve (AUC) of both imaging modalities.
Results
Kappa coefficients of intra- and interobserver agreement scores varied between 0.56 – 0.98, which were classified as moderate and excellent, respectively. There was no statistically significant difference between the imaging modalities, which were both sensitive and specific for the diagnosis of sheep condylar fractures.
Conclusions
This study confirms that CBCT is similar to CT in the diagnosis of different types of experimentally created sheep condylar fractures and can provide a cost- and dose-effective diagnostic option.
doi:10.1259/dmfr/29930707
PMCID: PMC3520235  PMID: 20729182
cone beam computed tomography; multislice computed tomography; condyle; fracture; sheep
10.  Reliability of vertebral fracture assessment using multidetector CT lateral scout views: the Framingham Osteoporosis Study 
Summary
Two radiologists evaluated images of the spine from computed tomography (CT) scans on two occasions to diagnose vertebral fracture in 100 individuals. Agreement was fair to good for mild fractures, and agreement was good to excellent for more severe fractures. CT scout views are useful to assess vertebral fracture.
Introduction
We investigated inter-reader agreement between two radiologists and intra-reader agreement between duplicate readings for each radiologist, in assessment of vertebral fracture using a semi-quantitative method from lateral scout views obtained by CT.
Methods
Participants included 50 women and 50 men (age 50-87 years, mean 70 years) in the Framingham Study. T4-L4 vertebrae were assessed independently by two radiologists on two occasions using a semi-quantitative scale as normal, mild, moderate, or severe fracture.
Results
Vertebra-specific prevalence of grade ≥1 (mild) fracture ranged from 3% to 5%. We found fair (κ=56-59%) inter-reader agreement for grade ≥1 vertebral fractures and good (κ=68-72%) inter-reader agreement for grade ≥2 fractures. Intra-reader agreement for grade ≥1 vertebral fracture was fair (κ=55%) for one reader and excellent for another reader (κ=77%), whereas intra-reader agreement for grade ≥2 vertebral fracture was excellent for both readers (κ=76% and 98%). Thoracic vertebrae were more difficult to evaluate than the lumbar region, and agreement was lowest (inter-reader κ=43%) for fracture at the upper (T4-T9) thoracic levels and highest (inter-reader κ=76-78%) for the lumbar spine (L1-L4).
Conclusions
Based on a semi-quantitative method to classify vertebral fractures using CT scout views, agreement within and between readers was fair to good, with the greatest source of variation occurring for fractures of mild severity and for the upper thoracic region. Agreement was good to excellent for fractures of at least moderate severity. Lateral CT scout views can be useful in clinical research settings to assess vertebral fracture.
doi:10.1007/s00198-010-1290-6
PMCID: PMC2964444  PMID: 20495902
Computed tomography; Lateral scout; Reliability; Scoutviews; Semiquantitative; Vertebral fracture
11.  World's first telepathology experiments employing WINDS ultra-high-speed internet satellite, nicknamed “KIZUNA” 
Background:
Recent advances in information technology have allowed the development of a telepathology system involving high-speed transfer of high-volume histological figures via fiber optic landlines. However, at present there are geographical limits to landlines. The Japan Aerospace Exploration Agency (JAXA) has developed the “Kizuna” ultra-high speed internet satellite and has pursued its various applications. In this study we experimented with telepathology in collaboration with JAXA using Kizuna. To measure the functionality of the Wideband InterNet working engineering test and Demonstration Satellite (WINDS) ultra-high speed internet satellite in remote pathological diagnosis and consultation, we examined the adequate data transfer speed and stability to conduct telepathology (both diagnosis and conferencing) with functionality, and ease similar or equal to telepathology using fiber-optic landlines.
Materials and Methods:
We performed experiments for 2 years. In year 1, we tested the usability of the WINDS for telepathology with real-time video and virtual slide systems. These are state-of-the-art technologies requiring massive volumes of data transfer. In year 2, we tested the usability of the WINDS for three-way teleconferencing with virtual slides. Facilities in Iwate (northern Japan), Tokyo, and Okinawa were connected via the WINDS and voice conferenced while remotely examining and manipulating virtual slides.
Results:
Network function parameters measured using ping and Iperf were within acceptable limits. However; stage movement, zoom, and conversation suffered a lag of approximately 0.8 s when using real-time video, and a delay of 60-90 s was experienced when accessing the first virtual slide in a session. No significant lag or inconvenience was experienced during diagnosis and conferencing, and the results were satisfactory. Our hypothesis was confirmed for both remote diagnosis using real-time video and virtual slide systems, and also for teleconferencing using virtual slide systems with voice functionality.
Conclusions:
Our results demonstrate the feasibility of ultra-high-speed internet satellite networks for use in telepathology. Because communications satellites have less geographical and infrastructural requirements than landlines, ultra-high-speed internet satellite telepathology represents a major step toward alleviating regional disparity in the quality of medical care.
doi:10.4103/2153-3539.119002
PMCID: PMC3815045  PMID: 24244882
KIZUNA (絆); optical fiber; real-time video system; telepathology; ultra-high-speed internet satellite; virtual slide system
12.  Advice from a Medical Expert through the Internet on Queries about AIDS and Hepatitis: Analysis of a Pilot Experiment 
PLoS Medicine  2006;3(7):e256.
Background
Advice from a medical expert on concerns and queries expressed anonymously through the Internet by patients and later posted on the Web, offers a new type of patient–doctor relationship. The aim of the current study was to perform a descriptive analysis of questions about AIDS and hepatitis made to an infectious disease expert and sent through the Internet to a consumer-oriented Web site in the Spanish language.
Methods and Findings
Questions were e-mailed and the questions and answers were posted anonymously in the “expert-advice” section of a Web site focused on AIDS and hepatitis. We performed a descriptive study and a temporal analysis of the questions received in the first 12 months after the launch of the site. A total of 899 questions were received from December 2003 to November 2004, with a marked linear growth pattern. Questions originated in Spain in 68% of cases and 32% came from Latin America (the Caribbean, Central America, and South America). Eighty percent of the senders were male. Most of the questions concerned HIV infection (79%) with many fewer on hepatitis (17%) . The highest numbers of questions were submitted just after the weekend (37% of questions were made on Mondays and Tuesdays). Risk factors for contracting HIV infection were the most frequent concern (69%), followed by the window period for detection (12.6%), laboratory results (5.9%), symptoms (4.7%), diagnosis (2.7%), and treatment (2.2%).
Conclusions
Our results confirm a great demand for this type of “ask-the-expert” Internet service, at least for AIDS and hepatitis. Factors such as anonymity, free access, and immediate answers have been key factors in its success.
Editors' Summary
Background.
Although substantial progress has been made in the fight against HIV/AIDS, in terms of developing new treatments and understanding factors that cause the disease to worsen, putting this knowledge into practice can be difficult. Two main barriers exist that can prevent individuals seeking information or treatment. The first is the considerable social stigma still associated with HIV; the second is the poverty of the developing countries—such as those in Latin America—where the disease has reached pandemic proportions. In addition, the disease, which used to be spread mainly through the sharing of injecting drug needles or through sex between men, has now entered the general population. When healthcare services are limited, people are often unable to seek information about HIV, and even when services do exist, the cost of accessing them can be too high. The same is true for other diseases such as hepatitis infection, which often co-exists with HIV. The Internet has the potential to go some way to filling this health information gap. And, many patients seek information on the Internet before consulting their doctor.
Why Was This Study Done?
In 2003, the Madrid-based newspaper El Mundo launched an HIV and hepatitis information resource situated in the health section of its existing Web site. One aspect of this resource was an “ask-the-expert” section, in which readers could anonymously e-mail questions about HIV and hepatitis that would be answered by an infectious disease expert. These ranged from how the diseases can be transmitted and who is most at risk, to what to do if an individual thinks they might have the disease. There seems to be a clear need for this Spanish-language service; in Latin America, 2.1 million people are infected with HIV, with 230,000 new cases in 2005. In the Caribbean, AIDS is the leading cause of death in people aged 15–44 years. In Spain, 71,000 people were infected with HIV in 2005. Although the Internet contains a vast store of health information, and many aspects of patient–doctor interactions have been made electronic, little is known about what format is ideal. The researchers, who included employees of the newspaper, decided to investigate the effectiveness of the question–answer format used by El Mundo.
What Did the Researchers Do and Find?
In the first 12 months after the service was launched, the researchers recorded several details: what day of the week questions were sent, what the questions were about, and whether they were sent by the person needing the information or by a family member or friend. They also noted demographic information, such as the age, sex, and country of origin of the person e-mailing the question.
Of 899 questions sent to the Web site between December 2003 and November 2004, most (80%) were sent by males. Most questions came from Spain, followed by Latin America, and most questions were sent on Mondays and Tuesdays. Some e-mails were from people who felt they had been waiting too long for an answer to their first e-mail—despite the mean time for answering a question being fewer than seven days. Messages of support for the Web site rose during the year from 2% to 22%.
What Do These Findings Mean?
The messages of support and encouragement sent in by users indicated that the service was well-received and useful. Most of the questions were about HIV rather than about hepatitis, which the researchers say could represent the more prominent media coverage of HIV. However, despite the disease's high profile, the questions about HIV were very basic. It could also mean that people hold a false impression that hepatitis is a less serious illness or that they have more information about it than about HIV.
Since most questions were sent in at the start of the week, the researchers believe that many individuals wrote in after engaging in potentially risky sexual behaviour over the weekend.
The researchers also found that existing information on the Web site already answered many of the new questions, indicating that people prefer a question-and-answer model over ready-prepared information. The anonymity, free access, and immediacy of the Internet-based service suggest this could be a model for providing other types of health information.
The findings also suggest that such a service can highlight the needs and concerns of specific populations and can help health planners and policymakers respond to those needs in their countries.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0030256.
• The AIDSinfo Web site from the US Department of Health and Human Services provides information on all aspects of HIV/AIDS treatment and prevention and has sections specially written for patients and the general public
• AVERT, an international AIDS charity, has a section on HIV in Latin America that includes details of transmission, infection rates, and treatment
Marco and colleagues analyzed questions sent by the public to a Spanish language "ask-the-expert" Internet site, and found that 70% of queries were about risk factors for acquiring HIV.
doi:10.1371/journal.pmed.0030256
PMCID: PMC1483911  PMID: 16796404
13.  Classification and treatment of proximal humerus fractures: inter-observer reliability and agreement across imaging modalities and experience 
Summary
Proximal humerus fractures (PHF) are common injuries, but previous studies have documented poor inter-observer reliability in fracture classification. This disparity has been attributed to multiple variables including poor imaging studies and inadequate surgeon experience. The purpose of this study is to evaluate whether inter-observer agreement can be improved with the application of multiple imaging modalities including X-ray, CT, and 3D CT reconstructions, stratified by physician experience, for both classification and treatment of PHFs.
Methods
Inter-observer agreement was measured for classification and treatment of PHFs. A total of sixteen fractures were imaged by plain X-ray (scapular AP and lateral), CT scan, and 3D CT reconstruction, yielding 48 randomized image sets. The observers consisted of 16 orthopaedic surgeons (4 upper extremity specialists, 4 general orthopedists, 4 senior residents, 4 junior residents), who were asked to classify each image set using the Neer system, and recommend treatment from four pre-selected choices. The results were evaluated by kappa reliability coefficients for inter-observer agreement between all imaging modalities and sub-divided by: fracture type and observer experience.
Results
All kappa values ranged from "slight" to "moderate" (k = .03 to .57) agreement. For overall classification and treatment, no advanced imaging modality had significantly higher scores than X-ray. However, when sub-divided by experience, 3D reconstruction and CT scan both had significantly higher agreement on classification than X-ray, among upper extremity specialists. Agreement on treatment among upper extremity specialists was best with CT scan. No other experience sub-division had significantly different kappa scores. When sub-divided by fracture type, CT scan and 3D reconstruction had higher scores than X-ray for classification only in 4-part fractures. Agreement on treatment of 4 part fractures was best with CT scan. No other fracture type sub-division had significantly different kappa scores.
Conclusions
Although 3D reconstruction showed a slight improvement in the inter-observer agreement for fracture classification among specialized upper extremity surgeons compared to all imaging modalities, fracture types, and surgeon experience; overall all imaging modalities continue to yield low inter-observer agreement for both classification and treatment regardless of physician experience.
doi:10.1186/1749-799X-6-38
PMCID: PMC3162565  PMID: 21801370
14.  Prediction accuracy of a sample-size estimation method for ROC studies 
Academic radiology  2010;17(5):628-638.
Rationale and Objectives
Sample-size estimation is an important consideration when planning a receiver operating characteristic (ROC) study. The aim of this work was to assess the prediction accuracy of a sample-size estimation method using the Monte Carlo simulation method.
Materials and Methods
Two ROC ratings simulators characterized by low reader and high case variabilities (LH) and high reader and low case variabilities (HL) were used to generate pilot data sets in 2 modalities. Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the ratings yielded estimates of the modality-reader, modality-case and error variances. These were input to the Hillis-Berbaum (HB) sample-size estimation method, which predicted the number of cases needed to achieve 80% power for 10 readers and an effect size of 0.06 in the pivotal study. Predictions that generalized to readers and cases (random-all), to cases only (random-cases) and to readers only (random-readers) were generated. A prediction-accuracy index defined as the probability that any single prediction yields true power in the range 75% to 90% was used to assess the HB method.
Results
For random-case generalization the HB-method prediction-accuracy was reasonable, ~ 50% for 5 readers in the pilot study. Prediction-accuracy was generally higher under low reader variability conditions (LH) than under high reader variability conditions (HL). Under ideal conditions (many readers in the pilot study) the DBM-MRMC based HB method overestimated the number of cases. The overestimates could be explained by the observed large variability of the DBM-MRMC modality-reader variance estimates, particularly when reader variability was large (HL). The largest benefit of increasing the number of readers in the pilot study was realized for LH, where 15 readers were enough to yield prediction accuracy > 50% under all generalization conditions, but the benefit was lesser for HL where prediction accuracy was ~ 36% for 15 readers under random-all and random-reader conditions.
Conclusion
The HB method tends to overestimate the number of cases. Random-case generalization had reasonable prediction accuracy. Provided about 15 readers were used in the pilot study the method performed reasonably under all conditions for LH. When reader variability was large, the prediction-accuracy for random-all and random-reader generalizations was compromised. Study designers may wish to compare the HB predictions to those of other methods and to sample-sizes used in previous similar studies.
doi:10.1016/j.acra.2010.01.007
PMCID: PMC2867097  PMID: 20380980
ROC; sample-size; methodology assessment; statistical power; DBM; MRMC; simulation; Monte Carlo
15.  Evaluation of low‐cost computer monitors for the detection of cervical spine injuries in the emergency room: an observer confidence‐based study 
Emergency Medicine Journal : EMJ  2006;23(11):850-853.
Background
To compare the diagnostic value of low‐cost computer monitors and a Picture Archiving and Communication System (PACS) workstation for the evaluation of cervical spine fractures in the emergency room.
Methods
Two groups of readers blinded to the diagnoses (2 radiologists and 3 orthopaedic surgeons) independently assessed–digital radiographs of the cervical spine (anterior–posterior, oblique and trans‐oral‐dens views). The radiographs of 57 patients who arrived consecutively to the emergency room in 2004 with clinical suspicion of a cervical spine injury were evaluated. The diagnostic values of these radiographs were scored on a 3‐point scale (1 = diagnosis not possible/bad image quality, 2 = diagnosis uncertain, 3 = clear diagnosis of fracture or no fracture) on a PACS workstation and on two different liquid crystal display (LCD) personal computer monitors. The images were randomised to avoid memory effects. We used logistic mixed‐effects models to determine the possible effects of monitor type on the evaluation of x ray images. To determine the overall effects of monitor type, this variable was used as a fixed effect, and the image number and reader group (radiologist or orthopaedic surgeon) were used as random effects on display quality. Group‐specific effects were examined, with the reader group and additional fixed effects as terms. A significance level of 0.05 was established for assessing the contribution of each fixed effect to the model.
Results
Overall, the diagnostic score did not differ significantly between standard personal computer monitors and the PACS workstation (both p values were 0.78).
Conclusion
Low‐cost LCD personal computer monitors may be useful in establishing a diagnosis of cervical spine fractures in the emergency room.
doi:10.1136/emj.2006.036822
PMCID: PMC2464403  PMID: 17057136
16.  Trends in the Surgical Treatment of Pathologic Proximal Femur Fractures Among Musculoskeletal Tumor Society Members 
Background
Several strategies for the treatment of pathologic proximal femur fractures are practiced but treatment outcomes have not been rigorously compared.
Questions/purposes
Major variations in the use of intramedullary fixation, extramedullary/plate-screw fixation, and endoprosthetic reconstruction techniques for pathologic proximal femur fractures in patients with skeletal metastases are reported. The clinical and surgical variables that influence this choice differ among treating surgeons. To characterize the technique preferences and to identify areas of consensus regarding specific clinical presentations, we administered an online survey to the Musculoskeletal Tumor Society (MSTS) membership. We also tested whether responses correlated with the respondents’ years in practice and asked about the indications for wide tumor resection and the role of tumor debulking and adjuvant cementation.
Methods
A 10-minute, web-based survey was sent via email to 244 MSTS members. The survey queried participants’ musculoskeletal oncology training and experience and presented case scenarios illustrating different combinations of four variables that influence decision-making: cancer type, estimated patient survival, fracture displacement, and anatomic region of involvement.
Results
Forty-one percent (n = 98) of MSTS members completed the survey. Intramedullary nail fixation (IMN; 45%) and proximal femur resection and reconstruction (34%) were the most commonly recommended techniques followed by long-stem cemented hemiarthroplasty/cemented hemiarthroplasty (15%) and open reduction and internal fixation (7%). Most respondents (56%) recommended use of cementation with IMN. Differences of opinion on recommended treatment were associated with variations in cancer type, fracture displacement, and anatomic region of involvement.
Conclusions
Our online survey showed a trend among MSTS members for selecting IMN and arthroplasty-related techniques to treat pathologic fractures of the proximal femur, but major differences in preferred operative technique exist. Prospective studies are needed to develop consistent, evidence-based treatment recommendations.
Electronic supplementary material
The online version of this article (doi:10.1007/s11999-012-2724-6) contains supplementary material, which is available to authorized users.
doi:10.1007/s11999-012-2724-6
PMCID: PMC3706680  PMID: 23247815
17.  Episodic memory retrieval for story characters in high-functioning autism 
Molecular Autism  2013;4:20.
Background
The objective of this study was to examine differences in episodic memory retrieval between individuals with autism spectrum disorder (ASD) and typically developing (TD) individuals. Previous studies have shown that personality similarities between readers and characters facilitated reading comprehension. Highly extraverted participants read stories featuring extraverted protagonists more easily and judged the outcomes of such stories more rapidly than did less extraverted participants. Similarly, highly neurotic participants judged the outcomes of stories with neurotic protagonists more rapidly than did participants with low levels of neuroticism. However, the impact of the similarity effect on memory retrieval remains unclear. This study tested our ‘similarity hypothesis’, namely that memory retrieval is enhanced when readers with ASD and TD readers read stories featuring protagonists with ASD and with characteristics associated with TD individuals, respectively.
Methods
Eighteen Japanese individuals (one female) with high-functioning ASD (aged 17 to 40 years) and 17 age- and intelligence quotient (IQ)-matched Japanese (one female) TD participants (aged 22 to 40 years) read 24 stories; 12 stories featured protagonists with ASD characteristics, and the other 12 featured TD protagonists. Participants read a single sentence at a time and pressed a spacebar to advance to the next sentence. After reading all 24 stories, they were asked to complete a recognition task about the target sentence in each story.
Results
To investigate episodic memory in ASD, we analyzed encoding based on the reading times for and readability of the stories and retrieval processes based on the accuracy of and response times for sentence recognition. Although the results showed no differences between ASD and TD groups in encoding processes, they did reveal inter-group differences in memory retrieval. Although individuals with ASD demonstrated the same level of accuracy as did TD individuals, their patterns of memory retrieval differed with respect to response times.
Conclusions
Individuals with ASD more effectively retrieved ASD-congruent than ASD-incongruent sentences, and TD individuals retrieved stories with TD more effectively than stories with ASD protagonists. Thus, similarity between reader and story character had different effects on memory retrieval in the ASD and TD groups.
doi:10.1186/2040-2392-4-20
PMCID: PMC3695882  PMID: 23800273
High-functioning autism; Narrative comprehension; Recognition; Memory retrieval; Similarity
18.  AO spine injury classification system: a revision proposal for the thoracic and lumbar spine 
European Spine Journal  2013;22(10):2184-2201.
Purpose
The AO Spine Classification Group was established to propose a revised AO spine injury classification system. This paper provides details on the rationale, methodology, and results of the initial stage of the revision process for injuries of the thoracic and lumbar (TL) spine.
Methods
In a structured, iterative process involving five experienced spine trauma surgeons from various parts of the world, consecutive cases with TL injuries were classified independently by members of the classification group, and analyzed for classification reliability using the Kappa coefficient (κ) and for accuracy using latent class analysis. The reasons for disagreements were examined systematically during review meetings. In four successive sessions, the system was revised until consensus and sufficient reproducibility were achieved.
Results
The TL spine injury system is based on three main injury categories adapted from the original Magerl AO concept: A (compression), B (tension band), and C (displacement) type injuries. Type-A injuries include four subtypes (wedge-impaction/split-pincer/incomplete burst/complete burst); B-type injuries are divided between purely osseous and osseo-ligamentous disruptions; and C-type injuries are further categorized into three subtypes (hyperextension/translation/separation). There is no subgroup division. The reliability of injury types (A, B, C) was good (κ = 0.77). The surgeons’ pairwise Kappa ranged from 0.69 to 0.90. Kappa coefficients κ for reliability of injury subtypes ranged from 0.26 to 0.78.
Conclusions
The proposed TL spine injury system is based on clinically relevant parameters. Final evaluation data showed reasonable reliability and accuracy. Further validation of the proposed revised AO Classification requires follow-up evaluation sessions and documentation by more surgeons from different countries and backgrounds and is subject to modification based on clinical parameters during subsequent phases.
doi:10.1007/s00586-013-2738-0
PMCID: PMC3804719  PMID: 23508335
Spinal injury classification; Thoracolumbar; Consensus development; Reliability; Accuracy
19.  Online health information – what the newspapers tell their readers: a systematic content analysis 
BMC Public Health  2014;14:1316.
Background
This study investigated the nature of newspaper reporting about online health information in the UK and US. Internet users frequently search for health information online, although the accuracy of the information retrieved varies greatly and can be misleading. Newspapers have the potential to influence public health behaviours, but information has been lacking in relation to how newspapers portray online health information to their readers.
Methods
The newspaper database Nexis®UK was searched for articles published from 2003 – 2012 relating to online health information. Systematic content analysis of articles published in the highest circulation newspapers in the UK and US was performed. A second researcher coded a 10% sample to establish inter-rater reliability of coding.
Results
In total, 161 newspaper articles were included in the analysis. Publication was most frequent in 2003, 2008 and 2009, which coincided with global threats to public health. UK broadsheet newspapers were significantly more likely to cover online health information than UK tabloid newspapers (p = 0.04) and only one article was identified in US tabloid newspapers. Articles most frequently appeared in health sections. Among the 79 articles that linked online health information to specific diseases or health topics, diabetes was the most frequently mentioned disease, cancer the commonest group of diseases and sexual health the most frequent health topic. Articles portrayed benefits of obtaining online health information more frequently than risks. Quotations from health professionals portrayed mixed opinions regarding public access to online health information. 108 (67.1%) articles directed readers to specific health-related web sites. 135 (83.9%) articles were rated as having balanced judgement and 76 (47.2%) were judged as having excellent quality reporting. No difference was found in the quality of reporting between UK and US articles.
Conclusions
Newspaper coverage of online health information was low during the 10-year period 2003 to 2012. Journalists tended to emphasise the benefits and understate the risks of online health information and the quality of reporting varied considerably. Newspapers directed readers to sources of online health information during global epidemics although, as most articles appeared in the health sections of broadsheet newspapers, coverage was limited to a relatively small readership.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2458-14-1316) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2458-14-1316
PMCID: PMC4326503  PMID: 25532562
Newspapers; Newspaper article; Internet; Health information; Online health information
20.  Satisfaction of Search in Multi-trauma Patients: Severity of Detected Fractures 
Academic radiology  2007;14(6):711-722.
Purpose
Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected. This research studied whether the severity of a detected fracture determines whether subsequent fractures are overlooked.
Materials and Methods
Each of seventy simulated multi-trauma patients presented examinations of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic area included a fracture (the SOS condition), and when it did not (the control condition). The SOS effect was measured on detection accuracy for subtle test fractures presented on examinations of the second and third anatomic areas. In an experiment with twelve radiology readers, the initial SOS radiographs showed non-displaced fractures of extremities, fractures associated with low morbidity. In another experiment with twelve different radiology readers, the initial examination, usually a CT, showed cervical and pelvic fractures of the type associated with high morbidity. Because of their more direct role in patient care, the experiment using high morbidity SOS fractures was repeated with seventeen orthopedic readers.
Results
Detection of subtle test fractures was substantially reduced when fractures of low morbidity were added (p<0.01). No similar SOS effect was observed in either experiment in which added fractures were associated with high morbidity.
Conclusion
The satisfaction of search effect in skeletal radiology was replicated, essentially doubling the evidence for SOS in musculoskeletal radiology, and providing an essential contrast to the absence of SOS from high morbidity fractures.
doi:10.1016/j.acra.2007.02.016
PMCID: PMC1978092  PMID: 17502261
21.  Quantification of Lower Leg Arterial Calcifications by High-Resolution Peripheral Quantitative Computed Tomography 
Bone  2013;58:42-47.
Vascular calcifications and bone health seem to be etiologically linked via common risk factors such as aging and subclinical chronic inflammation. Epidemiologic studies have shown significant associations between low bone mineral density (BMD), fragility fractures and calcifications of the coronary arteries and the abdominal aorta. In the last decade, high-resolution peripheral quantitative computed tomography (HR-pQCT) has emerged as in-vivo research tool for the assessment of peripheral bone geometry, density, and microarchitecture. Although vascular calcifications are frequently observed as incidental findings in HR-pQCT scans, they have not yet been incorporated into quantitative HR-pQCT analyses. We developed a semi-automated algorithm to quantify lower leg arterial calcifications (LLAC), captured by HR-pQCT. The objective of our study was to determine validity and reliability of the LLAC measure.
HR-pQCT scans were downscaled to a voxel size of 250 µm. After subtraction of bone volumes from the scans, LLAC were detected and contoured by a semi-automated, dual-threshold seed-point segmentation. LLAC mass (in mg hydroxyapatite; HA) was calculated as the product of voxel-based calcification volume (mm3) and mean calcification density (mgHA/cm3)/1000. To determine validity, we compared LLAC to coronary artery calcifications (CAC), as quantified by multi-detector computed tomography (MDCT) and Agatston scoring in forty-six patients on chronic hemodialysis. Moreover, we investigated associations of LLAC with age, time on dialysis, type-2 diabetes mellitus, history of stroke, and myocardial infarction. In a second step, we determined intra- and inter-reader reliability of the LLAC measure.
In the validity study, LLAC were present (>0 mgHA) in 76% of patients, 78% of patients had CAC (>0 mgHA). Median LLAC was 6.65 (0.08 – 24.40) mgHA and median CAC as expressed by Agatston score was 266.3 (15.88 – 1877.28). We found a significant positive correlation between LLAC and CAC (rho=0.6; p<0.01). Dialysis patients with type-2 diabetes mellitus (DM; 35%) and history of stroke (13%) had higher median LLAC than patients without those conditions (DM 20.0 fold greater, p=0.006; Stroke 5.1 fold greater, p=0.047;). LLAC was positively correlated with time on dialysis (rho=0.337, p=0.029), there was a trend towards a positive association of LLAC and age (rho=0.289, p=0.053). The reliability study yielded excellent intra- and inter-reader agreement of the LLAC measure (intra-reader ICC=0.999, 95% CI=0.998–1.000; inter-reader ICC=0.998, 95% CI=0.994–0.999).
Our study indicates that the LLAC measure has good validity and excellent reliability. The use of HR-pQCT for the simultaneous evaluation of arterial calcifications, peripheral bone geometry, bone density, and bone microarchitecture should facilitate future research on osteo-vascular interactions and potential associations with cardiovascular events.
doi:10.1016/j.bone.2013.08.006
PMCID: PMC4042679  PMID: 23954758
HR-pQCT; Lower Leg Arterial Calcifications; Quantification; Agatston-Score
22.  A feasibility trial of computer-aided diagnosis for enteric lesions in capsule endoscopy 
AIM: To investigate and evaluate the feasibility of the computer-aided screening diagnosis for enteric lesions in the capsule endoscopy (CE).
METHODS: After developing a series of algorithms for the screening diagnosis of the enteric lesions in CE based on their characteristic colors and contours, the normal and abnormal images obtained from 289 patients were respectively scanned and diagnosed by the CE readers and by the computer-aided screening for the enteric lesions with the image-processed software (IPS). The enteric lesions shown by the images included esoenteritis, mucosal ulcer and erosion, bleeding, space-occupying lesions, angioectasia, diverticula, parasites, etc. The images for the lesions or the suspected lesions confirmed by the CE readers and the computers were collected, and the effectiveness rate of the screening and the number of the scanned images were evaluated, respectively.
RESULTS: Compared with the diagnostic results obtained by the CE readers, the total effectiveness rate (sensitivity) in the screening of the commonly-encountered enteric lesions by IPS varied from 42.9% to 91.2%, with a median of 74.2%, though the specificity and the accuracy rates were still low, and the images for the rarely-encountered lesions were difficult to differentiate from the normal images. However, the number of the images screened by IPS was 5000 on average, and only 10%-15% of the original images were left behind. As a result, a large number of normal images were excluded, and the reading time decreased from 5 h to 1 h on average.
CONCLUSION: Though the total accuracy and specificity rates by the computer-aided screening for the enteric lesions with IPS are much lower than those by the CE readers, the computer-aided screening diagnosis can exclude a large number of the normal images and confine the enteric lesions to 5000 images on average, which can reduce the workload of the readers in the scanning of the images. This computer-aided screening technique can make a correct diagnosis as efficiently as possible in most of the patients.
doi:10.3748/wjg.14.6929
PMCID: PMC2773855  PMID: 19058327
Enteric lesions; Image processing; Capsule endoscopy; Diagnosis
23.  Accuracy of MR Elastography and Anatomic MR Imaging Features in the Diagnosis of Severe Hepatic Fibrosis and Cirrhosis 
Purpose
To compare the diagnostic accuracy of MR elastography and anatomic MR imaging features in the diagnosis of severe hepatic fibrosis and cirrhosis.
Materials and Methods
Three readers independently assessed presence of morphological changes associated with hepatic fibrosis in 72 patients with liver biopsy including: caudate to right lobe ratios, nodularity, portal venous hypertension (PVH) stigmata, posterior hepatic notch, expanded gallbladder fossa and right hepatic vein caliber. Three readers measured shear stiffness values using quantitative shear stiffness maps (elastograms). Sensitivity, specificity and diagnostic accuracy of stiffness values and each morphological feature were calculated. Inter-reader agreement was summarized using weighted kappa statistics. Intra-class correlation coefficient was used to assess inter-reader reproducibility of stiffness measurements. Binary logistic regression was used to assess inter-reader variability for dichotomized stiffness values and each morphological feature.
Results
Using 5.9 kPa as a cut-off for differentiating F3–F4 from F0–2 stages, overall sensitivity, specificity and diagnostic accuracy for MR elastography were 85.4%, 88.4 % and 87% respectively. Overall inter-reader agreement for stiffness values was substantial, with insignificant difference (p=0.74) in the frequency of differentiating F3–4 from F0–2 fibrosis. Only hepatic nodularity and PVH stigmata showed moderately high overall accuracy of 69.4% and 72.2%. Inter-reader agreement was substantial only for PVH stigmata, moderate for C/R m, deep notch and expanded gallbladder fossa. Only posterior hepatic notch (p=0.82) showed no significant difference in reader rating.
Conclusion
MR elastography is a non-invasive, accurate and reproducible technique compared with conventional features of detecting severe hepatic fibrosis.
doi:10.1002/jmri.23585
PMCID: PMC3495186  PMID: 22246952
liver; fibrosis; cirrhosis; magnetic resonance elastography; morphological features
24.  Effect of Improving the Quality of Radiographic Interpretation on the Ability to Predict Pulmonary Tuberculosis Relapse 
Academic radiology  2009;17(2):157-162.
Rationale and Objectives
Chest radiographic findings are important for diagnosis and management of tuberculosis. The reliability of these findings is therefore of interest. We sought to describe interobserver reliability of chest radiographic findings in pulmonary tuberculosis, and to understand how the reliability of these findings might affect the utility of radiographic findings in predicting tuberculosis relapse.
Materials and Methods
Three blinded readers independently reviewed chest radiographs from a randomly selected group of 10% of HIV-seronegative subjects participating in a tuberculosis treatment trial. The three readers then arrived at a fourth, consensus radiographic interpretation.
Results
A total of 241 films obtained from 99 patients were reviewed. Agreement among the independent readers was very good for the findings of bilateral disease (κ = 0.71–0.86 among readers) and cavitation (κ = 0.66–0.73). The original interpretation was reasonably sensitive and specific (compared to the consensus interpretation) for bilateral disease, but the sensitivity for cavity decreased from 81% for the 2-month film to 47% at end of treatment (P = 0.013). Substituting the consensus interpretation for the original interpretation increased the odds ratio for the association between cavitation on early chest radiograph and subsequent tuberculosis relapse from 4.97 to 8.97.
Conclusion
Radiographic findings were reasonably reliable between independent reviewers and the original interpretations. The original investigators, who knew the patient’s clinical course, were less likely to identify cavitation on the end of treatment chest radiograph. Improving the reliability of these findings could improve the utility of chest radiographs for predicting tuberculosis relapse.
doi:10.1016/j.acra.2009.08.013
PMCID: PMC3791332  PMID: 19910216
Tuberculosis; radiography; thoracic; reliability and validity
25.  Influence of Nodule Detection Software on Radiologists’ Confidence in Identifying Pulmonary Nodules With Computed Tomography 
Journal of thoracic imaging  2011;26(1):48-53.
Purpose
With advances in technology, detection of small pulmonary nodules is increasing. Nodule detection software (NDS) has been developed to assist radiologists with pulmonary nodule diagnosis. Although it may increase sensitivity for small nodules, often there is an accompanying increase in false-positive findings. We designed a study to examine the extent to which computed tomography (CT) NDS influences the confidence of radiologists in identifying small pulmonary nodules.
Materials and Methods
Eight radiologists (readers) with different levels of experience examined thoracic CT scans of 131 cases and identified all the clinically relevant pulmonary nodules. The reference standard was established by an expert, dedicated thoracic radiologist. For each nodule, the readers recorded nodule size, density, location, and confidence level. Two weeks (or more) later, the readers reinterpreted the same scans; however, this time they were provided marks, when present, as indicated by NDS and asked to reassess their level of confidence. The effect of NDS on changes in reader confidence was assessed using multivariable generalized linear regression models.
Results
A total of 327 unique nodules were identified. Declines in confidence were significantly (P<0.05) associated with the absence of an NDS mark and smaller nodules (odds ratio=71.0, 95% confidence interval =14.8–339.7). Among nodules with pre-NDS confidence less than 100%, increases in confidence were significantly (P<0.05) associated with the presence of an NDS mark (odds ratio=6.0, 95% confidence interval =2.7–13.6) and larger nodules. Secondary findings showed that NDS did not improve reader diagnostic accuracy.
Conclusion
Although in this study NDS does not seem to enhance reader accuracy, the confidence of the radiologists in identifying small pulmonary nodules with CT is greatly influenced by NDS.
doi:10.1097/RTI.0b013e3181d73a8f
PMCID: PMC3119348  PMID: 20498624
clinical decision making; computed tomography scan; diagnostic imaging; lung neoplasm; diagnostic errors

Results 1-25 (1316435)