PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1479143)

Clipboard (0)
None

Related Articles

1.  Polysomnography in Patients With Obstructive Sleep Apnea 
Executive Summary
Objective
The objective of this health technology policy assessment was to evaluate the clinical utility and cost-effectiveness of sleep studies in Ontario.
Clinical Need: Target Population and Condition
Sleep disorders are common and obstructive sleep apnea (OSA) is the predominant type. Obstructive sleep apnea is the repetitive complete obstruction (apnea) or partial obstruction (hypopnea) of the collapsible part of the upper airway during sleep. The syndrome is associated with excessive daytime sleepiness or chronic fatigue. Several studies have shown that OSA is associated with hypertension, stroke, and other cardiovascular disorders; many researchers believe that these cardiovascular disorders are consequences of OSA. This has generated increasing interest in recent years in sleep studies.
The Technology Being Reviewed
There is no ‘gold standard’ for the diagnosis of OSA, which makes it difficult to calibrate any test for diagnosis. Traditionally, polysomnography (PSG) in an attended setting (sleep laboratory) has been used as a reference standard for the diagnosis of OSA. Polysomnography measures several sleep variables, one of which is the apnea-hypopnea index (AHI) or respiratory disturbance index (RDI). The AHI is defined as the sum of apneas and hypopneas per hour of sleep; apnea is defined as the absence of airflow for ≥ 10 seconds; and hypopnea is defined as reduction in respiratory effort with ≥ 4% oxygen desaturation. The RDI is defined as the sum of apneas, hypopneas, and abnormal respiratory events per hour of sleep. Often the two terms are used interchangeably. The AHI has been widely used to diagnose OSA, although with different cut-off levels, the basis for which are often unclear or arbitrarily determined. Generally, an AHI of more than five events per hour of sleep is considered abnormal and the patient is considered to have a sleep disorder. An abnormal AHI accompanied by excessive daytime sleepiness is the hallmark for OSA diagnosis. For patients diagnosed with OSA, continuous positive airway pressure (CPAP) therapy is the treatment of choice. Polysomnography may also used for titrating CPAP to individual needs.
In January 2005, the College of Physicians and Surgeons of Ontario published the second edition of Independent Health Facilities: Clinical Practice Parameters and Facility Standards: Sleep Medicine, commonly known as “The Sleep Book.” The Sleep Book states that OSA is the most common primary respiratory sleep disorder and a full overnight sleep study is considered the current standard test for individuals in whom OSA is suspected (based on clinical signs and symptoms), particularly if CPAP or surgical therapy is being considered.
Polysomnography in a sleep laboratory is time-consuming and expensive. With the evolution of technology, portable devices have emerged that measure more or less the same sleep variables in sleep laboratories as in the home. Newer CPAP devices also have auto-titration features and can record sleep variables including AHI. These devices, if equally accurate, may reduce the dependency on sleep laboratories for the diagnosis of OSA and the titration of CPAP, and thus may be more cost-effective.
Difficulties arise, however, when trying to assess and compare the diagnostic efficacy of in-home PSG versus in-lab. The AHI measured from portable devices in-home is the sum of apneas and hypopneas per hour of time in bed, rather than of sleep, and the absolute diagnostic efficacy of in-lab PSG is unknown. To compare in-home PSG with in-lab PSG, several researchers have used correlation coefficients or sensitivity and specificity, while others have used Bland-Altman plots or receiver operating characteristics (ROC) curves. All these approaches, however, have potential pitfalls. Correlation coefficients do not measure agreement; sensitivity and specificity are not helpful when the true disease status is unknown; and Bland-Altman plots measure agreement (but are helpful when the range of clinical equivalence is known). Lastly, receiver operating characteristics curves are generated using logistic regression with the true disease status as the dependent variable and test values as the independent variable. Thus, each value of the test is used as a cut-point to measure sensitivity and specificity, which are then plotted on an x-y plane. The cut-point that maximizes both sensitivity and specificity is chosen as the cut-off level to discriminate between disease and no-disease states. In the absence of a gold standard to determine the true disease status, ROC curves are of minimal value.
At the request of the Ontario Health Technology Advisory Committee (OHTAC), MAS has thus reviewed the literature on PSG published over the last two years to examine new developments.
Methods
Review Strategy
There is a large body of literature on sleep studies and several reviews have been conducted. Two large cohort studies, the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study, are the main sources of evidence on sleep literature.
To examine new developments on PSG published in the past two years, MEDLINE, EMBASE, MEDLINE In-Process & Other Non-Indexed Citations, the Cochrane Database of Systematic Reviews and Cochrane CENTRAL, INAHTA, and websites of other health technology assessment agencies were searched. Any study that reported results of in-home or in-lab PSG was included. All articles that reported findings from the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study were also reviewed.
Diffusion of Sleep Laboratories
To estimate the diffusion of sleep laboratories, a list of sleep laboratories licensed under the Independent Health Facility Act was obtained. The annual number of sleep studies per 100,000 individuals in Ontario from 2000 to 2004 was also estimated using administrative databases.
Summary of Findings
Literature Review
A total of 315 articles were identified that were published in the past two years; 227 were excluded after reviewing titles and abstracts. A total of 59 articles were identified that reported findings of the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study.
Prevalence
Based on cross-sectional data from the Wisconsin Sleep Cohort Study of 602 men and women aged 30 to 60 years, it is estimated that the prevalence of sleep-disordered breathing is 9% in women and 24% in men, on the basis of more than five AHI events per hour of sleep. Among the women with sleep disorder breathing, 22.6% had daytime sleepiness and among the men, 15.5% had daytime sleepiness. Based on this, the prevalence of OSA in the middle-aged adult population is estimated to be 2% in women and 4% in men.
Snoring is present in 94% of OSA patients, but not all snorers have OSA. Women report daytime sleepiness less often compared with their male counterparts (of similar age, body mass index [BMI], and AHI). Prevalence of OSA tends to be higher in older age groups compared with younger age groups.
Diagnostic Value of Polysomnography
It is believed that PSG in the sleep laboratory is more accurate than in-home PSG. In the absence of a gold standard, however, claims of accuracy cannot be substantiated. In general, there is poor correlation between PSG variables and clinical variables. A variety of cut-off points of AHI (> 5, > 10, and > 15) are arbitrarily used to diagnose and categorize severity of OSA, though the clinical importance of these cut-off points has not been determined.
Recently, a study of the use of a therapeutic trial of CPAP to diagnose OSA was reported. The authors studied habitual snorers with daytime sleepiness in the absence of other medical or psychiatric disorders. Using PSG as the reference standard, the authors calculated the sensitivity of this test to be 80% and its specificity to be 97%. Further, they concluded that PSG could be avoided in 46% of this population.
Obstructive Sleep Apnea and Obesity
Obstructive sleep apnea is strongly associated with obesity. Obese individuals (BMI >30 kg/m2) are at higher risk for OSA compared with non-obese individuals and up to 75% of OSA patients are obese. It is hypothesized that obese individuals have large deposits of fat in the neck that cause the upper airway to collapse in the supine position during sleep. The observations reported from several studies support the hypothesis that AHIs (or RDIs) are significantly reduced with weight loss in obese individuals.
Obstructive Sleep Apnea and Cardiovascular Diseases
Associations have been shown between OSA and comorbidities such as diabetes mellitus and hypertension, which are known risk factors for myocardial infarction and stroke. Patients with more severe forms of OSA (based on AHI) report poorer quality of life and increased health care utilization compared with patients with milder forms of OSA. From animal models, it is hypothesized that sleep fragmentation results in glucose intolerance and hypertension. There is, however, no evidence from prospective studies in humans to establish a causal link between OSA and hypertension or diabetes mellitus. It is also not clear that the associations between OSA and other diseases are independent of obesity; in most of these studies, patients with higher values of AHI had higher values of BMI compared with patients with lower AHI values.
A recent meta-analysis of bariatric surgery has shown that weight loss in obese individuals (mean BMI = 46.8 kg/m2; range = 32.30–68.80) significantly improved their health profile. Diabetes was resolved in 76.8% of patients, hypertension was resolved in 61.7% of patients, hyperlipidemia improved in 70% of patients, and OSA resolved in 85.7% of patients. This suggests that obesity leads to OSA, diabetes, and hypertension, rather than OSA independently causing diabetes and hypertension.
Health Technology Assessments, Guidelines, and Recommendations
In April 2005, the Centers for Medicare and Medicaid Services (CMS) in the United States published its decision and review regarding in-home and in-lab sleep studies for the diagnosis and treatment of OSA with CPAP. In order to cover CPAP, CMS requires that a diagnosis of OSA be established using PSG in a sleep laboratory. After reviewing the literature, CMS concluded that the evidence was not adequate to determine that unattended portable sleep study was reasonable and necessary in the diagnosis of OSA.
In May 2005, the Canadian Coordinating Office of Health Technology Assessment (CCOHTA) published a review of guidelines for referral of patients to sleep laboratories. The review included 37 guidelines and associated reviews that covered 18 applications of sleep laboratory studies. The CCOHTA reported that the level of evidence for many applications was of limited quality, that some cited studies were not relevant to the recommendations made, that many recommendations reflect consensus positions only, and that there was a need for more good quality studies of many sleep laboratory applications.
Diffusion
As of the time of writing, there are 97 licensed sleep laboratories in Ontario. In 2000, the number of sleep studies performed in Ontario was 376/100,000 people. There was a steady rise in sleep studies in the following years such that in 2004, 769 sleep studies per 100,000 people were performed, for a total of 96,134 sleep studies. Based on prevalence estimates of the Wisconsin Sleep Cohort Study, it was estimated that 927,105 people aged 30 to 60 years have sleep-disordered breathing. Thus, there may be a 10-fold rise in the rate of sleep tests in the next few years.
Economic Analysis
In 2004, approximately 96,000 sleep studies were conducted in Ontario at a total cost of ~$47 million (Cdn). Since obesity is associated with sleep disordered breathing, MAS compared the costs of sleep studies to the cost of bariatric surgery. The cost of bariatric surgery is $17,350 per patient. In 2004, Ontario spent $4.7 million per year for 270 patients to undergo bariatric surgery in the province, and $8.2 million for 225 patients to seek out-of-country treatment. Using a Markov model, it was concluded that shifting costs from sleep studies to bariatric surgery would benefit more patients with OSA and may also prevent health consequences related to diabetes, hypertension, and hyperlipidemia. It is estimated that the annual cost of treating comorbid conditions in morbidly obese patients often exceeds $10,000 per patient. Thus, the downstream cost savings could be substantial.
Considerations for Policy Development
Weight loss is associated with a decrease in OSA severity. Treating and preventing obesity would also substantially reduce the economic burden associated with diabetes, hypertension, hyperlipidemia, and OSA. Promotion of healthy weights may be achieved by a multisectorial approach as recommended by the Chief Medical Officer of Health for Ontario. Bariatric surgery has the potential to help morbidly obese individuals (BMI > 35 kg/m2 with an accompanying comorbid condition, or BMI > 40 kg/m2) lose weight. In January 2005, MAS completed an assessment of bariatric surgery, based on which OHTAC recommended an improvement in access to these surgeries for morbidly obese patients in Ontario.
Habitual snorers with excessive daytime sleepiness have a high pretest probability of having OSA. These patients could be offered a therapeutic trial of CPAP to diagnose OSA, rather than a PSG. A majority of these patients are also obese and may benefit from weight loss. Individualized weight loss programs should, therefore, be offered and patients who are morbidly obese should be offered bariatric surgery.
That said, and in view of the still evolving understanding of the causes, consequences and optimal treatment of OSA, further research is warranted to identify which patients should be screened for OSA.
PMCID: PMC3379160  PMID: 23074483
2.  A Systematic Review of Statistical Methods Used to Test for Reliability of Medical Instruments Measuring Continuous Variables 
Objective(s): Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice.
Materials and Methods: In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria.
Results: The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments.
Conclusion: This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
PMCID: PMC3758037  PMID: 23997908
ICC; Intra-class correlation coefficient; Reliability; Statistical method; Validation study
3.  Lack of agreement between tonometric and gastric juice partial carbon dioxide tension 
Critical Care  2000;4(4):249-254.
Our goal was to compare measurement of tonometered saline and gastric juice partial carbon dioxide tension (PCO2). In this prospective observational study, 112 pairs of measurements were simultaneously obtained under various hemodynamic conditions, in 15 critical care patients. Linear regression analysis showed a significant correlation between the two methods of measuring PCO2 (r 2 = 0.43; P < 0.0001). However, gastric juice PCO2 was systematically higher (mean difference 51 mmHg). The 95% limits of agreement were 315 mmHg and the dispersion increased as the values of PCO2 increased. Tonometric and gastric juice PCO2 cannot be used interchangeably. Gastric juice PCO2 measurement should be interpreted with caution.
Introduction:
In recent years there has been growing interest in tonometric estimation of gastric intramucosal pH (pHi). More recently, attention has focused on the gradient between intraluminal and arterial PCO2. pHi appears to be a useful diagnostic and prognostic tool in critically ill patients, and may also be used as a therapeutic guide. However, intraluminal PCO2 is the parameter measured to calculate pHi, and it is assumed as equivalent to the PCO2 of the upper layers of the gastric mucosa.
Direct measurement of PCO2 in gastric juice might offer advantages over tonometry. Tonometer costs could be saved, and equilibration time would no longer be necessary. Additionally, preanalytic factors that account for poor reproducibility, such as inadequate volume of saline in the tonometer, errors in the dwell time of the sample or in the technique used to aspirate saline, mixing of the sample with tonometer dead space and delay in analysis, could be prevented. Nevertheless, to our knowledge few experimental or clinical studies have examined PCO2 in gastric juice. Moreover, no comparison with simultaneous tonometric samples has been performed. Our goal was to compare simultaneous measurement of PCO2 in gastric juice and in saline samples from a tonometer. Data from the present study show that gastric juice PCO2 is systematically higher. Furthermore, differences widen at high PCO2 values, and data dispersion becomes even more striking. Therefore, tonometric PCO2 and gastric juice PCO2 are not interchangeable.
Patients and methods:
The present study was approved by the local ethics committee, and informed consent was obtained from the next of kin of each patient.
We studied 15 consecutive mechanically ventilated patients from a medical/surgical intensive care unit, in whom tonometric monitoring was indicated by attending physicians. All patients were receiving 50 mg intravenous ranitidine every 8 h. Gastric tonometers were filled with saline, which was extracted after 90 min of equilibration time. At the same time, gastric juice was anaerobically extracted from the aspiration port of the tonometer. The initial 20 ml was discarded. PCO2 in both samples was measured using a blood gas analyzer (AVL 945; AVL List GMBH, Gratz, Austria). These measurements were taken at various time points in each patient, and under various haemodynamic and oxygen transport conditions, All measurements were performed with the patient fasted. Correlation between the two measurements was examined using the Bland-Altman technique.
We also performed an in vitro study to quantify the precision and bias for the AVL 945. For this purpose, a stable PCO2 in saline solution was achieved by bubbling 5% carbon dioxide calibration gas.
Results:
We performed 112 pairs of measurements in 15 patients. Table 1 shows clinical data and the first values of arterial, tonometered and gastric juice PCO2 for each patient. Regression analysis demonstrated a significant correlation between both methods of measuring PCO2 (r 2 =0.43; gastric juice PCO2 = -28.79 + [2.55 × tonometric PCO2]; P < 0.0001; Fig. 1). However, the bias calculated as the mean difference of gastric juice and tonometric PCO2 was 51 mmHg. The 95% limits of agreement were 315 mmHg (Fig. 2). For mean PCO2 values lesser than 100 mmHg, the bias and the 95% limits of agreement were 19 and 102 mmHg, respectively. As mean PCO2 increased, the scattering of differences widened (r 2 =0.71; P < 0.0001).
In an effort to prevent the bias related to multiple measurements per patient, we performed Bland-Altman analysis with the first measurement of each patient. After this the results remained similar (bias 55 mmHg, 95% limits of agreement 216 mmHg).
The AVL 945 blood gas analyzer showed a negative bias of 0.97 mmHg and a precision of 2.13 mmHg. This bias was considered negligible, so no further correction was made to saline tonometric values.
Discussion:
The results of the present study show that tonometric PCO2 and gastric juice PCO2 are not interchangeable. Gastric juice PCO2 is systematically higher. At high PCO2 values the differences widen, and data dispersion becomes even more marked.
There is no clear cause for these observations. A possible explanation might be that tonometric PCO2 is generated over a time interval, whereas gastric juice PCO2 might reflect rapid changes in mucosal metabolism. Different equilibrium time could also account for data dispersion, but not for the positive bias for gastric juice. Rapid changes should occur in both directions.
Another potential confounding factor is the ability of blood gas analyzers to measure PCO2 in gastric juice. Measurement of PCO2 in 0.9% saline is an important source of error in the estimation of pHi. Variation in PCO2 values may occur with different PCO2 equilibration solutions. For example, bias is -66.5% when the Nova Stat Profile 7 blood gas analyzer (Nova Biomedical, Waltham, MA, USA) measures concentration of 1.95% of CO2 equilibrated in normal saline. However, bias changes to +45.4% when 1.95% CO2 is equilibrated in human albumin solution 4.5%.
It would not be surprising if gastric juice components such as proteins, mucopolisaccharides and others interfere with CO2 solubility and its subsequent measurement by blood gas analyzers. In this way, intersubject and intrasubject variation in gastric juice composition could also account for data dispersion. Fiddian-Green et al [1] measured PCO2 in gastric contents of anaesthetized dogs. They isolated the stomach from the oesophagus and the duodenum with ligatures, and washed it through a catheter with saline. Then, they instilled 250 ml 0.9% saline and took samples to measure PCO2 and to estimate pHi. Simultaneously, mucosa pH was recorded with a microglass probe. They found a statistically significant correlation between both methods. However, data dispersion in the graph was considerable.
We were able to exclude analyzer underestimation of PCO2 in saline as the cause for the present results. In vitro performance of the AVL 945 in blood was good. It showed a negative bias less than 1 mmHg and a precision of about 2 mmHg.
We cannot infer from the present data the technique that should be the gold standard for measuring PCO2 in gastric mucosa. However, the studies that have established the normal values for pHi, prognostic changes and its uses as a therapeutic index have been performed with tonometry. Hence, more data are needed for the routine measurement of PCO2 in gastric juice.
Correlation between gastric juice and tonometric PCO2. We performed 112 pairs of measurements of gastric juice and tonometric PCO2 in 15 critical care patients under different haemodynamic and oxygen transport conditions. The linear regression coefficient is significant. However, the slope value indicates systematic overestimation of gastric juice PCO2 in relation to saline PCO2.
Bland-Altman analysis of the differences between gastric juice and tonometric PCO2. The bias calculated as the mean difference of gastric juice and tonometric PCO2 was 51 mmHg. The 95% limits of agreement were 315 mmHg. The bias and the scattering of differences widened as PCO2 increased.
Clinical characteristics and first value of arterial, tonometer and gastric juice PCO2
ARDS, acute respiratory distress syndrome.
PMCID: PMC29045  PMID: 11056754
gastric tonometry; intramucosal partial carbon dioxide tension; intramucosal pH
4.  A quantitative definition of scaphoid union: determining the inter-rater reliability of two techniques 
Background
Despite extensive literature supporting the use of computerized tomography (CT) scans in evaluating scaphoid fractures, there has not been a consensus on the methodology for defining and quantifying union. The purpose of this study was to test the inter-observer reliability of two methods of quantifying scaphoid union.
Methods
The CT scans of 50 non-operatively treated scaphoid fractures were reviewed by four blinded observers. Each was asked to classify union into one of three categories, united, partially united, or tenuously united, based on their general impression. Each reviewer then carefully analyzed each CT slice and quantified union based on two methods, the mean percentage union and the weighted mean percentage union. The estimated percentage of scaphoid union for each scan was recorded, and inter-observer reliability for both methods was assessed using a Bland-Altman plot to calculate the 95% limits of agreement. Kappa statistic was used to measure the degree of agreement for the categorical assessment of union.
Results
There was very little difference in the percentage of union calculated between the two methods (mean difference between the two methods was 1.2 ± 4.1%), with each reviewer demonstrating excellent agreement between the two methods based on the Bland-Altman plot. The kappa score indicated very good agreement (Ƙ = 0.80) between the consultant hand surgeon and the musculoskeletal radiologist, and good agreement (Ƙ = 0.62) between the consultant hand surgeon and the hand fellow for the categorical assessment of union.
Conclusions
This study describes two methods of quantifying and defining scaphoid union, both with a high inter-rater reliability. This indicates that either method can be reliably used, making it an important tool for both for clinical use and research purposes in future studies of scaphoid fractures, particularly those which are using union or time to union as their endpoint.
Level of evidence
Diagnostic, level III
doi:10.1186/1749-799X-8-28
PMCID: PMC3765287  PMID: 23961919
Inter-rater reliability; Partial union; Scaphoid; Union
5.  Percutaneous Vertebroplasty for Treatment of Painful Osteoporotic Vertebral Compression Fractures 
Executive Summary
Objective of Analysis
The objective of this analysis is to examine the safety and effectiveness of percutaneous vertebroplasty for treatment of osteoporotic vertebral compression fractures (VCFs) compared with conservative treatment.
Clinical Need and Target Population
Osteoporosis and associated fractures are important health issues in ageing populations. Vertebral compression fracture secondary to osteoporosis is a cause of morbidity in older adults. VCFs can affect both genders, but are more common among elderly females and can occur as a result of a fall or a minor trauma. The fracture may occur spontaneously during a simple activity such as picking up an object or rising up from a chair. Pain originating from the fracture site frequently increases with weight bearing. It is most severe during the first few weeks and decreases with rest and inactivity.
Traditional treatment of painful VCFs includes bed rest, analgesic use, back bracing and muscle relaxants. The comorbidities associated with VCFs include deep venous thrombosis, acceleration of osteopenea, loss of height, respiratory problems and emotional problems due to chronic pain.
Percutaneous vertebroplasty is a minimally invasive surgical procedure that has gained popularity as a new treatment option in the care for these patients. The technique of vertebroplasty was initially developed in France to treat osteolytic metastasis, myeloma, and hemangioma. The indications were further expanded to painful osteoporotic VCFs and subsequently to treatment of asymptomatic VCFs.
The mechanism of pain relief, which occurs within minutes to hours after vertebroplasty, is still not known. Pain pathways in the surrounding tissue appear to be altered in response to mechanical, chemical, vascular, and thermal stimuli after the injection of the cement. It has been suggested that mechanisms other than mechanical stabilization of the fracture, such as thermal injury to the nerve endings, results in immediate pain relief.
Percutaneous Vertebroplasty
Percutaneous vertebroplasty is performed with the patient in prone position and under local or general anesthesia. The procedure involves fluoroscopic imaging to guide the injection of bone cement into the fractured vertebral body to support the fractured bone. After injection of the cement, the patient is placed in supine position for about 1 hour while the cement hardens.
Cement leakage is the most frequent complication of vertebroplasty. The leakages may remain asymptomatic or cause symptoms of nerve irritation through compression of nerve roots. There are several reports of pulmonary cement embolism (PCE) following vertebroplasty. In some cases, the PCE may remain asymptomatic. Symptomatic PCE can be recognized by their clinical signs and symptoms such as chest pain, dyspnea, tachypnea, cyanosis, coughing, hemoptysis, dizziness, and sweating.
Research Methods
Literature Search
A literature search was performed on Feb 9, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1, 2005 to February 9, 2010.
Studies were initially reviewed by titles and abstracts. For those studies meeting the eligibility criteria, full-text articles were obtained and reviewed. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with an unknown eligibility were reviewed with a second clinical epidemiologist and then a group of epidemiologists until consensus was established. Data extraction was carried out by the author.
Inclusion Criteria
Study design: Randomized controlled trials (RCTs) comparing vertebroplasty with a control group or other interventions
Study population: Adult patients with osteoporotic vertebral fractures
Study sample size: Studies included 20 or more patients
English language full-reports
Published between Jan 1 2005 and Feb 9, 2010
(eligible studies identified through the Auto Alert function of the search were also included)
Exclusion Criteria
Non-randomized studies
Studies on conditions other than VCF (e.g. patients with multiple myeloma or metastatic tumors)
Studies focused on surgical techniques
Studies lacking outcome measures
Results of Evidence-Based Analysis
A systematic search yielded 168 citations. The titles and the abstracts of the citations were reviewed and full text of the identified citations was retrieved for further consideration. Upon review of the full publications and applying the inclusion and exclusion criteria, 5 RCTs were identified. Of these, two compared vertebroplasty with sham procedure, two compared vertebroplasty with conservative treatment, and one compared vertebroplasty with balloon kyphoplasty.
Randomized Controlled Trials
Recently, the results of two blinded randomized placebo-controlled trials of percutaneous vertebroplasty were reported. These trials, providing the highest quality of evidence available to date, do not support the use of vertebroplasty in patients with painful osteoporotic vertebral compression fractures. Based on the results of these trials, vertebroplasty offer no additional benefit over usual care and is not risk free.
In these trials the treatment allocation was blinded to the patients and outcome assessors. The control group received a sham procedure simulating vertebroplasty to minimize the effect of expectations and to reduce the potential for bias in self-reporting of outcomes. Both trials applied stringent exclusion criteria so that the results are generalizable to the patient populations that are candidates for vertebroplasty. In both trials vertebroplasty procedures were performed by highly skilled interventionists. Multiple valid outcome measures including pain, physical, mental, and social function were employed to test the between group differences in outcomes.
Prior to these two trials, there were two open randomized trials in which vertebroplasty was compared with conservative medical treatment. In the first randomized trial, patients were allowed to cross over to the other arm and had to be stopped after two weeks due to the high numbers of patients crossing over. The other study did not allow cross over and recently published the results of 12 months follow-up.
The following is the summary of the results of these 4 trials:
Two blinded RCTs on vertebroplasty provide the highest level of evidence available to date. Results of these two trials are supported by findings of an open randomized trial with 12 months follow-up. Blinded RCTs showed:
No significant differences in pain scores of patients who received vertebroplasty and patients who received a sham procedure as measured at 3 days, 2 weeks and 1 month in one study and at 1 week, 1 month, 3 months, and 6 months in the other.
The observed differences in pain scores between the two groups were neither statistically significant nor clinically important at any time points.
The above findings were consistent with the findings of an open RCT in which patients were followed for 12 months. This study showed that improvement in pain was similar between the two groups at 3 months and were sustained to 12 months.
In the blinded RCTs, physical, mental, and social functioning were measured at the above time points using 4-5 of the following 7 instruments: RDQ, EQ-5D, SF-36 PCS, SF-36 MCS, AQoL, QUALEFFO, SOF-ADL
There were no significant differences in any of these measures between patients who received vertebroplasty and patients who received a sham procedure at any of the above time points (with a few exceptions in favour of control intervention).
These findings were also consistent with the findings of an open RCT which demonstrated no significant between group differences in scores of ED-5Q, SF-36 PCS, SF 36 MCS, DPQ, Barthel, and MMSE which measure physical, mental, and social functioning (with a few exceptions in favour of control intervention).
One small (n=34) open RCT with a two week follow-up detected a significantly higher improvement in pain scores at 1 day after the intervention in vertebroplasty group compared with conservative treatment group. However, at 2 weeks follow-up, this difference was smaller and was not statistically significant.
Conservative treatment was associated with fewer clinically important complications
Risk of new VCFs following vertebroplasty was higher than those in conservative treatment but it requires further investigation.
PMCID: PMC3377535  PMID: 23074396
6.  Epidemiology and Reporting Characteristics of Systematic Reviews 
PLoS Medicine  2007;4(3):e78.
Background
Systematic reviews (SRs) have become increasingly popular to a wide range of stakeholders. We set out to capture a representative cross-sectional sample of published SRs and examine them in terms of a broad range of epidemiological, descriptive, and reporting characteristics, including emerging aspects not previously examined.
Methods and Findings
We searched Medline for SRs indexed during November 2004 and written in English. Citations were screened and those meeting our inclusion criteria were retained. Data were collected using a 51-item data collection form designed to assess the epidemiological and reporting details and the bias-related aspects of the reviews. The data were analyzed descriptively. In total 300 SRs were identified, suggesting a current annual publication rate of about 2,500, involving more than 33,700 separate studies including one-third of a million participants. The majority (272 [90.7%]) of SRs were reported in specialty journals. Most reviews (213 [71.0%]) were categorized as therapeutic, and included a median of 16 studies involving 1,112 participants. Funding sources were not reported in more than one-third (122 [40.7%]) of the reviews. Reviews typically searched a median of three electronic databases and two other sources, although only about two-thirds (208 [69.3%]) of them reported the years searched. Most (197/295 [66.8%]) reviews reported information about quality assessment, while few (68/294 [23.1%]) reported assessing for publication bias. A little over half (161/300 [53.7%]) of the SRs reported combining their results statistically, of which most (147/161 [91.3%]) assessed for consistency across studies. Few (53 [17.7%]) SRs reported being updates of previously completed reviews. No review had a registration number. Only half (150 [50.0%]) of the reviews used the term “systematic review” or “meta-analysis” in the title or abstract. There were large differences between Cochrane reviews and non-Cochrane reviews in the quality of reporting several characteristics.
Conclusions
SRs are now produced in large numbers, and our data suggest that the quality of their reporting is inconsistent. This situation might be improved if more widely agreed upon evidence-based reporting guidelines were endorsed and adhered to by authors and journals. These results substantiate the view that readers should not accept SRs uncritically.
Data were collected on the epidemiological, descriptive, and reporting characteristics of recent systematic reviews. A descriptive analysis found inconsistencies in the quality of reporting.
Editors' Summary
Background.
In health care it is important to assess all the evidence available about what causes a disease or the best way to prevent, diagnose, or treat it. Decisions should not be made simply on the basis of—for example—the latest or biggest research study, but after a full consideration of the findings from all the research of good quality that has so far been conducted on the issue in question. This approach is known as “evidence-based medicine” (EBM). A report that is based on a search for studies addressing a clearly defined question, a quality assessment of the studies found, and a synthesis of the research findings, is known as a systematic review (SR). Conducting an SR is itself regarded as a research project and the methods involved can be quite complex. In particular, as with other forms of research, it is important to do everything possible to reduce bias. The leading role in developing the SR concept and the methods that should be used has been played by an international network called the Cochrane Collaboration (see “Additional Information” below), which was launched in 1992. However, SRs are now becoming commonplace. Many articles published in journals and elsewhere are described as being systematic reviews.
Why Was This Study Done?
Since systematic reviews are claimed to be the best source of evidence, it is important that they should be well conducted and that bias should not have influenced the conclusions drawn in the review. Just because the authors of a paper that discusses evidence on a particular topic claim that they have done their review “systematically,” it does not guarantee that their methods have been sound and that their report is of good quality. However, if they have reported details of their methods, then it can help users of the review decide whether they are looking at a review with conclusions they can rely on. The authors of this PLoS Medicine article wanted to find out how many SRs are now being published, where they are being published, and what questions they are addressing. They also wanted to see how well the methods of SRs are being reported.
What Did the Researchers Do and Find?
They picked one month and looked for all the SRs added to the main list of medical literature in that month. They found 300, on a range of topics and in a variety of medical journals. They estimate that about 20% of reviews appearing each year are published by the Cochrane Collaboration. They found many cases in which important aspects of the methods used were not reported. For example, about a third of the SRs did not report how (if at all) the quality of the studies found in the search had been assessed. An important assessment, which analyzes for “publication bias,” was reported as having been done in only about a quarter of the cases. Most of the reporting failures were in the “non-Cochrane” reviews.
What Do These Findings Mean?
The authors concluded that the standards of reporting of SRs vary widely and that readers should, therefore, not accept the conclusions of SRs uncritically. To improve this situation, they urge that guidelines be drawn up regarding how SRs are reported. The writers of SRs and also the journals that publish them should follow these guidelines.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040078.
An editorial discussing this research article and its relevance to medical publishing appears in the same issue of PLoS Medicine
A good source of information on the evidence-based approach to medicine is the James Lind Library
The Web site of the Cochrane Collaboration is a good source of information on systematic reviews. In particular there is a newcomers' guide and information for health care “consumers”. From this Web site, it is also possible to see summaries of the SRs published by the Cochrane Collaboration (readers in some countries can also view the complete SRs free of charge)
Information on the practice of evidence-based medicine is available from the US Agency for Healthcare Research and Quality and the Canadian Agency for Drugs and Technologies in Health
doi:10.1371/journal.pmed.0040078
PMCID: PMC1831728  PMID: 17388659
7.  A multivariate hierarchical Bayesian approach to measuring agreement in repeated measurement method comparison studies 
Background
Assessing agreement in method comparison studies depends on two fundamentally important components; validity (the between method agreement) and reproducibility (the within method agreement). The Bland-Altman limits of agreement technique is one of the favoured approaches in medical literature for assessing between method validity. However, few researchers have adopted this approach for the assessment of both validity and reproducibility. This may be partly due to a lack of a flexible, easily implemented and readily available statistical machinery to analyse repeated measurement method comparison data.
Methods
Adopting the Bland-Altman framework, but using Bayesian methods, we present this statistical machinery. Two multivariate hierarchical Bayesian models are advocated, one which assumes that the underlying values for subjects remain static (exchangeable replicates) and one which assumes that the underlying values can change between repeated measurements (non-exchangeable replicates).
Results
We illustrate the salient advantages of these models using two separate datasets that have been previously analysed and presented; (i) assuming static underlying values analysed using both multivariate hierarchical Bayesian models, and (ii) assuming each subject's underlying value is continually changing quantity and analysed using the non-exchangeable replicate multivariate hierarchical Bayesian model.
Conclusion
These easily implemented models allow for full parameter uncertainty, simultaneous method comparison, handle unbalanced or missing data, and provide estimates and credible regions for all the parameters of interest. Computer code for the analyses in also presented, provided in the freely available and currently cost free software package WinBUGS.
doi:10.1186/1471-2288-9-6
PMCID: PMC2645135  PMID: 19161599
8.  Validation and comparison of EuroQoL-5 dimension (EQ-5D) and Short Form-6 dimension (SF-6D) among stable angina patients 
Objectives
Several preference-based health-related quality of life (HRQoL) instruments have been published and widely used in different populations. However no consensus has emerged regarding the most appropriate instrument in therapeutic area of stable angina. This study compared and validated the psychometric properties of two generic preference-based instruments, the EQ-5D and SF-6D, among Chinese stable angina patients.
Methods
Convergent validity of the EQ-5D and SF-6D was examined with eight a priori hypotheses from stable angina patients in conjunction with Seattle Angina Questionnaire (SAQ). Responsiveness was compared using the effect size (ES), relative efficiency (RE) and receiver operating characteristic (ROC) curves. Agreement between the EQ-5D and SF-6D was tested using intra-class correlation coefficient (ICC) and Bland-Altman plot. Factors affecting utility difference were explored with multiple linear regression analysis.
Results
In 411 patients (mean age 68.08 ± 11.35), mean utility scores (SD) were 0.78 (0.15) for the EQ-5D and 0.68 (0.12) for the SF-6D. Validity was demonstrated by the moderate to strong correlation coefficients (Range: 0.368-0.594, P< 0.001) for five of the eight hypotheses in both the EQ-5D and SF-6D. There were no serious floor effects for the EQ-5D and SF-6D, but ceiling effects for the EQ-5D were large. The areas under ROC of them all exceeded 0.5 (0.660-0.814, P< 0.001). The SF-6D showed a better discriminative capacity (ES: 0.573 to 1.179) between groups with different stable-angina-specific health status than the EQ-5D (ES: 0.426 to 1.126). RE suggested that the SF-6D (RE: 44.8 to 177.8%) was more efficient than the EQ-5D except for physical function. Poor agreement between them was observed with ICC (0.448, P< 0.001) and Bland-Altman plot analysis. Multiple liner regression showed that clinical variables significantly (P< 0.05) influenced differences in utility scores between the EQ-5D and SF-6D.
Conclusions
Both EQ-5D and SF-6D are valid and sensitive preference-based HRQoL instruments in Chinese stable angina patients. The SF-6D may be a more effective tool with lower ceiling effect and greater sensitivity. Further study is needed to compare other properties, such as reliability and longitudinal response.
doi:10.1186/s12955-014-0156-6
PMCID: PMC4213514  PMID: 25343944
Quality of life; Stable angina; EQ-5D; SF-6D; Utility; China
9.  Reliability and Validity of the Transport and Physical Activity Questionnaire (TPAQ) for Assessing Physical Activity Behaviour 
PLoS ONE  2014;9(9):e107039.
Background
No current validated survey instrument allows a comprehensive assessment of both physical activity and travel behaviours for use in interdisciplinary research on walking and cycling. This study reports on the test-retest reliability and validity of physical activity measures in the transport and physical activity questionnaire (TPAQ).
Methods
The TPAQ assesses time spent in different domains of physical activity and using different modes of transport for five journey purposes. Test-retest reliability of eight physical activity summary variables was assessed using intra-class correlation coefficients (ICC) and Kappa scores for continuous and categorical variables respectively. In a separate study, the validity of three survey-reported physical activity summary variables was assessed by computing Spearman correlation coefficients using accelerometer-derived reference measures. The Bland-Altman technique was used to determine the absolute validity of survey-reported time spent in moderate-to-vigorous physical activity (MVPA).
Results
In the reliability study, ICC for time spent in different domains of physical activity ranged from fair to substantial for walking for transport (ICC = 0.59), cycling for transport (ICC = 0.61), walking for recreation (ICC = 0.48), cycling for recreation (ICC = 0.35), moderate leisure-time physical activity (ICC = 0.47), vigorous leisure-time physical activity (ICC = 0.63), and total physical activity (ICC = 0.56). The proportion of participants estimated to meet physical activity guidelines showed acceptable reliability (k = 0.60). In the validity study, comparison of survey-reported and accelerometer-derived time spent in physical activity showed strong agreement for vigorous physical activity (r = 0.72, p<0.001), fair but non-significant agreement for moderate physical activity (r = 0.24, p = 0.09) and fair agreement for MVPA (r = 0.27, p = 0.05). Bland-Altman analysis showed a mean overestimation of MVPA of 87.6 min/week (p = 0.02) (95% limits of agreement −447.1 to +622.3 min/week).
Conclusion
The TPAQ provides a more comprehensive assessment of physical activity and travel behaviours and may be suitable for wider use. Its physical activity summary measures have comparable reliability and validity to those of similar existing questionnaires.
doi:10.1371/journal.pone.0107039
PMCID: PMC4162566  PMID: 25215510
10.  The construct validity and responsiveness of the EQ-5D, SF-6D and Diabetes Health Profile-18 in type 2 diabetes 
Background
Interest in the measurement of health related quality of life and psychosocial functioning from the patient’s perspective in diabetes mellitus has grown in recent years. The aim of this study is to investigate the psychometric performance of and agreement between the generic EQ-5D and SF-6D and diabetes specific DHP-18 in Type 2 diabetes. This will support the future use of the measures by providing further evidence regarding their psychometric properties and the conceptual overlap between the instruments. The results will inform whether the measures can be used with confidence alongside each other to provide a more holistic profile of people with Type 2 diabetes.
Methods
A large longitudinal dataset (n = 1,184) of people with Type 2 diabetes was used for the analysis. Convergent validity was tested by examining correlations between the measures. Known group validity was tested across a range of clinical and diabetes severity indicators using ANOVA and effect size statistics. Agreement was examined using Bland-Altman plots. Responsiveness was tested by examining floor and ceiling effects and standardised response means.
Results
Correlations between the measures indicates that there is overlap in the constructs assessed (with correlations between 0.1 and 0.7 reported), but there is some level of divergence between the generic and condition specific instruments. Known group validity was generally good but was not consistent across all indicators included (with effect sizes from 0 to 0.74 reported). The EQ-5D and SF-6D displayed a high level of agreement, but there was some disagreement between the generic measures and the DHP-18 dimensions across the severity range. Responsiveness was higher in those who self-reported change in health (SRMs between 0.06 and 0.25).
Conclusions
The psychometric assessment of the relationship between the EQ-5D, SF-6D and DHP-18 shows that all have a level of validity for use in Type 2 diabetes. This suggests that the measures can be used alongside each other to provide a more holistic assessment of with the quality of life impacts of Type 2 diabetes.
doi:10.1186/1477-7525-12-42
PMCID: PMC4304018  PMID: 24661350
EQ-5D; SF-6D; DHP; Psychometrics; Validity
11.  Interobserver reliability and intraobserver reproducibility of powers ratio for assessment of atlanto-occipital junction: comparison of plain radiography and computed tomography 
European Spine Journal  2009;18(4):577-582.
Powers ratio, as assessed on plain radiographs or computed tomography (CT) images, appears to have clinical and prognostic value. To date, the validation of this assessment tool has been limited to a small number of observers at a single site. No study has examined the intraobserver reproducibility and interobserver reliability of the Powers ratio measurement on plain radiographs or CT images among a large cohort of spine surgeons. This type of validation is critical to allow for the broader use of the Powers ratio methodology in research studies and clinical applications. Plain radiographs and spiral CT images of the cervical spine of 32 patients were assessed, and the Powers ratio was determined by five spine surgeons. Each surgeon performed three readings, 7 months apart. In the first round of measurements, the observers used only the Powers’ method of instruction. The second and third measurement sets were obtained after an interactive teaching session on the methodology. The order of the images was altered for the second and third set of measurements. The coefficient of variation (Cv) was calculated to determine the intraobserver repeatability and interobserver reliability for each imaging technique. A Bland-Altman plot was then used to assess the agreement between the two imaging techniques. For interobserver reliability, the mean Cv of the Powers ratio was 9.09 and 4.31% for plain radiographs and CT, respectively. The Cv mean value for intraobserver reproducibility averaged 4.95% (range 1.39–9.08) when CT scans were used and 14.17% (range 7.54–34.30) when plain radiographs were used. For intraobserver reproducibility, the lowest and highest Cv mean value of five raters was 1.39 and 9.08% using CT scans and 7.54 and 34.3% using plain radiographs. The Bland-Altman plot, demonstrated that the two methods were in close agreement on the −0.8 and 0.89% interval for limits of agreement (bias ± 1.96σ). The intraobserver reproducibility and interobserver reliability of Powers ratio measurement was acceptable (<5%) with CT scans but not with plain radiographs. However, despite the statistically inferior reliability and repeatability, the Bland-Altman plot analysis showed that given the −0.8 and 0.89% limits of agreement, the two methods may be used interchangeably in clinical practice.
doi:10.1007/s00586-008-0877-5
PMCID: PMC2899465  PMID: 19165510
Powers ratio; Interobserver reliability; Intraobserver reproducibility; Atlanto-occipital junction
12.  Point-of-Care International Normalized Ratio (INR) Monitoring Devices for Patients on Long-term Oral Anticoagulation Therapy 
Executive Summary
Subject of the Evidence-Based Analysis
The purpose of this evidence based analysis report is to examine the safety and effectiveness of point-of-care (POC) international normalized ratio (INR) monitoring devices for patients on long-term oral anticoagulation therapy (OAT).
Clinical Need: Target Population and Condition
Long-term OAT is typically required by patients with mechanical heart valves, chronic atrial fibrillation, venous thromboembolism, myocardial infarction, stroke, and/or peripheral arterial occlusion. It is estimated that approximately 1% of the population receives anticoagulation treatment and, by applying this value to Ontario, there are an estimated 132,000 patients on OAT in the province, a figure that is expected to increase with the aging population.
Patients on OAT are regularly monitored and their medications adjusted to ensure that their INR scores remain in the therapeutic range. This can be challenging due to the narrow therapeutic window of warfarin and variation in individual responses. Optimal INR scores depend on the underlying indication for treatment and patient level characteristics, but for most patients the therapeutic range is an INR score of between 2.0 and 3.0.
The current standard of care in Ontario for patients on long-term OAT is laboratory-based INR determination with management carried out by primary care physicians or anticoagulation clinics (ACCs). Patients also regularly visit a hospital or community-based facility to provide a venous blood samples (venipuncture) that are then sent to a laboratory for INR analysis.
Experts, however, have commented that there may be under-utilization of OAT due to patient factors, physician factors, or regional practice variations and that sub-optimal patient management may also occur. There is currently no population-based Ontario data to permit the assessment of patient care, but recent systematic reviews have estimated that less that 50% of patients receive OAT on a routine basis and that patients are in the therapeutic range only 64% of the time.
Overview of POC INR Devices
POC INR devices offer an alternative to laboratory-based testing and venipuncture, enabling INR determination from a fingerstick sample of whole blood. Independent evaluations have shown POC devices to have an acceptable level of precision. They permit INR results to be determined immediately, allowing for more rapid medication adjustments.
POC devices can be used in a variety of settings including physician offices, ACCs, long-term care facilities, pharmacies, or by the patients themselves through self-testing (PST) or self-management (PSM) techniques. With PST, patients measure their INR values and then contact their physician for instructions on dose adjustment, whereas with PSM, patients adjust the medication themselves based on pre-set algorithms. These models are not suitable for all patients and require the identification and education of suitable candidates.
Potential advantages of POC devices include improved convenience to patients, better treatment compliance and satisfaction, more frequent monitoring and fewer thromboembolic and hemorrhagic complications. Potential disadvantages of the device include the tendency to underestimate high INR values and overestimate low INR values, low thromboplastin sensitivity, inability to calculate a mean normal PT, and errors in INR determination in patients with antiphospholipid antibodies with certain instruments. Although treatment satisfaction and quality of life (QoL) may improve with POC INR monitoring, some patients may experience increased anxiety or preoccupation with their disease with these strategies.
Evidence-Based Analysis Methods
Research Questions
1. Effectiveness
Does POC INR monitoring improve clinical outcomes in various settings compared to standard laboratory-based testing?
Does POC INR monitoring impact patient satisfaction, QoL, compliance, acceptability, convenience compared to standard laboratory-based INR determination?
Settings include primary care settings with use of POC INR devices by general practitioners or nurses, ACCs, pharmacies, long-term care homes, and use by the patient either for PST or PSM.
2. Cost-effectiveness
What is the cost-effectiveness of POC INR monitoring devices in various settings compared to standard laboratory-based INR determination?
Inclusion Criteria
English-language RCTs, systematic reviews, and meta-analyses
Publication dates: 1996 to November 25, 2008
Population: patients on OAT
Intervention: anticoagulation monitoring by POC INR device in any setting including anticoagulation clinic, primary care (general practitioner or nurse), pharmacy, long-term care facility, PST, PSM or any other POC INR strategy
Minimum sample size: 50 patients Minimum follow-up period: 3 months
Comparator: usual care defined as venipuncture blood draw for an INR laboratory test and management provided by an ACC or individual practitioner
Outcomes: Hemorrhagic events, thromboembolic events, all-cause mortality, anticoagulation control as assessed by proportion of time or values in the therapeutic range, patient reported outcomes including satisfaction, QoL, compliance, acceptability, convenience
Exclusion criteria
Non-RCTs, before-after studies, quasi-experimental studies, observational studies, case reports, case series, editorials, letters, non-systematic reviews, conference proceedings, abstracts, non-English articles, duplicate publications
Studies where POC INR devices were compared to laboratory testing to assess test accuracy
Studies where the POC INR results were not used to guide patient management
Method of Review
A search of electronic databases (OVID MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, The Cochrane Library, and the International Agency for Health Technology Assessment [INAHTA] database) was undertaken to identify evidence published from January 1, 1998 to November 25, 2008. Studies meeting the inclusion criteria were selected from the search results. Reference lists of selected articles were also checked for relevant studies.
Summary of Findings
Five existing reviews and 22 articles describing 17 unique RCTs met the inclusion criteria. Three RCTs examined POC INR monitoring devices with PST strategies, 11 RCTs examined PSM strategies, one RCT included both PST and PSM strategies and two RCTs examined the use of POC INR monitoring devices by health care professionals.
Anticoagulation Control
Anticoagulation control is measured by the percentage of time INR is within the therapeutic range or by the percentage of INR values in the therapeutic range. Due to the differing methodologies and reporting structures used, it was deemed inappropriate to combine the data and estimate whether the difference between groups would be significant. Instead, the results of individual studies were weighted by the number of person-years of observation and then pooled to calculate a summary measure.
Across most studies, patients in the intervention groups tended to have a higher percentage of time and values in the therapeutic target range in comparison to control patients. When the percentage of time in the therapeutic range was pooled across studies and weighted by the number of person-years of observation, the difference between the intervention and control groups was 4.2% for PSM, 7.2% for PST and 6.1% for POC use by health care practitioners. Overall, intervention patients were in the target range 69% of the time and control patients were in the therapeutic target range 64% of the time leading to an overall difference between groups of roughly 5%.
Major Complications and Deaths
There was no statistically significant difference in the number of major hemorrhagic events between patients managed with POC INR monitoring devices and patients managed with standard laboratory testing (OR =0.74; 95% CI: 0.52- 1.04). This difference was non-significant for all POC strategies (PSM, PST, health care practitioner).
Patients managed with POC INR monitoring devices had significantly fewer thromboembolic events than usual care patients (OR =0.52; 95% CI: 0.37 - 0.74). When divided by POC strategy, PSM resulted in significantly fewer thromboembolic events than usual care (OR =0.46.; 95% CI: 0.29 - 0.72). The observed difference in thromboembolic events for PSM remained significant when the analysis was limited to major thromboembolic events (OR =0.40; 95% CI: 0.17 - 0.93), but was non-significant when the analysis was limited to minor thromboembolic events (OR =0.73; 95% CI: 0.08 - 7.01). PST and GP/Nurse strategies did not result in significant differences in thromboembolic events, however there were only a limited number of studies examining these interventions.
No statistically significant difference was observed in the number of deaths between POC intervention and usual care control groups (OR =0.67; 95% CI: 0.41 - 1.10). This difference was non-significant for all POC strategies. Only one study reported on survival with 10-year survival rate of 76.1% in the usual care control group compared to 84.5% in the PSM group (P=0.05).
Summary Results of Meta-Analyses of Major Complications and Deaths in POC INR Monitoring Studies
Patient Satisfaction and Quality of Life
Quality of life measures were reported in eight studies comparing POC INR monitoring to standard laboratory testing using a variety of measurement tools. It was thus not possible to calculate a quantitative summary measure. The majority of studies reported favourable impacts of POC INR monitoring on QoL and found better treatment satisfaction with POC monitoring. Results from a pre-analysis patient and caregiver focus group conducted in Ontario also indicated improved patient QoL with POC monitoring.
Quality of the Evidence
Studies varied with regard to patient eligibility, baseline patient characteristics, follow-up duration, and withdrawal rates. Differential drop-out rates were observed such that the POC intervention groups tended to have a larger number of patients who withdrew. There was a lack of consistency in the definitions and reporting for OAT control and definitions of adverse events. In most studies, the intervention group received more education on the use of warfarin and performed more frequent INR testing, which may have overestimated the effect of the POC intervention. Patient selection and eligibility criteria were not always fully described and it is likely that the majority of the PST/PSM trials included a highly motivated patient population. Lastly, a large number of trials were also sponsored by industry.
Despite the observed heterogeneity among studies, there was a general consensus in findings that POC INR monitoring devices have beneficial impacts on the risk of thromboembolic events, anticoagulation control and patient satisfaction and QoL (ES Table 2).
GRADE Quality of the Evidence on POC INR Monitoring Studies
CI refers to confidence interval; Interv, intervention; OR, odds ratio; RCT, randomized controlled trial.
Economic Analysis
Using a 5-year Markov model, the health and economic outcomes associated with four different anticoagulation management approaches were evaluated:
Standard care: consisting of a laboratory test with a venipuncture blood draw for an INR;
Healthcare staff testing: consisting of a test with a POC INR device in a medical clinic comprised of healthcare staff such as pharmacists, nurses, and physicians following protocol to manage OAT;
PST: patient self-testing using a POC INR device and phoning in results to an ACC or family physician; and
PSM: patient self-managing using a POC INR device and self-adjustment of OAT according to a standardized protocol. Patients may also phone in to a medical office for guidance.
The primary analytic perspective was that of the MOHLTC. Only direct medical costs were considered and the time horizon of the model was five years - the serviceable life of a POC device.
From the results of the economic analysis, it was found that POC strategies are cost-effective compared to traditional INR laboratory testing. In particular, the healthcare staff testing strategy can derive potential cost savings from the use of one device for multiple patients. The PSM strategy, however, seems to be the most cost-effective method i.e. patients are more inclined to adjust their INRs more readily (as opposed to allowing INRs to fall out of range).
Considerations for Ontario Health System
Although the use of POC devices continues to diffuse throughout Ontario, not all OAT patients are suitable or have the ability to practice PST/PSM. The use of POC is currently concentrated at the institutional setting, including hospitals, ACCs, long-term care facilities, physician offices and pharmacies, and is much less commonly used at the patient level. It is, however, estimated that 24% of OAT patients (representing approximately 32,000 patients in Ontario), would be suitable candidates for PST/PSM strategies and willing to use a POC device.
There are several barriers to the use and implementation of POC INR monitoring devices, including factors such as lack of physician familiarity with the devices, resistance to changing established laboratory-based methods, lack of an approach for identifying suitable patients and inadequate resources for effective patient education and training. Issues of cost and insufficient reimbursement strategies may also hinder implementation and effective quality assurance programs would need to be developed to ensure that INR measurements are accurate and precise.
Conclusions
For a select group of patients who are highly motivated and trained, PSM resulted in significantly fewer thromboembolic events compared to conventional laboratory-based INR testing. No significant differences were observed for major hemorrhages or all-cause mortality. PST and GP/Nurse use of POC strategies are just as effective as conventional laboratory-based INR testing for thromboembolic events, major hemorrhages, and all-cause mortality. POC strategies may also result in better OAT control as measured by the proportion of time INR is in the therapeutic range and there appears to be beneficial impacts on patient satisfaction and QoL. The use of POC devices should factor in patient suitability, patient education and training, health system constraints, and affordability.
Keywords
anticoagulants, International Normalized Ratio, point-of-care, self-monitoring, warfarin.
PMCID: PMC3377545  PMID: 23074516
13.  Information from Pharmaceutical Companies and the Quality, Quantity, and Cost of Physicians' Prescribing: A Systematic Review 
PLoS Medicine  2010;7(10):e1000352.
Geoff Spurling and colleagues report findings of a systematic review looking at the relationship between exposure to promotional material from pharmaceutical companies and the quality, quantity, and cost of prescribing. They fail to find evidence of improvements in prescribing after exposure, and find some evidence of an association with higher prescribing frequency, higher costs, or lower prescribing quality.
Background
Pharmaceutical companies spent $57.5 billion on pharmaceutical promotion in the United States in 2004. The industry claims that promotion provides scientific and educational information to physicians. While some evidence indicates that promotion may adversely influence prescribing, physicians hold a wide range of views about pharmaceutical promotion. The objective of this review is to examine the relationship between exposure to information from pharmaceutical companies and the quality, quantity, and cost of physicians' prescribing.
Methods and Findings
We searched for studies of physicians with prescribing rights who were exposed to information from pharmaceutical companies (promotional or otherwise). Exposures included pharmaceutical sales representative visits, journal advertisements, attendance at pharmaceutical sponsored meetings, mailed information, prescribing software, and participation in sponsored clinical trials. The outcomes measured were quality, quantity, and cost of physicians' prescribing. We searched Medline (1966 to February 2008), International Pharmaceutical Abstracts (1970 to February 2008), Embase (1997 to February 2008), Current Contents (2001 to 2008), and Central (The Cochrane Library Issue 3, 2007) using the search terms developed with an expert librarian. Additionally, we reviewed reference lists and contacted experts and pharmaceutical companies for information. Randomized and observational studies evaluating information from pharmaceutical companies and measures of physicians' prescribing were independently appraised for methodological quality by two authors. Studies were excluded where insufficient study information precluded appraisal. The full text of 255 articles was retrieved from electronic databases (7,185 studies) and other sources (138 studies). Articles were then excluded because they did not fulfil inclusion criteria (179) or quality appraisal criteria (18), leaving 58 included studies with 87 distinct analyses. Data were extracted independently by two authors and a narrative synthesis performed following the MOOSE guidelines. Of the set of studies examining prescribing quality outcomes, five found associations between exposure to pharmaceutical company information and lower quality prescribing, four did not detect an association, and one found associations with lower and higher quality prescribing. 38 included studies found associations between exposure and higher frequency of prescribing and 13 did not detect an association. Five included studies found evidence for association with higher costs, four found no association, and one found an association with lower costs. The narrative synthesis finding of variable results was supported by a meta-analysis of studies of prescribing frequency that found significant heterogeneity. The observational nature of most included studies is the main limitation of this review.
Conclusions
With rare exceptions, studies of exposure to information provided directly by pharmaceutical companies have found associations with higher prescribing frequency, higher costs, or lower prescribing quality or have not found significant associations. We did not find evidence of net improvements in prescribing, but the available literature does not exclude the possibility that prescribing may sometimes be improved. Still, we recommend that practitioners follow the precautionary principle and thus avoid exposure to information from pharmaceutical companies.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
A prescription drug is a medication that can be supplied only with a written instruction (“prescription”) from a physician or other licensed healthcare professional. In 2009, 3.9 billion drug prescriptions were dispensed in the US alone and US pharmaceutical companies made US$300 billion in sales revenue. Every year, a large proportion of this revenue is spent on drug promotion. In 2004, for example, a quarter of US drug revenue was spent on pharmaceutical promotion. The pharmaceutical industry claims that drug promotion—visits from pharmaceutical sales representatives, advertisements in journals and prescribing software, sponsorship of meetings, mailed information—helps to inform and educate healthcare professionals about the risks and benefits of their products and thereby ensures that patients receive the best possible care. Physicians, however, hold a wide range of views about pharmaceutical promotion. Some see it as a useful and convenient source of information. Others deny that they are influenced by pharmaceutical company promotion but claim that it influences other physicians. Meanwhile, several professional organizations have called for tighter control of promotional activities because of fears that pharmaceutical promotion might encourage physicians to prescribe inappropriate or needlessly expensive drugs.
Why Was This Study Done?
But is there any evidence that pharmaceutical promotion adversely influences prescribing? Reviews of the research literature undertaken in 2000 and 2005 provide some evidence that drug promotion influences prescribing behavior. However, these reviews only partly assessed the relationship between information from pharmaceutical companies and prescribing costs and quality and are now out of date. In this study, therefore, the researchers undertake a systematic review (a study that uses predefined criteria to identify all the research on a given topic) to reexamine the relationship between exposure to information from pharmaceutical companies and the quality, quantity, and cost of physicians' prescribing.
What Did the Researchers Do and Find?
The researchers searched the literature for studies of licensed physicians who were exposed to promotional and other information from pharmaceutical companies. They identified 58 studies that included a measure of exposure to any type of information directly provided by pharmaceutical companies and a measure of physicians' prescribing behavior. They then undertook a “narrative synthesis,” a descriptive analysis of the data in these studies. Ten of the studies, they report, examined the relationship between exposure to pharmaceutical company information and prescribing quality (as judged, for example, by physician drug choices in response to clinical vignettes). All but one of these studies suggested that exposure to drug company information was associated with lower prescribing quality or no association was detected. In the 51 studies that examined the relationship between exposure to drug company information and prescribing frequency, exposure to information was associated with more frequent prescribing or no association was detected. Thus, for example, 17 out of 29 studies of the effect of pharmaceutical sales representatives' visits found an association between visits and increased prescribing; none found an association with less frequent prescribing. Finally, eight studies examined the relationship between exposure to pharmaceutical company information and prescribing costs. With one exception, these studies indicated that exposure to information was associated with a higher cost of prescribing or no association was detected. So, for example, one study found that physicians with low prescribing costs were more likely to have rarely or never read promotional mail or journal advertisements from pharmaceutical companies than physicians with high prescribing costs.
What Do These Findings Mean?
With rare exceptions, these findings suggest that exposure to pharmaceutical company information is associated with either no effect on physicians' prescribing behavior or with adverse affects (reduced quality, increased frequency, or increased costs). Because most of the studies included in the review were observational studies—the physicians in the studies were not randomly selected to receive or not receive drug company information—it is not possible to conclude that exposure to information actually causes any changes in physician behavior. Furthermore, although these findings provide no evidence for any net improvement in prescribing after exposure to pharmaceutical company information, the researchers note that it would be wrong to conclude that improvements do not sometimes happen. The findings support the case for reforms to reduce negative influence to prescribing from pharmaceutical promotion.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000352.
Wikipedia has pages on prescription drugs and on pharmaceutical marketing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The UK General Medical Council provides guidelines on good practice in prescribing medicines
The US Food and Drug Administration provides information on prescription drugs and on its Bad Ad Program
Healthy Skepticism is an international nonprofit membership association that aims to improve health by reducing harm from misleading health information
The Drug Promotion Database was developed by the World Health Organization Department of Essential Drugs & Medicines Policy and Health Action International Europe to address unethical and inappropriate drug promotion
doi:10.1371/journal.pmed.1000352
PMCID: PMC2957394  PMID: 20976098
14.  A comparison of four different approaches to measuring health utility in depressed patients 
Background
A variety of instruments are used to measure health related quality of life. Few data exist on the performance and agreement of different instruments in a depressed population. The aim of this study was to investigate agreement between, and suitability of, the EQ-5D-3L, EQ-5D Visual Analogue Scale (EQ-5D VAS), SF-6D and SF-12 new algorithm for measuring health utility in depressed patients.
Methods
The intraclass correlation coefficient (ICC) and Bland and Altman approaches were used to assess agreement. Instrument sensitivity was analysed by: (1) plotting utility scores for the instruments against one another; (2) correlating utility scores and depressive symptoms (Beck Depression Inventory (BDI)); and (3) using Tukey’s procedure. Receiver Operating Characteristic (ROC) analysis assessed instrument responsiveness to change. Acceptability was assessed by comparing instrument completion rates.
Results
The overall ICC was 0.57. Bland and Altman plots showed wide limits of agreement for each pair wise comparison, except between the SF-6D and SF-12 new algorithm. Plots of utility scores displayed ’ceiling effects’ in the EQ-5D-3L index and ’floor effects’ in the SF-6D and SF-12 new algorithm. All instruments showed a negative monotonic relationship with BDI, but the EQ-5D-3L index and EQ-5D VAS could not differentiate between depression severity sub-groups. The SF-based instruments were better able to detect changes in health state over time. There was no difference in completion rates of the four instruments.
Conclusions
There was a lack of agreement between utility scores generated by the different instruments. According to the criteria of sensitivity, responsiveness and acceptability that we applied, the SF-6D and SF-12 may be more suitable for the measurement of health related utility in a depressed population than the EQ-5D-3L, which is the instrument currently recommended by NICE.
doi:10.1186/1477-7525-11-81
PMCID: PMC3663709  PMID: 23659557
Depression; EQ-5D; SF-6D; Health related utility; QALYs
15.  Patient-Nurse Interrater Reliability and Agreement of the Richards-Campbell Sleep Questionnaire 
Background
The Richards-Campbell Sleep Questionnaire (RCSQ) is a simple, validated survey instrument for measuring sleep quality in intensive care patients. Although both patients and nurses can complete the RCSQ, interrater reliability and agreement have not been fully evaluated.
Objectives
To evaluate patient-nurse interrater reliability and agreement of the RCSQ in a medical intensive care unit.
Methods
The instrument included 5 RCSQ items plus a rating of nighttime noise, each scored by using a 100-mm visual analogue scale. The mean of the 5 RCSQ items comprised a total score. For 24 days, the night-shift nurses in the medical intensive care unit completed the RCSQ regarding their patients’ overnight sleep quality. Upon awakening, all conscious, nondelirious patients completed the RCSQ. Neither nurses nor patients knew the others’ ratings. Patient-nurse agreement was evaluated by using mean differences and Bland-Altman plots. Reliability was evaluated by using intraclass correlation coefficients.
Results
Thirty-three patients had a total of 92 paired patient-nurse assessments. For all RCSQ items, nurses’ scores were higher (indicating “better” sleep) than patients’ scores, with significantly higher ratings for sleep depth (mean [SD], 67 [21] vs 48 [35], P = .001), awakenings (68 [21] vs 60 [33], P = .03), and total score (68 [19] vs 57 [28], P = .01). The Bland-Altman plots also showed that nurses’ ratings were generally higher than patients’ ratings. Intraclass correlation coefficients of patient-nurse pairs ranged from 0.13 to 0.49 across the survey questions.
Conclusions
Patient-nurse interrater reliability on the RCSQ was “slight” to “moderate,” with nurses tending to overestimate patients’ perceived sleep quality.
doi:10.4037/ajcc2012111
PMCID: PMC3667655  PMID: 22751369
16.  Assessment of nutritional status and health-related quality of life before and after liver transplantation 
BMC Gastroenterology  2015;15:6.
Background
Patients with chronic liver disease frequently suffer from malnutrition, together with a decline in their health-related quality of life.
This study was carried out with the aim of evaluating the nutritional status, complications of medical and surgical care, anxiety, health-related quality of life and dependence level on basic and instrumental activities of daily living in pre- and post-liver transplant patients.
Methods/Design
A prospective observational study with follow-up of patients on the waiting list for liver transplants who subsequently received a transplant at the University Hospital Complex in A Coruña during the period 2012–2014 (n = 110).
All the patients will be followed-up for a maximum of 6 months. For survivors, assessments will be re-evaluated at one, three and six months post- transplant.
Informed consent of the patient and ethical review board approval was obtained (Code: 2010/081 and 2010/082).
The following variables will be studied: socio-demographic data, reason for the transplant, comorbidity (Charlson Score), analytical parameters, time on transplant waiting list and post-transplant complications. A trained nurse will evaluate the following for each patient: nutritional indices, anthropometric variables and handgrip strength. Validated questionnaires will be used to determine the patients’ nutritional status (Subjective Global Assessment), anxiety (STAI questionnaire), Health-Related Quality of Life (LDQoL 1.0 questionnaire), dependence (Barthel Index and Lawton-Brody Scale), nursing diagnoses (NANDA) and post-transplant quality indicators.
Multiple linear/logistic regression models will be used to identify variables associated with the events of interest. Changes in nutritional status, quality of life and dependence over time will be analysed with linear mixed-effects regression models.
Actuarial survival analysis using Kaplan-Meier curves, Cox regression and competitive risk will be performed
Concordance between the different scores that assess nutritional status and interobserver agreement regarding nursing diagnoses will be studied using the statistical Kappa index and Bland Altman method.
Discussion
The risk of malnutrition can be considered as a possible prognostic factor in transplant outcomes, associated with anxiety, health-related quality of life and dependence.
For this reason we consider interesting to perform a prospective follow-up study of patients who require a transplant to survive, studying their nutritional status and health-related quality of life.
Electronic supplementary material
The online version of this article (doi:10.1186/s12876-015-0232-3) contains supplementary material, which is available to authorized users.
doi:10.1186/s12876-015-0232-3
PMCID: PMC4310167  PMID: 25608608
Liver transplantation; Nutrition status; Anxiety; Quality of life; Dependence
17.  Comparison of Retinal Nerve Fiber Layer Measurements Using Time Domain and Spectral Domain Optical Coherent Tomography 
Ophthalmology  2009;116(7):1271-1277.
Purpose
To determine the agreement between peripapillary retinal nerve fiber layer (RNFL) thickness measurements from Stratus time domain optical coherence tomography (OCT) and Cirrus spectral domain OCT (Carl Zeiss Meditec, Dublin, CA) in normal subjects and glaucoma patients.
Design
Evaluation of diagnostic test or technology.
Participants
One hundred thirty eyes from 130 normal subjects and glaucoma patients were analyzed. The subjects were divided into Normal (n=29), Glaucoma Suspect (n=12), Mild Glaucoma (n=41), Moderate Glaucoma (n=18), and Severe Glaucoma (n=30) by visual field criteria.
Methods
Peripapillary RNFL thickness was measured with Stratus Fast RNFL and Cirrus 200 x 200 Optic Disc Scan on the same day in one eye of each subject to determine agreement. Two operators used the same instruments for all scans.
Main Outcome Measures
Student paired t-testing, Pearson’s correlation coefficient, and Bland-Altman analysis of RNFL thickness measurements.
Results
The average age of the glaucoma group was significantly older at 68.3±12.3 years versus 55.7±12.1 years. The average RNFL thickness (mean ± SD, in μm) for each severity group with Stratus OCT was 99.4 ± 13.2, 94.5 ± 15.0, 79.0 ± 14.5, 62.7 ± 10.2, and 51.0 ± 8.9, corresponding to normal, suspects, mild, moderate, and severe subjects, respectively. For Cirrus OCT, the corresponding measurements were 92.0 ± 10.8, 88.1 ± 13.5, 73.3 ± 11.8, 60.9 ± 8.3, and 55.3 ± 6.6. All Stratus-Cirrus differences were statistically significant by paired t-testing (p < 0.001) except for the moderate group (p = 0.11). For average RNFL, there was a highly significant linear relationship between Stratus minus Cirrus difference and RNFL thickness as well (p < 0.001). Bland-Altman plots showed that the systematic difference of Stratus measurements are smaller than Cirrus at thinner RNFL values but larger at thicker RNFL measurements.
Conclusions
RNFL thickness measurements between Stratus OCT and Cirrus OCT cannot be directly compared. Clinicians should be aware that measurements are generally higher with Stratus than Cirrus except when the RNFL is very thin as in severe glaucoma. This difference must be taken into account if comparing measurements made with a Stratus instrument to those of a Cirrus instrument.
doi:10.1016/j.ophtha.2008.12.032
PMCID: PMC2713355  PMID: 19395086
18.  Pitfalls in the statistical examination and interpretation of the correspondence between physician and patient satisfaction ratings and their relevance for shared decision making research 
Background
The correspondence of satisfaction ratings between physicians and patients can be assessed on different dimensions. One may examine whether they differ between the two groups or focus on measures of association or agreement. The aim of our study was to evaluate methodological difficulties in calculating the correspondence between patient and physician satisfaction ratings and to show the relevance for shared decision making research.
Methods
We utilised a structured tool for cardiovascular prevention (arriba™) in a pragmatic cluster-randomised controlled trial. Correspondence between patient and physician satisfaction ratings after individual primary care consultations was assessed using the Patient Participation Scale (PPS). We used the Wilcoxon signed-rank test, the marginal homogeneity test, Kendall's tau-b, weighted kappa, percentage of agreement, and the Bland-Altman method to measure differences, associations, and agreement between physicians and patients.
Results
Statistical measures signal large differences between patient and physician satisfaction ratings with more favourable ratings provided by patients and a low correspondence regardless of group allocation. Closer examination of the raw data revealed a high ceiling effect of satisfaction ratings and only slight disagreement regarding the distributions of differences between physicians' and patients' ratings.
Conclusions
Traditional statistical measures of association and agreement are not able to capture a clinically relevant appreciation of the physician-patient relationship by both parties in skewed satisfaction ratings. Only the Bland-Altman method for assessing agreement augmented by bar charts of differences was able to indicate this.
Trial registration
ISRCTN: ISRCT71348772
doi:10.1186/1471-2288-11-71
PMCID: PMC3120809  PMID: 21592337
19.  Comparison of 3 Body Size Descriptors in Critically Ill Obese Children and Adolescents: Implications for Medication Dosing 
OBJECTIVE: To compare 3 methods of weight determination for medication dose calculations in obese children and to discuss feasibility for use in routine care.
METHODS: This was a patient safety and quality improvement study evaluating patients (2–19 years old) admitted to the pediatric intensive care unit during a 13-month period (July 2010–July 2011). Patients identified as obese (≥95th percentile body mass index [BMI] for age), including severely obese (≥99th percentile BMI for age), were included in the weight method comparison portion of this study. Lean body mass estimations, using equations derived by the Peters and Foster methods, were compared to ideal body weight estimates by using the BMI method. Absolute differences between values generated by the 3 methods, intraclass correlation (ICC), and Bland-Altman plots were calculated.
RESULTS: A total of 1369 patients met initial criteria; 176 met criteria for the dosing weight comparison (age ± SD = 9.28 ± 5 years; actual weight ± SD = 55.5 ± 33.9 kg; 46% female). Sixty were severely obese and 116 were obese. Mean ICC between methods was 0.968 (95% Confidence interval (CI): 0.959, 0.975). The Peters method estimated higher weights than the Foster or BMI method. Bland-Altman plots illustrated good agreement between methods in children with weight below 50 kg, but decreased agreement above 50 kg, which was influenced by sex.
CONCLUSIONS: All methods demonstrated strong correlation and acceptable agreement in children below 50 kg. Systematic biases were identified in children above 50 kg where variance was higher. The BMI method was least complex to calculate and the most feasible method for daily use.
doi:10.5863/1551-6776-19.2.103
PMCID: PMC4093662  PMID: 25024670
body mass index; lean body mass; ideal body weight; obesity; pediatric
20.  Validation of the Arabic Version of the Epworth Sleepiness Scale in Oman 
Oman Medical Journal  2013;28(6):454-456.
Objectives
The Epworth sleepiness scale is a self-administered eight-item questionnaire that was developed as a tool to measure subjective sleepiness in adults. The validity of the Epworth sleepiness scale has been validated and tested in different populations and ethnic groups. However, it has yet to be validated or tested in an Omani or other Arabic speaking population. Thus, the aim of this study is to test the validity and reproducibility of the Epworth sleepiness scale in an Omani population.
Methods
Subjects were recruited from the general population and were asked to participate in the study. The study enrolled 97 Omani volunteers and was conducted between May and October 2008. An Arabic version of the original English questionnaire was used. The study was approved by the Research and Ethics committee of the institution. Lin’s concordance correlation coefficient along with Bland-Altman plots were used to test the agreement between the Arabic and English versions of the Epworth sleepiness scale.
Results
The study included a total of 37 males (38%) and 60 females (62%) with age ranging between 18-75 years. Concordance correlation results revealed a substantial concordance (RhoC) of 0.914, but one that does not approach 1 (95% CI: 0.881, 0.947). This results from both lack of perfect correlation (Pearson’s r=0.914) and bias (C_b = 1.000). The Bland and Altman’s limits-of-agreement measured at 0.000 (95% CI: -2.684, 2.684), indicating insignificant average departure from agreement between the two versions of the Epworth sleepiness scale.
Conclusion
The results indicate agreement between the two versions of ESS (English and the Arabic).
doi:10.5001/omj.2013.126
PMCID: PMC3815862  PMID: 24223253
Sleepiness; Arabic ESS; Bland Altman
21.  What Do Evaluation Instruments Tell Us About the Quality of Complementary Medicine Information on the Internet? 
Background
Developers of health information websites aimed at consumers need methods to assess whether their website is of “high quality.” Due to the nature of complementary medicine, website information is diverse and may be of poor quality. Various methods have been used to assess the quality of websites, the two main approaches being (1) to compare the content against some gold standard, and (2) to rate various aspects of the site using an assessment tool.
Objective
We aimed to review available evaluation instruments to assess their performance when used by a researcher to evaluate websites containing information on complementary medicine and breast cancer. In particular, we wanted to see if instruments used the same criteria, agreed on the ranking of websites, were easy to use by a researcher, and if use of a single tool was sufficient to assess website quality.
Methods
Bibliographic databases, search engines, and citation searches were used to identify evaluation instruments. Instruments were included that enabled users with no subject knowledge to make an objective assessment of a website containing health information. The elements of each instrument were compared to nine main criteria defined by a previous study. Google was used to search for complementary medicine and breast cancer sites. The first six results and a purposive six from different origins (charities, sponsored, commercial) were chosen. Each website was assessed using each tool, and the percentage of criteria successfully met was recorded. The ranking of the websites by each tool was compared. The use of the instruments by others was estimated by citation analysis and Google searching.
Results
A total of 39 instruments were identified, 12 of which met the inclusion criteria; the instruments contained between 4 and 43 questions. When applied to 12 websites, there was agreement of the rank order of the sites with 10 of the instruments. Instruments varied in the range of criteria they assessed and in their ease of use.
Conclusions
Comparing the content of websites against a gold standard is time consuming and only feasible for very specific advice. Evaluation instruments offer gateway providers a method to assess websites. The checklist approach has face validity when results are compared to the actual content of “good” and “bad” websites. Although instruments differed in the range of items assessed, there was fair agreement between most available instruments. Some were easier to use than others, but these were not necessarily the instruments most widely used to date. Combining some of the better features of instruments to provide fewer, easy-to-use methods would be beneficial to gateway providers.
doi:10.2196/jmir.961
PMCID: PMC2483844  PMID: 18244894
Consumer Health Informatics; Internet; quality of information; complementary medicine
22.  Comparison between EQ-5D and SF-6D Utility in Rural Residents of Jiangsu Province, China 
PLoS ONE  2012;7(7):e41550.
Background
The SF-6D and EQ-5D are widely used generic index measures as health-related quality of life. We assessed within-subject agreement between SF-6D and EQ-5D utilities with different preference weights, and their validities in measuring Chinese rural residents, before and after standardization scores.
Methodology/Principal Findings
Rural residents over 18 years old were interviewed using EQ-5D and SF-6D in Jiangsu Province, China. EQ-5D utility-scoring algorithms were used from three conversion tables from the United Kingdom, Japan, and the United States. Validities, Sensitivity and agreement between instruments were computed and compared. Factors affecting utility difference were explored with multiple liner regression models. Scores with standardization intervals of 0–1 in the two instruments were analyzed by the use of the above methods again. In 929 respondents, relative efficiency statistic and receiver operating characteristic curves analysis showed SF-6D to be the more efficient, followed by the EQ-5D model in Japan weights. Bland–Altman plot analysis showed paired SF-6D/EQ-5D in UK weights had better agreement. Though some risk factors were found, multiple liner regression demonstrated most coefficients were weaker than 0.2, and all R2 values were less than 0.06. Standardization did not significantly influence these results except scores' value.
Conclusions/Significance
SF-6D and next EQ-5D in Japan weights could be used for Chinese rural residents. Further research with larger sample size of population is needed to establish and determine the feasibility of standardization score.
doi:10.1371/journal.pone.0041550
PMCID: PMC3407238  PMID: 22848526
23.  Reporting and Methods in Clinical Prediction Research: A Systematic Review 
PLoS Medicine  2012;9(5):e1001221.
Walter Bouwmeester and colleagues investigated the reporting and methods of prediction studies in 2008, in six high-impact general medical journals, and found that the majority of prediction studies do not follow current methodological recommendations.
Background
We investigated the reporting and methods of prediction studies, focusing on aims, designs, participant selection, outcomes, predictors, statistical power, statistical methods, and predictive performance measures.
Methods and Findings
We used a full hand search to identify all prediction studies published in 2008 in six high impact general medical journals. We developed a comprehensive item list to systematically score conduct and reporting of the studies, based on recent recommendations for prediction research. Two reviewers independently scored the studies. We retrieved 71 papers for full text review: 51 were predictor finding studies, 14 were prediction model development studies, three addressed an external validation of a previously developed model, and three reported on a model's impact on participant outcome. Study design was unclear in 15% of studies, and a prospective cohort was used in most studies (60%). Descriptions of the participants and definitions of predictor and outcome were generally good. Despite many recommendations against doing so, continuous predictors were often dichotomized (32% of studies). The number of events per predictor as a measure of statistical power could not be determined in 67% of the studies; of the remainder, 53% had fewer than the commonly recommended value of ten events per predictor. Methods for a priori selection of candidate predictors were described in most studies (68%). A substantial number of studies relied on a p-value cut-off of p<0.05 to select predictors in the multivariable analyses (29%). Predictive model performance measures, i.e., calibration and discrimination, were reported in 12% and 27% of studies, respectively.
Conclusions
The majority of prediction studies in high impact journals do not follow current methodological recommendations, limiting their reliability and applicability.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
There are often times in our lives when we would like to be able to predict the future. Is the stock market going to go up, for example, or will it rain tomorrow? Being able predict future health is also important, both to patients and to physicians, and there is an increasing body of published clinical “prediction research.” Diagnostic prediction research investigates the ability of variables or test results to predict the presence or absence of a specific diagnosis. So, for example, one recent study compared the ability of two imaging techniques to diagnose pulmonary embolism (a blood clot in the lungs). Prognostic prediction research investigates the ability of various markers to predict future outcomes such as the risk of a heart attack. Both types of prediction research can investigate the predictive properties of patient characteristics, single variables, tests, or markers, or combinations of variables, tests, or markers (multivariable studies). Both types of prediction research can include also studies that build multivariable prediction models to guide patient management (model development), or that test the performance of models (validation), or that quantify the effect of using a prediction model on patient and physician behaviors and outcomes (impact assessment).
Why Was This Study Done?
With the increase in prediction research, there is an increased interest in the methodology of this type of research because poorly done or poorly reported prediction research is likely to have limited reliability and applicability and will, therefore, be of little use in patient management. In this systematic review, the researchers investigate the reporting and methods of prediction studies by examining the aims, design, participant selection, definition and measurement of outcomes and candidate predictors, statistical power and analyses, and performance measures included in multivariable prediction research articles published in 2008 in several general medical journals. In a systematic review, researchers identify all the studies undertaken on a given topic using a predefined set of criteria and systematically analyze the reported methods and results of these studies.
What Did the Researchers Do and Find?
The researchers identified all the multivariable prediction studies meeting their predefined criteria that were published in 2008 in six high impact general medical journals by browsing through all the issues of the journals (a hand search). They then scored the methods and reporting of each study using a comprehensive item list based on recent recommendations for the conduct of prediction research (for example, the reporting recommendations for tumor marker prognostic studies—the REMARK guidelines). Of 71 retrieved studies, 51 were predictor finding studies, 14 were prediction model development studies, three externally validated an existing model, and three reported on a model's impact on participant outcome. Study design, participant selection, definitions of outcomes and predictors, and predictor selection were generally well reported, but other methodological and reporting aspects of the studies were suboptimal. For example, despite many recommendations, continuous predictors were often dichotomized. That is, rather than using the measured value of a variable in a prediction model (for example, blood pressure in a cardiovascular disease prediction model), measurements were frequently assigned to two broad categories. Similarly, many of the studies failed to adequately estimate the sample size needed to minimize bias in predictor effects, and few of the model development papers quantified and validated the proposed model's predictive performance.
What Do These Findings Mean?
These findings indicate that, in 2008, most of the prediction research published in high impact general medical journals failed to follow current guidelines for the conduct and reporting of clinical prediction studies. Because the studies examined here were published in high impact medical journals, they are likely to be representative of the higher quality studies published in 2008. However, reporting standards may have improved since 2008, and the conduct of prediction research may actually be better than this analysis suggests because the length restrictions that are often applied to journal articles may account for some of reporting omissions. Nevertheless, despite some encouraging findings, the researchers conclude that the poor reporting and poor methods they found in many published prediction studies is a cause for concern and is likely to limit the reliability and applicability of this type of clinical research.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001221.
The EQUATOR Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines including the REMARK recommendations (in English and Spanish)
A video of a presentation by Doug Altman, one of the researchers of this study, on improving the reporting standards of the medical evidence base, is available
The Cochrane Prognosis Methods Group provides additional information on the methodology of prognostic research
doi:10.1371/journal.pmed.1001221
PMCID: PMC3358324  PMID: 22629234
24.  The proxy problem anatomized: child-parent disagreement in health related quality of life reports of chronically ill adolescents 
Background
Discrepancy between self-reports and parent-proxy reports of adolescent health-related quality of life (HRQoL) has been repeatedly acknowledged in the literature as the proxy problem. However, little is known about the extent and direction of this discrepancy. The purpose of this study is to explore to what extent and in what direction HRQoL self-reports of adolescents with chronic conditions and those of their parents differ.
Methods
A cross-sectional survey was conducted among adolescents suffering from chronic conditions and their parents. Socio-demographic and disease-related characteristics were collected and information about consequences of the chronic condition was assessed. HRQoL was measured with KIDSCREEN-10 and DISABKIDS condition generic measure (DCGM-10). Agreement was analysed through defining a threshold of agreement based on half of the standard deviation of the HRQoL score with the highest variance. Agreement occurred if the difference between adolescent and parent scores was less than or equal to half of the standard deviation. Intra-class correlation coefficients and Bland-Altman plots were also computed. The characteristics associated with direction of disagreement were statistically tested with one-way ANOVA and Chi-square tests.
Results
584 paired HRQoL scores were obtained. Ratings from both adolescents and parents were high, compared to European norm data. Differences between adolescents and parents were statistically significant, yet relatively small. Disagreement existed in both directions: in 24.5% (KIDSCREEN-10) and 16.8% (DCGM-10) of the cases adolescents rated their HRQoL lower than did their parent, while in 32.2% (KIDSCREEN-10) and 31.7% (DCGM-10) of the cases the opposite was true. Adolescent's age, educational level and type of education, parent's educational level, number of hospital admissions and several other disease-related factors influenced direction of disagreement.
Conclusions
In a reasonable proportion of cases the adolescent and parent agreed on the adolescent's HRQoL (43-51% of the cases) and most disagreement tended to be minor. Thus, the proxy problem may be smaller than presented in the literature and its extent may differ per population. As adolescents are expected to become partners in their own health care, it is recommended to focus on adolescents' own perceptions of HRQoL.
doi:10.1186/1477-7525-10-10
PMCID: PMC3299605  PMID: 22276974
Adolescent; Chronic Illness; Self Report; Quality of Life; Parent; Proxy Report; KIDSCREEN-10; DCGM-10
25.  Behavioural Interventions for Type 2 Diabetes 
Executive Summary
In June 2008, the Medical Advisory Secretariat began work on the Diabetes Strategy Evidence Project, an evidence-based review of the literature surrounding strategies for successful management and treatment of diabetes. This project came about when the Health System Strategy Division at the Ministry of Health and Long-Term Care subsequently asked the secretariat to provide an evidentiary platform for the Ministry’s newly released Diabetes Strategy.
After an initial review of the strategy and consultation with experts, the secretariat identified five key areas in which evidence was needed. Evidence-based analyses have been prepared for each of these five areas: insulin pumps, behavioural interventions, bariatric surgery, home telemonitoring, and community based care. For each area, an economic analysis was completed where appropriate and is described in a separate report.
To review these titles within the Diabetes Strategy Evidence series, please visit the Medical Advisory Secretariat Web site, http://www.health.gov.on.ca/english/providers/program/mas/mas_about.html,
Diabetes Strategy Evidence Platform: Summary of Evidence-Based Analyses
Continuous Subcutaneous Insulin Infusion Pumps for Type 1 and Type 2 Adult Diabetics: An Evidence-Based Analysis
Behavioural Interventions for Type 2 Diabetes: An Evidence-Based Analysis
Bariatric Surgery for People with Diabetes and Morbid Obesity: An Evidence-Based Summary
Community-Based Care for the Management of Type 2 Diabetes: An Evidence-Based Analysis
Home Telemonitoring for Type 2 Diabetes: An Evidence-Based Analysis
Application of the Ontario Diabetes Economic Model (ODEM) to Determine the Cost-effectiveness and Budget Impact of Selected Type 2 Diabetes Interventions in Ontario
Objective
The objective of this report is to determine whether behavioural interventions1 are effective in improving glycemic control in adults with type 2 diabetes.
Background
Diabetes is a serious chronic condition affecting millions of people worldwide and is the sixth leading cause of death in Canada. In 2005, an estimated 8.8% of Ontario’s population had diabetes, representing more than 816,000 Ontarians. The direct health care cost of diabetes was $1.76 billion in the year 2000 and is projected to rise to a total cost of $3.14 billion by 2016. Much of this cost arises from the serious long-term complications associated with the disease including: coronary heart disease, stroke, adult blindness, limb amputations and kidney disease.
Type 2 diabetes accounts for 90–95% of diabetes and while type 2 diabetes is more prevalent in people aged 40 years and older, prevalence in younger populations is increasing due to a rise in obesity and physical inactivity in children.
Data from the United Kingdom Prospective Diabetes Study (UKPDS) has shown that tight glycemic control can significantly reduce the risk of developing serious complications in type 2 diabetics. Despite physicians’ and patients’ knowledge of the importance of glycemic control, Canadian data has shown that only 38% of patients with diabetes have HbA1C levels in the optimal range of 7% or less. This statistic highlights the complexities involved in the management of diabetes, which is characterized by extensive patient involvement in addition to the support provided by physicians. An enormous demand is, therefore, placed on patients to self-manage the physical, emotional and psychological aspects of living with a chronic illness.
Despite differences in individual needs to cope with diabetes, there is general agreement for the necessity of supportive programs for patient self-management. While traditional programs were didactic models with the goal of improving patients’ knowledge of their disease, current models focus on behavioural approaches aimed at providing patients with the skills and strategies required to promote and change their behaviour.
Several meta-analyses and systematic reviews have demonstrated improved health outcomes with self-management support programs in type 2 diabetics. They have all, however, either looked at a specific component of self-management support programs (i.e. self-management education) or have been conducted in specific populations. Most reviews are also qualitative and do not clearly define the interventions of interest, making findings difficult to interpret. Moreover, heterogeneity in the interventions has led to conflicting evidence on the components of effective programs. There is thus much uncertainty regarding the optimal design and delivery of these programs by policymakers.
Evidence-Based Analysis of Effectiveness
Research Questions
Are behavioural interventions effective in improving glycemic control in adults with type 2 diabetes?
Is the effectiveness of the intervention impacted by intervention characteristics (e.g. delivery of intervention, length of intervention, mode of instruction, interventionist etc.)?
Inclusion Criteria
English Language
Published between January 1996 to August 2008
Type 2 diabetic adult population (>18 years)
Randomized controlled trials (RCTs)
Systematic reviews, or meta-analyses
Describing a multi-faceted self-management support intervention as defined by the 2007 Self-Management Mapping Guide (1)
Reporting outcomes of glycemic control (HbA1c) with extractable data
Studies with a minimum of 6-month follow up
Exclusion Criteria
Studies with a control group other than usual care
Studies with a sample size <30
Studies without a clearly defined intervention
Outcomes of Interest
Primary outcome: glycemic control (HbA1c)
Secondary outcomes: systolic blood pressure (SBP) control, lipid control, change in smoking status, weight change, quality of life, knowledge, self-efficacy, managing psychosocial aspects of diabetes, assessing dissatisfaction and readiness to change, and setting and achieving diabetes goals.
Search Strategy
A search was performed in OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), The Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published between January 1996 and August 2008. Abstracts were reviewed by a single author and studies meeting the inclusion criteria outlined above were obtained. Data on population characteristics, glycemic control outcomes, and study design were extracted. Reference lists were also checked for relevant studies. The quality of the evidence was assessed as being either high, moderate, low, or very low according to the GRADE methodology.
Summary of Findings
The search identified 638 citations published between 1996 and August 2008, of which 12 met the inclusion criteria and one was a meta-analysis (Gary et al. 2003). The remaining 11 studies were RCTs (9 were used in the meta-analysis) and only one was defined as small (total sample size N=47).
Summary of Participant Demographics across studies
A total of 2,549 participants were included in the 11 identified studies. The mean age of participants reported was approximately 58 years and the mean duration of diabetes was approximately 6 years. Most studies reported gender with a mean percentage of females of approximately 67%. Of the eleven studies, two focused only on women and four included only Hispanic individuals. All studies evaluated type 2 diabetes patients exclusively.
Study Characteristics
The studies were conducted between 2002 and 2008. Approximately six of 11 studies were carried out within the USA, with the remaining studies conducted in the UK, Sweden, and Israel (sample size ranged from 47 to 824 participants). The quality of the studies ranged from moderate to low with four of the studies being of moderate quality and the remaining seven of low quality (based on the Consort Checklist). Differences in quality were mainly due to methodological issues such as inadequate description of randomization, sample size calculation allocation concealment, blinding and uncertainty of the use of intention-to-treat (ITT) analysis. Patients were recruited from several settings: six studies from primary or general medical practices, three studies from the community (e.g. via advertisements), and two from outpatient diabetes clinics. A usual care control group was reported in nine of 11 of the studies and two studies reported some type of minimal diabetes care in addition to usual care for the control group.
Intervention Characteristics
All of the interventions examined in the studies were mapped to the 2007 Self-management Mapping Guide. The interventions most often focused on problem solving, goal setting and encouraging participants to engage in activities that protect and promote health (e.g. modifying behaviour, change in diet, and increase physical activity). All of the studies examined comprehensive interventions targeted at least two self-care topics (e.g. diet, physical activity, blood glucose monitoring, foot care, etc.). Despite the homogeneity in the aims of the interventions, there was substantial clinical heterogeneity in other intervention characteristics such as duration, intensity, setting, mode of delivery (group vs. individual), interventionist, and outcomes of interest (discussed below).
Duration, Intensity and Mode of Delivery
Intervention durations ranged from 2 days to 1 year, with many falling into the range of 6 to 10 weeks. The rest of the interventions fell into categories of ≤ 2 weeks (2 studies), 6 months (2 studies), or 1 year (3 studies). Intensity of the interventions varied widely from 6 hours over 2 days, to 52 hours over 1 year; however, the majority consisted of interventions of 6 to 15 hours. Both individual and group sessions were used to deliver interventions. Group counselling was used in five studies as a mode of instruction, three studies used both individual and group sessions, and one study used individual sessions as its sole mode of instruction. Three studies also incorporated the use of telephone support as part of the intervention.
Interventionists and Setting
The following interventionists were reported (highest to lowest percentage, categories not mutually exclusive): nurse (36%), dietician (18%), physician (9%), pharmacist (9%), peer leader/community worker (18%), and other (36%). The ‘other’ category included interventionists such as consultants and facilitators with unspecified professional backgrounds. The setting of most interventions was community-based (seven studies), followed by primary care practices (three studies). One study described an intervention conducted in a pharmacy setting.
Outcomes
Duration of follow up of the studies ranged from 6 months to 8 years with a median follow-up duration of 12 months. Nine studies followed up patients at a minimum of two time points. Despite clear reporting of outcomes at follow up time points, there was poor reporting on whether the follow up was measured from participant entry into study or from end of intervention. All studies reported measures of glycemic control, specifically HbA1c levels. BMI was measured in five studies, while body weight was reported in two studies. Cholesterol was examined in three studies and blood pressure reduction in two. Smoking status was only examined in one of the studies. Additional outcomes examined in the trials included patient satisfaction, quality of life, diabetes knowledge, diabetes medication reduction, and behaviour modification (i.e. daily consumption of fruits/vegetables, exercise etc). Meta-analysis of the studies identified a moderate but significant reduction in HbA1c levels -0.44% 95%CI: -0.60, -0.29) for behavioural interventions in comparison to usual care for adults with type 2 diabetes. Subgroup analyses suggested the largest effects in interventions which were of at least duration and interventions in diabetics with higher baseline HbA1c (≥9.0). The quality of the evidence according to GRADE for the overall estimate was moderate and the quality of evidence for the subgroup analyses was identified as low.
Summary of Meta-Analysis of Studies Investigating the Effectiveness of Behavioural Interventions on HbA1c in Patients with Type 2 Diabetes.
Based on one study
Conclusions
Based on moderate quality evidence, behavioural interventions as defined by the 2007 Self-management mapping guide (Government of Victoria, Australia) produce a moderate reduction in HbA1c levels in patients with type 2 diabetes compared with usual care.
Based on low quality evidence, the interventions with the largest effects are those:
- in diabetics with higher baseline HbA1c (≥9.0)
- in which the interventions were of at least 1 year in duration
PMCID: PMC3377516  PMID: 23074526

Results 1-25 (1479143)