PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1957822)

Clipboard (0)
None

Related Articles

1.  Polysomnography in Patients With Obstructive Sleep Apnea 
Executive Summary
Objective
The objective of this health technology policy assessment was to evaluate the clinical utility and cost-effectiveness of sleep studies in Ontario.
Clinical Need: Target Population and Condition
Sleep disorders are common and obstructive sleep apnea (OSA) is the predominant type. Obstructive sleep apnea is the repetitive complete obstruction (apnea) or partial obstruction (hypopnea) of the collapsible part of the upper airway during sleep. The syndrome is associated with excessive daytime sleepiness or chronic fatigue. Several studies have shown that OSA is associated with hypertension, stroke, and other cardiovascular disorders; many researchers believe that these cardiovascular disorders are consequences of OSA. This has generated increasing interest in recent years in sleep studies.
The Technology Being Reviewed
There is no ‘gold standard’ for the diagnosis of OSA, which makes it difficult to calibrate any test for diagnosis. Traditionally, polysomnography (PSG) in an attended setting (sleep laboratory) has been used as a reference standard for the diagnosis of OSA. Polysomnography measures several sleep variables, one of which is the apnea-hypopnea index (AHI) or respiratory disturbance index (RDI). The AHI is defined as the sum of apneas and hypopneas per hour of sleep; apnea is defined as the absence of airflow for ≥ 10 seconds; and hypopnea is defined as reduction in respiratory effort with ≥ 4% oxygen desaturation. The RDI is defined as the sum of apneas, hypopneas, and abnormal respiratory events per hour of sleep. Often the two terms are used interchangeably. The AHI has been widely used to diagnose OSA, although with different cut-off levels, the basis for which are often unclear or arbitrarily determined. Generally, an AHI of more than five events per hour of sleep is considered abnormal and the patient is considered to have a sleep disorder. An abnormal AHI accompanied by excessive daytime sleepiness is the hallmark for OSA diagnosis. For patients diagnosed with OSA, continuous positive airway pressure (CPAP) therapy is the treatment of choice. Polysomnography may also used for titrating CPAP to individual needs.
In January 2005, the College of Physicians and Surgeons of Ontario published the second edition of Independent Health Facilities: Clinical Practice Parameters and Facility Standards: Sleep Medicine, commonly known as “The Sleep Book.” The Sleep Book states that OSA is the most common primary respiratory sleep disorder and a full overnight sleep study is considered the current standard test for individuals in whom OSA is suspected (based on clinical signs and symptoms), particularly if CPAP or surgical therapy is being considered.
Polysomnography in a sleep laboratory is time-consuming and expensive. With the evolution of technology, portable devices have emerged that measure more or less the same sleep variables in sleep laboratories as in the home. Newer CPAP devices also have auto-titration features and can record sleep variables including AHI. These devices, if equally accurate, may reduce the dependency on sleep laboratories for the diagnosis of OSA and the titration of CPAP, and thus may be more cost-effective.
Difficulties arise, however, when trying to assess and compare the diagnostic efficacy of in-home PSG versus in-lab. The AHI measured from portable devices in-home is the sum of apneas and hypopneas per hour of time in bed, rather than of sleep, and the absolute diagnostic efficacy of in-lab PSG is unknown. To compare in-home PSG with in-lab PSG, several researchers have used correlation coefficients or sensitivity and specificity, while others have used Bland-Altman plots or receiver operating characteristics (ROC) curves. All these approaches, however, have potential pitfalls. Correlation coefficients do not measure agreement; sensitivity and specificity are not helpful when the true disease status is unknown; and Bland-Altman plots measure agreement (but are helpful when the range of clinical equivalence is known). Lastly, receiver operating characteristics curves are generated using logistic regression with the true disease status as the dependent variable and test values as the independent variable. Thus, each value of the test is used as a cut-point to measure sensitivity and specificity, which are then plotted on an x-y plane. The cut-point that maximizes both sensitivity and specificity is chosen as the cut-off level to discriminate between disease and no-disease states. In the absence of a gold standard to determine the true disease status, ROC curves are of minimal value.
At the request of the Ontario Health Technology Advisory Committee (OHTAC), MAS has thus reviewed the literature on PSG published over the last two years to examine new developments.
Methods
Review Strategy
There is a large body of literature on sleep studies and several reviews have been conducted. Two large cohort studies, the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study, are the main sources of evidence on sleep literature.
To examine new developments on PSG published in the past two years, MEDLINE, EMBASE, MEDLINE In-Process & Other Non-Indexed Citations, the Cochrane Database of Systematic Reviews and Cochrane CENTRAL, INAHTA, and websites of other health technology assessment agencies were searched. Any study that reported results of in-home or in-lab PSG was included. All articles that reported findings from the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study were also reviewed.
Diffusion of Sleep Laboratories
To estimate the diffusion of sleep laboratories, a list of sleep laboratories licensed under the Independent Health Facility Act was obtained. The annual number of sleep studies per 100,000 individuals in Ontario from 2000 to 2004 was also estimated using administrative databases.
Summary of Findings
Literature Review
A total of 315 articles were identified that were published in the past two years; 227 were excluded after reviewing titles and abstracts. A total of 59 articles were identified that reported findings of the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study.
Prevalence
Based on cross-sectional data from the Wisconsin Sleep Cohort Study of 602 men and women aged 30 to 60 years, it is estimated that the prevalence of sleep-disordered breathing is 9% in women and 24% in men, on the basis of more than five AHI events per hour of sleep. Among the women with sleep disorder breathing, 22.6% had daytime sleepiness and among the men, 15.5% had daytime sleepiness. Based on this, the prevalence of OSA in the middle-aged adult population is estimated to be 2% in women and 4% in men.
Snoring is present in 94% of OSA patients, but not all snorers have OSA. Women report daytime sleepiness less often compared with their male counterparts (of similar age, body mass index [BMI], and AHI). Prevalence of OSA tends to be higher in older age groups compared with younger age groups.
Diagnostic Value of Polysomnography
It is believed that PSG in the sleep laboratory is more accurate than in-home PSG. In the absence of a gold standard, however, claims of accuracy cannot be substantiated. In general, there is poor correlation between PSG variables and clinical variables. A variety of cut-off points of AHI (> 5, > 10, and > 15) are arbitrarily used to diagnose and categorize severity of OSA, though the clinical importance of these cut-off points has not been determined.
Recently, a study of the use of a therapeutic trial of CPAP to diagnose OSA was reported. The authors studied habitual snorers with daytime sleepiness in the absence of other medical or psychiatric disorders. Using PSG as the reference standard, the authors calculated the sensitivity of this test to be 80% and its specificity to be 97%. Further, they concluded that PSG could be avoided in 46% of this population.
Obstructive Sleep Apnea and Obesity
Obstructive sleep apnea is strongly associated with obesity. Obese individuals (BMI >30 kg/m2) are at higher risk for OSA compared with non-obese individuals and up to 75% of OSA patients are obese. It is hypothesized that obese individuals have large deposits of fat in the neck that cause the upper airway to collapse in the supine position during sleep. The observations reported from several studies support the hypothesis that AHIs (or RDIs) are significantly reduced with weight loss in obese individuals.
Obstructive Sleep Apnea and Cardiovascular Diseases
Associations have been shown between OSA and comorbidities such as diabetes mellitus and hypertension, which are known risk factors for myocardial infarction and stroke. Patients with more severe forms of OSA (based on AHI) report poorer quality of life and increased health care utilization compared with patients with milder forms of OSA. From animal models, it is hypothesized that sleep fragmentation results in glucose intolerance and hypertension. There is, however, no evidence from prospective studies in humans to establish a causal link between OSA and hypertension or diabetes mellitus. It is also not clear that the associations between OSA and other diseases are independent of obesity; in most of these studies, patients with higher values of AHI had higher values of BMI compared with patients with lower AHI values.
A recent meta-analysis of bariatric surgery has shown that weight loss in obese individuals (mean BMI = 46.8 kg/m2; range = 32.30–68.80) significantly improved their health profile. Diabetes was resolved in 76.8% of patients, hypertension was resolved in 61.7% of patients, hyperlipidemia improved in 70% of patients, and OSA resolved in 85.7% of patients. This suggests that obesity leads to OSA, diabetes, and hypertension, rather than OSA independently causing diabetes and hypertension.
Health Technology Assessments, Guidelines, and Recommendations
In April 2005, the Centers for Medicare and Medicaid Services (CMS) in the United States published its decision and review regarding in-home and in-lab sleep studies for the diagnosis and treatment of OSA with CPAP. In order to cover CPAP, CMS requires that a diagnosis of OSA be established using PSG in a sleep laboratory. After reviewing the literature, CMS concluded that the evidence was not adequate to determine that unattended portable sleep study was reasonable and necessary in the diagnosis of OSA.
In May 2005, the Canadian Coordinating Office of Health Technology Assessment (CCOHTA) published a review of guidelines for referral of patients to sleep laboratories. The review included 37 guidelines and associated reviews that covered 18 applications of sleep laboratory studies. The CCOHTA reported that the level of evidence for many applications was of limited quality, that some cited studies were not relevant to the recommendations made, that many recommendations reflect consensus positions only, and that there was a need for more good quality studies of many sleep laboratory applications.
Diffusion
As of the time of writing, there are 97 licensed sleep laboratories in Ontario. In 2000, the number of sleep studies performed in Ontario was 376/100,000 people. There was a steady rise in sleep studies in the following years such that in 2004, 769 sleep studies per 100,000 people were performed, for a total of 96,134 sleep studies. Based on prevalence estimates of the Wisconsin Sleep Cohort Study, it was estimated that 927,105 people aged 30 to 60 years have sleep-disordered breathing. Thus, there may be a 10-fold rise in the rate of sleep tests in the next few years.
Economic Analysis
In 2004, approximately 96,000 sleep studies were conducted in Ontario at a total cost of ~$47 million (Cdn). Since obesity is associated with sleep disordered breathing, MAS compared the costs of sleep studies to the cost of bariatric surgery. The cost of bariatric surgery is $17,350 per patient. In 2004, Ontario spent $4.7 million per year for 270 patients to undergo bariatric surgery in the province, and $8.2 million for 225 patients to seek out-of-country treatment. Using a Markov model, it was concluded that shifting costs from sleep studies to bariatric surgery would benefit more patients with OSA and may also prevent health consequences related to diabetes, hypertension, and hyperlipidemia. It is estimated that the annual cost of treating comorbid conditions in morbidly obese patients often exceeds $10,000 per patient. Thus, the downstream cost savings could be substantial.
Considerations for Policy Development
Weight loss is associated with a decrease in OSA severity. Treating and preventing obesity would also substantially reduce the economic burden associated with diabetes, hypertension, hyperlipidemia, and OSA. Promotion of healthy weights may be achieved by a multisectorial approach as recommended by the Chief Medical Officer of Health for Ontario. Bariatric surgery has the potential to help morbidly obese individuals (BMI > 35 kg/m2 with an accompanying comorbid condition, or BMI > 40 kg/m2) lose weight. In January 2005, MAS completed an assessment of bariatric surgery, based on which OHTAC recommended an improvement in access to these surgeries for morbidly obese patients in Ontario.
Habitual snorers with excessive daytime sleepiness have a high pretest probability of having OSA. These patients could be offered a therapeutic trial of CPAP to diagnose OSA, rather than a PSG. A majority of these patients are also obese and may benefit from weight loss. Individualized weight loss programs should, therefore, be offered and patients who are morbidly obese should be offered bariatric surgery.
That said, and in view of the still evolving understanding of the causes, consequences and optimal treatment of OSA, further research is warranted to identify which patients should be screened for OSA.
PMCID: PMC3379160  PMID: 23074483
2.  A Systematic Review of Statistical Methods Used to Test for Reliability of Medical Instruments Measuring Continuous Variables 
Objective(s): Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice.
Materials and Methods: In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria.
Results: The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments.
Conclusion: This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
PMCID: PMC3758037  PMID: 23997908
ICC; Intra-class correlation coefficient; Reliability; Statistical method; Validation study
3.  Lack of agreement between tonometric and gastric juice partial carbon dioxide tension 
Critical Care  2000;4(4):249-254.
Our goal was to compare measurement of tonometered saline and gastric juice partial carbon dioxide tension (PCO2). In this prospective observational study, 112 pairs of measurements were simultaneously obtained under various hemodynamic conditions, in 15 critical care patients. Linear regression analysis showed a significant correlation between the two methods of measuring PCO2 (r 2 = 0.43; P < 0.0001). However, gastric juice PCO2 was systematically higher (mean difference 51 mmHg). The 95% limits of agreement were 315 mmHg and the dispersion increased as the values of PCO2 increased. Tonometric and gastric juice PCO2 cannot be used interchangeably. Gastric juice PCO2 measurement should be interpreted with caution.
Introduction:
In recent years there has been growing interest in tonometric estimation of gastric intramucosal pH (pHi). More recently, attention has focused on the gradient between intraluminal and arterial PCO2. pHi appears to be a useful diagnostic and prognostic tool in critically ill patients, and may also be used as a therapeutic guide. However, intraluminal PCO2 is the parameter measured to calculate pHi, and it is assumed as equivalent to the PCO2 of the upper layers of the gastric mucosa.
Direct measurement of PCO2 in gastric juice might offer advantages over tonometry. Tonometer costs could be saved, and equilibration time would no longer be necessary. Additionally, preanalytic factors that account for poor reproducibility, such as inadequate volume of saline in the tonometer, errors in the dwell time of the sample or in the technique used to aspirate saline, mixing of the sample with tonometer dead space and delay in analysis, could be prevented. Nevertheless, to our knowledge few experimental or clinical studies have examined PCO2 in gastric juice. Moreover, no comparison with simultaneous tonometric samples has been performed. Our goal was to compare simultaneous measurement of PCO2 in gastric juice and in saline samples from a tonometer. Data from the present study show that gastric juice PCO2 is systematically higher. Furthermore, differences widen at high PCO2 values, and data dispersion becomes even more striking. Therefore, tonometric PCO2 and gastric juice PCO2 are not interchangeable.
Patients and methods:
The present study was approved by the local ethics committee, and informed consent was obtained from the next of kin of each patient.
We studied 15 consecutive mechanically ventilated patients from a medical/surgical intensive care unit, in whom tonometric monitoring was indicated by attending physicians. All patients were receiving 50 mg intravenous ranitidine every 8 h. Gastric tonometers were filled with saline, which was extracted after 90 min of equilibration time. At the same time, gastric juice was anaerobically extracted from the aspiration port of the tonometer. The initial 20 ml was discarded. PCO2 in both samples was measured using a blood gas analyzer (AVL 945; AVL List GMBH, Gratz, Austria). These measurements were taken at various time points in each patient, and under various haemodynamic and oxygen transport conditions, All measurements were performed with the patient fasted. Correlation between the two measurements was examined using the Bland-Altman technique.
We also performed an in vitro study to quantify the precision and bias for the AVL 945. For this purpose, a stable PCO2 in saline solution was achieved by bubbling 5% carbon dioxide calibration gas.
Results:
We performed 112 pairs of measurements in 15 patients. Table 1 shows clinical data and the first values of arterial, tonometered and gastric juice PCO2 for each patient. Regression analysis demonstrated a significant correlation between both methods of measuring PCO2 (r 2 =0.43; gastric juice PCO2 = -28.79 + [2.55 × tonometric PCO2]; P < 0.0001; Fig. 1). However, the bias calculated as the mean difference of gastric juice and tonometric PCO2 was 51 mmHg. The 95% limits of agreement were 315 mmHg (Fig. 2). For mean PCO2 values lesser than 100 mmHg, the bias and the 95% limits of agreement were 19 and 102 mmHg, respectively. As mean PCO2 increased, the scattering of differences widened (r 2 =0.71; P < 0.0001).
In an effort to prevent the bias related to multiple measurements per patient, we performed Bland-Altman analysis with the first measurement of each patient. After this the results remained similar (bias 55 mmHg, 95% limits of agreement 216 mmHg).
The AVL 945 blood gas analyzer showed a negative bias of 0.97 mmHg and a precision of 2.13 mmHg. This bias was considered negligible, so no further correction was made to saline tonometric values.
Discussion:
The results of the present study show that tonometric PCO2 and gastric juice PCO2 are not interchangeable. Gastric juice PCO2 is systematically higher. At high PCO2 values the differences widen, and data dispersion becomes even more marked.
There is no clear cause for these observations. A possible explanation might be that tonometric PCO2 is generated over a time interval, whereas gastric juice PCO2 might reflect rapid changes in mucosal metabolism. Different equilibrium time could also account for data dispersion, but not for the positive bias for gastric juice. Rapid changes should occur in both directions.
Another potential confounding factor is the ability of blood gas analyzers to measure PCO2 in gastric juice. Measurement of PCO2 in 0.9% saline is an important source of error in the estimation of pHi. Variation in PCO2 values may occur with different PCO2 equilibration solutions. For example, bias is -66.5% when the Nova Stat Profile 7 blood gas analyzer (Nova Biomedical, Waltham, MA, USA) measures concentration of 1.95% of CO2 equilibrated in normal saline. However, bias changes to +45.4% when 1.95% CO2 is equilibrated in human albumin solution 4.5%.
It would not be surprising if gastric juice components such as proteins, mucopolisaccharides and others interfere with CO2 solubility and its subsequent measurement by blood gas analyzers. In this way, intersubject and intrasubject variation in gastric juice composition could also account for data dispersion. Fiddian-Green et al [1] measured PCO2 in gastric contents of anaesthetized dogs. They isolated the stomach from the oesophagus and the duodenum with ligatures, and washed it through a catheter with saline. Then, they instilled 250 ml 0.9% saline and took samples to measure PCO2 and to estimate pHi. Simultaneously, mucosa pH was recorded with a microglass probe. They found a statistically significant correlation between both methods. However, data dispersion in the graph was considerable.
We were able to exclude analyzer underestimation of PCO2 in saline as the cause for the present results. In vitro performance of the AVL 945 in blood was good. It showed a negative bias less than 1 mmHg and a precision of about 2 mmHg.
We cannot infer from the present data the technique that should be the gold standard for measuring PCO2 in gastric mucosa. However, the studies that have established the normal values for pHi, prognostic changes and its uses as a therapeutic index have been performed with tonometry. Hence, more data are needed for the routine measurement of PCO2 in gastric juice.
Correlation between gastric juice and tonometric PCO2. We performed 112 pairs of measurements of gastric juice and tonometric PCO2 in 15 critical care patients under different haemodynamic and oxygen transport conditions. The linear regression coefficient is significant. However, the slope value indicates systematic overestimation of gastric juice PCO2 in relation to saline PCO2.
Bland-Altman analysis of the differences between gastric juice and tonometric PCO2. The bias calculated as the mean difference of gastric juice and tonometric PCO2 was 51 mmHg. The 95% limits of agreement were 315 mmHg. The bias and the scattering of differences widened as PCO2 increased.
Clinical characteristics and first value of arterial, tonometer and gastric juice PCO2
ARDS, acute respiratory distress syndrome.
PMCID: PMC29045  PMID: 11056754
gastric tonometry; intramucosal partial carbon dioxide tension; intramucosal pH
4.  Epidemiology and Reporting Characteristics of Systematic Reviews 
PLoS Medicine  2007;4(3):e78.
Background
Systematic reviews (SRs) have become increasingly popular to a wide range of stakeholders. We set out to capture a representative cross-sectional sample of published SRs and examine them in terms of a broad range of epidemiological, descriptive, and reporting characteristics, including emerging aspects not previously examined.
Methods and Findings
We searched Medline for SRs indexed during November 2004 and written in English. Citations were screened and those meeting our inclusion criteria were retained. Data were collected using a 51-item data collection form designed to assess the epidemiological and reporting details and the bias-related aspects of the reviews. The data were analyzed descriptively. In total 300 SRs were identified, suggesting a current annual publication rate of about 2,500, involving more than 33,700 separate studies including one-third of a million participants. The majority (272 [90.7%]) of SRs were reported in specialty journals. Most reviews (213 [71.0%]) were categorized as therapeutic, and included a median of 16 studies involving 1,112 participants. Funding sources were not reported in more than one-third (122 [40.7%]) of the reviews. Reviews typically searched a median of three electronic databases and two other sources, although only about two-thirds (208 [69.3%]) of them reported the years searched. Most (197/295 [66.8%]) reviews reported information about quality assessment, while few (68/294 [23.1%]) reported assessing for publication bias. A little over half (161/300 [53.7%]) of the SRs reported combining their results statistically, of which most (147/161 [91.3%]) assessed for consistency across studies. Few (53 [17.7%]) SRs reported being updates of previously completed reviews. No review had a registration number. Only half (150 [50.0%]) of the reviews used the term “systematic review” or “meta-analysis” in the title or abstract. There were large differences between Cochrane reviews and non-Cochrane reviews in the quality of reporting several characteristics.
Conclusions
SRs are now produced in large numbers, and our data suggest that the quality of their reporting is inconsistent. This situation might be improved if more widely agreed upon evidence-based reporting guidelines were endorsed and adhered to by authors and journals. These results substantiate the view that readers should not accept SRs uncritically.
Data were collected on the epidemiological, descriptive, and reporting characteristics of recent systematic reviews. A descriptive analysis found inconsistencies in the quality of reporting.
Editors' Summary
Background.
In health care it is important to assess all the evidence available about what causes a disease or the best way to prevent, diagnose, or treat it. Decisions should not be made simply on the basis of—for example—the latest or biggest research study, but after a full consideration of the findings from all the research of good quality that has so far been conducted on the issue in question. This approach is known as “evidence-based medicine” (EBM). A report that is based on a search for studies addressing a clearly defined question, a quality assessment of the studies found, and a synthesis of the research findings, is known as a systematic review (SR). Conducting an SR is itself regarded as a research project and the methods involved can be quite complex. In particular, as with other forms of research, it is important to do everything possible to reduce bias. The leading role in developing the SR concept and the methods that should be used has been played by an international network called the Cochrane Collaboration (see “Additional Information” below), which was launched in 1992. However, SRs are now becoming commonplace. Many articles published in journals and elsewhere are described as being systematic reviews.
Why Was This Study Done?
Since systematic reviews are claimed to be the best source of evidence, it is important that they should be well conducted and that bias should not have influenced the conclusions drawn in the review. Just because the authors of a paper that discusses evidence on a particular topic claim that they have done their review “systematically,” it does not guarantee that their methods have been sound and that their report is of good quality. However, if they have reported details of their methods, then it can help users of the review decide whether they are looking at a review with conclusions they can rely on. The authors of this PLoS Medicine article wanted to find out how many SRs are now being published, where they are being published, and what questions they are addressing. They also wanted to see how well the methods of SRs are being reported.
What Did the Researchers Do and Find?
They picked one month and looked for all the SRs added to the main list of medical literature in that month. They found 300, on a range of topics and in a variety of medical journals. They estimate that about 20% of reviews appearing each year are published by the Cochrane Collaboration. They found many cases in which important aspects of the methods used were not reported. For example, about a third of the SRs did not report how (if at all) the quality of the studies found in the search had been assessed. An important assessment, which analyzes for “publication bias,” was reported as having been done in only about a quarter of the cases. Most of the reporting failures were in the “non-Cochrane” reviews.
What Do These Findings Mean?
The authors concluded that the standards of reporting of SRs vary widely and that readers should, therefore, not accept the conclusions of SRs uncritically. To improve this situation, they urge that guidelines be drawn up regarding how SRs are reported. The writers of SRs and also the journals that publish them should follow these guidelines.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040078.
An editorial discussing this research article and its relevance to medical publishing appears in the same issue of PLoS Medicine
A good source of information on the evidence-based approach to medicine is the James Lind Library
The Web site of the Cochrane Collaboration is a good source of information on systematic reviews. In particular there is a newcomers' guide and information for health care “consumers”. From this Web site, it is also possible to see summaries of the SRs published by the Cochrane Collaboration (readers in some countries can also view the complete SRs free of charge)
Information on the practice of evidence-based medicine is available from the US Agency for Healthcare Research and Quality and the Canadian Agency for Drugs and Technologies in Health
doi:10.1371/journal.pmed.0040078
PMCID: PMC1831728  PMID: 17388659
5.  Point-of-Care International Normalized Ratio (INR) Monitoring Devices for Patients on Long-term Oral Anticoagulation Therapy 
Executive Summary
Subject of the Evidence-Based Analysis
The purpose of this evidence based analysis report is to examine the safety and effectiveness of point-of-care (POC) international normalized ratio (INR) monitoring devices for patients on long-term oral anticoagulation therapy (OAT).
Clinical Need: Target Population and Condition
Long-term OAT is typically required by patients with mechanical heart valves, chronic atrial fibrillation, venous thromboembolism, myocardial infarction, stroke, and/or peripheral arterial occlusion. It is estimated that approximately 1% of the population receives anticoagulation treatment and, by applying this value to Ontario, there are an estimated 132,000 patients on OAT in the province, a figure that is expected to increase with the aging population.
Patients on OAT are regularly monitored and their medications adjusted to ensure that their INR scores remain in the therapeutic range. This can be challenging due to the narrow therapeutic window of warfarin and variation in individual responses. Optimal INR scores depend on the underlying indication for treatment and patient level characteristics, but for most patients the therapeutic range is an INR score of between 2.0 and 3.0.
The current standard of care in Ontario for patients on long-term OAT is laboratory-based INR determination with management carried out by primary care physicians or anticoagulation clinics (ACCs). Patients also regularly visit a hospital or community-based facility to provide a venous blood samples (venipuncture) that are then sent to a laboratory for INR analysis.
Experts, however, have commented that there may be under-utilization of OAT due to patient factors, physician factors, or regional practice variations and that sub-optimal patient management may also occur. There is currently no population-based Ontario data to permit the assessment of patient care, but recent systematic reviews have estimated that less that 50% of patients receive OAT on a routine basis and that patients are in the therapeutic range only 64% of the time.
Overview of POC INR Devices
POC INR devices offer an alternative to laboratory-based testing and venipuncture, enabling INR determination from a fingerstick sample of whole blood. Independent evaluations have shown POC devices to have an acceptable level of precision. They permit INR results to be determined immediately, allowing for more rapid medication adjustments.
POC devices can be used in a variety of settings including physician offices, ACCs, long-term care facilities, pharmacies, or by the patients themselves through self-testing (PST) or self-management (PSM) techniques. With PST, patients measure their INR values and then contact their physician for instructions on dose adjustment, whereas with PSM, patients adjust the medication themselves based on pre-set algorithms. These models are not suitable for all patients and require the identification and education of suitable candidates.
Potential advantages of POC devices include improved convenience to patients, better treatment compliance and satisfaction, more frequent monitoring and fewer thromboembolic and hemorrhagic complications. Potential disadvantages of the device include the tendency to underestimate high INR values and overestimate low INR values, low thromboplastin sensitivity, inability to calculate a mean normal PT, and errors in INR determination in patients with antiphospholipid antibodies with certain instruments. Although treatment satisfaction and quality of life (QoL) may improve with POC INR monitoring, some patients may experience increased anxiety or preoccupation with their disease with these strategies.
Evidence-Based Analysis Methods
Research Questions
1. Effectiveness
Does POC INR monitoring improve clinical outcomes in various settings compared to standard laboratory-based testing?
Does POC INR monitoring impact patient satisfaction, QoL, compliance, acceptability, convenience compared to standard laboratory-based INR determination?
Settings include primary care settings with use of POC INR devices by general practitioners or nurses, ACCs, pharmacies, long-term care homes, and use by the patient either for PST or PSM.
2. Cost-effectiveness
What is the cost-effectiveness of POC INR monitoring devices in various settings compared to standard laboratory-based INR determination?
Inclusion Criteria
English-language RCTs, systematic reviews, and meta-analyses
Publication dates: 1996 to November 25, 2008
Population: patients on OAT
Intervention: anticoagulation monitoring by POC INR device in any setting including anticoagulation clinic, primary care (general practitioner or nurse), pharmacy, long-term care facility, PST, PSM or any other POC INR strategy
Minimum sample size: 50 patients Minimum follow-up period: 3 months
Comparator: usual care defined as venipuncture blood draw for an INR laboratory test and management provided by an ACC or individual practitioner
Outcomes: Hemorrhagic events, thromboembolic events, all-cause mortality, anticoagulation control as assessed by proportion of time or values in the therapeutic range, patient reported outcomes including satisfaction, QoL, compliance, acceptability, convenience
Exclusion criteria
Non-RCTs, before-after studies, quasi-experimental studies, observational studies, case reports, case series, editorials, letters, non-systematic reviews, conference proceedings, abstracts, non-English articles, duplicate publications
Studies where POC INR devices were compared to laboratory testing to assess test accuracy
Studies where the POC INR results were not used to guide patient management
Method of Review
A search of electronic databases (OVID MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, The Cochrane Library, and the International Agency for Health Technology Assessment [INAHTA] database) was undertaken to identify evidence published from January 1, 1998 to November 25, 2008. Studies meeting the inclusion criteria were selected from the search results. Reference lists of selected articles were also checked for relevant studies.
Summary of Findings
Five existing reviews and 22 articles describing 17 unique RCTs met the inclusion criteria. Three RCTs examined POC INR monitoring devices with PST strategies, 11 RCTs examined PSM strategies, one RCT included both PST and PSM strategies and two RCTs examined the use of POC INR monitoring devices by health care professionals.
Anticoagulation Control
Anticoagulation control is measured by the percentage of time INR is within the therapeutic range or by the percentage of INR values in the therapeutic range. Due to the differing methodologies and reporting structures used, it was deemed inappropriate to combine the data and estimate whether the difference between groups would be significant. Instead, the results of individual studies were weighted by the number of person-years of observation and then pooled to calculate a summary measure.
Across most studies, patients in the intervention groups tended to have a higher percentage of time and values in the therapeutic target range in comparison to control patients. When the percentage of time in the therapeutic range was pooled across studies and weighted by the number of person-years of observation, the difference between the intervention and control groups was 4.2% for PSM, 7.2% for PST and 6.1% for POC use by health care practitioners. Overall, intervention patients were in the target range 69% of the time and control patients were in the therapeutic target range 64% of the time leading to an overall difference between groups of roughly 5%.
Major Complications and Deaths
There was no statistically significant difference in the number of major hemorrhagic events between patients managed with POC INR monitoring devices and patients managed with standard laboratory testing (OR =0.74; 95% CI: 0.52- 1.04). This difference was non-significant for all POC strategies (PSM, PST, health care practitioner).
Patients managed with POC INR monitoring devices had significantly fewer thromboembolic events than usual care patients (OR =0.52; 95% CI: 0.37 - 0.74). When divided by POC strategy, PSM resulted in significantly fewer thromboembolic events than usual care (OR =0.46.; 95% CI: 0.29 - 0.72). The observed difference in thromboembolic events for PSM remained significant when the analysis was limited to major thromboembolic events (OR =0.40; 95% CI: 0.17 - 0.93), but was non-significant when the analysis was limited to minor thromboembolic events (OR =0.73; 95% CI: 0.08 - 7.01). PST and GP/Nurse strategies did not result in significant differences in thromboembolic events, however there were only a limited number of studies examining these interventions.
No statistically significant difference was observed in the number of deaths between POC intervention and usual care control groups (OR =0.67; 95% CI: 0.41 - 1.10). This difference was non-significant for all POC strategies. Only one study reported on survival with 10-year survival rate of 76.1% in the usual care control group compared to 84.5% in the PSM group (P=0.05).
Summary Results of Meta-Analyses of Major Complications and Deaths in POC INR Monitoring Studies
Patient Satisfaction and Quality of Life
Quality of life measures were reported in eight studies comparing POC INR monitoring to standard laboratory testing using a variety of measurement tools. It was thus not possible to calculate a quantitative summary measure. The majority of studies reported favourable impacts of POC INR monitoring on QoL and found better treatment satisfaction with POC monitoring. Results from a pre-analysis patient and caregiver focus group conducted in Ontario also indicated improved patient QoL with POC monitoring.
Quality of the Evidence
Studies varied with regard to patient eligibility, baseline patient characteristics, follow-up duration, and withdrawal rates. Differential drop-out rates were observed such that the POC intervention groups tended to have a larger number of patients who withdrew. There was a lack of consistency in the definitions and reporting for OAT control and definitions of adverse events. In most studies, the intervention group received more education on the use of warfarin and performed more frequent INR testing, which may have overestimated the effect of the POC intervention. Patient selection and eligibility criteria were not always fully described and it is likely that the majority of the PST/PSM trials included a highly motivated patient population. Lastly, a large number of trials were also sponsored by industry.
Despite the observed heterogeneity among studies, there was a general consensus in findings that POC INR monitoring devices have beneficial impacts on the risk of thromboembolic events, anticoagulation control and patient satisfaction and QoL (ES Table 2).
GRADE Quality of the Evidence on POC INR Monitoring Studies
CI refers to confidence interval; Interv, intervention; OR, odds ratio; RCT, randomized controlled trial.
Economic Analysis
Using a 5-year Markov model, the health and economic outcomes associated with four different anticoagulation management approaches were evaluated:
Standard care: consisting of a laboratory test with a venipuncture blood draw for an INR;
Healthcare staff testing: consisting of a test with a POC INR device in a medical clinic comprised of healthcare staff such as pharmacists, nurses, and physicians following protocol to manage OAT;
PST: patient self-testing using a POC INR device and phoning in results to an ACC or family physician; and
PSM: patient self-managing using a POC INR device and self-adjustment of OAT according to a standardized protocol. Patients may also phone in to a medical office for guidance.
The primary analytic perspective was that of the MOHLTC. Only direct medical costs were considered and the time horizon of the model was five years - the serviceable life of a POC device.
From the results of the economic analysis, it was found that POC strategies are cost-effective compared to traditional INR laboratory testing. In particular, the healthcare staff testing strategy can derive potential cost savings from the use of one device for multiple patients. The PSM strategy, however, seems to be the most cost-effective method i.e. patients are more inclined to adjust their INRs more readily (as opposed to allowing INRs to fall out of range).
Considerations for Ontario Health System
Although the use of POC devices continues to diffuse throughout Ontario, not all OAT patients are suitable or have the ability to practice PST/PSM. The use of POC is currently concentrated at the institutional setting, including hospitals, ACCs, long-term care facilities, physician offices and pharmacies, and is much less commonly used at the patient level. It is, however, estimated that 24% of OAT patients (representing approximately 32,000 patients in Ontario), would be suitable candidates for PST/PSM strategies and willing to use a POC device.
There are several barriers to the use and implementation of POC INR monitoring devices, including factors such as lack of physician familiarity with the devices, resistance to changing established laboratory-based methods, lack of an approach for identifying suitable patients and inadequate resources for effective patient education and training. Issues of cost and insufficient reimbursement strategies may also hinder implementation and effective quality assurance programs would need to be developed to ensure that INR measurements are accurate and precise.
Conclusions
For a select group of patients who are highly motivated and trained, PSM resulted in significantly fewer thromboembolic events compared to conventional laboratory-based INR testing. No significant differences were observed for major hemorrhages or all-cause mortality. PST and GP/Nurse use of POC strategies are just as effective as conventional laboratory-based INR testing for thromboembolic events, major hemorrhages, and all-cause mortality. POC strategies may also result in better OAT control as measured by the proportion of time INR is in the therapeutic range and there appears to be beneficial impacts on patient satisfaction and QoL. The use of POC devices should factor in patient suitability, patient education and training, health system constraints, and affordability.
Keywords
anticoagulants, International Normalized Ratio, point-of-care, self-monitoring, warfarin.
PMCID: PMC3377545  PMID: 23074516
6.  Continuous Subcutaneous Insulin Infusion (CSII) Pumps for Type 1 and Type 2 Adult Diabetic Populations 
Executive Summary
In June 2008, the Medical Advisory Secretariat began work on the Diabetes Strategy Evidence Project, an evidence-based review of the literature surrounding strategies for successful management and treatment of diabetes. This project came about when the Health System Strategy Division at the Ministry of Health and Long-Term Care subsequently asked the secretariat to provide an evidentiary platform for the Ministry’s newly released Diabetes Strategy.
After an initial review of the strategy and consultation with experts, the secretariat identified five key areas in which evidence was needed. Evidence-based analyses have been prepared for each of these five areas: insulin pumps, behavioural interventions, bariatric surgery, home telemonitoring, and community based care. For each area, an economic analysis was completed where appropriate and is described in a separate report.
To review these titles within the Diabetes Strategy Evidence series, please visit the Medical Advisory Secretariat Web site, http://www.health.gov.on.ca/english/providers/program/mas/mas_about.html,
Diabetes Strategy Evidence Platform: Summary of Evidence-Based Analyses
Continuous Subcutaneous Insulin Infusion Pumps for Type 1 and Type 2 Adult Diabetics: An Evidence-Based Analysis
Behavioural Interventions for Type 2 Diabetes: An Evidence-Based Analysis
Bariatric Surgery for People with Diabetes and Morbid Obesity: An Evidence-Based Summary
Community-Based Care for the Management of Type 2 Diabetes: An Evidence-Based Analysis
Home Telemonitoring for Type 2 Diabetes: An Evidence-Based Analysis
Application of the Ontario Diabetes Economic Model (ODEM) to Determine the Cost-effectiveness and Budget Impact of Selected Type 2 Diabetes Interventions in Ontario
Objective
The objective of this analysis is to review the efficacy of continuous subcutaneous insulin infusion (CSII) pumps as compared to multiple daily injections (MDI) for the type 1 and type 2 adult diabetics.
Clinical Need and Target Population
Insulin therapy is an integral component of the treatment of many individuals with diabetes. Type 1, or juvenile-onset diabetes, is a life-long disorder that commonly manifests in children and adolescents, but onset can occur at any age. It represents about 10% of the total diabetes population and involves immune-mediated destruction of insulin producing cells in the pancreas. The loss of these cells results in a decrease in insulin production, which in turn necessitates exogenous insulin therapy.
Type 2, or ‘maturity-onset’ diabetes represents about 90% of the total diabetes population and is marked by a resistance to insulin or insufficient insulin secretion. The risk of developing type 2 diabetes increases with age, obesity, and lack of physical activity. The condition tends to develop gradually and may remain undiagnosed for many years. Approximately 30% of patients with type 2 diabetes eventually require insulin therapy.
CSII Pumps
In conventional therapy programs for diabetes, insulin is injected once or twice a day in some combination of short- and long-acting insulin preparations. Some patients require intensive therapy regimes known as multiple daily injection (MDI) programs, in which insulin is injected three or more times a day. It’s a time consuming process and usually requires an injection of slow acting basal insulin in the morning or evening and frequent doses of short-acting insulin prior to eating. The most common form of slower acting insulin used is neutral protamine gagedorn (NPH), which reaches peak activity 3 to 5 hours after injection. There are some concerns surrounding the use of NPH at night-time as, if injected immediately before bed, nocturnal hypoglycemia may occur. To combat nocturnal hypoglycemia and other issues related to absorption, alternative insulins have been developed, such as the slow-acting insulin glargine. Glargine has no peak action time and instead acts consistently over a twenty-four hour period, helping reduce the frequency of hypoglycemic episodes.
Alternatively, intensive therapy regimes can be administered by continuous insulin infusion (CSII) pumps. These devices attempt to closely mimic the behaviour of the pancreas, continuously providing a basal level insulin to the body with additional boluses at meal times. Modern CSII pumps are comprised of a small battery-driven pump that is designed to administer insulin subcutaneously through the abdominal wall via butterfly needle. The insulin dose is adjusted in response to measured capillary glucose values in a fashion similar to MDI and is thus often seen as a preferred method to multiple injection therapy. There are, however, still risks associated with the use of CSII pumps. Despite the increased use of CSII pumps, there is uncertainty around their effectiveness as compared to MDI for improving glycemic control.
Part A: Type 1 Diabetic Adults (≥19 years)
An evidence-based analysis on the efficacy of CSII pumps compared to MDI was carried out on both type 1 and type 2 adult diabetic populations.
Research Questions
Are CSII pumps more effective than MDI for improving glycemic control in adults (≥19 years) with type 1 diabetes?
Are CSII pumps more effective than MDI for improving additional outcomes related to diabetes such as quality of life (QoL)?
Literature Search
Inclusion Criteria
Randomized controlled trials, systematic reviews, meta-analysis and/or health technology assessments from MEDLINE, EMBASE, CINAHL
Adults (≥ 19 years)
Type 1 diabetes
Study evaluates CSII vs. MDI
Published between January 1, 2002 – March 24, 2009
Patient currently on intensive insulin therapy
Exclusion Criteria
Studies with <20 patients
Studies <5 weeks in duration
CSII applied only at night time and not 24 hours/day
Mixed group of diabetes patients (children, adults, type 1, type 2)
Pregnancy studies
Outcomes of Interest
The primary outcomes of interest were glycosylated hemoglobin (HbA1c) levels, mean daily blood glucose, glucose variability, and frequency of hypoglycaemic events. Other outcomes of interest were insulin requirements, adverse events, and quality of life.
Search Strategy
The literature search strategy employed keywords and subject headings to capture the concepts of:
1) insulin pumps, and
2) type 1 diabetes.
The search was run on July 6, 2008 in the following databases: Ovid MEDLINE (1996 to June Week 4 2008), OVID MEDLINE In-Process and Other Non-Indexed Citations, EMBASE (1980 to 2008 Week 26), OVID CINAHL (1982 to June Week 4 2008) the Cochrane Library, and the Centre for Reviews and Dissemination/International Agency for Health Technology Assessment. A search update was run on March 24, 2009 and studies published prior to 2002 were also examined for inclusion into the review. Parallel search strategies were developed for the remaining databases. Search results were limited to human and English-language published between January 2002 and March 24, 2009. Abstracts were reviewed, and studies meeting the inclusion criteria outlined above were obtained. Reference lists were also checked for relevant studies.
Summary of Findings
The database search identified 519 relevant citations published between 1996 and March 24, 2009. Of the 519 abstracts reviewed, four RCTs and one abstract met the inclusion criteria outlined above. While efficacy outcomes were reported in each of the trials, a meta-analysis was not possible due to missing data around standard deviations of change values as well as missing data for the first period of the crossover arm of the trial. Meta-analysis was not possible on other outcomes (quality of life, insulin requirements, frequency of hypoglycemia) due to differences in reporting.
HbA1c
In studies where no baseline data was reported, the final values were used. Two studies (Hanaire-Broutin et al. 2000, Hoogma et al. 2005) reported a slight reduction in HbA1c of 0.35% and 0.22% respectively for CSII pumps in comparison to MDI. A slightly larger reduction in HbA1c of 0.84% was reported by DeVries et al.; however, this study was the only study to include patients with poor glycemic control marked by higher baseline HbA1c levels. One study (Bruttomesso et al. 2008) showed no difference between CSII pumps and MDI on Hba1c levels and was the only study using insulin glargine (consistent with results of parallel RCT in abstract by Bolli 2004). While there is statistically significant reduction in HbA1c in three of four trials, there is no evidence to suggest these results are clinically significant.
Mean Blood Glucose
Three of four studies reported a statistically significant reduction in the mean daily blood glucose for patients using CSII pump, though these results were not clinically significant. One study (DeVries et al. 2002) did not report study data on mean blood glucose but noted that the differences were not statistically significant. There is difficulty with interpreting study findings as blood glucose was measured differently across studies. Three of four studies used a glucose diary, while one study used a memory meter. In addition, frequency of self monitoring of blood glucose (SMBG) varied from four to nine times per day. Measurements used to determine differences in mean daily blood glucose between the CSII pump group and MDI group at clinic visits were collected at varying time points. Two studies use measurements from the last day prior to the final visit (Hoogma et al. 2005, DeVries et al. 2002), while one study used measurements taken during the last 30 days and another study used measurements taken during the 14 days prior to the final visit of each treatment period.
Glucose Variability
All four studies showed a statistically significant reduction in glucose variability for patients using CSII pumps compared to those using MDI, though one, Bruttomesso et al. 2008, only showed a significant reduction at the morning time point. Brutomesso et al. also used alternate measures of glucose variability and found that both the Lability index and mean amplitude of glycemic excursions (MAGE) were in concordance with the findings using the standard deviation (SD) values of mean blood glucose, but the average daily risk range (ADRR) showed no difference between the CSII pump and MDI groups.
Hypoglycemic Events
There is conflicting evidence concerning the efficacy of CSII pumps in decreasing both mild and severe hypoglycemic events. For mild hypoglycemic events, DeVries et al. observed a higher number of events per patient week in the CSII pump group than the MDI group, while Hoogma et al. observed a higher number of events per patient year in the MDI group. The remaining two studies found no differences between the two groups in the frequency of mild hypoglycemic events. For severe hypoglycemic events, Hoogma et al. found an increase in events per patient year among MDI patients, however, all of the other RCTs showed no difference between the patient groups in this aspect.
Insulin Requirements and Adverse Events
In all four studies, insulin requirements were significantly lower in patients receiving CSII pump treatment in comparison to MDI. This difference was statistically significant in all studies. Adverse events were reported in three studies. Devries et al. found no difference in ketoacidotic episodes between CSII pump and MDI users. Bruttomesso et al. reported no adverse events during the study. Hanaire-Broutin et al. found that 30 patients experienced 58 serious adverse events (SAEs) during MDI and 23 patients had 33 SAEs during treatment out of a total of 256 patients. Most events were related to severe hypoglycemia and diabetic ketoacidosis.
Quality of Life and Patient Preference
QoL was measured in three studies and patient preference was measured in one. All three studies found an improvement in QoL for CSII users compared to those using MDI, although various instruments were used among the studies and possible reporting bias was evident as non-positive outcomes were not consistently reported. Moreover, there was also conflicting results in two of the studies using the Diabetes Treatment Satisfaction Questionnaire (DTSQ). DeVries et al. reported no difference in treatment satisfaction between CSII pump users and MDI users while Brutomesso et al. reported that treatment satisfaction improved among CSII pump users.
Patient preference for CSII pumps was demonstrated in just one study (Hanaire-Broutin et al. 2000) and there are considerable limitations with interpreting this data as it was gathered through interview and 72% of patients that preferred CSII pumps were previously on CSII pump therapy prior to the study. As all studies were industry sponsored, findings on QoL and patient preference must be interpreted with caution.
Quality of Evidence
Overall, the body of evidence was downgraded from high to low due to study quality and issues with directness as identified using the GRADE quality assessment tool (see Table 1) While blinding of patient to intervention/control was not feasible in these studies, blinding of study personnel during outcome assessment and allocation concealment were generally lacking. Trials reported consistent results for the outcomes HbA1c, mean blood glucose and glucose variability, but the directness or generalizability of studies, particularly with respect to the generalizability of the diabetic population, was questionable as most trials used highly motivated populations with fairly good glycemic control. In addition, the populations in each of the studies varied with respect to prior treatment regimens, which may not be generalizable to the population eligible for pumps in Ontario. For the outcome of hypoglycaemic events the evidence was further downgraded to very low since there was conflicting evidence between studies with respect to the frequency of mild and severe hypoglycaemic events in patients using CSII pumps as compared to CSII (see Table 2). The GRADE quality of evidence for the use of CSII in adults with type 1 diabetes is therefore low to very low and any estimate of effect is, therefore, uncertain.
GRADE Quality Assessment for CSII pumps vs. MDI on HbA1c, Mean Blood Glucose, and Glucose Variability for Adults with Type 1 Diabetes
Inadequate or unknown allocation concealment (3/4 studies); Unblinded assessment (all studies) however lack of blinding due to the nature of the study; No ITT analysis (2/4 studies); possible bias SMBG (all studies)
HbA1c: 3/4 studies show consistency however magnitude of effect varies greatly; Single study uses insulin glargine instead of NPH; Mean Blood Glucose: 3/4 studies show consistency however magnitude of effect varies between studies; Glucose Variability: All studies show consistency but 1 study only showed a significant effect in the morning
Generalizability in question due to varying populations: highly motivated populations, educational component of interventions/ run-in phases, insulin pen use in 2/4 studies and varying levels of baseline glycemic control and experience with intensified insulin therapy, pumps and MDI.
GRADE Quality Assessment for CSII pumps vs. MDI on Frequency of Hypoglycemic
Inadequate or unknown allocation concealment (3/4 studies); Unblinded assessment (all studies) however lack of blinding due to the nature of the study; No ITT analysis (2/4 studies); possible bias SMBG (all studies)
Conflicting evidence with respect to mild and severe hypoglycemic events reported in studies
Generalizability in question due to varying populations: highly motivated populations, educational component of interventions/ run-in phases, insulin pen use in 2/4 studies and varying levels of baseline glycemic control and experience with intensified insulin therapy, pumps and MDI.
Economic Analysis
One article was included in the analysis from the economic literature scan. Four other economic evaluations were identified but did not meet our inclusion criteria. Two of these articles did not compare CSII with MDI and the other two articles used summary estimates from a mixed population with Type 1 and 2 diabetes in their economic microsimulation to estimate costs and effects over time. Included were English articles that conducted comparisons between CSII and MDI with the outcome of Quality Adjusted Life Years (QALY) in an adult population with type 1 diabetes.
From one study, a subset of the population with type 1 diabetes was identified that may be suitable and benefit from using insulin pumps. There is, however, limited data in the literature addressing the cost-effectiveness of insulin pumps versus MDI in type 1 diabetes. Longer term models are required to estimate the long term costs and effects of pumps compared to MDI in this population.
Conclusions
CSII pumps for the treatment of adults with type 1 diabetes
Based on low-quality evidence, CSII pumps confer a statistically significant but not clinically significant reduction in HbA1c and mean daily blood glucose as compared to MDI in adults with type 1 diabetes (>19 years).
CSII pumps also confer a statistically significant reduction in glucose variability as compared to MDI in adults with type 1 diabetes (>19 years) however the clinical significance is unknown.
There is indirect evidence that the use of newer long-acting insulins (e.g. insulin glargine) in MDI regimens result in less of a difference between MDI and CSII compared to differences between MDI and CSII in which older insulins are used.
There is conflicting evidence regarding both mild and severe hypoglycemic events in this population when using CSII pumps as compared to MDI. These findings are based on very low-quality evidence.
There is an improved quality of life for patients using CSII pumps as compared to MDI however, limitations exist with this evidence.
Significant limitations of the literature exist specifically:
All studies sponsored by insulin pump manufacturers
All studies used crossover design
Prior treatment regimens varied
Types of insulins used in study varied (NPH vs. glargine)
Generalizability of studies in question as populations were highly motivated and half of studies used insulin pens as the mode of delivery for MDI
One short-term study concluded that pumps are cost-effective, although this was based on limited data and longer term models are required to estimate the long-term costs and effects of pumps compared to MDI in adults with type 1 diabetes.
Part B: Type 2 Diabetic Adults
Research Questions
Are CSII pumps more effective than MDI for improving glycemic control in adults (≥19 years) with type 2 diabetes?
Are CSII pumps more effective than MDI for improving other outcomes related to diabetes such as quality of life?
Literature Search
Inclusion Criteria
Randomized controlled trials, systematic reviews, meta-analysis and/or health technology assessments from MEDLINE, Excerpta Medica Database (EMBASE), Cumulative Index to Nursing & Allied Health Literature (CINAHL)
Any person with type 2 diabetes requiring insulin treatment intensive
Published between January 1, 2000 – August 2008
Exclusion Criteria
Studies with <10 patients
Studies <5 weeks in duration
CSII applied only at night time and not 24 hours/day
Mixed group of diabetes patients (children, adults, type 1, type 2)
Pregnancy studies
Outcomes of Interest
The primary outcome of interest was a reduction in glycosylated hemoglobin (HbA1c) levels. Other outcomes of interest were mean blood glucose level, glucose variability, insulin requirements, frequency of hypoglycemic events, adverse events, and quality of life.
Search Strategy
A comprehensive literature search was performed in OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, CINAHL, The Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published between January 1, 2000 and August 15, 2008. Studies meeting the inclusion criteria were selected from the search results. Data on the study characteristics, patient characteristics, primary and secondary treatment outcomes, and adverse events were abstracted. Reference lists of selected articles were also checked for relevant studies. The quality of the evidence was assessed as high, moderate, low, or very low according to the GRADE methodology.
Summary of Findings
The database search identified 286 relevant citations published between 1996 and August 2008. Of the 286 abstracts reviewed, four RCTs met the inclusion criteria outlined above. Upon examination, two studies were subsequently excluded from the meta-analysis due to small sample size and missing data (Berthe et al.), as well as outlier status and high drop out rate (Wainstein et al) which is consistent with previously reported meta-analyses on this topic (Jeitler et al 2008, and Fatourechi M et al. 2009).
HbA1c
The primary outcome in this analysis was reduction in HbA1c. Both studies demonstrated that both CSII pumps and MDI reduce HbA1c, but neither treatment modality was found to be superior to the other. The results of a random effects model meta-analysis showed a mean difference in HbA1c of -0.14 (-0.40, 0.13) between the two groups, which was found not to be statistically or clinically significant. There was no statistical heterogeneity observed between the two studies (I2=0%).
Forrest plot of two parallel, RCTs comparing CSII to MDI in type 2 diabetes
Secondary Outcomes
Mean Blood Glucose and Glucose Variability
Mean blood glucose was only used as an efficacy outcome in one study (Raskin et al. 2003). The authors found that the only time point in which there were consistently lower blood glucose values for the CSII group compared to the MDI group was 90 minutes after breakfast. Glucose variability was not examined in either study and the authors reported no difference in weight gain between the CSII pump group and MDI groups at the end of study. Conflicting results were reported regarding injection site reactions between the two studies. Herman et al. reported no difference in the number of subjects experiencing site problems between the two groups, while Raskin et al. reported that there were no injection site reactions in the MDI group but 15 such episodes among 8 participants in the CSII pump group.
Frequency of Hypoglycemic Events and Insulin Requirements
All studies reported that there were no differences in the number of mild hypoglycemic events in patients on CSII pumps versus MDI. Herman et al. also reported no differences in the number of severe hypoglycemic events in patients using CSII pumps compared to those on MDI. Raskin et al. reported that there were no severe hypoglycemic events in either group throughout the study duration. Insulin requirements were only examined in Herman et al., who found that daily insulin requirements were equal between the CSII pump and MDI treatment groups.
Quality of Life
QoL was measured by Herman et al. using the Diabetes Quality of Life Clinical Trial Questionnaire (DQOLCTQ). There were no differences reported between CSII users and MDI users for treatment satisfaction, diabetes impact, and worry-related scores. Patient satisfaction was measured in Raskin et al. using a patient satisfaction questionnaire, whose results indicated that patients in the CSII pump group had significantly greater improvement in overall treatment satisfaction at the end of the study compared to the MDI group. Although patient preference was also reported, it was only examined in the CSII pump group, thus results indicating a greater preference for CSII pumps in this groups (as compared to prior injectable insulin regimens) are biased and must be interpreted with caution.
Quality of Evidence
Overall, the body of evidence was downgraded from high to low according to study quality and issues with directness as identified using the GRADE quality assessment tool (see Table 3). While blinding of patient to intervention/control is not feasible in these studies, blinding of study personnel during outcome assessment and allocation concealment were generally lacking. ITT was not clearly explained in one study and heterogeneity between study populations was evident from participants’ treatment regimens prior to study initiation. Although trials reported consistent results for HbA1c outcomes, the directness or generalizability of studies, particularly with respect to the generalizability of the diabetic population, was questionable as trials required patients to adhere to an intense SMBG regimen. This suggests that patients were highly motivated. In addition, since prior treatment regimens varied between participants (no requirement for patients to be on MDI), study findings may not be generalizable to the population eligible for a pump in Ontario. The GRADE quality of evidence for the use of CSII in adults with type 2 diabetes is, therefore, low and any estimate of effect is uncertain.
GRADE Quality Assessment for CSII pumps vs. MDI on HbA1c Adults with Type 2 Diabetes
Inadequate or unknown allocation concealment (all studies); Unblinded assessment (all studies) however lack of blinding due to the nature of the study; ITT not well explained in 1 of 2 studies
Indirect due to lack of generalizability of findings since participants varied with respect to prior treatment regimens and intensive SMBG suggests highly motivated populations used in trials.
Economic Analysis
An economic analysis of CSII pumps was carried out using the Ontario Diabetes Economic Model (ODEM) and has been previously described in the report entitled “Application of the Ontario Diabetes Economic Model (ODEM) to Determine the Cost-effectiveness and Budget Impact of Selected Type 2 Diabetes Interventions in Ontario”, part of the diabetes strategy evidence series. Based on the analysis, CSII pumps are not cost-effective for adults with type 2 diabetes, either for the age 65+ sub-group or for all patients in general. Details of the analysis can be found in the full report.
Conclusions
CSII pumps for the treatment of adults with type 2 diabetes
There is low quality evidence demonstrating that the efficacy of CSII pumps is not superior to MDI for adult type 2 diabetics.
There were no differences in the number of mild and severe hypoglycemic events in patients on CSII pumps versus MDI.
There are conflicting findings with respect to an improved quality of life for patients using CSII pumps as compared to MDI.
Significant limitations of the literature exist specifically:
All studies sponsored by insulin pump manufacturers
Prior treatment regimens varied
Types of insulins used in study varied (NPH vs. glargine)
Generalizability of studies in question as populations may not reflect eligible patient population in Ontario (participants not necessarily on MDI prior to study initiation, pen used in one study and frequency of SMBG required during study was high suggesting highly motivated participants)
Based on ODEM, insulin pumps are not cost-effective for adults with type 2 diabetes either for the age 65+ sub-group or for all patients in general.
PMCID: PMC3377523  PMID: 23074525
7.  Validation and comparison of EuroQoL-5 dimension (EQ-5D) and Short Form-6 dimension (SF-6D) among stable angina patients 
Objectives
Several preference-based health-related quality of life (HRQoL) instruments have been published and widely used in different populations. However no consensus has emerged regarding the most appropriate instrument in therapeutic area of stable angina. This study compared and validated the psychometric properties of two generic preference-based instruments, the EQ-5D and SF-6D, among Chinese stable angina patients.
Methods
Convergent validity of the EQ-5D and SF-6D was examined with eight a priori hypotheses from stable angina patients in conjunction with Seattle Angina Questionnaire (SAQ). Responsiveness was compared using the effect size (ES), relative efficiency (RE) and receiver operating characteristic (ROC) curves. Agreement between the EQ-5D and SF-6D was tested using intra-class correlation coefficient (ICC) and Bland-Altman plot. Factors affecting utility difference were explored with multiple linear regression analysis.
Results
In 411 patients (mean age 68.08 ± 11.35), mean utility scores (SD) were 0.78 (0.15) for the EQ-5D and 0.68 (0.12) for the SF-6D. Validity was demonstrated by the moderate to strong correlation coefficients (Range: 0.368-0.594, P< 0.001) for five of the eight hypotheses in both the EQ-5D and SF-6D. There were no serious floor effects for the EQ-5D and SF-6D, but ceiling effects for the EQ-5D were large. The areas under ROC of them all exceeded 0.5 (0.660-0.814, P< 0.001). The SF-6D showed a better discriminative capacity (ES: 0.573 to 1.179) between groups with different stable-angina-specific health status than the EQ-5D (ES: 0.426 to 1.126). RE suggested that the SF-6D (RE: 44.8 to 177.8%) was more efficient than the EQ-5D except for physical function. Poor agreement between them was observed with ICC (0.448, P< 0.001) and Bland-Altman plot analysis. Multiple liner regression showed that clinical variables significantly (P< 0.05) influenced differences in utility scores between the EQ-5D and SF-6D.
Conclusions
Both EQ-5D and SF-6D are valid and sensitive preference-based HRQoL instruments in Chinese stable angina patients. The SF-6D may be a more effective tool with lower ceiling effect and greater sensitivity. Further study is needed to compare other properties, such as reliability and longitudinal response.
doi:10.1186/s12955-014-0156-6
PMCID: PMC4213514  PMID: 25343944
Quality of life; Stable angina; EQ-5D; SF-6D; Utility; China
8.  Level of agreement between patient-reported EQ-5D responses and EQ-5D responses mapped from the SF-12 in an injury population 
Background
Comparing health-related quality of life (HRQL) outcomes between studies is difficult due to the wide variety of instruments used. Comparing study outcomes and facilitating pooled data analyses requires valid “crosswalks” between HRQL instruments. Algorithms exist to map 12-item Short Form Health Survey (SF-12) responses to EQ-5D item responses and preference weights, but none have been validated in populations where disability is prevalent, such as injury.
Methods
Data were extracted from the Validating and Improving injury Burden Estimates Study (Injury-VIBES) for 10,166 adult, hospitalized trauma patients, with both the three-level EQ-5D (EQ-5D-3L) and SF-12 data responses at six and 12-months postinjury. Agreement between actual (patient-reported) and estimated (mapped from SF-12) EQ-5D-3L item responses and preference weights was assessed using Kappa, Prevalence-Adjusted Bias-Adjusted Kappa statistics and Bland-Altman plots.
Results
Moderate agreement was observed for usual activities, pain/discomfort, and anxiety/depression. Agreement was substantial for mobility and self-care items. The mean differences in preference weights were -0.024 and -0.012 at six and 12 months (p < 0.001), respectively. The Bland-Altman plot limits of agreement were large compared to the range of valid preference weight values (-0.56 to 1.00). Estimated EQ-5D-3L responses under-reported disability for all items except pain/discomfort.
Conclusions
Caution should be taken when using EQ-5D-3L responses mapped from the SF-12 to describe patient outcomes or when undertaking economic evaluation, due to the underestimation of disability associated with mapped values. The findings from this study could be used to adjust expected EQ-5D-3L preference weights when estimated from SF-12 item responses when combining data from studies that use either instrument.
Electronic supplementary material
The online version of this article (doi:10.1186/s12963-015-0047-z) contains supplementary material, which is available to authorized users.
doi:10.1186/s12963-015-0047-z
PMCID: PMC4474565  PMID: 26097435
Injury; Agreement; SF-12; EQ-5D; Quality of life
9.  An empirical comparison of the OPQoL-Brief, EQ-5D-3 L and ASCOT in a community dwelling population of older people 
Background
This study examined the relationships between a newly developed older person-specific non-preference-based quality of life (QoL) instrument (Older People’s Quality of Life brief questionnaire (OPQoL-brief)) and two generic preference-based instruments (the EQ-5D-3L Level (EQ-5D-3 L) and the Adult Social Care Outcomes Toolkit (ASCOT) in a community-dwelling population of Australian older people receiving aged care services.
Methods
We formulated hypotheses about the convergent validity between the instruments (examined by Wilcoxon-Mann Whitney, Kruskal Wallis and Spearman’s correlation tests) and levels of agreement (assessed using intra class correlation (ICC) and modified Bland-Altman plots based on normalized Z EQ-5D-3 L and ASCOT utilities and OPQoL-Brief summary scores).
Results
The utilities/summary scores for 87 participants (aged 65–93 years) were moderately but positively correlated. Moderate convergent validity was evident for a number of instrument dimensions with the strongest relationship (r = 0.57) between ‘enjoy life’ (OPQoL-Brief) and ‘social contact’ (ASCOT). The overall ICC was 0.54 and Bland-Altman scatter plots showed 3–6 % of normalized Z-scores were outside the 95 % limits of agreement suggesting moderate agreement between all three instruments (agreement highest between the OPQoL-Brief and the ASCOT).
Conclusions
Our results suggest that the OPQoL-Brief, the ASCOT and the EQ-5D_3L are suitable for measuring quality of life outcomes in community-dwelling populations of older people. Given the different constructs underpinning these instruments, we recommend that choice of instrument should be guided by the context in which the instruments are being applied. Currently, the OPQoL-Brief is not suitable for use in cost-utility analyses as it is not preference-based. Given their different perspectives, we recommend that both the ASCOT and the EQ-5D are applied simultaneously to capture broader aspects of quality of life and health status within cost-utility analyses within the aged care sector. Future research directed towards the development of a new single preference-based instrument that incorporates both health status and broader aspects of quality of life within quality adjusted life year calculations for older people would be beneficial.
doi:10.1186/s12955-015-0357-7
PMCID: PMC4588872  PMID: 26420314
EQ-5D-3 L; OPQoL-Brief; ASCOT; Convergent validity; Level of agreement; Community-dwelling; Older people
10.  Internet-Based Device-Assisted Remote Monitoring of Cardiovascular Implantable Electronic Devices 
Executive Summary
Objective
The objective of this Medical Advisory Secretariat (MAS) report was to conduct a systematic review of the available published evidence on the safety, effectiveness, and cost-effectiveness of Internet-based device-assisted remote monitoring systems (RMSs) for therapeutic cardiac implantable electronic devices (CIEDs) such as pacemakers (PMs), implantable cardioverter-defibrillators (ICDs), and cardiac resynchronization therapy (CRT) devices. The MAS evidence-based review was performed to support public financing decisions.
Clinical Need: Condition and Target Population
Sudden cardiac death (SCD) is a major cause of fatalities in developed countries. In the United States almost half a million people die of SCD annually, resulting in more deaths than stroke, lung cancer, breast cancer, and AIDS combined. In Canada each year more than 40,000 people die from a cardiovascular related cause; approximately half of these deaths are attributable to SCD.
Most cases of SCD occur in the general population typically in those without a known history of heart disease. Most SCDs are caused by cardiac arrhythmia, an abnormal heart rhythm caused by malfunctions of the heart’s electrical system. Up to half of patients with significant heart failure (HF) also have advanced conduction abnormalities.
Cardiac arrhythmias are managed by a variety of drugs, ablative procedures, and therapeutic CIEDs. The range of CIEDs includes pacemakers (PMs), implantable cardioverter-defibrillators (ICDs), and cardiac resynchronization therapy (CRT) devices. Bradycardia is the main indication for PMs and individuals at high risk for SCD are often treated by ICDs.
Heart failure (HF) is also a significant health problem and is the most frequent cause of hospitalization in those over 65 years of age. Patients with moderate to severe HF may also have cardiac arrhythmias, although the cause may be related more to heart pump or haemodynamic failure. The presence of HF, however, increases the risk of SCD five-fold, regardless of aetiology. Patients with HF who remain highly symptomatic despite optimal drug therapy are sometimes also treated with CRT devices.
With an increasing prevalence of age-related conditions such as chronic HF and the expanding indications for ICD therapy, the rate of ICD placement has been dramatically increasing. The appropriate indications for ICD placement, as well as the rate of ICD placement, are increasingly an issue. In the United States, after the introduction of expanded coverage of ICDs, a national ICD registry was created in 2005 to track these devices. A recent survey based on this national ICD registry reported that 22.5% (25,145) of patients had received a non-evidence based ICD and that these patients experienced significantly higher in-hospital mortality and post-procedural complications.
In addition to the increased ICD device placement and the upfront device costs, there is the need for lifelong follow-up or surveillance, placing a significant burden on patients and device clinics. In 2007, over 1.6 million CIEDs were implanted in Europe and the United States, which translates to over 5.5 million patient encounters per year if the recommended follow-up practices are considered. A safe and effective RMS could potentially improve the efficiency of long-term follow-up of patients and their CIEDs.
Technology
In addition to being therapeutic devices, CIEDs have extensive diagnostic abilities. All CIEDs can be interrogated and reprogrammed during an in-clinic visit using an inductive programming wand. Remote monitoring would allow patients to transmit information recorded in their devices from the comfort of their own homes. Currently most ICD devices also have the potential to be remotely monitored. Remote monitoring (RM) can be used to check system integrity, to alert on arrhythmic episodes, and to potentially replace in-clinic follow-ups and manage disease remotely. They do not currently have the capability of being reprogrammed remotely, although this feature is being tested in pilot settings.
Every RMS is specifically designed by a manufacturer for their cardiac implant devices. For Internet-based device-assisted RMSs, this customization includes details such as web application, multiplatform sensors, custom algorithms, programming information, and types and methods of alerting patients and/or physicians. The addition of peripherals for monitoring weight and pressure or communicating with patients through the onsite communicators also varies by manufacturer. Internet-based device-assisted RMSs for CIEDs are intended to function as a surveillance system rather than an emergency system.
Health care providers therefore need to learn each application, and as more than one application may be used at one site, multiple applications may need to be reviewed for alarms. All RMSs deliver system integrity alerting; however, some systems seem to be better geared to fast arrhythmic alerting, whereas other systems appear to be more intended for remote follow-up or supplemental remote disease management. The different RMSs may therefore have different impacts on workflow organization because of their varying frequency of interrogation and methods of alerts. The integration of these proprietary RM web-based registry systems with hospital-based electronic health record systems has so far not been commonly implemented.
Currently there are 2 general types of RMSs: those that transmit device diagnostic information automatically and without patient assistance to secure Internet-based registry systems, and those that require patient assistance to transmit information. Both systems employ the use of preprogrammed alerts that are either transmitted automatically or at regular scheduled intervals to patients and/or physicians.
The current web applications, programming, and registry systems differ greatly between the manufacturers of transmitting cardiac devices. In Canada there are currently 4 manufacturers—Medtronic Inc., Biotronik, Boston Scientific Corp., and St Jude Medical Inc.—which have regulatory approval for remote transmitting CIEDs. Remote monitoring systems are proprietary to the manufacturer of the implant device. An RMS for one device will not work with another device, and the RMS may not work with all versions of the manufacturer’s devices.
All Internet-based device-assisted RMSs have common components. The implanted device is equipped with a micro-antenna that communicates with a small external device (at bedside or wearable) commonly known as the transmitter. Transmitters are able to interrogate programmed parameters and diagnostic data stored in the patients’ implant device. The information transfer to the communicator can occur at preset time intervals with the participation of the patient (waving a wand over the device) or it can be sent automatically (wirelessly) without their participation. The encrypted data are then uploaded to an Internet-based database on a secure central server. The data processing facilities at the central database, depending on the clinical urgency, can trigger an alert for the physician(s) that can be sent via email, fax, text message, or phone. The details are also posted on the secure website for viewing by the physician (or their delegate) at their convenience.
Research Questions
The research directions and specific research questions for this evidence review were as follows:
To identify the Internet-based device-assisted RMSs available for follow-up of patients with therapeutic CIEDs such as PMs, ICDs, and CRT devices.
To identify the potential risks, operational issues, or organizational issues related to Internet-based device-assisted RM for CIEDs.
To evaluate the safety, acceptability, and effectiveness of Internet-based device-assisted RMSs for CIEDs such as PMs, ICDs, and CRT devices.
To evaluate the safety, effectiveness, and cost-effectiveness of Internet-based device-assisted RMSs for CIEDs compared to usual outpatient in-office monitoring strategies.
To evaluate the resource implications or budget impact of RMSs for CIEDs in Ontario, Canada.
Research Methods
Literature Search
The review included a systematic review of published scientific literature and consultations with experts and manufacturers of all 4 approved RMSs for CIEDs in Canada. Information on CIED cardiac implant clinics was also obtained from Provincial Programs, a division within the Ministry of Health and Long-Term Care with a mandate for cardiac implant specialty care. Various administrative databases and registries were used to outline the current clinical follow-up burden of CIEDs in Ontario. The provincial population-based ICD database developed and maintained by the Institute for Clinical Evaluative Sciences (ICES) was used to review the current follow-up practices with Ontario patients implanted with ICD devices.
Search Strategy
A literature search was performed on September 21, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from 1950 to September 2010. Search alerts were generated and reviewed for additional relevant literature until December 31, 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search.
Inclusion Criteria
published between 1950 and September 2010;
English language full-reports and human studies;
original reports including clinical evaluations of Internet-based device-assisted RMSs for CIEDs in clinical settings;
reports including standardized measurements on outcome events such as technical success, safety, effectiveness, cost, measures of health care utilization, morbidity, mortality, quality of life or patient satisfaction;
randomized controlled trials (RCTs), systematic reviews and meta-analyses, cohort and controlled clinical studies.
Exclusion Criteria
non-systematic reviews, letters, comments and editorials;
reports not involving standardized outcome events;
clinical reports not involving Internet-based device assisted RM systems for CIEDs in clinical settings;
reports involving studies testing or validating algorithms without RM;
studies with small samples (<10 subjects).
Outcomes of Interest
The outcomes of interest included: technical outcomes, emergency department visits, complications, major adverse events, symptoms, hospital admissions, clinic visits (scheduled and/or unscheduled), survival, morbidity (disease progression, stroke, etc.), patient satisfaction, and quality of life.
Summary of Findings
The MAS evidence review was performed to review available evidence on Internet-based device-assisted RMSs for CIEDs published until September 2010. The search identified 6 systematic reviews, 7 randomized controlled trials, and 19 reports for 16 cohort studies—3 of these being registry-based and 4 being multi-centered. The evidence is summarized in the 3 sections that follow.
1. Effectiveness of Remote Monitoring Systems of CIEDs for Cardiac Arrhythmia and Device Functioning
In total, 15 reports on 13 cohort studies involving investigations with 4 different RMSs for CIEDs in cardiology implant clinic groups were identified in the review. The 4 RMSs were: Care Link Network® (Medtronic Inc,, Minneapolis, MN, USA); Home Monitoring® (Biotronic, Berlin, Germany); House Call 11® (St Jude Medical Inc., St Pauls, MN, USA); and a manufacturer-independent RMS. Eight of these reports were with the Home Monitoring® RMS (12,949 patients), 3 were with the Care Link® RMS (167 patients), 1 was with the House Call 11® RMS (124 patients), and 1 was with a manufacturer-independent RMS (44 patients). All of the studies, except for 2 in the United States, (1 with Home Monitoring® and 1 with House Call 11®), were performed in European countries.
The RMSs in the studies were evaluated with different cardiac implant device populations: ICDs only (6 studies), ICD and CRT devices (3 studies), PM and ICD and CRT devices (4 studies), and PMs only (2 studies). The patient populations were predominately male (range, 52%–87%) in all studies, with mean ages ranging from 58 to 76 years. One study population was unique in that RMSs were evaluated for ICDs implanted solely for primary prevention in young patients (mean age, 44 years) with Brugada syndrome, which carries an inherited increased genetic risk for sudden heart attack in young adults.
Most of the cohort studies reported on the feasibility of RMSs in clinical settings with limited follow-up. In the short follow-up periods of the studies, the majority of the events were related to detection of medical events rather than system configuration or device abnormalities. The results of the studies are summarized below:
The interrogation of devices on the web platform, both for continuous and scheduled transmissions, was significantly quicker with remote follow-up, both for nurses and physicians.
In a case-control study focusing on a Brugada population–based registry with patients followed-up remotely, there were significantly fewer outpatient visits and greater detection of inappropriate shocks. One death occurred in the control group not followed remotely and post-mortem analysis indicated early signs of lead failure prior to the event.
Two studies examined the role of RMSs in following ICD leads under regulatory advisory in a European clinical setting and noted:
– Fewer inappropriate shocks were administered in the RM group.
– Urgent in-office interrogations and surgical revisions were performed within 12 days of remote alerts.
– No signs of lead fracture were detected at in-office follow-up; all were detected at remote follow-up.
Only 1 study reported evaluating quality of life in patients followed up remotely at 3 and 6 months; no values were reported.
Patient satisfaction was evaluated in 5 cohort studies, all in short term follow-up: 1 for the Home Monitoring® RMS, 3 for the Care Link® RMS, and 1 for the House Call 11® RMS.
– Patients reported receiving a sense of security from the transmitter, a good relationship with nurses and physicians, positive implications for their health, and satisfaction with RM and organization of services.
– Although patients reported that the system was easy to implement and required less than 10 minutes to transmit information, a variable proportion of patients (range, 9% 39%) reported that they needed the assistance of a caregiver for their transmission.
– The majority of patients would recommend RM to other ICD patients.
– Patients with hearing or other physical or mental conditions hindering the use of the system were excluded from studies, but the frequency of this was not reported.
Physician satisfaction was evaluated in 3 studies, all with the Care Link® RMS:
– Physicians reported an ease of use and high satisfaction with a generally short-term use of the RMS.
– Physicians reported being able to address the problems in unscheduled patient transmissions or physician initiated transmissions remotely, and were able to handle the majority of the troubleshooting calls remotely.
– Both nurses and physicians reported a high level of satisfaction with the web registry system.
2. Effectiveness of Remote Monitoring Systems in Heart Failure Patients for Cardiac Arrhythmia and Heart Failure Episodes
Remote follow-up of HF patients implanted with ICD or CRT devices, generally managed in specialized HF clinics, was evaluated in 3 cohort studies: 1 involved the Home Monitoring® RMS and 2 involved the Care Link® RMS. In these RMSs, in addition to the standard diagnostic features, the cardiac devices continuously assess other variables such as patient activity, mean heart rate, and heart rate variability. Intra-thoracic impedance, a proxy measure for lung fluid overload, was also measured in the Care Link® studies. The overall diagnostic performance of these measures cannot be evaluated, as the information was not reported for patients who did not experience intra-thoracic impedance threshold crossings or did not undergo interventions. The trial results involved descriptive information on transmissions and alerts in patients experiencing high morbidity and hospitalization in the short study periods.
3. Comparative Effectiveness of Remote Monitoring Systems for CIEDs
Seven RCTs were identified evaluating RMSs for CIEDs: 2 were for PMs (1276 patients) and 5 were for ICD/CRT devices (3733 patients). Studies performed in the clinical setting in the United States involved both the Care Link® RMS and the Home Monitoring® RMS, whereas all studies performed in European countries involved only the Home Monitoring® RMS.
3A. Randomized Controlled Trials of Remote Monitoring Systems for Pacemakers
Two trials, both multicenter RCTs, were conducted in different countries with different RMSs and study objectives. The PREFER trial was a large trial (897 patients) performed in the United States examining the ability of Care Link®, an Internet-based remote PM interrogation system, to detect clinically actionable events (CAEs) sooner than the current in-office follow-up supplemented with transtelephonic monitoring transmissions, a limited form of remote device interrogation. The trial results are summarized below:
In the 375-day mean follow-up, 382 patients were identified with at least 1 CAE—111 patients in the control arm and 271 in the remote arm.
The event rate detected per patient for every type of CAE, except for loss of atrial capture, was higher in the remote arm than the control arm.
The median time to first detection of CAEs (4.9 vs. 6.3 months) was significantly shorter in the RMS group compared to the control group (P < 0.0001).
Additionally, only 2% (3/190) of the CAEs in the control arm were detected during a transtelephonic monitoring transmission (the rest were detected at in-office follow-ups), whereas 66% (446/676) of the CAEs were detected during remote interrogation.
The second study, the OEDIPE trial, was a smaller trial (379 patients) performed in France evaluating the ability of the Home Monitoring® RMS to shorten PM post-operative hospitalization while preserving the safety of conventional management of longer hospital stays.
Implementation and operationalization of the RMS was reported to be successful in 91% (346/379) of the patients and represented 8144 transmissions.
In the RM group 6.5% of patients failed to send messages (10 due to improper use of the transmitter, 2 with unmanageable stress). Of the 172 patients transmitting, 108 patients sent a total of 167 warnings during the trial, with a greater proportion of warnings being attributed to medical rather than technical causes.
Forty percent had no warning message transmission and among these, 6 patients experienced a major adverse event and 1 patient experienced a non-major adverse event. Of the 6 patients having a major adverse event, 5 contacted their physician.
The mean medical reaction time was faster in the RM group (6.5 ± 7.6 days vs. 11.4 ± 11.6 days).
The mean duration of hospitalization was significantly shorter (P < 0.001) for the RM group than the control group (3.2 ± 3.2 days vs. 4.8 ± 3.7 days).
Quality of life estimates by the SF-36 questionnaire were similar for the 2 groups at 1-month follow-up.
3B. Randomized Controlled Trials Evaluating Remote Monitoring Systems for ICD or CRT Devices
The 5 studies evaluating the impact of RMSs with ICD/CRT devices were conducted in the United States and in European countries and involved 2 RMSs—Care Link® and Home Monitoring ®. The objectives of the trials varied and 3 of the trials were smaller pilot investigations.
The first of the smaller studies (151 patients) evaluated patient satisfaction, achievement of patient outcomes, and the cost-effectiveness of the Care Link® RMS compared to quarterly in-office device interrogations with 1-year follow-up.
Individual outcomes such as hospitalizations, emergency department visits, and unscheduled clinic visits were not significantly different between the study groups.
Except for a significantly higher detection of atrial fibrillation in the RM group, data on ICD detection and therapy were similar in the study groups.
Health-related quality of life evaluated by the EuroQoL at 6-month or 12-month follow-up was not different between study groups.
Patients were more satisfied with their ICD care in the clinic follow-up group than in the remote follow-up group at 6-month follow-up, but were equally satisfied at 12- month follow-up.
The second small pilot trial (20 patients) examined the impact of RM follow-up with the House Call 11® system on work schedules and cost savings in patients randomized to 2 study arms varying in the degree of remote follow-up.
The total time including device interrogation, transmission time, data analysis, and physician time required was significantly shorter for the RM follow-up group.
The in-clinic waiting time was eliminated for patients in the RM follow-up group.
The physician talk time was significantly reduced in the RM follow-up group (P < 0.05).
The time for the actual device interrogation did not differ in the study groups.
The third small trial (115 patients) examined the impact of RM with the Home Monitoring® system compared to scheduled trimonthly in-clinic visits on the number of unplanned visits, total costs, health-related quality of life (SF-36), and overall mortality.
There was a 63.2% reduction in in-office visits in the RM group.
Hospitalizations or overall mortality (values not stated) were not significantly different between the study groups.
Patient-induced visits were higher in the RM group than the in-clinic follow-up group.
The TRUST Trial
The TRUST trial was a large multicenter RCT conducted at 102 centers in the United States involving the Home Monitoring® RMS for ICD devices for 1450 patients. The primary objectives of the trial were to determine if remote follow-up could be safely substituted for in-office clinic follow-up (3 in-office visits replaced) and still enable earlier physician detection of clinically actionable events.
Adherence to the protocol follow-up schedule was significantly higher in the RM group than the in-office follow-up group (93.5% vs. 88.7%, P < 0.001).
Actionability of trimonthly scheduled checks was low (6.6%) in both study groups. Overall, actionable causes were reprogramming (76.2%), medication changes (24.8%), and lead/system revisions (4%), and these were not different between the 2 study groups.
The overall mean number of in-clinic and hospital visits was significantly lower in the RM group than the in-office follow-up group (2.1 per patient-year vs. 3.8 per patient-year, P < 0.001), representing a 45% visit reduction at 12 months.
The median time from onset of first arrhythmia to physician evaluation was significantly shorter (P < 0.001) in the RM group than in the in-office follow-up group for all arrhythmias (1 day vs. 35.5 days).
The median time to detect clinically asymptomatic arrhythmia events—atrial fibrillation (AF), ventricular fibrillation (VF), ventricular tachycardia (VT), and supra-ventricular tachycardia (SVT)—was also significantly shorter (P < 0.001) in the RM group compared to the in-office follow-up group (1 day vs. 41.5 days) and was significantly quicker for each of the clinical arrhythmia events—AF (5.5 days vs. 40 days), VT (1 day vs. 28 days), VF (1 day vs. 36 days), and SVT (2 days vs. 39 days).
System-related problems occurred infrequently in both groups—in 1.5% of patients (14/908) in the RM group and in 0.7% of patients (3/432) in the in-office follow-up group.
The overall adverse event rate over 12 months was not significantly different between the 2 groups and individual adverse events were also not significantly different between the RM group and the in-office follow-up group: death (3.4% vs. 4.9%), stroke (0.3% vs. 1.2%), and surgical intervention (6.6% vs. 4.9%), respectively.
The 12-month cumulative survival was 96.4% (95% confidence interval [CI], 95.5%–97.6%) in the RM group and 94.2% (95% confidence interval [CI], 91.8%–96.6%) in the in-office follow-up group, and was not significantly different between the 2 groups (P = 0.174).
The CONNECT Trial
The CONNECT trial, another major multicenter RCT, involved the Care Link® RMS for ICD/CRT devices in a15-month follow-up study of 1,997 patients at 133 sites in the United States. The primary objective of the trial was to determine whether automatically transmitted physician alerts decreased the time from the occurrence of clinically relevant events to medical decisions. The trial results are summarized below:
Of the 575 clinical alerts sent in the study, 246 did not trigger an automatic physician alert. Transmission failures were related to technical issues such as the alert not being programmed or not being reset, and/or a variety of patient factors such as not being at home and the monitor not being plugged in or set up.
The overall mean time from the clinically relevant event to the clinical decision was significantly shorter (P < 0.001) by 17.4 days in the remote follow-up group (4.6 days for 172 patients) than the in-office follow-up group (22 days for 145 patients).
– The median time to a clinical decision was shorter in the remote follow-up group than in the in-office follow-up group for an AT/AF burden greater than or equal to 12 hours (3 days vs. 24 days) and a fast VF rate greater than or equal to 120 beats per minute (4 days vs. 23 days).
Although infrequent, similar low numbers of events involving low battery and VF detection/therapy turned off were noted in both groups. More alerts, however, were noted for out-of-range lead impedance in the RM group (18 vs. 6 patients), and the time to detect these critical events was significantly shorter in the RM group (same day vs. 17 days).
Total in-office clinic visits were reduced by 38% from 6.27 visits per patient-year in the in-office follow-up group to 3.29 visits per patient-year in the remote follow-up group.
Health care utilization visits (N = 6,227) that included cardiovascular-related hospitalization, emergency department visits, and unscheduled clinic visits were not significantly higher in the remote follow-up group.
The overall mean length of hospitalization was significantly shorter (P = 0.002) for those in the remote follow-up group (3.3 days vs. 4.0 days) and was shorter both for patients with ICD (3.0 days vs. 3.6 days) and CRT (3.8 days vs. 4.7 days) implants.
The mortality rate between the study arms was not significantly different between the follow-up groups for the ICDs (P = 0.31) or the CRT devices with defribillator (P = 0.46).
Conclusions
There is limited clinical trial information on the effectiveness of RMSs for PMs. However, for RMSs for ICD devices, multiple cohort studies and 2 large multicenter RCTs demonstrated feasibility and significant reductions in in-office clinic follow-ups with RMSs in the first year post implantation. The detection rates of clinically significant events (and asymptomatic events) were higher, and the time to a clinical decision for these events was significantly shorter, in the remote follow-up groups than in the in-office follow-up groups. The earlier detection of clinical events in the remote follow-up groups, however, was not associated with lower morbidity or mortality rates in the 1-year follow-up. The substitution of almost all the first year in-office clinic follow-ups with RM was also not associated with an increased health care utilization such as emergency department visits or hospitalizations.
The follow-up in the trials was generally short-term, up to 1 year, and was a more limited assessment of potential longer term device/lead integrity complications or issues. None of the studies compared the different RMSs, particularly the different RMSs involving patient-scheduled transmissions or automatic transmissions. Patients’ acceptance of and satisfaction with RM were reported to be high, but the impact of RM on patients’ health-related quality of life, particularly the psychological aspects, was not evaluated thoroughly. Patients who are not technologically competent, having hearing or other physical/mental impairments, were identified as potentially disadvantaged with remote surveillance. Cohort studies consistently identified subgroups of patients who preferred in-office follow-up. The evaluation of costs and workflow impact to the health care system were evaluated in European or American clinical settings, and only in a limited way.
Internet-based device-assisted RMSs involve a new approach to monitoring patients, their disease progression, and their CIEDs. Remote monitoring also has the potential to improve the current postmarket surveillance systems of evolving CIEDs and their ongoing hardware and software modifications. At this point, however, there is insufficient information to evaluate the overall impact to the health care system, although the time saving and convenience to patients and physicians associated with a substitution of in-office follow-up by RM is more certain. The broader issues surrounding infrastructure, impacts on existing clinical care systems, and regulatory concerns need to be considered for the implementation of Internet-based RMSs in jurisdictions involving different clinical practices.
PMCID: PMC3377571  PMID: 23074419
11.  A quantitative definition of scaphoid union: determining the inter-rater reliability of two techniques 
Background
Despite extensive literature supporting the use of computerized tomography (CT) scans in evaluating scaphoid fractures, there has not been a consensus on the methodology for defining and quantifying union. The purpose of this study was to test the inter-observer reliability of two methods of quantifying scaphoid union.
Methods
The CT scans of 50 non-operatively treated scaphoid fractures were reviewed by four blinded observers. Each was asked to classify union into one of three categories, united, partially united, or tenuously united, based on their general impression. Each reviewer then carefully analyzed each CT slice and quantified union based on two methods, the mean percentage union and the weighted mean percentage union. The estimated percentage of scaphoid union for each scan was recorded, and inter-observer reliability for both methods was assessed using a Bland-Altman plot to calculate the 95% limits of agreement. Kappa statistic was used to measure the degree of agreement for the categorical assessment of union.
Results
There was very little difference in the percentage of union calculated between the two methods (mean difference between the two methods was 1.2 ± 4.1%), with each reviewer demonstrating excellent agreement between the two methods based on the Bland-Altman plot. The kappa score indicated very good agreement (Ƙ = 0.80) between the consultant hand surgeon and the musculoskeletal radiologist, and good agreement (Ƙ = 0.62) between the consultant hand surgeon and the hand fellow for the categorical assessment of union.
Conclusions
This study describes two methods of quantifying and defining scaphoid union, both with a high inter-rater reliability. This indicates that either method can be reliably used, making it an important tool for both for clinical use and research purposes in future studies of scaphoid fractures, particularly those which are using union or time to union as their endpoint.
Level of evidence
Diagnostic, level III
doi:10.1186/1749-799X-8-28
PMCID: PMC3765287  PMID: 23961919
Inter-rater reliability; Partial union; Scaphoid; Union
12.  A multivariate hierarchical Bayesian approach to measuring agreement in repeated measurement method comparison studies 
Background
Assessing agreement in method comparison studies depends on two fundamentally important components; validity (the between method agreement) and reproducibility (the within method agreement). The Bland-Altman limits of agreement technique is one of the favoured approaches in medical literature for assessing between method validity. However, few researchers have adopted this approach for the assessment of both validity and reproducibility. This may be partly due to a lack of a flexible, easily implemented and readily available statistical machinery to analyse repeated measurement method comparison data.
Methods
Adopting the Bland-Altman framework, but using Bayesian methods, we present this statistical machinery. Two multivariate hierarchical Bayesian models are advocated, one which assumes that the underlying values for subjects remain static (exchangeable replicates) and one which assumes that the underlying values can change between repeated measurements (non-exchangeable replicates).
Results
We illustrate the salient advantages of these models using two separate datasets that have been previously analysed and presented; (i) assuming static underlying values analysed using both multivariate hierarchical Bayesian models, and (ii) assuming each subject's underlying value is continually changing quantity and analysed using the non-exchangeable replicate multivariate hierarchical Bayesian model.
Conclusion
These easily implemented models allow for full parameter uncertainty, simultaneous method comparison, handle unbalanced or missing data, and provide estimates and credible regions for all the parameters of interest. Computer code for the analyses in also presented, provided in the freely available and currently cost free software package WinBUGS.
doi:10.1186/1471-2288-9-6
PMCID: PMC2645135  PMID: 19161599
13.  Reliability and Validity of the Transport and Physical Activity Questionnaire (TPAQ) for Assessing Physical Activity Behaviour 
PLoS ONE  2014;9(9):e107039.
Background
No current validated survey instrument allows a comprehensive assessment of both physical activity and travel behaviours for use in interdisciplinary research on walking and cycling. This study reports on the test-retest reliability and validity of physical activity measures in the transport and physical activity questionnaire (TPAQ).
Methods
The TPAQ assesses time spent in different domains of physical activity and using different modes of transport for five journey purposes. Test-retest reliability of eight physical activity summary variables was assessed using intra-class correlation coefficients (ICC) and Kappa scores for continuous and categorical variables respectively. In a separate study, the validity of three survey-reported physical activity summary variables was assessed by computing Spearman correlation coefficients using accelerometer-derived reference measures. The Bland-Altman technique was used to determine the absolute validity of survey-reported time spent in moderate-to-vigorous physical activity (MVPA).
Results
In the reliability study, ICC for time spent in different domains of physical activity ranged from fair to substantial for walking for transport (ICC = 0.59), cycling for transport (ICC = 0.61), walking for recreation (ICC = 0.48), cycling for recreation (ICC = 0.35), moderate leisure-time physical activity (ICC = 0.47), vigorous leisure-time physical activity (ICC = 0.63), and total physical activity (ICC = 0.56). The proportion of participants estimated to meet physical activity guidelines showed acceptable reliability (k = 0.60). In the validity study, comparison of survey-reported and accelerometer-derived time spent in physical activity showed strong agreement for vigorous physical activity (r = 0.72, p<0.001), fair but non-significant agreement for moderate physical activity (r = 0.24, p = 0.09) and fair agreement for MVPA (r = 0.27, p = 0.05). Bland-Altman analysis showed a mean overestimation of MVPA of 87.6 min/week (p = 0.02) (95% limits of agreement −447.1 to +622.3 min/week).
Conclusion
The TPAQ provides a more comprehensive assessment of physical activity and travel behaviours and may be suitable for wider use. Its physical activity summary measures have comparable reliability and validity to those of similar existing questionnaires.
doi:10.1371/journal.pone.0107039
PMCID: PMC4162566  PMID: 25215510
14.  Percutaneous Vertebroplasty for Treatment of Painful Osteoporotic Vertebral Compression Fractures 
Executive Summary
Objective of Analysis
The objective of this analysis is to examine the safety and effectiveness of percutaneous vertebroplasty for treatment of osteoporotic vertebral compression fractures (VCFs) compared with conservative treatment.
Clinical Need and Target Population
Osteoporosis and associated fractures are important health issues in ageing populations. Vertebral compression fracture secondary to osteoporosis is a cause of morbidity in older adults. VCFs can affect both genders, but are more common among elderly females and can occur as a result of a fall or a minor trauma. The fracture may occur spontaneously during a simple activity such as picking up an object or rising up from a chair. Pain originating from the fracture site frequently increases with weight bearing. It is most severe during the first few weeks and decreases with rest and inactivity.
Traditional treatment of painful VCFs includes bed rest, analgesic use, back bracing and muscle relaxants. The comorbidities associated with VCFs include deep venous thrombosis, acceleration of osteopenea, loss of height, respiratory problems and emotional problems due to chronic pain.
Percutaneous vertebroplasty is a minimally invasive surgical procedure that has gained popularity as a new treatment option in the care for these patients. The technique of vertebroplasty was initially developed in France to treat osteolytic metastasis, myeloma, and hemangioma. The indications were further expanded to painful osteoporotic VCFs and subsequently to treatment of asymptomatic VCFs.
The mechanism of pain relief, which occurs within minutes to hours after vertebroplasty, is still not known. Pain pathways in the surrounding tissue appear to be altered in response to mechanical, chemical, vascular, and thermal stimuli after the injection of the cement. It has been suggested that mechanisms other than mechanical stabilization of the fracture, such as thermal injury to the nerve endings, results in immediate pain relief.
Percutaneous Vertebroplasty
Percutaneous vertebroplasty is performed with the patient in prone position and under local or general anesthesia. The procedure involves fluoroscopic imaging to guide the injection of bone cement into the fractured vertebral body to support the fractured bone. After injection of the cement, the patient is placed in supine position for about 1 hour while the cement hardens.
Cement leakage is the most frequent complication of vertebroplasty. The leakages may remain asymptomatic or cause symptoms of nerve irritation through compression of nerve roots. There are several reports of pulmonary cement embolism (PCE) following vertebroplasty. In some cases, the PCE may remain asymptomatic. Symptomatic PCE can be recognized by their clinical signs and symptoms such as chest pain, dyspnea, tachypnea, cyanosis, coughing, hemoptysis, dizziness, and sweating.
Research Methods
Literature Search
A literature search was performed on Feb 9, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1, 2005 to February 9, 2010.
Studies were initially reviewed by titles and abstracts. For those studies meeting the eligibility criteria, full-text articles were obtained and reviewed. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with an unknown eligibility were reviewed with a second clinical epidemiologist and then a group of epidemiologists until consensus was established. Data extraction was carried out by the author.
Inclusion Criteria
Study design: Randomized controlled trials (RCTs) comparing vertebroplasty with a control group or other interventions
Study population: Adult patients with osteoporotic vertebral fractures
Study sample size: Studies included 20 or more patients
English language full-reports
Published between Jan 1 2005 and Feb 9, 2010
(eligible studies identified through the Auto Alert function of the search were also included)
Exclusion Criteria
Non-randomized studies
Studies on conditions other than VCF (e.g. patients with multiple myeloma or metastatic tumors)
Studies focused on surgical techniques
Studies lacking outcome measures
Results of Evidence-Based Analysis
A systematic search yielded 168 citations. The titles and the abstracts of the citations were reviewed and full text of the identified citations was retrieved for further consideration. Upon review of the full publications and applying the inclusion and exclusion criteria, 5 RCTs were identified. Of these, two compared vertebroplasty with sham procedure, two compared vertebroplasty with conservative treatment, and one compared vertebroplasty with balloon kyphoplasty.
Randomized Controlled Trials
Recently, the results of two blinded randomized placebo-controlled trials of percutaneous vertebroplasty were reported. These trials, providing the highest quality of evidence available to date, do not support the use of vertebroplasty in patients with painful osteoporotic vertebral compression fractures. Based on the results of these trials, vertebroplasty offer no additional benefit over usual care and is not risk free.
In these trials the treatment allocation was blinded to the patients and outcome assessors. The control group received a sham procedure simulating vertebroplasty to minimize the effect of expectations and to reduce the potential for bias in self-reporting of outcomes. Both trials applied stringent exclusion criteria so that the results are generalizable to the patient populations that are candidates for vertebroplasty. In both trials vertebroplasty procedures were performed by highly skilled interventionists. Multiple valid outcome measures including pain, physical, mental, and social function were employed to test the between group differences in outcomes.
Prior to these two trials, there were two open randomized trials in which vertebroplasty was compared with conservative medical treatment. In the first randomized trial, patients were allowed to cross over to the other arm and had to be stopped after two weeks due to the high numbers of patients crossing over. The other study did not allow cross over and recently published the results of 12 months follow-up.
The following is the summary of the results of these 4 trials:
Two blinded RCTs on vertebroplasty provide the highest level of evidence available to date. Results of these two trials are supported by findings of an open randomized trial with 12 months follow-up. Blinded RCTs showed:
No significant differences in pain scores of patients who received vertebroplasty and patients who received a sham procedure as measured at 3 days, 2 weeks and 1 month in one study and at 1 week, 1 month, 3 months, and 6 months in the other.
The observed differences in pain scores between the two groups were neither statistically significant nor clinically important at any time points.
The above findings were consistent with the findings of an open RCT in which patients were followed for 12 months. This study showed that improvement in pain was similar between the two groups at 3 months and were sustained to 12 months.
In the blinded RCTs, physical, mental, and social functioning were measured at the above time points using 4-5 of the following 7 instruments: RDQ, EQ-5D, SF-36 PCS, SF-36 MCS, AQoL, QUALEFFO, SOF-ADL
There were no significant differences in any of these measures between patients who received vertebroplasty and patients who received a sham procedure at any of the above time points (with a few exceptions in favour of control intervention).
These findings were also consistent with the findings of an open RCT which demonstrated no significant between group differences in scores of ED-5Q, SF-36 PCS, SF 36 MCS, DPQ, Barthel, and MMSE which measure physical, mental, and social functioning (with a few exceptions in favour of control intervention).
One small (n=34) open RCT with a two week follow-up detected a significantly higher improvement in pain scores at 1 day after the intervention in vertebroplasty group compared with conservative treatment group. However, at 2 weeks follow-up, this difference was smaller and was not statistically significant.
Conservative treatment was associated with fewer clinically important complications
Risk of new VCFs following vertebroplasty was higher than those in conservative treatment but it requires further investigation.
PMCID: PMC3377535  PMID: 23074396
15.  Validation of the howRu and howRwe questionnaires at the individual patient level 
Background
The howRu and howRwe are new short questionnaires which are meant to measure health-related quality of life and patient experience. However, validation at the individual patient level has not yet taken place. We aimed to investigate the validity of both questionnaires at the individual patient level.
Methods
In this prospective validation study, patients were asked to complete both questionnaires and comment on their answers in a semi-structured in-depth interview. Based on the transcribed interviews, a panel of 45 general practitioners and 45 patients filled out the questionnaires as they thought the patients had completed them. The questionnaires were considered valid instruments when a reliable and acceptable level of agreement was reached between the patient’s score and the score of a review panel, defined as a concordance correlation coefficient (CCC) of ≥0.70. Bland-Altman plots were also made.
Results
Ninety patients were included. The CCC of the howRu total score of the review panel and patients was 0.80 (95 % CI 0.73 to 0.86). Bland-Altman plots showed a mean difference of −0.96 and the limits of agreement ranged from −2.87 to 0.95. The CCC of the howRwe total score was 0.57 (95 % CI 0.42 to 0.69). The mean difference on the Bland-Altman plots was −0.54 and the limits of agreement ranged from −3.59 to 2.52.
Conclusions
The howRu seems to be a valid questionnaire for measuring health-related quality of life at the individual patient level. We do not advice to use the tested version of the howRwe questionnaire for assessing patient experience at the individual patient level.
Trial registration
The study was registered at clinicaltrials.gov NCT01830803.
Registration date: 5 April 2013.
Electronic supplementary material
The online version of this article (doi:10.1186/s12913-015-1093-8) contains supplementary material, which is available to authorized users.
doi:10.1186/s12913-015-1093-8
PMCID: PMC4592573  PMID: 26431695
16.  Behavioural Interventions for Type 2 Diabetes 
Executive Summary
In June 2008, the Medical Advisory Secretariat began work on the Diabetes Strategy Evidence Project, an evidence-based review of the literature surrounding strategies for successful management and treatment of diabetes. This project came about when the Health System Strategy Division at the Ministry of Health and Long-Term Care subsequently asked the secretariat to provide an evidentiary platform for the Ministry’s newly released Diabetes Strategy.
After an initial review of the strategy and consultation with experts, the secretariat identified five key areas in which evidence was needed. Evidence-based analyses have been prepared for each of these five areas: insulin pumps, behavioural interventions, bariatric surgery, home telemonitoring, and community based care. For each area, an economic analysis was completed where appropriate and is described in a separate report.
To review these titles within the Diabetes Strategy Evidence series, please visit the Medical Advisory Secretariat Web site, http://www.health.gov.on.ca/english/providers/program/mas/mas_about.html,
Diabetes Strategy Evidence Platform: Summary of Evidence-Based Analyses
Continuous Subcutaneous Insulin Infusion Pumps for Type 1 and Type 2 Adult Diabetics: An Evidence-Based Analysis
Behavioural Interventions for Type 2 Diabetes: An Evidence-Based Analysis
Bariatric Surgery for People with Diabetes and Morbid Obesity: An Evidence-Based Summary
Community-Based Care for the Management of Type 2 Diabetes: An Evidence-Based Analysis
Home Telemonitoring for Type 2 Diabetes: An Evidence-Based Analysis
Application of the Ontario Diabetes Economic Model (ODEM) to Determine the Cost-effectiveness and Budget Impact of Selected Type 2 Diabetes Interventions in Ontario
Objective
The objective of this report is to determine whether behavioural interventions1 are effective in improving glycemic control in adults with type 2 diabetes.
Background
Diabetes is a serious chronic condition affecting millions of people worldwide and is the sixth leading cause of death in Canada. In 2005, an estimated 8.8% of Ontario’s population had diabetes, representing more than 816,000 Ontarians. The direct health care cost of diabetes was $1.76 billion in the year 2000 and is projected to rise to a total cost of $3.14 billion by 2016. Much of this cost arises from the serious long-term complications associated with the disease including: coronary heart disease, stroke, adult blindness, limb amputations and kidney disease.
Type 2 diabetes accounts for 90–95% of diabetes and while type 2 diabetes is more prevalent in people aged 40 years and older, prevalence in younger populations is increasing due to a rise in obesity and physical inactivity in children.
Data from the United Kingdom Prospective Diabetes Study (UKPDS) has shown that tight glycemic control can significantly reduce the risk of developing serious complications in type 2 diabetics. Despite physicians’ and patients’ knowledge of the importance of glycemic control, Canadian data has shown that only 38% of patients with diabetes have HbA1C levels in the optimal range of 7% or less. This statistic highlights the complexities involved in the management of diabetes, which is characterized by extensive patient involvement in addition to the support provided by physicians. An enormous demand is, therefore, placed on patients to self-manage the physical, emotional and psychological aspects of living with a chronic illness.
Despite differences in individual needs to cope with diabetes, there is general agreement for the necessity of supportive programs for patient self-management. While traditional programs were didactic models with the goal of improving patients’ knowledge of their disease, current models focus on behavioural approaches aimed at providing patients with the skills and strategies required to promote and change their behaviour.
Several meta-analyses and systematic reviews have demonstrated improved health outcomes with self-management support programs in type 2 diabetics. They have all, however, either looked at a specific component of self-management support programs (i.e. self-management education) or have been conducted in specific populations. Most reviews are also qualitative and do not clearly define the interventions of interest, making findings difficult to interpret. Moreover, heterogeneity in the interventions has led to conflicting evidence on the components of effective programs. There is thus much uncertainty regarding the optimal design and delivery of these programs by policymakers.
Evidence-Based Analysis of Effectiveness
Research Questions
Are behavioural interventions effective in improving glycemic control in adults with type 2 diabetes?
Is the effectiveness of the intervention impacted by intervention characteristics (e.g. delivery of intervention, length of intervention, mode of instruction, interventionist etc.)?
Inclusion Criteria
English Language
Published between January 1996 to August 2008
Type 2 diabetic adult population (>18 years)
Randomized controlled trials (RCTs)
Systematic reviews, or meta-analyses
Describing a multi-faceted self-management support intervention as defined by the 2007 Self-Management Mapping Guide (1)
Reporting outcomes of glycemic control (HbA1c) with extractable data
Studies with a minimum of 6-month follow up
Exclusion Criteria
Studies with a control group other than usual care
Studies with a sample size <30
Studies without a clearly defined intervention
Outcomes of Interest
Primary outcome: glycemic control (HbA1c)
Secondary outcomes: systolic blood pressure (SBP) control, lipid control, change in smoking status, weight change, quality of life, knowledge, self-efficacy, managing psychosocial aspects of diabetes, assessing dissatisfaction and readiness to change, and setting and achieving diabetes goals.
Search Strategy
A search was performed in OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), The Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published between January 1996 and August 2008. Abstracts were reviewed by a single author and studies meeting the inclusion criteria outlined above were obtained. Data on population characteristics, glycemic control outcomes, and study design were extracted. Reference lists were also checked for relevant studies. The quality of the evidence was assessed as being either high, moderate, low, or very low according to the GRADE methodology.
Summary of Findings
The search identified 638 citations published between 1996 and August 2008, of which 12 met the inclusion criteria and one was a meta-analysis (Gary et al. 2003). The remaining 11 studies were RCTs (9 were used in the meta-analysis) and only one was defined as small (total sample size N=47).
Summary of Participant Demographics across studies
A total of 2,549 participants were included in the 11 identified studies. The mean age of participants reported was approximately 58 years and the mean duration of diabetes was approximately 6 years. Most studies reported gender with a mean percentage of females of approximately 67%. Of the eleven studies, two focused only on women and four included only Hispanic individuals. All studies evaluated type 2 diabetes patients exclusively.
Study Characteristics
The studies were conducted between 2002 and 2008. Approximately six of 11 studies were carried out within the USA, with the remaining studies conducted in the UK, Sweden, and Israel (sample size ranged from 47 to 824 participants). The quality of the studies ranged from moderate to low with four of the studies being of moderate quality and the remaining seven of low quality (based on the Consort Checklist). Differences in quality were mainly due to methodological issues such as inadequate description of randomization, sample size calculation allocation concealment, blinding and uncertainty of the use of intention-to-treat (ITT) analysis. Patients were recruited from several settings: six studies from primary or general medical practices, three studies from the community (e.g. via advertisements), and two from outpatient diabetes clinics. A usual care control group was reported in nine of 11 of the studies and two studies reported some type of minimal diabetes care in addition to usual care for the control group.
Intervention Characteristics
All of the interventions examined in the studies were mapped to the 2007 Self-management Mapping Guide. The interventions most often focused on problem solving, goal setting and encouraging participants to engage in activities that protect and promote health (e.g. modifying behaviour, change in diet, and increase physical activity). All of the studies examined comprehensive interventions targeted at least two self-care topics (e.g. diet, physical activity, blood glucose monitoring, foot care, etc.). Despite the homogeneity in the aims of the interventions, there was substantial clinical heterogeneity in other intervention characteristics such as duration, intensity, setting, mode of delivery (group vs. individual), interventionist, and outcomes of interest (discussed below).
Duration, Intensity and Mode of Delivery
Intervention durations ranged from 2 days to 1 year, with many falling into the range of 6 to 10 weeks. The rest of the interventions fell into categories of ≤ 2 weeks (2 studies), 6 months (2 studies), or 1 year (3 studies). Intensity of the interventions varied widely from 6 hours over 2 days, to 52 hours over 1 year; however, the majority consisted of interventions of 6 to 15 hours. Both individual and group sessions were used to deliver interventions. Group counselling was used in five studies as a mode of instruction, three studies used both individual and group sessions, and one study used individual sessions as its sole mode of instruction. Three studies also incorporated the use of telephone support as part of the intervention.
Interventionists and Setting
The following interventionists were reported (highest to lowest percentage, categories not mutually exclusive): nurse (36%), dietician (18%), physician (9%), pharmacist (9%), peer leader/community worker (18%), and other (36%). The ‘other’ category included interventionists such as consultants and facilitators with unspecified professional backgrounds. The setting of most interventions was community-based (seven studies), followed by primary care practices (three studies). One study described an intervention conducted in a pharmacy setting.
Outcomes
Duration of follow up of the studies ranged from 6 months to 8 years with a median follow-up duration of 12 months. Nine studies followed up patients at a minimum of two time points. Despite clear reporting of outcomes at follow up time points, there was poor reporting on whether the follow up was measured from participant entry into study or from end of intervention. All studies reported measures of glycemic control, specifically HbA1c levels. BMI was measured in five studies, while body weight was reported in two studies. Cholesterol was examined in three studies and blood pressure reduction in two. Smoking status was only examined in one of the studies. Additional outcomes examined in the trials included patient satisfaction, quality of life, diabetes knowledge, diabetes medication reduction, and behaviour modification (i.e. daily consumption of fruits/vegetables, exercise etc). Meta-analysis of the studies identified a moderate but significant reduction in HbA1c levels -0.44% 95%CI: -0.60, -0.29) for behavioural interventions in comparison to usual care for adults with type 2 diabetes. Subgroup analyses suggested the largest effects in interventions which were of at least duration and interventions in diabetics with higher baseline HbA1c (≥9.0). The quality of the evidence according to GRADE for the overall estimate was moderate and the quality of evidence for the subgroup analyses was identified as low.
Summary of Meta-Analysis of Studies Investigating the Effectiveness of Behavioural Interventions on HbA1c in Patients with Type 2 Diabetes.
Based on one study
Conclusions
Based on moderate quality evidence, behavioural interventions as defined by the 2007 Self-management mapping guide (Government of Victoria, Australia) produce a moderate reduction in HbA1c levels in patients with type 2 diabetes compared with usual care.
Based on low quality evidence, the interventions with the largest effects are those:
- in diabetics with higher baseline HbA1c (≥9.0)
- in which the interventions were of at least 1 year in duration
PMCID: PMC3377516  PMID: 23074526
17.  Children’s Quality of Life Based on the KIDSCREEN-27: Child Self-Report, Parent Ratings and Child-Parent Agreement in a Swedish Random Population Sample 
PLoS ONE  2016;11(3):e0150545.
Background
The KIDSCREEN-27 is a measure of child and adolescent quality of life (QoL), with excellent psychometric properties, available in child-report and parent-rating versions in 38 languages. This study provides child-reported and parent-rated norms for the KIDSCREEN-27 among Swedish 11–16 year-olds, as well as child-parent agreement. Sociodemographic correlates of self-reported wellbeing and parent-rated wellbeing were also measured.
Methods
A random population sample consisting of 600 children aged 11–16, 100 per age group and one of their parents (N = 1200), were approached for response to self-reported and parent-rated versions of the KIDSCREEN-27. Parents were also asked about their education, employment status and their own QoL based on the 26-item WHOQOL-Bref. Based on the final sampling pool of 1158 persons, a 34.8% response rate of 403 individuals was obtained, including 175 child-parent pairs, 27 child singleton responders and 26 parent singletons. Gender and age differences for parent ratings and child-reported data were analyzed using t-tests and the Mann-Whitney U-test. Post-hoc Dunn tests were conducted for pairwise comparisons when the p-value for specific subscales was 0.05 or lower. Child-parent agreement was tested item-by-item, using the Prevalence- and Bias-Adjusted Kappa (PABAK) coefficient for ordinal data (PABAK-OS); dimensional and total score agreement was evaluated based on dichotomous cut-offs for lower well-being, using the PABAK and total, continuous scores were evaluated using Bland-Altman plots.
Results
Compared to European norms, Swedish children in this sample scored lower on Physical wellbeing (48.8 SE/49.94 EU) but higher on the other KIDSCREEN-27 dimensions: Psychological wellbeing (53.4/49.77), Parent relations and autonomy (55.1/49.99), Social Support and peers (54.1/49.94) and School (55.8/50.01). Older children self-reported lower wellbeing than younger children. No significant self-reported gender differences occurred and parent ratings showed no gender or age differences. Item-by-item child-parent agreement was slight for 14 items (51.9%), fair for 12 items (44.4%), and less than chance for one item (3.7%), but agreement on all dimensions as well as the total score was substantial according to the PABAK-OS. Visual interpretation of the Bland-Altman plot suggested that when children’s average wellbeing score was lower parents seemed to rate their children as having relatively higher total wellbeing, but as children’s average wellbeing score increased, parents tended to rate their children as having relatively lower total wellbeing. Children living with both parents had higher wellbeing than those who lived with only one parent.
Conclusions
Results agreed with European findings that adolescent wellbeing decreases with age but contrasted with some prior Swedish research identifying better wellbeing for boys on all dimensions but Social support and peers. The study suggests the importance of considering children’s own reports and not only parental or other informant ratings. Future research should be conducted at regular intervals and encompass larger samples.
doi:10.1371/journal.pone.0150545
PMCID: PMC4784934  PMID: 26959992
18.  Interobserver reliability and intraobserver reproducibility of powers ratio for assessment of atlanto-occipital junction: comparison of plain radiography and computed tomography 
European Spine Journal  2009;18(4):577-582.
Powers ratio, as assessed on plain radiographs or computed tomography (CT) images, appears to have clinical and prognostic value. To date, the validation of this assessment tool has been limited to a small number of observers at a single site. No study has examined the intraobserver reproducibility and interobserver reliability of the Powers ratio measurement on plain radiographs or CT images among a large cohort of spine surgeons. This type of validation is critical to allow for the broader use of the Powers ratio methodology in research studies and clinical applications. Plain radiographs and spiral CT images of the cervical spine of 32 patients were assessed, and the Powers ratio was determined by five spine surgeons. Each surgeon performed three readings, 7 months apart. In the first round of measurements, the observers used only the Powers’ method of instruction. The second and third measurement sets were obtained after an interactive teaching session on the methodology. The order of the images was altered for the second and third set of measurements. The coefficient of variation (Cv) was calculated to determine the intraobserver repeatability and interobserver reliability for each imaging technique. A Bland-Altman plot was then used to assess the agreement between the two imaging techniques. For interobserver reliability, the mean Cv of the Powers ratio was 9.09 and 4.31% for plain radiographs and CT, respectively. The Cv mean value for intraobserver reproducibility averaged 4.95% (range 1.39–9.08) when CT scans were used and 14.17% (range 7.54–34.30) when plain radiographs were used. For intraobserver reproducibility, the lowest and highest Cv mean value of five raters was 1.39 and 9.08% using CT scans and 7.54 and 34.3% using plain radiographs. The Bland-Altman plot, demonstrated that the two methods were in close agreement on the −0.8 and 0.89% interval for limits of agreement (bias ± 1.96σ). The intraobserver reproducibility and interobserver reliability of Powers ratio measurement was acceptable (<5%) with CT scans but not with plain radiographs. However, despite the statistically inferior reliability and repeatability, the Bland-Altman plot analysis showed that given the −0.8 and 0.89% limits of agreement, the two methods may be used interchangeably in clinical practice.
doi:10.1007/s00586-008-0877-5
PMCID: PMC2899465  PMID: 19165510
Powers ratio; Interobserver reliability; Intraobserver reproducibility; Atlanto-occipital junction
19.  The construct validity and responsiveness of the EQ-5D, SF-6D and Diabetes Health Profile-18 in type 2 diabetes 
Background
Interest in the measurement of health related quality of life and psychosocial functioning from the patient’s perspective in diabetes mellitus has grown in recent years. The aim of this study is to investigate the psychometric performance of and agreement between the generic EQ-5D and SF-6D and diabetes specific DHP-18 in Type 2 diabetes. This will support the future use of the measures by providing further evidence regarding their psychometric properties and the conceptual overlap between the instruments. The results will inform whether the measures can be used with confidence alongside each other to provide a more holistic profile of people with Type 2 diabetes.
Methods
A large longitudinal dataset (n = 1,184) of people with Type 2 diabetes was used for the analysis. Convergent validity was tested by examining correlations between the measures. Known group validity was tested across a range of clinical and diabetes severity indicators using ANOVA and effect size statistics. Agreement was examined using Bland-Altman plots. Responsiveness was tested by examining floor and ceiling effects and standardised response means.
Results
Correlations between the measures indicates that there is overlap in the constructs assessed (with correlations between 0.1 and 0.7 reported), but there is some level of divergence between the generic and condition specific instruments. Known group validity was generally good but was not consistent across all indicators included (with effect sizes from 0 to 0.74 reported). The EQ-5D and SF-6D displayed a high level of agreement, but there was some disagreement between the generic measures and the DHP-18 dimensions across the severity range. Responsiveness was higher in those who self-reported change in health (SRMs between 0.06 and 0.25).
Conclusions
The psychometric assessment of the relationship between the EQ-5D, SF-6D and DHP-18 shows that all have a level of validity for use in Type 2 diabetes. This suggests that the measures can be used alongside each other to provide a more holistic assessment of with the quality of life impacts of Type 2 diabetes.
doi:10.1186/1477-7525-12-42
PMCID: PMC4304018  PMID: 24661350
EQ-5D; SF-6D; DHP; Psychometrics; Validity
20.  What Do Evaluation Instruments Tell Us About the Quality of Complementary Medicine Information on the Internet? 
Background
Developers of health information websites aimed at consumers need methods to assess whether their website is of “high quality.” Due to the nature of complementary medicine, website information is diverse and may be of poor quality. Various methods have been used to assess the quality of websites, the two main approaches being (1) to compare the content against some gold standard, and (2) to rate various aspects of the site using an assessment tool.
Objective
We aimed to review available evaluation instruments to assess their performance when used by a researcher to evaluate websites containing information on complementary medicine and breast cancer. In particular, we wanted to see if instruments used the same criteria, agreed on the ranking of websites, were easy to use by a researcher, and if use of a single tool was sufficient to assess website quality.
Methods
Bibliographic databases, search engines, and citation searches were used to identify evaluation instruments. Instruments were included that enabled users with no subject knowledge to make an objective assessment of a website containing health information. The elements of each instrument were compared to nine main criteria defined by a previous study. Google was used to search for complementary medicine and breast cancer sites. The first six results and a purposive six from different origins (charities, sponsored, commercial) were chosen. Each website was assessed using each tool, and the percentage of criteria successfully met was recorded. The ranking of the websites by each tool was compared. The use of the instruments by others was estimated by citation analysis and Google searching.
Results
A total of 39 instruments were identified, 12 of which met the inclusion criteria; the instruments contained between 4 and 43 questions. When applied to 12 websites, there was agreement of the rank order of the sites with 10 of the instruments. Instruments varied in the range of criteria they assessed and in their ease of use.
Conclusions
Comparing the content of websites against a gold standard is time consuming and only feasible for very specific advice. Evaluation instruments offer gateway providers a method to assess websites. The checklist approach has face validity when results are compared to the actual content of “good” and “bad” websites. Although instruments differed in the range of items assessed, there was fair agreement between most available instruments. Some were easier to use than others, but these were not necessarily the instruments most widely used to date. Combining some of the better features of instruments to provide fewer, easy-to-use methods would be beneficial to gateway providers.
doi:10.2196/jmir.961
PMCID: PMC2483844  PMID: 18244894
Consumer Health Informatics; Internet; quality of information; complementary medicine
21.  Risk of Bias in Systematic Reviews of Non-Randomized Studies of Adverse Cardiovascular Effects of Thiazolidinediones and Cyclooxygenase-2 Inhibitors: Application of a New Cochrane Risk of Bias Tool 
PLoS Medicine  2016;13(4):e1001987.
Background
Systematic reviews of the effects of healthcare interventions frequently include non-randomized studies. These are subject to confounding and a range of other biases that are seldom considered in detail when synthesizing and interpreting the results. Our aims were to assess the reliability and usability of a new Cochrane risk of bias (RoB) tool for non-randomized studies of interventions and to determine whether restricting analysis to studies with low or moderate RoB made a material difference to the results of the reviews.
Methods and Findings
We selected two systematic reviews of population-based, controlled non-randomized studies of the relationship between the use of thiazolidinediones (TZDs) and cyclooxygenase-2 (COX-2) inhibitors and major cardiovascular events. Two epidemiologists applied the Cochrane RoB tool and made assessments across the seven specified domains of bias for each of 37 component studies. Inter-rater agreement was measured using the weighted Kappa statistic. We grouped studies according to overall RoB and performed statistical pooling for (a) all studies and (b) only studies with low or moderate RoB. Kappa scores across the seven bias domains ranged from 0.50 to 1.0. In the COX-2 inhibitor review, two studies had low overall RoB, 14 had moderate RoB, and five had serious RoB. In the TZD review, six studies had low RoB, four had moderate RoB, four had serious RoB, and two had critical RoB. The pooled odds ratios for myocardial infarction, heart failure, and death for rosiglitazone versus pioglitazone remained significantly elevated when analyses were confined to studies with low or moderate RoB. However, the estimate for myocardial infarction declined from 1.14 (95% CI 1.07–1.24) to 1.06 (95% CI 0.99–1.13) when analysis was confined to studies with low RoB. Estimates of pooled relative risks of cardiovascular events with COX-2 inhibitors compared with no nonsteroidal anti-inflammatory drug changed little when analyses were confined to studies with low or moderate RoB. The exception was a rise in the relative risk associated with ibuprofen from 1.07 (95% CI 0.97–1.18) to 1.14 (95% CI 1.03–1.26). The main limitation of our study was testing the instrument on a narrow range of pharmacoepidemiological studies; we cannot assume our findings extend to a broader range of interventions and settings.
Conclusions
The Cochrane RoB tool highlighted a wide range of risks of bias in studies included in two widely cited reviews and had the potential to change the conclusions of the reviews. Systematic reviews that incorporate non-randomized studies of medical interventions should include a detailed assessment of RoB for each included study.
David Henry and colleagues re-evaluate findings from two systematic reviews using the new ACROBAT-NRSI bias assessment tool for non-randomized studies.
Editors' Summary
Background
In the past, clinicians used their own experience to help them make decisions about the best treatments (interventions) for their patients. Nowadays, “evidence-based medicine”—largely based on findings from randomized controlled trials (RCTs)—guides most clinical decisions. RCTs—studies that compare outcomes in groups of patients chosen at random to receive different interventions—are the best way to assess the efficacy of an intervention (the performance of a treatment under ideal conditions), but individual trials often fail to show a statistically significant difference (a difference unlikely to have arisen by chance) between two interventions. Significant differences between interventions can be detected, however, by undertaking a systematic review (a study that identifies all the RCTs on a given intervention using predefined criteria) and a meta-analysis (a statistical technique for combining, or “synthesizing,” the findings from several independent RCTs).
Why Was This Study Done?
Systematic reviews of healthcare interventions can also include non-randomized studies, which use administrative databases to identify people receiving different interventions and electronic health records to determine clinical outcomes. However, non-randomized studies of interventions are prone to many “biases” that affect the accuracy of their findings. For example, a potential bias in non-randomized studies is “confounding,” the possibility that an unmeasured characteristic shared by the people receiving a specific intervention, rather than the intervention itself, is responsible for the observed outcome. When undertaking systematic reviews and meta-analyses, it is essential to measure the risk of bias (RoB) in each individual study included in the review and meta-analysis. But, although a widely used tool is available for measuring RoB in RCTs, bias is seldom considered in detail when synthesizing the results of non-randomized studies of interventions. Here, the researchers assess the reliability and usability of ACROBAT-NRSI, a tool developed by Cochrane (an organization that promotes evidence-informed health decision-making) for the assessment of RoB in non-randomized intervention studies. ACROBAT-NRSI assists authors in identifying potential concerns across seven bias domains and assesses the overall RoB of individual non-randomized intervention studies.
What Did the Researchers Do and Find?
Two of the researchers independently applied the ACROBAT-NRSI process to 37 papers included in two widely cited systematic reviews of non-randomized studies of the relationship between the use of thiazolidinediones (drugs used to treat diabetes, such as rosiglitazone and pioglitazone) and cyclooxygenase-2 (COX-2) inhibitors (nonsteroidal anti-inflammatory drugs [NSAIDs] such as ibuprofen) and major cardiovascular events (heart attack [myocardial infarction] and heart failure). The two researchers largely agreed on their RoB assessments (good inter-rater agreement), which, after training and early experience, took roughly 2.5 hours to complete for each study. In the thiazolidinedione review, six studies had low overall RoB, four had moderate RoB, four had serious RoB, and two had critical RoB. In the COX-2 inhibitor review, two studies low overall RoB, fourteen had moderate RoB, and five had serious RoB. When the researchers restricted meta-analysis to studies with low or moderate RoB, estimates of the pooled relative risks of cardiovascular events with COX-2 inhibitors (compared with no NSAID) changed little, except for a rise in the relative risk associated with ibuprofen. Finally, although the risk estimates for myocardial infarction, heart failure, and death for rosiglitazone compared with pioglitazone remained significantly raised when analyses were confined to studies with low or moderate RoB, there was no significantly increased risk of myocardial infarction when the analysis was confined to studies with low RoB.
What Do These Findings Mean?
These findings show that there was considerable variability in RoB among the studies included in two systematic reviews of non-randomized intervention studies. Although all 37 studies included in these reviews were originally considered to be of sufficiently high quality for inclusion using less comprehensive—or less RoB-focused—critical appraisal tools, only eight were judged to have low RoB using ACROBAT-NRSI. Notably, exclusion of studies with moderate, serious, or critical RoB resulted in clinically important changes to some of the conclusions of the original reviews. Because the researchers considered only two systematic reviews, their findings may not be generalizable—ACROBAT-NRSI needs further testing across a range of study types. Moreover, because the tool is designed to be used within a team setting, studies are needed to investigate whether the performance of the tool depends on the team’s skill mix. Importantly, however, these findings highlight the importance of including a detailed RoB assessment for each study included in systematic reviews of non-randomized studies of medical interventions.
Additional Information
This list of resources contains links that can be accessed when viewing the PDF on a device or via the online version of the article at http://dx.doi.org/10.1371/journal.pmed.1001987.
More information about ACROBAT-NRSI (A Cochrane Risk of Bias Assessment Tool for Non-Randomized Studies of Interventions) is available; the main Cochrane website provides information about Cochrane and its work; the Cochrane Handbook for Systematic Reviews of Interventions has a chapter on including non-randomized studies in systematic reviews
Wikipedia has pages on evidence-based medicine, clinical trials, systematic review, and meta-analysis (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
ClinicalTrials.gov, the US National Institutes of Health clinical trials registry, provides additional background information about clinical trials
doi:10.1371/journal.pmed.1001987
PMCID: PMC4821619  PMID: 27046153
22.  Information from Pharmaceutical Companies and the Quality, Quantity, and Cost of Physicians' Prescribing: A Systematic Review 
PLoS Medicine  2010;7(10):e1000352.
Geoff Spurling and colleagues report findings of a systematic review looking at the relationship between exposure to promotional material from pharmaceutical companies and the quality, quantity, and cost of prescribing. They fail to find evidence of improvements in prescribing after exposure, and find some evidence of an association with higher prescribing frequency, higher costs, or lower prescribing quality.
Background
Pharmaceutical companies spent $57.5 billion on pharmaceutical promotion in the United States in 2004. The industry claims that promotion provides scientific and educational information to physicians. While some evidence indicates that promotion may adversely influence prescribing, physicians hold a wide range of views about pharmaceutical promotion. The objective of this review is to examine the relationship between exposure to information from pharmaceutical companies and the quality, quantity, and cost of physicians' prescribing.
Methods and Findings
We searched for studies of physicians with prescribing rights who were exposed to information from pharmaceutical companies (promotional or otherwise). Exposures included pharmaceutical sales representative visits, journal advertisements, attendance at pharmaceutical sponsored meetings, mailed information, prescribing software, and participation in sponsored clinical trials. The outcomes measured were quality, quantity, and cost of physicians' prescribing. We searched Medline (1966 to February 2008), International Pharmaceutical Abstracts (1970 to February 2008), Embase (1997 to February 2008), Current Contents (2001 to 2008), and Central (The Cochrane Library Issue 3, 2007) using the search terms developed with an expert librarian. Additionally, we reviewed reference lists and contacted experts and pharmaceutical companies for information. Randomized and observational studies evaluating information from pharmaceutical companies and measures of physicians' prescribing were independently appraised for methodological quality by two authors. Studies were excluded where insufficient study information precluded appraisal. The full text of 255 articles was retrieved from electronic databases (7,185 studies) and other sources (138 studies). Articles were then excluded because they did not fulfil inclusion criteria (179) or quality appraisal criteria (18), leaving 58 included studies with 87 distinct analyses. Data were extracted independently by two authors and a narrative synthesis performed following the MOOSE guidelines. Of the set of studies examining prescribing quality outcomes, five found associations between exposure to pharmaceutical company information and lower quality prescribing, four did not detect an association, and one found associations with lower and higher quality prescribing. 38 included studies found associations between exposure and higher frequency of prescribing and 13 did not detect an association. Five included studies found evidence for association with higher costs, four found no association, and one found an association with lower costs. The narrative synthesis finding of variable results was supported by a meta-analysis of studies of prescribing frequency that found significant heterogeneity. The observational nature of most included studies is the main limitation of this review.
Conclusions
With rare exceptions, studies of exposure to information provided directly by pharmaceutical companies have found associations with higher prescribing frequency, higher costs, or lower prescribing quality or have not found significant associations. We did not find evidence of net improvements in prescribing, but the available literature does not exclude the possibility that prescribing may sometimes be improved. Still, we recommend that practitioners follow the precautionary principle and thus avoid exposure to information from pharmaceutical companies.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
A prescription drug is a medication that can be supplied only with a written instruction (“prescription”) from a physician or other licensed healthcare professional. In 2009, 3.9 billion drug prescriptions were dispensed in the US alone and US pharmaceutical companies made US$300 billion in sales revenue. Every year, a large proportion of this revenue is spent on drug promotion. In 2004, for example, a quarter of US drug revenue was spent on pharmaceutical promotion. The pharmaceutical industry claims that drug promotion—visits from pharmaceutical sales representatives, advertisements in journals and prescribing software, sponsorship of meetings, mailed information—helps to inform and educate healthcare professionals about the risks and benefits of their products and thereby ensures that patients receive the best possible care. Physicians, however, hold a wide range of views about pharmaceutical promotion. Some see it as a useful and convenient source of information. Others deny that they are influenced by pharmaceutical company promotion but claim that it influences other physicians. Meanwhile, several professional organizations have called for tighter control of promotional activities because of fears that pharmaceutical promotion might encourage physicians to prescribe inappropriate or needlessly expensive drugs.
Why Was This Study Done?
But is there any evidence that pharmaceutical promotion adversely influences prescribing? Reviews of the research literature undertaken in 2000 and 2005 provide some evidence that drug promotion influences prescribing behavior. However, these reviews only partly assessed the relationship between information from pharmaceutical companies and prescribing costs and quality and are now out of date. In this study, therefore, the researchers undertake a systematic review (a study that uses predefined criteria to identify all the research on a given topic) to reexamine the relationship between exposure to information from pharmaceutical companies and the quality, quantity, and cost of physicians' prescribing.
What Did the Researchers Do and Find?
The researchers searched the literature for studies of licensed physicians who were exposed to promotional and other information from pharmaceutical companies. They identified 58 studies that included a measure of exposure to any type of information directly provided by pharmaceutical companies and a measure of physicians' prescribing behavior. They then undertook a “narrative synthesis,” a descriptive analysis of the data in these studies. Ten of the studies, they report, examined the relationship between exposure to pharmaceutical company information and prescribing quality (as judged, for example, by physician drug choices in response to clinical vignettes). All but one of these studies suggested that exposure to drug company information was associated with lower prescribing quality or no association was detected. In the 51 studies that examined the relationship between exposure to drug company information and prescribing frequency, exposure to information was associated with more frequent prescribing or no association was detected. Thus, for example, 17 out of 29 studies of the effect of pharmaceutical sales representatives' visits found an association between visits and increased prescribing; none found an association with less frequent prescribing. Finally, eight studies examined the relationship between exposure to pharmaceutical company information and prescribing costs. With one exception, these studies indicated that exposure to information was associated with a higher cost of prescribing or no association was detected. So, for example, one study found that physicians with low prescribing costs were more likely to have rarely or never read promotional mail or journal advertisements from pharmaceutical companies than physicians with high prescribing costs.
What Do These Findings Mean?
With rare exceptions, these findings suggest that exposure to pharmaceutical company information is associated with either no effect on physicians' prescribing behavior or with adverse affects (reduced quality, increased frequency, or increased costs). Because most of the studies included in the review were observational studies—the physicians in the studies were not randomly selected to receive or not receive drug company information—it is not possible to conclude that exposure to information actually causes any changes in physician behavior. Furthermore, although these findings provide no evidence for any net improvement in prescribing after exposure to pharmaceutical company information, the researchers note that it would be wrong to conclude that improvements do not sometimes happen. The findings support the case for reforms to reduce negative influence to prescribing from pharmaceutical promotion.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000352.
Wikipedia has pages on prescription drugs and on pharmaceutical marketing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The UK General Medical Council provides guidelines on good practice in prescribing medicines
The US Food and Drug Administration provides information on prescription drugs and on its Bad Ad Program
Healthy Skepticism is an international nonprofit membership association that aims to improve health by reducing harm from misleading health information
The Drug Promotion Database was developed by the World Health Organization Department of Essential Drugs & Medicines Policy and Health Action International Europe to address unethical and inappropriate drug promotion
doi:10.1371/journal.pmed.1000352
PMCID: PMC2957394  PMID: 20976098
23.  Effects of saliva collection using cotton swabs on melatonin enzyme immunoassay 
Background
Although various acceptable and easy-to-use devices have been used for saliva collection, cotton swabs are among the most common ones. Previous studies reported that cotton swabs yield a lower level of melatonin detection. However, this statistical method is not adequate for detecting an agreement between cotton saliva collection and passive saliva collection, and a test for bias is needed. Furthermore, the effects of cotton swabs have not been examined at lower melatonin level, a level at which melatonin is used for assessment of circadian rhythms, namely dim light melatonin onset (DLMO). In the present study, we estimated the effect of cotton swabs on the results of salivary melatonin assay using the Bland-Altman plot at lower level.
Methods
Nine healthy males were recruited and each provided four saliva samples on a single day to yield a total of 36 samples. Saliva samples were directly collected in plastic tubes using plastic straws, and subsequently pipetted onto cotton swabs (cotton saliva collection) and into clear sterile tubes (passive saliva collection). The melatonin levels were analyzed in duplicate using commercially available ELISA kits.
Results
The mean melatonin concentration in cotton saliva collection samples was significantly lower than that in passive saliva collection samples at higher melatonin level (>6 pg/mL). The Bland-Altman plot indicated that cotton swabs causes relative and proportional biases in the assay results. For lower melatonin level (<6 pg/mL), although the BA plots didn't show proportional and relative biases, there was no significant correlation between passive and cotton saliva collection samples.
Conclusion
Our findings indicate an interference effect of cotton swabs on the assay result of salivary melatonin at lower melatonin level. Cotton-based collection devices might, thus, not be suitable for assessment of DLMO.
doi:10.1186/1740-3391-9-1
PMCID: PMC3024305  PMID: 21219623
24.  Patient-Nurse Interrater Reliability and Agreement of the Richards-Campbell Sleep Questionnaire 
Background
The Richards-Campbell Sleep Questionnaire (RCSQ) is a simple, validated survey instrument for measuring sleep quality in intensive care patients. Although both patients and nurses can complete the RCSQ, interrater reliability and agreement have not been fully evaluated.
Objectives
To evaluate patient-nurse interrater reliability and agreement of the RCSQ in a medical intensive care unit.
Methods
The instrument included 5 RCSQ items plus a rating of nighttime noise, each scored by using a 100-mm visual analogue scale. The mean of the 5 RCSQ items comprised a total score. For 24 days, the night-shift nurses in the medical intensive care unit completed the RCSQ regarding their patients’ overnight sleep quality. Upon awakening, all conscious, nondelirious patients completed the RCSQ. Neither nurses nor patients knew the others’ ratings. Patient-nurse agreement was evaluated by using mean differences and Bland-Altman plots. Reliability was evaluated by using intraclass correlation coefficients.
Results
Thirty-three patients had a total of 92 paired patient-nurse assessments. For all RCSQ items, nurses’ scores were higher (indicating “better” sleep) than patients’ scores, with significantly higher ratings for sleep depth (mean [SD], 67 [21] vs 48 [35], P = .001), awakenings (68 [21] vs 60 [33], P = .03), and total score (68 [19] vs 57 [28], P = .01). The Bland-Altman plots also showed that nurses’ ratings were generally higher than patients’ ratings. Intraclass correlation coefficients of patient-nurse pairs ranged from 0.13 to 0.49 across the survey questions.
Conclusions
Patient-nurse interrater reliability on the RCSQ was “slight” to “moderate,” with nurses tending to overestimate patients’ perceived sleep quality.
doi:10.4037/ajcc2012111
PMCID: PMC3667655  PMID: 22751369
25.  Reporting and Methods in Clinical Prediction Research: A Systematic Review 
PLoS Medicine  2012;9(5):e1001221.
Walter Bouwmeester and colleagues investigated the reporting and methods of prediction studies in 2008, in six high-impact general medical journals, and found that the majority of prediction studies do not follow current methodological recommendations.
Background
We investigated the reporting and methods of prediction studies, focusing on aims, designs, participant selection, outcomes, predictors, statistical power, statistical methods, and predictive performance measures.
Methods and Findings
We used a full hand search to identify all prediction studies published in 2008 in six high impact general medical journals. We developed a comprehensive item list to systematically score conduct and reporting of the studies, based on recent recommendations for prediction research. Two reviewers independently scored the studies. We retrieved 71 papers for full text review: 51 were predictor finding studies, 14 were prediction model development studies, three addressed an external validation of a previously developed model, and three reported on a model's impact on participant outcome. Study design was unclear in 15% of studies, and a prospective cohort was used in most studies (60%). Descriptions of the participants and definitions of predictor and outcome were generally good. Despite many recommendations against doing so, continuous predictors were often dichotomized (32% of studies). The number of events per predictor as a measure of statistical power could not be determined in 67% of the studies; of the remainder, 53% had fewer than the commonly recommended value of ten events per predictor. Methods for a priori selection of candidate predictors were described in most studies (68%). A substantial number of studies relied on a p-value cut-off of p<0.05 to select predictors in the multivariable analyses (29%). Predictive model performance measures, i.e., calibration and discrimination, were reported in 12% and 27% of studies, respectively.
Conclusions
The majority of prediction studies in high impact journals do not follow current methodological recommendations, limiting their reliability and applicability.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
There are often times in our lives when we would like to be able to predict the future. Is the stock market going to go up, for example, or will it rain tomorrow? Being able predict future health is also important, both to patients and to physicians, and there is an increasing body of published clinical “prediction research.” Diagnostic prediction research investigates the ability of variables or test results to predict the presence or absence of a specific diagnosis. So, for example, one recent study compared the ability of two imaging techniques to diagnose pulmonary embolism (a blood clot in the lungs). Prognostic prediction research investigates the ability of various markers to predict future outcomes such as the risk of a heart attack. Both types of prediction research can investigate the predictive properties of patient characteristics, single variables, tests, or markers, or combinations of variables, tests, or markers (multivariable studies). Both types of prediction research can include also studies that build multivariable prediction models to guide patient management (model development), or that test the performance of models (validation), or that quantify the effect of using a prediction model on patient and physician behaviors and outcomes (impact assessment).
Why Was This Study Done?
With the increase in prediction research, there is an increased interest in the methodology of this type of research because poorly done or poorly reported prediction research is likely to have limited reliability and applicability and will, therefore, be of little use in patient management. In this systematic review, the researchers investigate the reporting and methods of prediction studies by examining the aims, design, participant selection, definition and measurement of outcomes and candidate predictors, statistical power and analyses, and performance measures included in multivariable prediction research articles published in 2008 in several general medical journals. In a systematic review, researchers identify all the studies undertaken on a given topic using a predefined set of criteria and systematically analyze the reported methods and results of these studies.
What Did the Researchers Do and Find?
The researchers identified all the multivariable prediction studies meeting their predefined criteria that were published in 2008 in six high impact general medical journals by browsing through all the issues of the journals (a hand search). They then scored the methods and reporting of each study using a comprehensive item list based on recent recommendations for the conduct of prediction research (for example, the reporting recommendations for tumor marker prognostic studies—the REMARK guidelines). Of 71 retrieved studies, 51 were predictor finding studies, 14 were prediction model development studies, three externally validated an existing model, and three reported on a model's impact on participant outcome. Study design, participant selection, definitions of outcomes and predictors, and predictor selection were generally well reported, but other methodological and reporting aspects of the studies were suboptimal. For example, despite many recommendations, continuous predictors were often dichotomized. That is, rather than using the measured value of a variable in a prediction model (for example, blood pressure in a cardiovascular disease prediction model), measurements were frequently assigned to two broad categories. Similarly, many of the studies failed to adequately estimate the sample size needed to minimize bias in predictor effects, and few of the model development papers quantified and validated the proposed model's predictive performance.
What Do These Findings Mean?
These findings indicate that, in 2008, most of the prediction research published in high impact general medical journals failed to follow current guidelines for the conduct and reporting of clinical prediction studies. Because the studies examined here were published in high impact medical journals, they are likely to be representative of the higher quality studies published in 2008. However, reporting standards may have improved since 2008, and the conduct of prediction research may actually be better than this analysis suggests because the length restrictions that are often applied to journal articles may account for some of reporting omissions. Nevertheless, despite some encouraging findings, the researchers conclude that the poor reporting and poor methods they found in many published prediction studies is a cause for concern and is likely to limit the reliability and applicability of this type of clinical research.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001221.
The EQUATOR Network is an international initiative that seeks to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies; its website includes information on a wide range of reporting guidelines including the REMARK recommendations (in English and Spanish)
A video of a presentation by Doug Altman, one of the researchers of this study, on improving the reporting standards of the medical evidence base, is available
The Cochrane Prognosis Methods Group provides additional information on the methodology of prognostic research
doi:10.1371/journal.pmed.1001221
PMCID: PMC3358324  PMID: 22629234

Results 1-25 (1957822)