Search tips
Search criteria

Results 1-25 (1838989)

Clipboard (0)

Related Articles

1.  A robust method using propensity score stratification for correcting verification bias for binary tests 
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
PMCID: PMC3276270  PMID: 21856650
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
2.  Estimation of the ROC Curve under Verification Bias 
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
PMCID: PMC3475535  PMID: 19588455
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
3.  Clinical Utility of Serologic Testing for Celiac Disease in Ontario 
Executive Summary
Objective of Analysis
The objective of this evidence-based evaluation is to assess the accuracy of serologic tests in the diagnosis of celiac disease in subjects with symptoms consistent with this disease. Furthermore the impact of these tests in the diagnostic pathway of the disease and decision making was also evaluated.
Celiac Disease
Celiac disease is an autoimmune disease that develops in genetically predisposed individuals. The immunological response is triggered by ingestion of gluten, a protein that is present in wheat, rye, and barley. The treatment consists of strict lifelong adherence to a gluten-free diet (GFD).
Patients with celiac disease may present with a myriad of symptoms such as diarrhea, abdominal pain, weight loss, iron deficiency anemia, dermatitis herpetiformis, among others.
Serologic Testing in the Diagnosis Celiac Disease
There are a number of serologic tests used in the diagnosis of celiac disease.
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibodies (DGP)
Serologic tests are automated with the exception of the EMA test, which is more time-consuming and operator-dependent than the other tests. For each serologic test, both immunoglobulin A (IgA) or G (IgG) can be measured, however, IgA measurement is the standard antibody measured in celiac disease.
Diagnosis of Celiac Disease
According to celiac disease guidelines, the diagnosis of celiac disease is established by small bowel biopsy. Serologic tests are used to initially detect and to support the diagnosis of celiac disease. A small bowel biopsy is indicated in individuals with a positive serologic test. In some cases an endoscopy and small bowel biopsy may be required even with a negative serologic test. The diagnosis of celiac disease must be performed on a gluten-containing diet since the small intestine abnormalities and the serologic antibody levels may resolve or improve on a GFD.
Since IgA measurement is the standard for the serologic celiac disease tests, false negatives may occur in IgA-deficient individuals.
Incidence and Prevalence of Celiac Disease
The incidence and prevalence of celiac disease in the general population and in subjects with symptoms consistent with or at higher risk of celiac disease based on systematic reviews published in 2004 and 2009 are summarized below.
Incidence of Celiac Disease in the General Population
Adults or mixed population: 1 to 17/100,000/year
Children: 2 to 51/100,000/year
In one of the studies, a stratified analysis showed that there was a higher incidence of celiac disease in younger children compared to older children, i.e., 51 cases/100,000/year in 0 to 2 year-olds, 33/100,000/year in 2 to 5 year-olds, and 10/100,000/year in children 5 to 15 years old.
Prevalence of Celiac Disease in the General Population
The prevalence of celiac disease reported in population-based studies identified in the 2004 systematic review varied between 0.14% and 1.87% (median: 0.47%, interquartile range: 0.25%, 0.71%). According to the authors of the review, the prevalence did not vary by age group, i.e., adults and children.
Prevalence of Celiac Disease in High Risk Subjects
Type 1 diabetes (adults and children): 1 to 11%
Autoimmune thyroid disease: 2.9 to 3.3%
First degree relatives of patients with celiac disease: 2 to 20%
Prevalence of Celiac Disease in Subjects with Symptoms Consistent with the Disease
The prevalence of celiac disease in subjects with symptoms consistent with the disease varied widely among studies, i.e., 1.5% to 50% in adult studies, and 1.1% to 17% in pediatric studies. Differences in prevalence may be related to the referral pattern as the authors of a systematic review noted that the prevalence tended to be higher in studies whose population originated from tertiary referral centres compared to general practice.
Research Questions
What is the sensitivity and specificity of serologic tests in the diagnosis celiac disease?
What is the clinical validity of serologic tests in the diagnosis of celiac disease? The clinical validity was defined as the ability of the test to change diagnosis.
What is the clinical utility of serologic tests in the diagnosis of celiac disease? The clinical utility was defined as the impact of the test on decision making.
What is the budget impact of serologic tests in the diagnosis of celiac disease?
What is the cost-effectiveness of serologic tests in the diagnosis of celiac disease?
Literature Search
A literature search was performed on November 13th, 2009 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1st 2003 and November 13th 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with unknown eligibility were reviewed with a second clinical epidemiologist, then a group of epidemiologists until consensus was established. The quality of evidence was assessed as high, moderate, low or very low according to GRADE methodology.
Studies that evaluated diagnostic accuracy, i.e., both sensitivity and specificity of serology tests in the diagnosis of celiac disease.
Study population consisted of untreated patients with symptoms consistent with celiac disease.
Studies in which both serologic celiac disease tests and small bowel biopsy (gold standard) were used in all subjects.
Systematic reviews, meta-analyses, randomized controlled trials, prospective observational studies, and retrospective cohort studies.
At least 20 subjects included in the celiac disease group.
English language.
Human studies.
Studies published from 2000 on.
Clearly defined cut-off value for the serology test. If more than one test was evaluated, only those tests for which a cut-off was provided were included.
Description of small bowel biopsy procedure clearly outlined (location, number of biopsies per patient), unless if specified that celiac disease diagnosis guidelines were followed.
Patients in the treatment group had untreated CD.
Studies on screening of the general asymptomatic population.
Studies that evaluated rapid diagnostic kits for use either at home or in physician’s offices.
Studies that evaluated diagnostic modalities other than serologic tests such as capsule endoscopy, push enteroscopy, or genetic testing.
Cut-off for serologic tests defined based on controls included in the study.
Study population defined based on positive serology or subjects pre-screened by serology tests.
Celiac disease status known before study enrolment.
Sensitivity or specificity estimates based on repeated testing for the same subject.
Non-peer-reviewed literature such as editorials and letters to the editor.
The population consisted of adults and children with untreated, undiagnosed celiac disease with symptoms consistent with the disease.
Serologic Celiac Disease Tests Evaluated
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibody (DGP)
Combinations of some of the serologic tests listed above were evaluated in some studies
Both IgA and IgG antibodies were evaluated for the serologic tests listed above.
Outcomes of Interest
Positive and negative likelihood ratios
Diagnostic odds ratio (OR)
Area under the sROC curve (AUC)
Small bowel biopsy was used as the gold standard in order to estimate the sensitivity and specificity of each serologic test.
Statistical Analysis
Pooled estimates of sensitivity, specificity and diagnostic odds ratios (DORs) for the different serologic tests were calculated using a bivariate, binomial generalized linear mixed model. Statistical significance for differences in sensitivity and specificity between serologic tests was defined by P values less than 0.05, where “false discovery rate” adjustments were made for multiple hypothesis testing. The bivariate regression analyses were performed using SAS version 9.2 (SAS Institute Inc.; Cary, NC, USA). Using the bivariate model parameters, summary receiver operating characteristic (sROC) curves were produced using Review Manager 5.0.22 (The Nordiac Cochrane Centre, The Cochrane Collaboration, 2008). The area under the sROC curve (AUC) was estimated by bivariate mixed-efects binary regression modeling framework. Model specification, estimation and prediction are carried out with xtmelogit in Stata release 10 (Statacorp, 2007). Statistical tests for the differences in AUC estimates could not be carried out.
The study results were stratified according to patient or disease characteristics such as age, severity of Marsh grade abnormalities, among others, if reported in the studies. The literature indicates that the diagnostic accuracy of serologic tests for celiac disease may be affected in patients with chronic liver disease, therefore, the studies identified through the systematic literature review that evaluated the diagnostic accuracy of serologic tests for celiac disease in patients with chronic liver disease were summarized. The effect of the GFD in patiens diagnosed with celiac disease was also summarized if reported in the studies eligible for the analysis.
Summary of Findings
Published Systematic Reviews
Five systematic reviews of studies that evaluated the diagnostic accuracy of serologic celiac disease tests were identified through our literature search. Seventeen individual studies identified in adults and children were eligible for this evaluation.
In general, the studies included evaluated the sensitivity and specificity of at least one serologic test in subjects with symptoms consistent with celiac disease. The gold standard used to confirm the celiac disease diagnosis was small bowel biopsy. Serologic tests evaluated included tTG, EMA, AGA, and DGP, using either IgA or IgG antibodies. Indirect immunoflurorescence was used for the EMA serologic tests whereas enzyme-linked immunosorbent assay (ELISA) was used for the other serologic tests.
Common symptoms described in the studies were chronic diarrhea, abdominal pain, bloating, unexplained weight loss, unexplained anemia, and dermatitis herpetiformis.
The main conclusions of the published systematic reviews are summarized below.
IgA tTG and/or IgA EMA have a high accuracy (pooled sensitivity: 90% to 98%, pooled specificity: 95% to 99% depending on the pooled analysis).
Most reviews found that AGA (IgA or IgG) are not as accurate as IgA tTG and/or EMA tests.
A 2009 systematic review concluded that DGP (IgA or IgG) seems to have a similar accuracy compared to tTG, however, since only 2 studies identified evaluated its accuracy, the authors believe that additional data is required to draw firm conclusions.
Two systematic reviews also concluded that combining two serologic celiac disease tests has little contribution to the accuracy of the diagnosis.
MAS Analysis
The pooled analysis performed by MAS showed that IgA tTG has a sensitivity of 92.1% [95% confidence interval (CI) 88.0, 96.3], compared to 89.2% (83.3, 95.1, p=0.12) for IgA DGP, 85.1% (79.5, 94.4, p=0.07) for IgA EMA, and 74.9% (63.6, 86.2, p=0.0003) for IgA AGA. Among the IgG-based tests, the results suggest that IgG DGP has a sensitivity of 88.4% (95% CI: 82.1, 94.6), 44.7% (30.3, 59.2) for tTG, and 69.1% (56.0, 82.2) for AGA. The difference was significant when IgG DGP was compared to IgG tTG but not IgG AGA. Combining serologic celiac disease tests yielded a slightly higher sensitivity compared to individual IgA-based serologic tests.
IgA deficiency
The prevalence of total or severe IgA deficiency was low in the studies identified varying between 0 and 1.7% as reported in 3 studies in which IgA deficiency was not used as a referral indication for celiac disease serologic testing. The results of IgG-based serologic tests were positive in all patients with IgA deficiency in which celiac disease was confirmed by small bowel biopsy as reported in four studies.
The MAS pooled analysis indicates a high specificity across the different serologic tests including the combination strategy, pooled estimates ranged from 90.1% to 98.7% depending on the test.
Likelihood Ratios
According to the likelihood ratio estimates, both IgA tTG and serologic test combinationa were considered very useful tests (positive likelihood ratio above ten and the negative likelihood ratio below 0.1).
Moderately useful tests included IgA EMA, IgA DGP, and IgG DGP (positive likelihood ratio between five and ten and the negative likelihood ratio between 0.1 and 0.2).
Somewhat useful tests: IgA AGA, IgG AGA, generating small but sometimes important changes from pre- to post-test probability (positive LR between 2 and 5 and negative LR between 0.2 and 0.5)
Not Useful: IgG tTG, altering pre- to post-test probability to a small and rarely important degree (positive LR between 1 and 2 and negative LR between 0.5 and 1).
Diagnostic Odds Ratios (DOR)
Among the individual serologic tests, IgA tTG had the highest DOR, 136.5 (95% CI: 51.9, 221.2). The statistical significance of the difference in DORs among tests was not calculated, however, considering the wide confidence intervals obtained, the differences may not be statistically significant.
Area Under the sROC Curve (AUC)
The sROC AUCs obtained ranged between 0.93 and 0.99 for most IgA-based tests with the exception of IgA AGA, with an AUC of 0.89.
Sensitivity and Specificity of Serologic Tests According to Age Groups
Serologic test accuracy did not seem to vary according to age (adults or children).
Sensitivity and Specificity of Serologic Tests According to Marsh Criteria
Four studies observed a trend towards a higher sensitivity of serologic celiac disease tests when Marsh 3c grade abnormalities were found in the small bowel biopsy compared to Marsh 3a or 3b (statistical significance not reported). The sensitivity of serologic tests was much lower when Marsh 1 grade abnormalities were found in small bowel biopsy compared to Marsh 3 grade abnormalities. The statistical significance of these findings were not reported in the studies.
Diagnostic Accuracy of Serologic Celiac Disease Tests in Subjects with Chronic Liver Disease
A total of 14 observational studies that evaluated the specificity of serologic celiac disease tests in subjects with chronic liver disease were identified. All studies evaluated the frequency of false positive results (1-specificity) of IgA tTG, however, IgA tTG test kits using different substrates were used, i.e., human recombinant, human, and guinea-pig substrates. The gold standard, small bowel biopsy, was used to confirm the result of the serologic tests in only 5 studies. The studies do not seem to have been designed or powered to compare the diagnostic accuracy among different serologic celiac disease tests.
The results of the studies identified in the systematic literature review suggest that there is a trend towards a lower frequency of false positive results if the IgA tTG test using human recombinant substrate is used compared to the guinea pig substrate in subjects with chronic liver disease. However, the statistical significance of the difference was not reported in the studies. When IgA tTG with human recombinant substrate was used, the number of false positives seems to be similar to what was estimated in the MAS pooled analysis for IgA-based serologic tests in a general population of patients. These results should be interpreted with caution since most studies did not use the gold standard, small bowel biopsy, to confirm or exclude the diagnosis of celiac disease, and since the studies were not designed to compare the diagnostic accuracy among different serologic tests. The sensitivity of the different serologic tests in patients with chronic liver disease was not evaluated in the studies identified.
Effects of a Gluten-Free Diet (GFD) in Patients Diagnosed with Celiac Disease
Six studies identified evaluated the effects of GFD on clinical, histological, or serologic improvement in patients diagnosed with celiac disease. Improvement was observed in 51% to 95% of the patients included in the studies.
Grading of Evidence
Overall, the quality of the evidence ranged from moderate to very low depending on the serologic celiac disease test. Reasons to downgrade the quality of the evidence included the use of a surrogate endpoint (diagnostic accuracy) since none of the studies evaluated clinical outcomes, inconsistencies among study results, imprecise estimates, and sparse data. The quality of the evidence was considered moderate for IgA tTg and IgA EMA, low for IgA DGP, and serologic test combinations, and very low for IgA AGA.
Clinical Validity and Clinical Utility of Serologic Testing in the Diagnosis of Celiac Disease
The clinical validity of serologic tests in the diagnosis of celiac disease was considered high in subjects with symptoms consistent with this disease due to
High accuracy of some serologic tests.
Serologic tests detect possible celiac disease cases and avoid unnecessary small bowel biopsy if the test result is negative, unless an endoscopy/ small bowel biopsy is necessary due to the clinical presentation.
Serologic tests support the results of small bowel biopsy.
The clinical utility of serologic tests for the diagnosis of celiac disease, as defined by its impact in decision making was also considered high in subjects with symptoms consistent with this disease given the considerations listed above and since celiac disease diagnosis leads to treatment with a gluten-free diet.
Economic Analysis
A decision analysis was constructed to compare costs and outcomes between the tests based on the sensitivity, specificity and prevalence summary estimates from the MAS Evidence-Based Analysis (EBA). A budget impact was then calculated by multiplying the expected costs and volumes in Ontario. The outcome of the analysis was expected costs and false negatives (FN). Costs were reported in 2010 CAD$. All analyses were performed using TreeAge Pro Suite 2009.
Four strategies made up the efficiency frontier; IgG tTG, IgA tTG, EMA and small bowel biopsy. All other strategies were dominated. IgG tTG was the least costly and least effective strategy ($178.95, FN avoided=0). Small bowel biopsy was the most costly and most effective strategy ($396.60, FN avoided =0.1553). The cost per FN avoided were $293, $369, $1,401 for EMA, IgATTG and small bowel biopsy respectively. One-way sensitivity analyses did not change the ranking of strategies.
All testing strategies with small bowel biopsy are cheaper than biopsy alone however they also result in more FNs. The most cost-effective strategy will depend on the decision makers’ willingness to pay. Findings suggest that IgA tTG was the most cost-effective and feasible strategy based on its Incremental Cost-Effectiveness Ratio (ICER) and convenience to conduct the test. The potential impact of IgA tTG test in the province of Ontario would be $10.4M, $11.0M and $11.7M respectively in the following three years based on past volumes and trends in the province and basecase expected costs.
The panel of tests is the commonly used strategy in the province of Ontario therefore the impact to the system would be $13.6M, $14.5M and $15.3M respectively in the next three years based on past volumes and trends in the province and basecase expected costs.
The clinical validity and clinical utility of serologic tests for celiac disease was considered high in subjects with symptoms consistent with this disease as they aid in the diagnosis of celiac disease and some tests present a high accuracy.
The study findings suggest that IgA tTG is the most accurate and the most cost-effective test.
AGA test (IgA) has a lower accuracy compared to other IgA-based tests
Serologic test combinations appear to be more costly with little gain in accuracy. In addition there may be problems with generalizability of the results of the studies included in this review if different test combinations are used in clinical practice.
IgA deficiency seems to be uncommon in patients diagnosed with celiac disease.
The generalizability of study results is contingent on performing both the serologic test and small bowel biopsy in subjects on a gluten-containing diet as was the case in the studies identified, since the avoidance of gluten may affect test results.
PMCID: PMC3377499  PMID: 23074399
4.  Positron Emission Tomography for the Assessment of Myocardial Viability 
Executive Summary
In July 2009, the Medical Advisory Secretariat (MAS) began work on Non-Invasive Cardiac Imaging Technologies for the Assessment of Myocardial Viability, an evidence-based review of the literature surrounding different cardiac imaging modalities to ensure that appropriate technologies are accessed by patients undergoing viability assessment. This project came about when the Health Services Branch at the Ministry of Health and Long-Term Care asked MAS to provide an evidentiary platform on effectiveness and cost-effectiveness of non-invasive cardiac imaging modalities.
After an initial review of the strategy and consultation with experts, MAS identified five key non-invasive cardiac imaging technologies that can be used for the assessment of myocardial viability: positron emission tomography, cardiac magnetic resonance imaging, dobutamine echocardiography, and dobutamine echocardiography with contrast, and single photon emission computed tomography.
A 2005 review conducted by MAS determined that positron emission tomography was more sensitivity than dobutamine echocardiography and single photon emission tomography and dominated the other imaging modalities from a cost-effective standpoint. However, there was inadequate evidence to compare positron emission tomography and cardiac magnetic resonance imaging. Thus, this report focuses on this comparison only. For both technologies, an economic analysis was also completed.
The Non-Invasive Cardiac Imaging Technologies for the Assessment of Myocardial Viability is made up of the following reports, which can be publicly accessed at the MAS website at: or at
Positron Emission Tomography for the Assessment of Myocardial Viability: An Evidence-Based Analysis
Magnetic Resonance Imaging for the Assessment of Myocardial Viability: An Evidence-Based Analysis
The objective of this analysis is to assess the effectiveness and safety of positron emission tomography (PET) imaging using F-18-fluorodeoxyglucose (FDG) for the assessment of myocardial viability. To evaluate the effectiveness of FDG PET viability imaging, the following outcomes are examined:
the diagnostic accuracy of FDG PET for predicting functional recovery;
the impact of PET viability imaging on prognosis (mortality and other patient outcomes); and
the contribution of PET viability imaging to treatment decision making and subsequent patient outcomes.
Clinical Need: Condition and Target Population
Left Ventricular Systolic Dysfunction and Heart Failure
Heart failure is a complex syndrome characterized by the heart’s inability to maintain adequate blood circulation through the body leading to multiorgan abnormalities and, eventually, death. Patients with heart failure experience poor functional capacity, decreased quality of life, and increased risk of morbidity and mortality.
In 2005, more than 71,000 Canadians died from cardiovascular disease, of which, 54% were due to ischemic heart disease. Left ventricular (LV) systolic dysfunction due to coronary artery disease (CAD)1 is the primary cause of heart failure accounting for more than 70% of cases. The prevalence of heart failure was estimated at one percent of the Canadian population in 1989. Since then, the increase in the older population has undoubtedly resulted in a substantial increase in cases. Heart failure is associated with a poor prognosis: one-year mortality rates were 32.9% and 31.1% for men and women, respectively in Ontario between 1996 and 1997.
Treatment Options
In general, there are three options for the treatment of heart failure: medical treatment, heart transplantation, and revascularization for those with CAD as the underlying cause. Concerning medical treatment, despite recent advances, mortality remains high among treated patients, while, heart transplantation is affected by the limited availability of donor hearts and consequently has long waiting lists. The third option, revascularization, is used to restore the flow of blood to the heart via coronary artery bypass grafting (CABG) or through minimally invasive percutaneous coronary interventions (balloon angioplasty and stenting). Both methods, however, are associated with important perioperative risks including mortality, so it is essential to properly select patients for this procedure.
Myocardial Viability
Left ventricular dysfunction may be permanent if a myocardial scar is formed, or it may be reversible after revascularization. Reversible LV dysfunction occurs when the myocardium is viable but dysfunctional (reduced contractility). Since only patients with dysfunctional but viable myocardium benefit from revascularization, the identification and quantification of the extent of myocardial viability is an important part of the work-up of patients with heart failure when determining the most appropriate treatment path. Various non-invasive cardiac imaging modalities can be used to assess patients in whom determination of viability is an important clinical issue, specifically:
dobutamine echocardiography (echo),
stress echo with contrast,
SPECT using either technetium or thallium,
cardiac magnetic resonance imaging (cardiac MRI), and
positron emission tomography (PET).
Dobutamine Echocardiography
Stress echocardiography can be used to detect viable myocardium. During the infusion of low dose dobutamine (5 – 10 μg/kg/min), an improvement of contractility in hypokinetic and akentic segments is indicative of the presence of viable myocardium. Alternatively, a low-high dose dobutamine protocol can be used in which a biphasic response characterized by improved contractile function during the low-dose infusion followed by a deterioration in contractility due to stress induced ischemia during the high dose dobutamine infusion (dobutamine dose up to 40 ug/kg/min) represents viable tissue. Newer techniques including echocardiography using contrast agents, harmonic imaging, and power doppler imaging may help to improve the diagnostic accuracy of echocardiographic assessment of myocardial viability.
Stress Echocardiography with Contrast
Intravenous contrast agents, which are high molecular weight inert gas microbubbles that act like red blood cells in the vascular space, can be used during echocardiography to assess myocardial viability. These agents allow for the assessment of myocardial blood flow (perfusion) and contractile function (as described above), as well as the simultaneous assessment of perfusion to make it possible to distinguish between stunned and hibernating myocardium.
SPECT can be performed using thallium-201 (Tl-201), a potassium analogue, or technetium-99 m labelled tracers. When Tl-201 is injected intravenously into a patient, it is taken up by the myocardial cells through regional perfusion, and Tl-201 is retained in the cell due to sodium/potassium ATPase pumps in the myocyte membrane. The stress-redistribution-reinjection protocol involves three sets of images. The first two image sets (taken immediately after stress and then three to four hours after stress) identify perfusion defects that may represent scar tissue or viable tissue that is severely hypoperfused. The third set of images is taken a few minutes after the re-injection of Tl-201 and after the second set of images is completed. These re-injection images identify viable tissue if the defects exhibit significant fill-in (> 10% increase in tracer uptake) on the re-injection images.
The other common Tl-201 viability imaging protocol, rest-redistribution, involves SPECT imaging performed at rest five minutes after Tl-201 is injected and again three to four hours later. Viable tissue is identified if the delayed images exhibit significant fill-in of defects identified in the initial scans (> 10% increase in uptake) or if defects are fixed but the tracer activity is greater than 50%.
There are two technetium-99 m tracers: sestamibi (MIBI) and tetrofosmin. The uptake and retention of these tracers is dependent on regional perfusion and the integrity of cellular membranes. Viability is assessed using one set of images at rest and is defined by segments with tracer activity greater than 50%.
Cardiac Magnetic Resonance Imaging
Cardiac magnetic resonance imaging (cardiac MRI) is a non-invasive, x-ray free technique that uses a powerful magnetic field, radio frequency pulses, and a computer to produce detailed images of the structure and function of the heart. Two types of cardiac MRI are used to assess myocardial viability: dobutamine stress magnetic resonance imaging (DSMR) and delayed contrast-enhanced cardiac MRI (DE-MRI). DE-MRI, the most commonly used technique in Ontario, uses gadolinium-based contrast agents to define the transmural extent of scar, which can be visualized based on the intensity of the image. Hyper-enhanced regions correspond to irreversibly damaged myocardium. As the extent of hyper-enhancement increases, the amount of scar increases, so there is a lower the likelihood of functional recovery.
Cardiac Positron Emission Tomography
Positron emission tomography (PET) is a nuclear medicine technique used to image tissues based on the distinct ways in which normal and abnormal tissues metabolize positron-emitting radionuclides. Radionuclides are radioactive analogs of common physiological substrates such as sugars, amino acids, and free fatty acids that are used by the body. The only licensed radionuclide used in PET imaging for viability assessment is F-18 fluorodeoxyglucose (FDG).
During a PET scan, the radionuclides are injected into the body and as they decay, they emit positively charged particles (positrons) that travel several millimetres into tissue and collide with orbiting electrons. This collision results in annihilation where the combined mass of the positron and electron is converted into energy in the form of two 511 keV gamma rays, which are then emitted in opposite directions (180 degrees) and captured by an external array of detector elements in the PET gantry. Computer software is then used to convert the radiation emission into images. The system is set up so that it only detects coincident gamma rays that arrive at the detectors within a predefined temporal window, while single photons arriving without a pair or outside the temporal window do not active the detector. This allows for increased spatial and contrast resolution.
Evidence-Based Analysis
Research Questions
What is the diagnostic accuracy of PET for detecting myocardial viability?
What is the prognostic value of PET viability imaging (mortality and other clinical outcomes)?
What is the contribution of PET viability imaging to treatment decision making?
What is the safety of PET viability imaging?
Literature Search
A literature search was performed on July 17, 2009 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1, 2004 to July 16, 2009. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. In addition, published systematic reviews and health technology assessments were reviewed for relevant studies published before 2004. Reference lists of included studies were also examined for any additional relevant studies not already identified. The quality of the body of evidence was assessed as high, moderate, low or very low according to GRADE methodology.
Inclusion Criteria
Criteria applying to diagnostic accuracy studies, prognosis studies, and physician decision-making studies:
English language full-reports
Health technology assessments, systematic reviews, meta-analyses, randomized controlled trials (RCTs), and observational studies
Patients with chronic, known CAD
PET imaging using FDG for the purpose of detecting viable myocardium
Criteria applying to diagnostic accuracy studies:
Assessment of functional recovery ≥3 months after revascularization
Raw data available to calculate sensitivity and specificity
Gold standard: prediction of global or regional functional recovery
Criteria applying to prognosis studies:
Mortality studies that compare revascularized patients with non-revascularized patients and patients with viable and non-viable myocardium
Exclusion Criteria
Criteria applying to diagnostic accuracy studies, prognosis studies, and physician decision-making studies:
PET perfusion imaging
< 20 patients
< 18 years of age
Patients with non-ischemic heart disease
Animal or phantom studies
Studies focusing on the technical aspects of PET
Studies conducted exclusively in patients with acute myocardial infarction (MI)
Duplicate publications
Criteria applying to diagnostic accuracy studies
Gold standard other than functional recovery (e.g., PET or cardiac MRI)
Assessment of functional recovery occurs before patients are revascularized
Outcomes of Interest
Diagnostic accuracy studies
Sensitivity and specificity
Positive and negative predictive values (PPV and NPV)
Positive and negative likelihood ratios
Diagnostic accuracy
Adverse events
Prognosis studies
Mortality rate
Functional status
Exercise capacity
Quality of Life
Influence on PET viability imaging on physician decision making
Statistical Methods
Pooled estimates of sensitivity and specificity were calculated using a bivariate, binomial generalized linear mixed model. Statistical significance was defined by P values less than 0.05, where “false discovery rate” adjustments were made for multiple hypothesis testing. Using the bivariate model parameters, summary receiver operating characteristic (sROC) curves were produced. The area under the sROC curve was estimated by numerical integration with a cubic spline (default option). Finally, pooled estimates of mortality rates were calculated using weighted means.
Quality of Evidence
The quality of evidence assigned to individual diagnostic studies was determined using the QUADAS tool, a list of 14 questions that address internal and external validity, bias, and generalizibility of diagnostic accuracy studies. Each question is scored as “yes”, “no”, or “unclear”. The quality of the body of evidence was then assessed as high, moderate, low, or very low according to the GRADE Working Group criteria. The following definitions of quality were used in grading the quality of the evidence:
Summary of Findings
A total of 40 studies met the inclusion criteria and were included in this review: one health technology assessment, two systematic reviews, 22 observational diagnostic accuracy studies, and 16 prognosis studies. The available PET viability imaging literature addresses two questions: 1) what is the diagnostic accuracy of PET imaging for the assessment; and 2) what is the prognostic value of PET viability imaging. The diagnostic accuracy studies use regional or global functional recovery as the reference standard to determine the sensitivity and specificity of the technology. While regional functional recovery was most commonly used in the studies, global functional recovery is more important clinically. Due to differences in reporting and thresholds, however, it was not possible to pool global functional recovery.
Functional recovery, however, is a surrogate reference standard for viability and consequently, the diagnostic accuracy results may underestimate the specificity of PET viability imaging. For example, regional functional recovery may take up to a year after revascularization depending on whether it is stunned or hibernating tissue, while many of the studies looked at regional functional recovery 3 to 6 months after revascularization. In addition, viable tissue may not recover function after revascularization due to graft patency or re-stenosis. Both issues may lead to false positives and underestimate specificity. Given these limitations, the prognostic value of PET viability imaging provides the most direct and clinically useful information. This body of literature provides evidence on the comparative effectiveness of revascularization and medical therapy in patients with viable myocardium and patients without viable myocardium. In addition, the literature compares the impact of PET-guided treatment decision making with SPECT-guided or standard care treatment decision making on survival and cardiac events (including cardiac mortality, MI, hospital stays, unintended revascularization, etc).
The main findings from the diagnostic accuracy and prognosis evidence are:
Based on the available very low quality evidence, PET is a useful imaging modality for the detection of viable myocardium. The pooled estimates of sensitivity and specificity for the prediction of regional functional recovery as a surrogate for viable myocardium are 91.5% (95% CI, 88.2% – 94.9%) and 67.8% (95% CI, 55.8% – 79.7%), respectively.
Based the available very low quality of evidence, an indirect comparison of pooled estimates of sensitivity and specificity showed no statistically significant difference in the diagnostic accuracy of PET viability imaging for regional functional recovery using perfusion/metabolism mismatch with FDG PET plus either a PET or SPECT perfusion tracer compared with metabolism imaging with FDG PET alone.
FDG PET + PET perfusion metabolism mismatch: sensitivity, 89.9% (83.5% – 96.4%); specificity, 78.3% (66.3% – 90.2%);
FDG PET + SPECT perfusion metabolism mismatch: sensitivity, 87.2% (78.0% – 96.4%); specificity, 67.1% (48.3% – 85.9%);
FDG PET metabolism: sensitivity, 94.5% (91.0% – 98.0%); specificity, 66.8% (53.2% – 80.3%).
Given these findings, further higher quality studies are required to determine the comparative effectiveness and clinical utility of metabolism and perfusion/metabolism mismatch viability imaging with PET.
Based on very low quality of evidence, patients with viable myocardium who are revascularized have a lower mortality rate than those who are treated with medical therapy. Given the quality of evidence, however, this estimate of effect is uncertain so further higher quality studies in this area should be undertaken to determine the presence and magnitude of the effect.
While revascularization may reduce mortality in patients with viable myocardium, current moderate quality RCT evidence suggests that PET-guided treatment decisions do not result in statistically significant reductions in mortality compared with treatment decisions based on SPECT or standard care protocols. The PARR II trial by Beanlands et al. found a significant reduction in cardiac events (a composite outcome that includes cardiac deaths, MI, or hospital stay for cardiac cause) between the adherence to PET recommendations subgroup and the standard care group (hazard ratio, .62; 95% confidence intervals, 0.42 – 0.93; P = .019); however, this post-hoc sub-group analysis is hypothesis generating and higher quality studies are required to substantiate these findings.
The use of FDG PET plus SPECT to determine perfusion/metabolism mismatch to assess myocardial viability increases the radiation exposure compared with FDG PET imaging alone or FDG PET combined with PET perfusion imaging (total-body effective dose: FDG PET, 7 mSv; FDG PET plus PET perfusion tracer, 7.6 – 7.7 mSV; FDG PET plus SPECT perfusion tracer, 16 – 25 mSv). While the precise risk attributed to this increased exposure is unknown, there is increasing concern regarding lifetime multiple exposures to radiation-based imaging modalities, although the incremental lifetime risk for patients who are older or have a poor prognosis may not be as great as for healthy individuals.
PMCID: PMC3377573  PMID: 23074393
5.  A model for adjusting for nonignorable verification bias in estimation of ROC curve and its area with likelihood-based approach 
Biometrics  2010;66(4):1119-1128.
In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this paper, we proposed a likelihood-based approach to estimate the ROC curve and the area under ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all the four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in the Alzheimer’s disease.
PMCID: PMC3618959  PMID: 20222937
Alzheimer’s disease; nonignorable missing data; ROC curve; verification bias
6.  Covariate adjustment in estimating the area under ROC curve with partially missing gold standard 
Biometrics  2013;69(1):91-100.
In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as “verification bias”. To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modelled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not-missing-at-random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
PMCID: PMC3622116  PMID: 23410529
Alzheimer's disease; area under ROC curve; covariate adjustment; U statistics; verification bias; weighted estimating equations
7.  Direct Estimation of the Area Under the Receiver Operating Characteristic Curve in the Presence of Verification Bias 
Statistics in medicine  2009;28(3):361-376.
The area under a receiver operating characteristic (ROC) curve (AUC) is a commonly used index for summarizing the ability of a continuous diagnostic test to discriminate between healthy and diseased subjects. If all subjects have their true disease status verified, one can directly estimate the AUC nonparametrically using the Wilcoxon statistic. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Because estimators of the AUC based only on verified subjects are typically biased, it is common to estimate the AUC from a bias-corrected ROC curve. The variance of the estimator, however, does not have a closed-form expression and thus resampling techniques are used to obtain an estimate. In this paper, we develop a new method for directly estimating the AUC in the setting of verification bias based on U-statistics and inverse probability weighting. Closed-form expressions for the estimator and its variance are derived. We also show that the new estimator is equivalent to the empirical AUC derived from the bias-corrected ROC curve arising from the inverse probability weighting approach.
PMCID: PMC2626141  PMID: 18680124
Diagnostic test; Inverse probability weighting; Missing at random; U-statistic
8.  Self-Administered Outpatient Antimicrobial Infusion by Uninsured Patients Discharged from a Safety-Net Hospital: A Propensity-Score-Balanced Retrospective Cohort Study 
PLoS Medicine  2015;12(12):e1001922.
Outpatient parenteral antimicrobial therapy (OPAT) is accepted as safe and effective for medically stable patients to complete intravenous (IV) antibiotics in an outpatient setting. Since, however, uninsured patients in the United States generally cannot afford OPAT, safety-net hospitals are often burdened with long hospitalizations purely to infuse antibiotics, occupying beds that could be used for patients requiring more intensive services. OPAT is generally delivered in one of four settings: infusion centers, nursing homes, at home with skilled nursing assistance, or at home with self-administered therapy. The first three—termed healthcare-administered OPAT (H-OPAT)—are most commonly used in the United States by patients with insurance funding. The fourth—self-administered OPAT (S-OPAT)—is relatively uncommon, with the few published studies having been conducted in the United Kingdom. With multidisciplinary planning, we established an S-OPAT clinic in 2009 to shift care of selected uninsured patients safely to self-administration of their IV antibiotics at home. We undertook this study to determine whether the low-income mostly non-English-speaking patients in our S-OPAT program could administer their own IV antimicrobials at home with outcomes as good as, or better than, those receiving H-OPAT.
Methods and Findings
Parkland Hospital is a safety-net hospital serving Dallas County, Texas. From 1 January 2009 to 14 October 2013, all uninsured patients meeting criteria were enrolled in S-OPAT, while insured patients were discharged to H-OPAT settings. The S-OPAT patients were trained through multilingual instruction to self-administer IV antimicrobials by gravity, tested for competency before discharge, and thereafter followed at designated intervals in the S-OPAT outpatient clinic for IV access care, laboratory monitoring, and physician follow-up. The primary outcome was 30-d all-cause readmission, and the secondary outcome was 1-y all-cause mortality. The study was adequately powered for readmission but not for mortality. Clinical, sociodemographic, and outcome data were collected from the Parkland Hospital electronic medical records and the US census, constituting a historical prospective cohort study. We used multivariable logistic regression to develop a propensity score predicting S-OPAT versus H-OPAT group membership from covariates. We then estimated the effect of S-OPAT versus H-OPAT on the two outcomes using multivariable proportional hazards regression, controlling for selection bias and confounding with the propensity score and covariates.
Of the 1,168 patients discharged to receive OPAT, 944 (81%) were managed in the S-OPAT program and 224 (19%) by H-OPAT services. In multivariable proportional hazards regression models controlling for confounding and selection bias, the 30-d readmission rate was 47% lower in the S-OPAT group (adjusted hazard ratio [aHR], 0.53; 95% CI 0.35–0.81; p = 0.003), and the 1-y mortality rate did not differ significantly between the groups (aHR, 0.86; 95% CI 0.37–2.00; p = 0.73). The S-OPAT program shifted a median 26 d of inpatient infusion per patient to the outpatient setting, avoiding 27,666 inpatient days. The main limitation of this observational study—the potential bias from the difference in healthcare funding status of the groups—was addressed by propensity score modeling.
S-OPAT was associated with similar or better clinical outcomes than H-OPAT. S-OPAT may be an acceptable model of treatment for uninsured, medically stable patients to complete extended courses of IV antimicrobials at home.
In a propensity score-balanced retrospective cohort study, Kavita Bhavan and colleagues compare health outcomes for patients undergoing self-administered versus healthcare-administered outpatient parenteral antimicrobial therapy.
Editors' Summary
Patients sometimes need lengthy courses of antimicrobial agents to treat life-threatening infections. For example, patients who develop endocarditis (an infection of the inner lining of the heart usually caused by bacteria entering the blood and traveling to the heart) need to be given antimicrobial drugs for up to six weeks. Initially, these patients require intensive diagnostic and therapeutic care in the hospital. But once the antimicrobial treatment starts to work, most patients only need regular intravenous antimicrobial infusions. Patients who stay in the hospital to receive this low intensity care occupy beds that could be used for patients requiring more intensive care. Moreover, they are at risk of catching a hospital-acquired, antibiotic-resistant infection. For these reasons, and because long-term administration of antimicrobial agents in the hospital is costly, outpatient parenteral (injected or infused) antimicrobial therapy (OPAT) is increasingly being used as a safe and effective way for medically stable patients to complete a course of intravenous antibiotics outside the hospital.
Why Was This Study Done?
In the US, OPAT is usually delivered in infusion centers, in nursing homes, or at home by visiting nurses. But healthcare-administered OPAT (H-OPAT) is available only to insured patients (in the US, medical insurance provided by employers or by the government-run Medicare and Medicaid programs funds healthcare). Uninsured people cannot usually afford H-OPAT and have to stay in safety-net hospitals (public hospitals that provide care to low-income, uninsured populations) for intravenous antibiotic treatment. In this propensity-score-balanced retrospective cohort study, the researchers investigate whether uninsured patients discharged from a safety-net hospital in Texas to self-administer OPAT at home (S-OPAT) can achieve outcomes as good as or better than those achieved by patients receiving H-OPAT. A retrospective cohort study compares recorded clinical outcomes in groups of patients who received different treatments. Because the patients were not chosen at random, such studies are subject to selection bias and confounding. Propensity score balancing is used to control for selection bias—the possibility that some members of the population are less likely to be included in a study than others. Adjustment for covariates (patient characteristics that may affect the outcome under study) is used to control for confounding—the possibility that unknown characteristics shared by patients with a specific outcome, rather than any treatment, may be responsible for that outcome.
What Did the Researchers Do and Find?
Between 2010 and 2013, 994 uninsured patients were enrolled in the hospital’s S-OPAT program, and 224 insured patients were discharged to an H-OPAT program. Patients in the S-OPAT group were trained to self-administer intravenous antimicrobials, tested for their ability to treat themselves before discharge, and then monitored by weekly visits to the S-OPAT outpatient clinic. The researchers estimated the effect of S-OPAT versus H-OPAT on 30-day all-cause readmission and one-year all-cause mortality (the primary and secondary outcomes, respectively) after adjusting for covariates and controlling for selection bias with a propensity score developed using baseline clinical and sociodemographic information collected from the patients. The 30-day readmission rate was 47% lower in the S-OPAT group than in the H-OPAT group (a significant result unlikely to have arisen by chance), and the one-year mortality rate did not differ significantly between the two groups. Notably, because the S-OPAT program resulted in patients spending fewer days having inpatient infusions, 27,666 inpatient days were avoided over the study period.
What Do These Findings Mean?
These findings indicate that, after adjusting for preexisting differences between those patients receiving S-OPAT and those receiving H-OPAT and for potential confounders, the risk of readmission within 30 days of discharge was lower in the S-OPAT group than in the H-OPAT group and the risk of dying within one year of hospital discharge did not differ significantly between the two groups (the study did not include enough participants to detect any subtle difference that might have existed for this end point). Thus, S-OPAT was associated with similar or better outcomes than H-OPAT. Note that there may be residual selection bias and confounding by characteristics not included in the propensity score. This study did not address whether S-OPAT actually improves outcomes for patients compared with H-OPAT; a randomized controlled trial in which patients are randomly assigned to receive the two treatments is needed to do this. Nevertheless, these findings suggest that S-OPAT might make it possible for uninsured, medically stable patients to have extended courses of intravenous antimicrobials at home rather than remaining in the hospital until their treatment is complete.
Additional Information
This list of resources contains links that can be accessed when viewing the PDF on a device or via the online version of the article at
The UK National Health Service Choices website provides basic information about the use of antibiotics, including information about when intravenous antibiotics are needed and about endocarditis
The US National Heart, Lung, and Blood Institute also provides information about endocarditis and its treatment
The Infectious Diseases Society of America provides clinical guidelines for the use of OPAT
The OPAT Initiative of the British Society for Antimicrobial Chemotherapy is a multi-stakeholder project that supports the establishment of standardized OPAT services throughout the UK; it also provides guidelines for the use of OPAT
Wikipedia has a page on propensity score matching (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4686020  PMID: 26671467
9.  Semiparametric estimation of the covariate-specific ROC curve in presence of ignorable veri cation bias 
Biometrics  2011;67(3):906-916.
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
PMCID: PMC3596883  PMID: 21361890
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
10.  Optimal cut-off criteria for duplex ultrasound for the diagnosis of restenosis in stented carotid arteries: Review and protocol for a diagnostic study 
BMC Neurology  2009;9:36.
Carotid angioplasty with stenting is a relatively new, increasingly used, less-invasive treatment for the treatment of symptomatic carotid artery stenosis. It is being evaluated in ongoing and nearly finished randomized trials. An important factor in the evaluation of stents is the occurrence of in-stent restenosis. An un-stented carotid artery is likely to have a more elastic vessel wall than a stented one, even if stenosis is present. Therefore, duplex ultrasound cut-off criteria for the degrees of an in-stent stenosis, based on blood velocity parameters, are probably different from the established cut-offs used for un-stented arteries. Routine criteria can not be applied to stented arteries but new criteria need to be established for this particular purpose.
Current literature was systematically reviewed. From the selected studies, the following data were extracted: publication year, population size, whether the study was prospective, duplex ultrasound cut-off criteria reported, which reference test was used, and if there was an indication for selection bias and for verification bias in particular. Previous studies often were retrospective, or the reference test (DSA or CTA) was carried out only when a patient was suspected of having restenosis at DUS, which may result in verification bias.
In general, the velocity cut-off values for stenosis measurements in stented arteries were higher than those reported for unstented arteries. Previous studies often were retrospective, or the reference test (DSA or CTA) was carried out only when a patient was suspected of having restenosis at DUS, which may result in verification bias.
To address the deficiencies of the existing studies, we propose a prospective cohort study nested within the International Carotid Stenting Study (ICSS), an international multi-centre trial in which over 1,700 patients have been randomised between stenting and CEA. In this cohort we will enrol a minimum of 300 patients treated with a stent. All patients undergo regular DUS examination at the yearly follow-up visit according to the ICSS protocol. To avoid verification bias, an additional computed tomography angiography (CTA) will be performed as a reference test in all consecutive patients, regardless of the degree of stenosis on the initial DUS test.
PMCID: PMC2722571  PMID: 19624830
11.  Diagnosing Severe Falciparum Malaria in Parasitaemic African Children: A Prospective Evaluation of Plasma PfHRP2 Measurement 
PLoS Medicine  2012;9(8):e1001297.
Arjen Dondorp and colleagues investigate whether the plasma level of Plasmodium falciparum histidine-rich protein 2 can be used to distinguish between severe malaria and other severe febrile illness in African children with malaria.
In African children, distinguishing severe falciparum malaria from other severe febrile illnesses with coincidental Plasmodium falciparum parasitaemia is a major challenge. P. falciparum histidine-rich protein 2 (PfHRP2) is released by mature sequestered parasites and can be used to estimate the total parasite burden. We investigated the prognostic significance of plasma PfHRP2 and used it to estimate the malaria-attributable fraction in African children diagnosed with severe malaria.
Methods and Findings
Admission plasma PfHRP2 was measured prospectively in African children (from Mozambique, The Gambia, Kenya, Tanzania, Uganda, Rwanda, and the Democratic Republic of the Congo) aged 1 month to 15 years with severe febrile illness and a positive P. falciparum lactate dehydrogenase (pLDH)-based rapid test in a clinical trial comparing parenteral artesunate versus quinine (the AQUAMAT trial, ISRCTN 50258054). In 3,826 severely ill children, Plasmadium falciparum PfHRP2 was higher in patients with coma (p = 0.0209), acidosis (p<0.0001), and severe anaemia (p<0.0001). Admission geometric mean (95%CI) plasma PfHRP2 was 1,611 (1,350–1,922) ng/mL in fatal cases (n = 381) versus 1,046 (991–1,104) ng/mL in survivors (n = 3,445, p<0.0001), without differences in parasitaemia as assessed by microscopy. There was a U-shaped association between log10 plasma PfHRP2 and risk of death. Mortality increased 20% per log10 increase in PfHRP2 above 174 ng/mL (adjusted odds ratio [AOR] 1.21, 95%CI 1.05–1.39, p = 0.009). A mechanistic model assuming a PfHRP2-independent risk of death in non-malaria illness closely fitted the observed data and showed malaria-attributable mortality less than 50% with plasma PfHRP2≤174 ng/mL. The odds ratio (OR) for death in artesunate versus quinine-treated patients was 0.61 (95%CI 0.44–0.83, p = 0.0018) in the highest PfHRP2 tertile, whereas there was no difference in the lowest tertile (OR 1.05; 95%CI 0.69–1.61; p = 0.82). A limitation of the study is that some conclusions are drawn from a mechanistic model, which is inherently dependent on certain assumptions. However, a sensitivity analysis of the model indicated that the results were robust to a plausible range of parameter estimates. Further studies are needed to validate our findings.
Plasma PfHRP2 has prognostic significance in African children with severe falciparum malaria and provides a tool to stratify the risk of “true” severe malaria-attributable disease as opposed to other severe illnesses in parasitaemic African children.
Please see later in the article for the Editors' Summary.
Editors' Summary
Malaria is a life-threatening disease caused by parasites that are transmitted to people through the bites of infected mosquitoes. In 2010, malaria caused an estimated 655,000 deaths worldwide, mostly in Africa, where according to the World Health Organization, one African child dies every minute from the disease. There are four Plasmodium parasite species that cause malaria in humans, with one species, Plasmodium falciparum, causing the most severe disease. However, diagnosing severe falciparum malaria in children living in endemic areas is problematic, as many semi-immune children may have the malaria parasites in their blood (described as being parasitaemic) but do not have clinical disease. Therefore, a positive malaria blood smear may be coincidental and not be diagnostic of severe malaria, and unfortunately, neither are the clinical symptoms of severe malaria, such as shock, acidosis, or coma, which can also be caused by other childhood infections. For these reasons, the misdiagnosis of falciparum malaria in severely ill children is an important problem in sub-Saharan Africa, and may result in unnecessary child deaths.
Why Was This Study Done?
Previous studies have suggested that a parasite protein—P. falciparum histidine-rich protein-2 (PfHRP2)—is a measure of the total number of parasites in the patient. Unlike the circulating parasites detected on a blood film, which do not represent the parasites that get stuck in vital organs, PfHRP2 is distributed equally through the total blood plasma volume, and so can be considered a measure of the total parasite burden in the previous 48 hours. So in this study, the researchers assessed the prognostic value of plasma PfHRP2 in African children with severe malaria and whether this protein could distinguish children who really do have severe malaria from those who have severe febrile illness but coincidental parasitaemia, who may have another infection.
What Did the Researchers Do and Find?
The researchers assessed levels of plasma PfHRP2 in 3,826 out of a possible 5,425 African children who participated in a large multinational trial (in Mozambique, The Gambia, Rwanda, Tanzania, Kenya, Uganda, and the Democratic Republic of Congo) that compared the anti-malarial drugs quinine and artesunate for the treatment of severe malaria. All children had a clinical diagnosis of severe malaria confirmed by a rapid diagnostic test, and the researchers used clinical signs to define the severity of malaria. The researchers assessed the relationship between plasma PfHRP2 concentrations and risk of death taking other well established predictors of death, such as coma, convulsions, hypoglycaemia, respiratory distress, and shock, into account.
The researchers found that PfHRP2 was detectable in 3,800/3,826 (99%) children with severe malaria and that the average plasma PfHRP2 levels was significantly higher in the 381 children who died from malaria than in children who survived (1,611 ng/mL versus 1,046 ng/mL). Plasma PfHRP2 was also significantly higher in children with severe malaria signs and symptoms such as coma, acidosis, and severe anaemia. Importantly, the researchers found that high death rates were associated with either very low or very high values of plasma PfHRP2: the odds (chance) of death were 20% higher per unit increase in PfHRP2 above a specific threshold (174 ng/ml), but below this concentration, the risk of death increased with decreasing levels, probably because at lower levels disease was caused by a severe febrile disease other than malaria, like septicemia. Finally, the researchers found that in children within the highest PfHRP2 tertile, the chance of death when treated with the antimalarial drug artesunate versus quinine was 0.61 but that there was no difference in death rates in the lowest tertile, which supports that patients with very low plasma PfHRP2 have a different severe febrile illness than malaria. The researchers use mathematical modeling to provide cut-off values for plasma PfHRP2 denoting the proportion of patients with a diagnosis other than severe malaria.
What Do These Findings Mean?
These findings suggest that in areas of moderate or high malaria transmission where a high proportion of children are parasitaemic, plasma PfHRP2 levels taken on admission to hospital can differentiate children at highest risk of death from severe falciparum malaria from those likely to have alternative causes of severe febrile illness. Therefore, plasma PfHRP2 could be considered a valuable additional diagnostic tool and prognostic indicator in African children with severe falciparum malaria. This finding is important for clinicians treating children with severe febrile illnesses in malaria-endemic countries: while high levels of plasma PfHRP2 is indicative of severe malaria which needs urgent antimalarial treatment, low levels suggest that another severe infective disease should be considered, warranting additional investigations and urgent treatment with antibiotics.
Additional Information
Please access these Web sites via the online version of this summary at
A previous small study in PLOS ONE explores the relationship between plasma PfHRP2 and severe malaria in Tanzanian children
The WHO website and the website of Malaria No More have comprehensive information about malaria
PMCID: PMC3424256  PMID: 22927801
12.  Polysomnography in Patients With Obstructive Sleep Apnea 
Executive Summary
The objective of this health technology policy assessment was to evaluate the clinical utility and cost-effectiveness of sleep studies in Ontario.
Clinical Need: Target Population and Condition
Sleep disorders are common and obstructive sleep apnea (OSA) is the predominant type. Obstructive sleep apnea is the repetitive complete obstruction (apnea) or partial obstruction (hypopnea) of the collapsible part of the upper airway during sleep. The syndrome is associated with excessive daytime sleepiness or chronic fatigue. Several studies have shown that OSA is associated with hypertension, stroke, and other cardiovascular disorders; many researchers believe that these cardiovascular disorders are consequences of OSA. This has generated increasing interest in recent years in sleep studies.
The Technology Being Reviewed
There is no ‘gold standard’ for the diagnosis of OSA, which makes it difficult to calibrate any test for diagnosis. Traditionally, polysomnography (PSG) in an attended setting (sleep laboratory) has been used as a reference standard for the diagnosis of OSA. Polysomnography measures several sleep variables, one of which is the apnea-hypopnea index (AHI) or respiratory disturbance index (RDI). The AHI is defined as the sum of apneas and hypopneas per hour of sleep; apnea is defined as the absence of airflow for ≥ 10 seconds; and hypopnea is defined as reduction in respiratory effort with ≥ 4% oxygen desaturation. The RDI is defined as the sum of apneas, hypopneas, and abnormal respiratory events per hour of sleep. Often the two terms are used interchangeably. The AHI has been widely used to diagnose OSA, although with different cut-off levels, the basis for which are often unclear or arbitrarily determined. Generally, an AHI of more than five events per hour of sleep is considered abnormal and the patient is considered to have a sleep disorder. An abnormal AHI accompanied by excessive daytime sleepiness is the hallmark for OSA diagnosis. For patients diagnosed with OSA, continuous positive airway pressure (CPAP) therapy is the treatment of choice. Polysomnography may also used for titrating CPAP to individual needs.
In January 2005, the College of Physicians and Surgeons of Ontario published the second edition of Independent Health Facilities: Clinical Practice Parameters and Facility Standards: Sleep Medicine, commonly known as “The Sleep Book.” The Sleep Book states that OSA is the most common primary respiratory sleep disorder and a full overnight sleep study is considered the current standard test for individuals in whom OSA is suspected (based on clinical signs and symptoms), particularly if CPAP or surgical therapy is being considered.
Polysomnography in a sleep laboratory is time-consuming and expensive. With the evolution of technology, portable devices have emerged that measure more or less the same sleep variables in sleep laboratories as in the home. Newer CPAP devices also have auto-titration features and can record sleep variables including AHI. These devices, if equally accurate, may reduce the dependency on sleep laboratories for the diagnosis of OSA and the titration of CPAP, and thus may be more cost-effective.
Difficulties arise, however, when trying to assess and compare the diagnostic efficacy of in-home PSG versus in-lab. The AHI measured from portable devices in-home is the sum of apneas and hypopneas per hour of time in bed, rather than of sleep, and the absolute diagnostic efficacy of in-lab PSG is unknown. To compare in-home PSG with in-lab PSG, several researchers have used correlation coefficients or sensitivity and specificity, while others have used Bland-Altman plots or receiver operating characteristics (ROC) curves. All these approaches, however, have potential pitfalls. Correlation coefficients do not measure agreement; sensitivity and specificity are not helpful when the true disease status is unknown; and Bland-Altman plots measure agreement (but are helpful when the range of clinical equivalence is known). Lastly, receiver operating characteristics curves are generated using logistic regression with the true disease status as the dependent variable and test values as the independent variable. Thus, each value of the test is used as a cut-point to measure sensitivity and specificity, which are then plotted on an x-y plane. The cut-point that maximizes both sensitivity and specificity is chosen as the cut-off level to discriminate between disease and no-disease states. In the absence of a gold standard to determine the true disease status, ROC curves are of minimal value.
At the request of the Ontario Health Technology Advisory Committee (OHTAC), MAS has thus reviewed the literature on PSG published over the last two years to examine new developments.
Review Strategy
There is a large body of literature on sleep studies and several reviews have been conducted. Two large cohort studies, the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study, are the main sources of evidence on sleep literature.
To examine new developments on PSG published in the past two years, MEDLINE, EMBASE, MEDLINE In-Process & Other Non-Indexed Citations, the Cochrane Database of Systematic Reviews and Cochrane CENTRAL, INAHTA, and websites of other health technology assessment agencies were searched. Any study that reported results of in-home or in-lab PSG was included. All articles that reported findings from the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study were also reviewed.
Diffusion of Sleep Laboratories
To estimate the diffusion of sleep laboratories, a list of sleep laboratories licensed under the Independent Health Facility Act was obtained. The annual number of sleep studies per 100,000 individuals in Ontario from 2000 to 2004 was also estimated using administrative databases.
Summary of Findings
Literature Review
A total of 315 articles were identified that were published in the past two years; 227 were excluded after reviewing titles and abstracts. A total of 59 articles were identified that reported findings of the Sleep Heart Health Study and the Wisconsin Sleep Cohort Study.
Based on cross-sectional data from the Wisconsin Sleep Cohort Study of 602 men and women aged 30 to 60 years, it is estimated that the prevalence of sleep-disordered breathing is 9% in women and 24% in men, on the basis of more than five AHI events per hour of sleep. Among the women with sleep disorder breathing, 22.6% had daytime sleepiness and among the men, 15.5% had daytime sleepiness. Based on this, the prevalence of OSA in the middle-aged adult population is estimated to be 2% in women and 4% in men.
Snoring is present in 94% of OSA patients, but not all snorers have OSA. Women report daytime sleepiness less often compared with their male counterparts (of similar age, body mass index [BMI], and AHI). Prevalence of OSA tends to be higher in older age groups compared with younger age groups.
Diagnostic Value of Polysomnography
It is believed that PSG in the sleep laboratory is more accurate than in-home PSG. In the absence of a gold standard, however, claims of accuracy cannot be substantiated. In general, there is poor correlation between PSG variables and clinical variables. A variety of cut-off points of AHI (> 5, > 10, and > 15) are arbitrarily used to diagnose and categorize severity of OSA, though the clinical importance of these cut-off points has not been determined.
Recently, a study of the use of a therapeutic trial of CPAP to diagnose OSA was reported. The authors studied habitual snorers with daytime sleepiness in the absence of other medical or psychiatric disorders. Using PSG as the reference standard, the authors calculated the sensitivity of this test to be 80% and its specificity to be 97%. Further, they concluded that PSG could be avoided in 46% of this population.
Obstructive Sleep Apnea and Obesity
Obstructive sleep apnea is strongly associated with obesity. Obese individuals (BMI >30 kg/m2) are at higher risk for OSA compared with non-obese individuals and up to 75% of OSA patients are obese. It is hypothesized that obese individuals have large deposits of fat in the neck that cause the upper airway to collapse in the supine position during sleep. The observations reported from several studies support the hypothesis that AHIs (or RDIs) are significantly reduced with weight loss in obese individuals.
Obstructive Sleep Apnea and Cardiovascular Diseases
Associations have been shown between OSA and comorbidities such as diabetes mellitus and hypertension, which are known risk factors for myocardial infarction and stroke. Patients with more severe forms of OSA (based on AHI) report poorer quality of life and increased health care utilization compared with patients with milder forms of OSA. From animal models, it is hypothesized that sleep fragmentation results in glucose intolerance and hypertension. There is, however, no evidence from prospective studies in humans to establish a causal link between OSA and hypertension or diabetes mellitus. It is also not clear that the associations between OSA and other diseases are independent of obesity; in most of these studies, patients with higher values of AHI had higher values of BMI compared with patients with lower AHI values.
A recent meta-analysis of bariatric surgery has shown that weight loss in obese individuals (mean BMI = 46.8 kg/m2; range = 32.30–68.80) significantly improved their health profile. Diabetes was resolved in 76.8% of patients, hypertension was resolved in 61.7% of patients, hyperlipidemia improved in 70% of patients, and OSA resolved in 85.7% of patients. This suggests that obesity leads to OSA, diabetes, and hypertension, rather than OSA independently causing diabetes and hypertension.
Health Technology Assessments, Guidelines, and Recommendations
In April 2005, the Centers for Medicare and Medicaid Services (CMS) in the United States published its decision and review regarding in-home and in-lab sleep studies for the diagnosis and treatment of OSA with CPAP. In order to cover CPAP, CMS requires that a diagnosis of OSA be established using PSG in a sleep laboratory. After reviewing the literature, CMS concluded that the evidence was not adequate to determine that unattended portable sleep study was reasonable and necessary in the diagnosis of OSA.
In May 2005, the Canadian Coordinating Office of Health Technology Assessment (CCOHTA) published a review of guidelines for referral of patients to sleep laboratories. The review included 37 guidelines and associated reviews that covered 18 applications of sleep laboratory studies. The CCOHTA reported that the level of evidence for many applications was of limited quality, that some cited studies were not relevant to the recommendations made, that many recommendations reflect consensus positions only, and that there was a need for more good quality studies of many sleep laboratory applications.
As of the time of writing, there are 97 licensed sleep laboratories in Ontario. In 2000, the number of sleep studies performed in Ontario was 376/100,000 people. There was a steady rise in sleep studies in the following years such that in 2004, 769 sleep studies per 100,000 people were performed, for a total of 96,134 sleep studies. Based on prevalence estimates of the Wisconsin Sleep Cohort Study, it was estimated that 927,105 people aged 30 to 60 years have sleep-disordered breathing. Thus, there may be a 10-fold rise in the rate of sleep tests in the next few years.
Economic Analysis
In 2004, approximately 96,000 sleep studies were conducted in Ontario at a total cost of ~$47 million (Cdn). Since obesity is associated with sleep disordered breathing, MAS compared the costs of sleep studies to the cost of bariatric surgery. The cost of bariatric surgery is $17,350 per patient. In 2004, Ontario spent $4.7 million per year for 270 patients to undergo bariatric surgery in the province, and $8.2 million for 225 patients to seek out-of-country treatment. Using a Markov model, it was concluded that shifting costs from sleep studies to bariatric surgery would benefit more patients with OSA and may also prevent health consequences related to diabetes, hypertension, and hyperlipidemia. It is estimated that the annual cost of treating comorbid conditions in morbidly obese patients often exceeds $10,000 per patient. Thus, the downstream cost savings could be substantial.
Considerations for Policy Development
Weight loss is associated with a decrease in OSA severity. Treating and preventing obesity would also substantially reduce the economic burden associated with diabetes, hypertension, hyperlipidemia, and OSA. Promotion of healthy weights may be achieved by a multisectorial approach as recommended by the Chief Medical Officer of Health for Ontario. Bariatric surgery has the potential to help morbidly obese individuals (BMI > 35 kg/m2 with an accompanying comorbid condition, or BMI > 40 kg/m2) lose weight. In January 2005, MAS completed an assessment of bariatric surgery, based on which OHTAC recommended an improvement in access to these surgeries for morbidly obese patients in Ontario.
Habitual snorers with excessive daytime sleepiness have a high pretest probability of having OSA. These patients could be offered a therapeutic trial of CPAP to diagnose OSA, rather than a PSG. A majority of these patients are also obese and may benefit from weight loss. Individualized weight loss programs should, therefore, be offered and patients who are morbidly obese should be offered bariatric surgery.
That said, and in view of the still evolving understanding of the causes, consequences and optimal treatment of OSA, further research is warranted to identify which patients should be screened for OSA.
PMCID: PMC3379160  PMID: 23074483
13.  A systematic approach to statistical analysis in dosimetry and patient-specific IMRT plan verification measurements 
In the presence of random uncertainties, delivered radiation treatment doses in patient likely exhibit a statistical distribution. The expected dose and variance of this distribution are unknown and are most likely not equal to the planned value since the current treatment planning systems cannot exactly model and simulate treatment machine. Relevant clinical questions are 1) how to quantitatively estimate the expected delivered dose and extrapolate the expected dose to the treatment dose over a treatment course and 2) how to evaluate the treatment dose relative to the corresponding planned dose. This study is to present a systematic approach to address these questions and to apply this approach to patient-specific IMRT (PSIMRT) plan verifications.
The expected delivered dose in patient and variance are quantitatively estimated using Student T distribution and Chi Distribution, respectively, based on pre-treatment QA measurements. Relationships between the expected dose and the delivered dose over a treatment course and between the expected dose and the planned dose are quantified with mathematical formalisms. The requirement and evaluation of the pre-treatment QA measurement results are also quantitatively related to the desired treatment accuracy and to the to-be-delivered treatment course itself. The developed methodology was applied to PSIMRT plan verification procedures for both QA result evaluation and treatment quality estimation.
Statistically, the pre-treatment QA measurement process was dictated not only by the corresponding plan but also by the delivered dose deviation, number of measurements, treatment fractionation, potential uncertainties during patient treatment, and desired treatment accuracy tolerance. For the PSIMRT QA procedures, in theory, more than one measurement had to be performed to evaluate whether the to-be-delivered treatment course would meet the desired dose coverage and treatment tolerance.
By acknowledging and considering the statistical nature of multi-fractional delivery of radiation treatment, we have established a quantitative methodology to evaluate the PSIMRT QA results. Both the statistical parameters associated with the QA measurement procedure and treatment course need to be taken into account to evaluate the QA outcome and to determine whether the plan is acceptable and whether additional measures should be taken to reduce treatment uncertainties. The result from a single QA measurement without the appropriate statistical analysis can be misleading. When the required number of measurements is comparable to the planned number of fractions and the variance is unacceptably high, action must be taken to either modify the plan or adjust the beam delivery system.
PMCID: PMC3852372  PMID: 24074185
Dose measurement; IMRT QA; Uncertainty; Statistical analysis
14.  A Method for the Evaluation of Thousands of Automated 3D Stem Cell Segmentations 
Journal of microscopy  2015;260(3):363-376.
There is no segmentation method that performs perfectly with any data set in comparison to human segmentation. Evaluation procedures for segmentation algorithms become critical for their selection. The problems associated with segmentation performance evaluations and visual verification of segmentation results are exaggerated when dealing with thousands of 3D image volumes because of the amount of computation and manual inputs needed.
We address the problem of evaluating 3D segmentation performance when segmentation is applied to thousands of confocal microscopy images (z-stacks). Our approach is to incorporate experimental imaging and geometrical criteria, and map them into computationally efficient segmentation algorithms that can be applied to a very large number of z-stacks. This is an alternative approach to considering existing segmentation methods and evaluating most state-of-the-art algorithms. We designed a methodology for 3D segmentation performance characterization that consists of design, evaluation and verification steps. The characterization integrates manual inputs from projected surrogate “ground truth” of statistically representative samples and from visual inspection into the evaluation. The novelty of the methodology lies in (1) designing candidate segmentation algorithms by mapping imaging and geometrical criteria into algorithmic steps, and constructing plausible segmentation algorithms with respect to the order of algorithmic steps and their parameters, (2) evaluating segmentation accuracy using samples drawn from probability distribution estimates of candidate segmentations, and (3) minimizing human labor needed to create surrogate “truth” by approximating z-stack segmentations with 2D contours from three orthogonal z-stack projections and by developing visual verification tools.
We demonstrate the methodology by applying it to a dataset of 1253 mesenchymal stem cells. The cells reside on 10 different types of biomaterial scaffolds, and are stained for actin and nucleus yielding 128 460 image frames (on average 125 cells/scaffold × 10 scaffold types × 2 stains × 51 frames/cell). After constructing and evaluating six candidates of 3D segmentation algorithms, the most accurate 3D segmentation algorithm achieved an average precision of 0.82 and an accuracy of 0.84 as measured by the Dice similarity index where values greater than 0.7 indicate a good spatial overlap. A probability of segmentation success was 0.85 based on visual verification, and a computation time was 42.3 h to process all z-stacks. While the most accurate segmentation technique was 4.2 times slower than the second most accurate algorithm, it consumed on average 9.65 times less memory per z-stack segmentation.
PMCID: PMC4888372  PMID: 26268699
3D segmentation; segmentation evaluation; sampling; visual verification; confocal imaging; stem cells
15.  Neuroimaging for the Evaluation of Chronic Headaches 
Executive Summary
The objectives of this evidence based review are:
i) To determine the effectiveness of computed tomography (CT) and magnetic resonance imaging (MRI) scans in the evaluation of persons with a chronic headache and a normal neurological examination.
ii) To determine the comparative effectiveness of CT and MRI scans for detecting significant intracranial abnormalities in persons with chronic headache and a normal neurological exam.
iii) To determine the budget impact of CT and MRI scans for persons with a chronic headache and a normal neurological exam.
Clinical Need: Condition and Target Population
Headaches disorders are generally classified as either primary or secondary with further sub-classifications into specific headache types. Primary headaches are those not caused by a disease or medical condition and include i) tension-type headache, ii) migraine, iii) cluster headache and, iv) other primary headaches, such as hemicrania continua and new daily persistent headache. Secondary headaches include those headaches caused by an underlying medical condition. While primary headaches disorders are far more frequent than secondary headache disorders, there is an urge to carry out neuroimaging studies (CT and/or MRI scans) out of fear of missing uncommon secondary causes and often to relieve patient anxiety.
Tension type headaches are the most common primary headache disorder and migraines are the most common severe primary headache disorder. Cluster headaches are a type of trigeminal autonomic cephalalgia and are less common than migraines and tension type headaches. Chronic headaches are defined as headaches present for at least 3 months and lasting greater than or equal to 15 days per month. The International Classification of Headache Disorders states that for most secondary headaches the characteristics of the headache are poorly described in the literature and for those headache disorders where it is well described there are few diagnostically important features.
The global prevalence of headache in general in the adult population is estimated at 46%, for tension-type headache it is 42% and 11% for migraine headache. The estimated prevalence of cluster headaches is 0.1% or 1 in 1000 persons. The prevalence of chronic daily headache is estimated at 3%.
Computed Tomography
Computed tomography (CT) is a medical imaging technique used to aid diagnosis and to guide interventional and therapeutic procedures. It allows rapid acquisition of high-resolution three-dimensional images, providing radiologists and other physicians with cross-sectional views of a person’s anatomy. CT scanning poses risk of radiation exposure. The radiation exposure from a conventional CT scanner may emit effective doses of 2-4mSv for a typical head CT.
Magnetic Resonance Imaging
Magnetic resonance imaging (MRI) is a medical imaging technique used to aid diagnosis but unlike CT it does not use ionizing radiation. Instead, it uses a strong magnetic field to image a person’s anatomy. Compared to CT, MRI can provide increased contrast between the soft tissues of the body. Because of the persistent magnetic field, extra care is required in the magnetic resonance environment to ensure that injury or harm does not come to any personnel while in the environment.
Research Questions
What is the effectiveness of CT and MRI scanning in the evaluation of persons with a chronic headache and a normal neurological examination?
What is the comparative effectiveness of CT and MRI scanning for detecting significant intracranial abnormality in persons with chronic headache and a normal neurological exam?
What is the budget impact of CT and MRI scans for persons with a chronic headache and a normal neurological exam.
Research Methods
Literature Search
Search Strategy
A literature search was performed on February 18, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January, 2005 to February, 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with an unknown eligibility were reviewed with a second clinical epidemiologist and then a group of epidemiologists until consensus was established.
Inclusion Criteria
Systematic reviews, randomized controlled trials, observational studies
Outpatient adult population with chronic headache and normal neurological exam
Studies reporting likelihood ratio of clinical variables for a significant intracranial abnormality
English language studies
Exclusion Criteria
Studies which report outcomes for persons with seizures, focal symptoms, recent/new onset headache, change in presentation, thunderclap headache, and headache due to trauma
Persons with abnormal neurological examination
Case reports
Outcomes of Interest
Primary Outcome
Probability for intracranial abnormality
Secondary Outcome
Patient relief from anxiety
System service use
System costs
Detection rates for significant abnormalities in MRI and CT scans
Summary of Findings
One systematic review, 1 small RCT, and 1 observational study met the inclusion and exclusion criteria. The systematic review completed by Detsky, et al. reported the likelihood ratios of specific clinical variables to predict significant intracranial abnormalities. The RCT completed by Howard et al., evaluated whether neuroimaging persons with chronic headache increased or reduced patient anxiety. The prospective observational study by Sempere et al., provided evidence for the pre-test probability of intracranial abnormalities in persons with chronic headache as well as minimal data on the comparative effectiveness of CT and MRI to detect intracranial abnormalities.
Outcome 1: Pre-test Probability.
The pre-test probability is usually related to the prevalence of the disease and can be adjusted depending on the characteristics of the population. The study by Sempere et al. determined the pre-test probability (prevalence) of significant intracranial abnormalities in persons with chronic headaches defined as headache experienced for at least a 4 week duration with a normal neurological exam. There is a pre-test probability of 0.9% (95% CI 0.5, 1.4) in persons with chronic headache and normal neurological exam. The highest pre-test probability of 5 found in persons with cluster headaches. The second highest, that of 3.7, was reported in persons with indeterminate type headache. There was a 0.75% rate of incidental findings.
Likelihood ratios for detecting a significant abnormality
Clinical findings from the history and physical may be used as screening test to predict abnormalities on neuroimaging. The extent to which the clinical variable may be a good predictive variable can be captured by reporting its likelihood ratio. The likelihood ratio provides an estimate of how much a test result will change the odds of having a disease or condition. The positive likelihood ratio (LR+) tells you how much the odds of having the disease increases when a test is positive. The negative likelihood ratio (LR-) tells you how much the odds of having the disease decreases when the test is negative.
Detsky et al., determined the likelihood ratio for specific clinical variable from 11 studies. There were 4 clinical variables with both statistically significant positive and negative likelihood ratios. These included: abnormal neurological exam (LR+ 5.3, LR- 0.72), undefined headache (LR+ 3.8, LR- 0.66), headache aggravated by exertion or valsalva (LR+ 2.3, LR- 0.70), and headache with vomiting (LR+ 1.8, and LR- 0.47). There were two clinical variables with a statistically significant positive likelihood ratio and non significant negative likelihood ratio. These included: cluster-type headache (LR+ 11, LR- 0.95), and headache with aura (LR+ 12.9, LR- 0.52). Finally, there were 8 clinical variables with both statistically non significant positive and negative likelihood ratios. These included: headache with focal symptoms, new onset headache, quick onset headache, worsening headache, male gender, headache with nausea, increased headache severity, and migraine type headache.
Outcome 2: Relief from Anxiety
Howard et al. completed an RCT of 150 persons to determine if neuroimaging for headaches was anxiolytic or anxiogenic. Persons were randomized to receiving either an MRI scan or no scan for investigation of their headache. The study population was stratified into those persons with a Hospital Anxiety and Depression scale (HADS) > 11 (the high anxiety and depression group) and those < 11 (the low anxiety and depression) so that there were 4 groups:
Group 1: High anxiety and depression, no scan group
Group 2: High anxiety and depression, scan group
Group 3: Low anxiety and depression, no scan group
Group 4: Low anxiety and depression, scan group
There was no evidence for any overall reduction in anxiety at 1 year as measured by a visual analogue scale of ‘level of worry’ when analysed by whether the person received a scan or not. Similarly, there was no interaction between anxiety and depression status and whether a scan was offered or not on patient anxiety. Anxiety did not decrease at 1 year to any statistically significant degree in the high anxiety and depression group (HADS positive) compared with the low anxiety and depression group (HADS negative).
There are serious methodological limitations in this study design which may have contributed to these negative results. First, when considering the comparison of ‘scan’ vs. ‘no scan’ groups, 12 people (16%) in the ‘no scan group’ actually received a scan within the follow up year. If indeed scanning does reduce anxiety then this contamination of the ‘no scan’ group may have reduced the effect between the groups results resulting in a non significant difference in anxiety scores between the ‘scanned’ and the ‘no scan’ group. Second, there was an inadequate sample size at 1 year follow up in each of the 4 groups which may have contributed to a Type II statistical error (missing a difference when one may exist) when comparing scan vs. no scan by anxiety and depression status. Therefore, based on the results and study limitations it is inconclusive as to whether scanning reduces anxiety.
Outcome 3: System Services
Howard et al., considered services used and system costs a secondary outcome. These were determined by examining primary care case notes at 1 year for consultation rates, symptoms, further investigations, and contact with secondary and tertiary care.
System Services
The authors report that the use of neurologist and psychiatrist services was significantly higher for those persons not offered as scan, regardless of their anxiety and depression status (P<0.001 for neurologist, and P=0.033 for psychiatrist)
Outcome 4: System Costs
System Costs
There was evidence of statistically significantly lower system costs if persons with high levels of anxiety and depression (Hospital Anxiety and Depression Scale score >11) were provided with a scan (P=0.03 including inpatient costs, and 0.047 excluding inpatient costs).
Comparative Effectiveness of CT and MRI Scans
One study reported the detection rate for significant intracranial abnormalities using CT and MRI. In a cohort of 1876 persons with a non acute headache defined as any type of headache that had begun at least 4 weeks before enrolment Sempere et al. reported that the detection rate was 19/1432 (1.3%) using CT and 4/444 (0.9%) using MRI. Of 119 normal CT scans 2 (1.7%) had significant intracranial abnormality on MRI. The 2 cases were a small meningioma, and an acoustic neurinoma.
The evidence presented can be summarized as follows:
Pre-test Probability
Based on the results by Sempere et al., there is a low pre-test probability for intracranial abnormalities in persons with chronic headaches and a normal neurological exam (defined as headaches experiences for a minimum of 4 weeks). The Grade quality of evidence supporting this outcome is very low.
Likelihood Ratios
Based on the systematic review by Detsky et al., there is a statistically significant positive and negative likelihood ratio for the following clinical variables: abnormal neurological exam, undefined headache, headache aggravated by exertion or valsalva, headache with vomiting. Grade quality of evidence supporting this outcome is very low.
Based on the systematic review by Detsky et al. there is a statistically significant positive likelihood ratio but non statistically significant negative likelihood ratio for the following clinical variables: cluster headache and headache with aura. The Grade quality of evidence supporting this outcome is very low.
Based on the systematic review by Detsky et al., there is a non significant positive and negative likelihood ratio for the following clinical variables: headache with focal symptoms, new onset headache, quick onset headache, worsening headache, male gender, headache with nausea, increased headache severity, migraine type headache. The Grade quality of evidence supporting this outcome is very low.
Relief from Anxiety
Based on the RCT by Howard et al., it is inconclusive whether neuroimaging scans in persons with a chronic headache are anxiolytic. The Grade quality of evidence supporting this outcome is low.
System Services
Based on the RCT by Howard et al. scanning persons with chronic headache regardless of their anxiety and/or depression level reduces service use. The Grade quality of evidence is low.
System Costs
Based on the RCT by Howard et al., scanning persons with a score greater than 11 on the High Anxiety and Depression Scale reduces system costs. The Grade quality of evidence is moderate.
Comparative Effectiveness of CT and MRI Scans
There is sparse evidence to determine the relative effectiveness of CT compared with MRI scanning for the detection of intracranial abnormalities. The Grade quality of evidence supporting this is very low.
Economic Analysis
Ontario Perspective
Volumes for neuroimaging of the head i.e. CT and MRI scans, from the Ontario Health Insurance Plan (OHIP) data set were used to investigate trends in the province for Fiscal Years (FY) 2004-2009.
Assumptions were made in order to investigate neuroimaging of the head for the indication of headache. From the literature, 27% of all CT and 13% of all MRI scans for the head were assumed to include an indication of headache. From that same retrospective chart review and personal communication with the author 16% of CT scans and 4% of MRI scans for the head were for the sole indication of headache. From the Ministry of Health and Long-Term Care (MOHLTC) wait times data, 73% of all CT and 93% of all MRI scans in the province, irrespective of indication were outpatient procedures.
The expenditure for each FY reflects the volume for that year and since volumes have increased in the past 6 FYs, the expenditure has also increased with a pay-out reaching 3.0M and 2.8M for CT and MRI services of the head respectively for the indication of headache and a pay-out reaching 1.8M and 0.9M for CT and MRI services of the head respectively for the indication of headache only in FY 08/09.
Cost per Abnormal Finding
The yield of abnormal finding for a CT and MRI scan of the head for the indication of headache only is 2% and 5% respectively. Based on these yield a high-level estimate of the cost per abnormal finding with neuroimaging of the head for headache only can be calculated for each FY. In FY 08/09 there were 37,434 CT and 16,197 MRI scans of the head for headache only. These volumes would generate a yield of abnormal finding of 749 and 910 with a CT scan and MRI scan respectively. The expenditure for FY 08/09 was 1.8M and 0.9M for CT and MRI services respectively. Therefore the cost per abnormal finding would be $2,409 for CT and $957 for MRI. These cost per abnormal finding estimates were limited because they did not factor in comparators or the consequences associated with an abnormal reading or FNs. The estimates only consider the cost of the neuroimaging procedure and the yield of abnormal finding with the respective procedure.
PMCID: PMC3377587  PMID: 23074404
16.  Disease Models for Event Prediction 
The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event.
One of the primary goals of this research was to characterize the viability of biosurveillance models to provide operationally relevant information to decision makers, in order to identify areas for future research. Two critical characteristics differentiate this work from other infectious disease modeling reviews [1,2]. First, we reviewed models that attempted to predict the disease event, not merely its transmission dynamics. Second, we considered models involving pathogens of concern as determined by the US National Select Agent Registry.
Background: A rich and diverse field of infectious disease modeling has emerged over the past 60 years and has advanced our understanding of population- and individual-level disease transmission dynamics, including risk factors, virulence and spatio-temporal patterns of disease spread. Recent modeling advances include biostatistical methods, and massive agent-based population, biophysical, ordinary differential equation, and ecological-niche models. Diverse data sources are being integrated into these models as well, such as demographics, remotely-sensed measurements and imaging, environmental measurements, and surrogate data such as news alerts and social media. Yet, there remains a gap in the sensitivity and specificity of these models not only in tracking infectious disease events but also predicting their occurrence.
We searched dozens of commercial and government databases and harvested Google search results for eligible models utilizing terms and phrases provided by public health analysts relating to biosurveillance, remote sensing, risk assessments, spatial epidemiology, and ecological niche-modeling, This returned 13,767 webpages and 12,152 citations. After de-duplication and removal of extraneous material, a core collection of 6,503 items was established, these publications and their abstracts are presented in a semantic wiki at Next, PNNL’s IN-SPIRE visual analytics software was used to cross-correlate these publications with the definition for a biosurveillance model. As a result, we systematically reviewed 44 papers, and the results are presented in this analysis.
The models were classified as one or more of the following types: event forecast (9%), spatial (59%), ecological niche (64%), diagnostic or clinical (14%), spread or response (20%), and reviews (7%). The distribution of transmission modes in the models was: direct contact (55%), vector-borne (34%), water- or soil-borne (16%), and non-specific (7%). The parameters (e.g., etiology, cultural) and data sources (e.g., remote sensing, NGO, epidemiological) for each model were recorded. A highlight of this review is the analysis of verification and validation procedures employed by (and reported for) each model, if any. All models were classified as either a) Verified or Validated (89%), or b) Not Verified or Validated (11%; which for the purposes of this review was considered a standalone category).
The verification and validation (V&V) of these models is discussed in detail. The vast majority of models studied were verified or validated in some form or another, which was a surprising observation made from this portion of the study. We subsequently focused on those models which were not verified or validated in an attempt to identify why this information was missing. One reason may be that the V&V was simply not reported upon within the paper reviewed for those models. A positive observation was the significant use of real epidemiological data to validate the models. Even though ‘Validation using Spatially and Temporally Independent Data’ was one of the smallest classification groups, validation through the use of actual data versus predicted data represented approximately 33% of these models. We close with initial recommended operational readiness level guidelines, based on established Technology Readiness Level definitions.
PMCID: PMC3692832
Disease models; Event prediction; Operational readiness
17.  Personal Exposure to Mixtures of Volatile Organic Compounds: Modeling and Further Analysis of the RIOPA Data 
Emission sources of volatile organic compounds (VOCs) are numerous and widespread in both indoor and outdoor environments. Concentrations of VOCs indoors typically exceed outdoor levels, and most people spend nearly 90% of their time indoors. Thus, indoor sources generally contribute the majority of VOC exposures for most people. VOC exposure has been associated with a wide range of acute and chronic health effects; for example, asthma, respiratory diseases, liver and kidney dysfunction, neurologic impairment, and cancer. Although exposures to most VOCs for most persons fall below health-based guidelines, and long-term trends show decreases in ambient emissions and concentrations, a subset of individuals experience much higher exposures that exceed guidelines. Thus, exposure to VOCs remains an important environmental health concern.
The present understanding of VOC exposures is incomplete. With the exception of a few compounds, concentration and especially exposure data are limited; and like other environmental data, VOC exposure data can show multiple modes, low and high extreme values, and sometimes a large portion of data below method detection limits (MDLs). Field data also show considerable spatial or interpersonal variability, and although evidence is limited, temporal variability seems high. These characteristics can complicate modeling and other analyses aimed at risk assessment, policy actions, and exposure management. In addition to these analytic and statistical issues, exposure typically occurs as a mixture, and mixture components may interact or jointly contribute to adverse effects. However most pollutant regulations, guidelines, and studies remain focused on single compounds, and thus may underestimate cumulative exposures and risks arising from coexposures. In addition, the composition of VOC mixtures has not been thoroughly investigated, and mixture components show varying and complex dependencies. Finally, although many factors are known to affect VOC exposures, many personal, environmental, and socioeconomic determinants remain to be identified, and the significance and applicability of the determinants reported in the literature are uncertain.
To help answer these unresolved questions and overcome limitations of previous analyses, this project used several novel and powerful statistical modeling and analysis techniques and two large data sets. The overall objectives of this project were (1) to identify and characterize exposure distributions (including extreme values), (2) evaluate mixtures (including dependencies), and (3) identify determinants of VOC exposure.
VOC data were drawn from two large data sets: the Relationships of Indoor, Outdoor, and Personal Air (RIOPA) study (1999–2001) and the National Health and Nutrition Examination Survey (NHANES; 1999–2000). The RIOPA study used a convenience sample to collect outdoor, indoor, and personal exposure measurements in three cities (Elizabeth, NJ; Houston, TX; Los Angeles, CA). In each city, approximately 100 households with adults and children who did not smoke were sampled twice for 18 VOCs. In addition, information about 500 variables associated with exposure was collected. The NHANES used a nationally representative sample and included personal VOC measurements for 851 participants. NHANES sampled 10 VOCs in common with RIOPA. Both studies used similar sampling methods and study periods.
Specific Aim 1
To estimate and model extreme value exposures, extreme value distribution models were fitted to the top 10% and 5% of VOC exposures. Health risks were estimated for individual VOCs and for three VOC mixtures. Simulated extreme value data sets, generated for each VOC and for fitted extreme value and lognormal distributions, were compared with measured concentrations (RIOPA observations) to evaluate each model’s goodness of fit.
Mixture distributions were fitted with the conventional finite mixture of normal distributions and the semi-parametric Dirichlet process mixture (DPM) of normal distributions for three individual VOCs (chloroform, 1,4-DCB, and styrene). Goodness of fit for these full distribution models was also evaluated using simulated data.
Specific Aim 2
Mixtures in the RIOPA VOC data set were identified using positive matrix factorization (PMF) and by toxicologic mode of action. Dependency structures of a mixture’s components were examined using mixture fractions and were modeled using copulas, which address correlations of multiple components across their entire distributions. Five candidate copulas (Gaussian, t, Gumbel, Clayton, and Frank) were evaluated, and the performance of fitted models was evaluated using simulation and mixture fractions. Cumulative cancer risks were calculated for mixtures, and results from copulas and multivariate lognormal models were compared with risks based on RIOPA observations.
Specific Aim 3
Exposure determinants were identified using stepwise regressions and linear mixed-effects models (LMMs).
Specific Aim 1
Extreme value exposures in RIOPA typically were best fitted by three-parameter generalized extreme value (GEV) distributions, and sometimes by the two-parameter Gumbel distribution. In contrast, lognormal distributions significantly underestimated both the level and likelihood of extreme values. Among the VOCs measured in RIOPA, 1,4-dichlorobenzene (1,4-DCB) was associated with the greatest cancer risks; for example, for the highest 10% of measurements of 1,4-DCB, all individuals had risk levels above 10−4, and 13% of all participants had risk levels above 10−2.
Of the full-distribution models, the finite mixture of normal distributions with two to four clusters and the DPM of normal distributions had superior performance in comparison with the lognormal models. DPM distributions provided slightly better fit than the finite mixture distributions; the advantages of the DPM model were avoiding certain convergence issues associated with the finite mixture distributions, adaptively selecting the number of needed clusters, and providing uncertainty estimates. Although the results apply to the RIOPA data set, GEV distributions and mixture models appear more broadly applicable. These models can be used to simulate VOC distributions, which are neither normally nor lognormally distributed, and they accurately represent the highest exposures, which may have the greatest health significance.
Specific Aim 2
Four VOC mixtures were identified and apportioned by PMF; they represented gasoline vapor, vehicle exhaust, chlorinated solvents and disinfection byproducts, and cleaning products and odorants. The last mixture (cleaning products and odorants) accounted for the largest fraction of an individual’s total exposure (average of 42% across RIOPA participants). Often, a single compound dominated a mixture but the mixture fractions were heterogeneous; that is, the fractions of the compounds changed with the concentration of the mixture.
Three VOC mixtures were identified by toxicologic mode of action and represented VOCs associated with hematopoietic, liver, and renal tumors. Estimated lifetime cumulative cancer risks exceeded 10−3 for about 10% of RIOPA participants. The dependency structures of the VOC mixtures in the RIOPA data set fitted Gumbel (two mixtures) and t copulas (four mixtures). These copula types emphasize dependencies found in the upper and lower tails of a distribution. The copulas reproduced both risk predictions and exposure fractions with a high degree of accuracy and performed better than multivariate lognormal distributions.
Specific Aim 3
In an analysis focused on the home environment and the outdoor (close to home) environment, home VOC concentrations dominated personal exposures (66% to 78% of the total exposure, depending on VOC); this was largely the result of the amount of time participants spent at home and the fact that indoor concentrations were much higher than outdoor concentrations for most VOCs.
In a different analysis focused on the sources inside the home and outside (but close to the home), it was assumed that 100% of VOCs from outside sources would penetrate the home. Outdoor VOC sources accounted for 5% (d-limonene) to 81% (carbon tetrachloride [CTC]) of the total exposure. Personal exposure and indoor measurements had similar determinants depending on the VOC. Gasoline-related VOCs (e.g., benzene and methyl tert-butyl ether [MTBE]) were associated with city, residences with attached garages, pumping gas, wind speed, and home air exchange rate (AER). Odorant and cleaning-related VOCs (e.g., 1,4-DCB and chloroform) also were associated with city, and a residence’s AER, size, and family members showering. Dry-cleaning and industry-related VOCs (e.g., tetrachloroethylene [or perchloroethylene, PERC] and trichloroethylene [TCE]) were associated with city, type of water supply to the home, and visits to the dry cleaner. These and other relationships were significant, they explained from 10% to 40% of the variance in the measurements, and are consistent with known emission sources and those reported in the literature. Outdoor concentrations of VOCs had only two determinants in common: city and wind speed. Overall, personal exposure was dominated by the home setting, although a large fraction of indoor VOC concentrations were due to outdoor sources.
City of residence, personal activities, household characteristics, and meteorology were significant determinants.
Concentrations in RIOPA were considerably lower than levels in the nationally representative NHANES for all VOCs except MTBE and 1,4-DCB. Differences between RIOPA and NHANES results can be explained by contrasts between the sampling designs and staging in the two studies, and by differences in the demographics, smoking, employment, occupations, and home locations. A portion of these differences are due to the nature of the convenience (RIOPA) and representative (NHANES) sampling strategies used in the two studies.
Accurate models for exposure data, which can feature extreme values, multiple modes, data below the MDL, heterogeneous interpollutant dependency structures, and other complex characteristics, are needed to estimate exposures and risks and to develop control and management guidelines and policies. Conventional and novel statistical methods were applied to data drawn from two large studies to understand the nature and significance of VOC exposures. Both extreme value distributions and mixture models were found to provide excellent fit to single VOC compounds (univariate distributions), and copulas may be the method of choice for VOC mixtures (multivariate distributions), especially for the highest exposures, which fit parametric models poorly and which may represent the greatest health risk. The identification of exposure determinants, including the influence of both certain activities (e.g., pumping gas) and environments (e.g., residences), provides information that can be used to manage and reduce exposures. The results obtained using the RIOPA data set add to our understanding of VOC exposures and further investigations using a more representative population and a wider suite of VOCs are suggested to extend and generalize results.
PMCID: PMC4577247  PMID: 25145040
18.  Estimates of sensitivity and specificity can be biased when reporting the results of the second test in a screening trial conducted in series 
Cancer screening reduces cancer mortality when early detection allows successful treatment of otherwise fatal disease. There are a variety of trial designs used to find the best screening test. In a series screening trial design, the decision to conduct the second test is based on the results of the first test. Thus, the estimates of diagnostic accuracy for the second test are conditional, and may differ from unconditional estimates. The problem is further complicated when some cases are misclassified as non-cases due to incomplete disease status ascertainment.
For a series design, we assume that the second screening test is conducted only if the first test had negative results. We derive formulae for the conditional sensitivity and specificity of the second test in the presence of differential verification bias. For comparison, we also derive formulae for the sensitivity and specificity for a single test design, both with and without differential verification bias.
Both the series design and differential verification bias have strong effects on estimates of sensitivity and specificity. In both the single test and series designs, differential verification bias inflates estimates of sensitivity and specificity. In general, for the series design, the inflation is smaller than that observed for a single test design.
The degree of bias depends on disease prevalence, the proportion of misclassified cases, and on the correlation between the test results for cases. As disease prevalence increases, the observed conditional sensitivity is unaffected. However, there is an increasing upward bias in observed conditional specificity. As the proportion of correctly classified cases increases, the upward bias in observed conditional sensitivity and specificity decreases. As the agreement between the two screening tests becomes stronger, the upward bias in observed conditional sensitivity decreases, while the specificity bias increases.
In a series design, estimates of sensitivity and specificity for the second test are conditional estimates. These estimates must always be described in context of the design of the trial, and the study population, to prevent misleading comparisons. In addition, these estimates may be biased by incomplete disease status ascertainment.
PMCID: PMC2819240  PMID: 20064254
Neuro-Oncology  2014;16(Suppl 2):ii42.
Biomarkers, genetic alterations and epigenetic marks gain in importance as prognostic and predictive markers both, after primary diagnosis and in the recurrent disease situation. However, tissue samples are needed throughout the longitudinal course of the disease in order to adjust therapy to changing conditions. A remaining question is whether an open tumor resection is superior for pathological and molecular analysis compared to a small stereotactic biopsy. We compared the diagnostic potential of stereotactic biopsies to tissue samples from gross total resections. 16 stereotactic 1-mm tissue samples (2 diffuse astrocytomas, 2 anaplastic astrocytomas, 9 glioblastomas, 3 other tumors) were analyzed with an approach combining histopathologic diagnosis with small sample size-adjusted moleculargenetic analysis. Serial biopsies were taken throughout the tumor. Single probes were taken for single analysis (e.g. 1 probe for immunohistochemistry, 1 probe for genomic DNA extraction, 1 probe for RNA and protein extraction etc.). This procedure is eligible as the literature reports homogeneous distribution of biomarkers throughout tumors. We report that for each single patient a microscopic analysis, different immunhistochemical stainings (e.g. GFAP, S100, Vimentin, MAP2, EGFR, IDH1, p53, and MIB-1) as well as biochemical and molecular analysis could be performed with material obtained from a single stereotactic biopsy. The biochemical analysis was done by Western blot using various antibodies (e.g. ALDH1, AHR, GFAP, beta-actin). Furthermore, molecular techniques were applied including the methylation status of the MGMT promoter by bisulfit conversion of the DNA and following PCR and the IDH1/2 mutation status verification by pyro-sequencing. Genomic DNA could be extracted from all stereotactic probes. RNA extraction was also possible, but the amount of cDNA which could be obtained from the RNA of stereotactic samples was significant lower in 14 (out of the 16) samples than the amount of cDNA normally used for quantitative RT-PCR from open operations. There was no biopsy-related morbidity. We conclude that determination of many different immunohistochemical, biochemical, and molecular parameters are possible using the stereotactic biopsy technique. Moreover, the procedure is safe and reliable. The stereotactic biopsy enables decision making for optimizing the treatment of brain tumors without the necessity of an open resection. This finding has even more clinical impact in elderly patients or in patients with eloquent tumor localizations.
PMCID: PMC4185502
20.  Low-Density Lipoprotein Apheresis 
Executive Summary
To assess the effectiveness and safety of low-density lipoprotein (LDL) apheresis performed with the heparin-induced extracorporeal LDL precipitation (HELP) system for the treatment of patients with refractory homozygous (HMZ) and heterozygous (HTZ) familial hypercholesterolemia (FH).
Background on Familial Hypercholesterolemia
Familial hypercholesterolemia is a genetic autosomal dominant disorder that is caused by several mutations in the LDL-receptor gene. The reduced number or absence of functional LDL receptors results in impaired hepatic clearance of circulating low-density lipoprotein cholesterol (LDL-C) particles, which results in extremely high levels of LDL-C in the bloodstream. Familial hypercholesterolemia is characterized by excess LDL-C deposits in tendons and arterial walls, early onset of atherosclerotic disease, and premature cardiac death.
Familial hypercholesterolemia occurs in both HTZ and HMZ forms.
Heterozygous FH is one of the most common monogenic metabolic disorders in the general population, occurring in approximately 1 in 500 individuals1. Nevertheless, HTZ FH is largely undiagnosed and an accurate diagnosis occurs in only about 15% of affected patients in Canada. Thus, it is estimated that there are approximately 3,800 diagnosed and 21,680 undiagnosed cases of HTZ FH in Ontario.
In HTZ FH patients, half of the LDL receptors do not work properly or are absent, resulting in plasma LDL-C levels 2- to 3-fold higher than normal (range 7-15mmol/L or 300-500mg/dL). Most HTZ FH patients are not diagnosed until middle age when either they or one of their siblings present with symptomatic coronary artery disease (CAD). Without lipid-lowering treatment, 50% of males die before the age of 50 and 25% of females die before the age of 60, from myocardial infarction or sudden death.
In contrast to the HTZ form, HMZ FH is rare (occurring in 1 case per million persons) and more severe, with a 6- to 8-fold elevation in plasma LDL-C levels (range 15-25mmol/L or 500-1000mg/dL). Homozygous FH patients are typically diagnosed in infancy, usually due to the presence of cholesterol deposits in the skin and tendons. The main complication of HMZ FH is supravalvular aortic stenosis, which is caused by cholesterol deposits on the aortic valve and in the ascending aorta. The average life expectancy of affected individuals is 23 to 25 years. In Ontario, it is estimated that there are 13 to 15 cases of HMZ FH. An Ontario clinical expert confirmed that 9 HMZ FH patients have been identified to date.
There are 2 accepted clinical diagnostic criterion for the diagnosis of FH: the Simon Broome FH Register criteria from the United Kingdom and the Dutch Lipid Network criteria from the Netherlands. The criterion supplement cholesterol levels with clinical history, physical signs and family history. DNA-based-mutation-screening methods permit a definitive diagnosis of HTZ FH to be made. However, given that there are over 1000 identified mutations in the LDL receptor gene and that the detection rates of current techniques are low, genetic testing becomes problematic in countries with high genetic heterogeneity, such as Canada.
The primary aim of treatment in both HTZ and HMZ FH is to reduce plasma LDL-C levels in order to reduce the risk of developing atherosclerosis and CAD.
The first line of treatment is dietary intervention, however it alone is rarely sufficient for the treatment of FH patients. Patients are frequently treated with lipid-lowering drugs such as resins, fibrates, niacin, statins and cholesterol absorption-inhibiting drugs (ezetimibe). Most HTZ FH patients require a combination of drugs to achieve or approach target cholesterol levels.
A small number of HTZ FH patients are refractory to treatment or intolerant to lipid-lowering medication. According to clinical experts, the prevalence of refractory HTZ FH in Ontario is between 1 to 5%. Using the mean of 3%, it is estimated that there are approximately 765 refractory HTZ FH patients in Ontario, of which 115 are diagnosed and 650 are undiagnosed.
Drug therapy is less effective in HMZ FH patients since the effects of the majority of cholesterol-lowering drugs are mediated by the upregulation of LDL receptors, which are often absent or function poorly in HMZ FH patients. Some HMZ FH patients may still benefit from drug therapy, however this rarely reduces LDL-C levels to targeted levels.
Existing Technology: Plasma Exchange
An option currently available in Ontario for FH patients who do not respond to standard diet and drug therapy is plasma exchange (PE). Patients are treated with this lifelong therapy on a weekly or biweekly basis with concomitant drug therapy.
Plasma exchange is nonspecific and eliminates virtually all plasma proteins such as albumin, immunoglobulins, coagulation factors, fibrinolytic factors and HDL-C, in addition to acutely lowering LDL-C by about 50%. Blood is removed from the patient, plasma is isolated, discarded and replaced with a substitution fluid. The substitution fluid and the remaining cellular components of the blood are then returned to the patient.
The major limitation of PE is its nonspecificity. The removal of HDL-C prevents successful vascular remodeling of the areas stenosed by atherosclerosis. In addition, there is an increased susceptibility to infections, and costs are incurred by the need for replacement fluid. Adverse events can be expected to occur in 12% of procedures.
Other Alternatives
Surgical alternatives for FH patients include portocaval shunt, ileal bypass and liver transplantation. However, these are risky procedures and are associated with a high morbidity rate. Results with gene therapy are not convincing to date.
The Technology Being Reviewed: LDL Apheresis
An alternative to PE is LDL apheresis. Unlike PE, LDL apheresis is a selective treatment that removes LDL-C and other atherogenic lipoproteins from the blood while minimally impacting other plasma components such as HDL-C, total serum protein, albumin and immunoglobulins. As with PE, FH patients require lifelong therapy with LDL apheresis on a weekly/biweekly basis with concomitant drug therapy.
Heparin-Induced Extracorporeal LDL Precipitation
Heparin-induced extracorporeal LDL precipitation (HELP) is one of the most widely used methods of LDL apheresis. It is a continuous closed-loop system that processes blood extracorporeally. It operates on the principle that at a low pH, LDL and lipoprotein (a) [Lp(a)] bind to heparin and fibrinogen to form a precipitate which is then removed by filtration. In general, the total duration of treatment is approximately 2 to 3 hours.
Results from early trials indicate that LDL-C concentration is reduced by 65% to 70% immediately following treatment in both HMZ and HTZ FH and then rapidly begins to rise. Typically patients with HTZ FH are treated every 2 weeks while patients with HMZ FH require weekly therapy. Heparin-induced extracorporeal LDL precipitation also produces small transient decreases in HDL-C, however levels generally return to baseline within 2 days. After several months of therapy, long-term reductions in LDL-C and increases in HDL-C have been reported.
In addition to having an impact on plasma cholesterol concentrations, HELP lowers plasma fibrinogen, a risk factor for atherosclerosis, and reduces concentrations of cellular adhesion molecules, which play a role in early atherogenesis.
In comparison with PE, HELP LDL apheresis does not have major effects on essential plasma proteins and does not require replacement fluid, thus decreasing susceptibility to infections. One study noted that adverse events were documented in 2.9% of LDL apheresis treatments using the HELP system compared with 12% using PE. As per the manufacturer, patients must weigh at least 30kgs to be eligible for treatment with HELP.
Regulatory Status
The H.E.L.P.® System (B.Braun Medizintechnologie GmbH, Germany) has been licensed by Health Canada since December 2000 as a Class 3 medical device (Licence # 26023) for performing LDL apheresis to acutely remove LDL from the plasma of 3 high-risk patient populations for whom diet has been ineffective and maximum drug therapy has either been ineffective or not tolerated. The 3 patient groups are as follows:
Functional hypercholesterolemic homozygotes with LDL-C >500 mg/dL (>13mmol/L);
Functional hypercholesterolemic heterozygotes with LDL-C >300 mg/dL (>7.8mmol/L);
Functional hypercholesterolemic heterozygotes with LDL-C >200 mg/dL (>5.2mmol/L) and documented CAD
No other LDL apheresis system is currently licensed in Canada.
Review Strategy
The Medical Advisory Secretariat systematically reviewed the literature to assess the effectiveness and safety of LDL apheresis performed with the HELP system for the treatment of patients with refractory HMZ and HTZ FH. A standard search methodology was used to retrieve international health technology assessments and English-language journal articles from selected databases.
The GRADE approach was used to systematically and explicitly make judgments about the quality of evidence and strength of recommendations.
Summary of Findings
The search identified 398 articles published from January 1, 1998 to May 30, 2007. Eight studies met the inclusion criteria. Five case series, 2 case series nested within comparative studies, and one retrospective review, were included in the analysis. A health technology assessment conducted by the Alberta Heritage Foundation for Medical Research, and a review by the United States Food and Drug Administration were also included.
Large heterogeneity among the studies was observed. Studies varied in inclusion criteria, baseline patient characteristics and methodology.
Overall, the mean acute1 relative decrease in LDL-C with HELP LDL apheresis ranged from 53 to 77%. The mean acute relative reductions ranged as follows: total cholesterol (TC) 47 to 64%, HDL-C +0.4 to -29%, triglycerides (TG) 33 to 62%, Lp(a) 55 to 68% and fibrinogen 56 to 65%.
The mean chronic2 relative decreases in LDL-C and TC with HELP LDL apheresis ranged from 9 to 46% and 5 to 34%, respectively. Familial hypercholesterolemia patients treated with HELP did not achieve the target LDL-C value set by international guidelines (LDL-C < 2.5mmol/L, 100mg/dL). The chronic mean relative increase in HDL-C ranged from 12 to 27%. The ratio of LDL:HDL and the ratio of TC:HDL are 2 measures that have been shown to be important risk factors for cardiac events. In high-risk patients, the recommended target LDL:HDL ratio is less than or equal to 2, and the target TC:HDL ratio is less than 4. In the studies that reported chronic lipid changes, the LDL:HDL and TC:HDL ratios exceeded targeted values.
Three studies investigated the effects of HELP on coronary outcomes and atherosclerotic changes. One noted that twice as many lesions displayed regression in comparison to those displaying progression. The second study found that there was a decrease in Agatston scores3 and in the volume of coronary calcium. The last study noted that 2 of 5 patients showed regression of coronary atherosclerosis, and 3 of the 5 patients showed no change as assessed by a global change score.
Adverse effects were typically mild and transient, and the majority of events were related to problems with vascular access. Of the 3 studies that provided quantitative information, the proportion of adverse events ranged from 2.9 to 5.1%.
GRADE Quality of Evidence
In general, studies were of low quality, i.e., case series studies (Tables 1-3). No controlled studies were identified and no studies directly compared the effectiveness of the HELP system with PE or with diet and drug therapy. Conducting trials with a sufficiently large control group would not have been feasible or acceptable given that HELP represents a last alternative in these patients who are resistant to conventional therapeutic strategies.
A major limitation is that there is limited evidence on the effectiveness and safety of HELP apheresis in HMZ FH patients. However, it is unlikely that better-quality evidence will become available, given that HMZ FH is rare and LDL apheresis is a last therapeutic option for these patients.
Lastly, there is limited data on the long-term effects of LDL apheresis in FH patients. No studies with HELP were identified that examined long-term outcomes such as survival and cardiovascular events. The absence of this data may be attributed to the rarity of the condition, and the large number of subjects and long duration of follow-up that would be needed to conduct such trials.
Homozygous Familial Hypercholesterolemia - Lipid Outcomes
Heterozygous Familial Hypercholesterolemia - Lipid Outcomes
Heterozygous Familial Hypercholesterolemia - Coronary Artery Disease Outcomes
Economic Analysis
A budget-impact analysis was conducted to forecast future costs for PE and HELP apheresis in FH patients. All costs are reported in Canadian dollars. Based on epidemiological data of 13 HMZ, 115 diagnosed HTZ and 765 cases of all HTZ patients (diagnosed + undiagnosed), the annual cost of weekly treatment was estimated to be $488,025, $4,332,227 and $24,758,556 respectively for PE. For HELP apheresis, the annual cost of weekly treatment was estimated to be $1,025,338, $9,156,209 and $60,982,579 respectively. Costs for PE and HELP apheresis were halved with a biweekly treatment schedule.
The cost per coronary artery disease death avoided over a 10-year period in HTZ FH-diagnosed patients was also calculated and estimated to be $37.5 million and $18.7 million for weekly and biweekly treatment respectively, when comparing HELP apheresis with PE and with no intervention. Although HELP apheresis costs twice as much as PE, it helped to avoid 12 deaths compared with PE and 22 deaths compared with no intervention, over a period of 10 years.
Ontario Health System Impact Analysis
Low-density lipoprotein apheresis using the HELP system is currently being funded by the provinces of Quebec and Alberta. The program in Quebec has been in operation since 2001 and is limited to the treatment of HMZ FH patients. The Alberta program is relatively new and is currently treating HMZ FH patients, but it is expanding to include refractory HTZ FH patients.
Low-density lipoprotein apheresis is a lifelong treatment and requires considerable commitment on the part of the patient, and the patient’s family and physician. In addition, the management of FH continues to evolve. With the advent of new more powerful cholesterol-lowering drugs, some HTZ patients may be able to sufficiently control their hypercholesterolemia. Nevertheless, according to clinical experts, HMZ patients will likely always require LDL apheresis.
Given the substantial costs associated with LDL apheresis, treatment has been limited to HMZ FH patients. However, LDL apheresis could be applied to a much larger population, which would include HTZ FH patients who are refractory to diet and drug therapy. HTZ FH patients are generally recruited in a more advanced state, demonstrate a longer natural survival than HMZ FH patients and are older.
For HMZ FH patients, the benefits of LDL apheresis clearly outweigh the risks and burdens. According to GRADE, the recommendation would be graded as strong, with low- to very low-quality evidence (Table 4).
In both HMZ and HTZ FH patients, there is evidence of overall clinical benefit of LDL apheresis from case series studies. Low-density lipoprotein apheresis has several advantages over the current treatment of PE, including decreased exposure to blood products, decreased risk of adverse events, conservation of nonatherogenic and athero-protective components, such as HDL-C and lowering of other atherogenic components, such as fibrinogen.
In contrast to HMZ FH patients, there remains a lot of uncertainty in the social/ethical acceptance of this technology for the treatment of refractory HTZ FH patients. In addition to the substantial costs, it is unknown whether the current health care system could cope with the additional demand. There is uncertainty in the estimates of benefits, risks and burdens. According to GRADE, the recommendation would be graded as weak with low- to very-low-quality evidence (Table 5).
GRADE Recommendation - Homozygous Patients
GRADE of recommendation: Strong recommendation, low-quality or very-low-quality evidence
Benefits clearly outweigh risk and burdens
Case series study designs
Strong, but may change when higher-quality evidence becomes available
GRADE Recommendation - Heterozygous Patients
GRADE of recommendation: Weak recommendation, low-quality or very-low-quality evidence
Uncertainty in the estimates of benefits, risks and burden, which these may be closely balanced
Case series study designs
Very weak; other alternatives may be equally reasonable
PMCID: PMC3377562  PMID: 23074505
21.  Estimating the Number of Paediatric Fevers Associated with Malaria Infection Presenting to Africa's Public Health Sector in 2007 
PLoS Medicine  2010;7(7):e1000301.
Peter Gething and colleagues compute the number of fevers likely to present to public health facilities in Africa and the estimated number of these fevers likely to be infected with Plasmodium falciparum malaria parasites.
As international efforts to increase the coverage of artemisinin-based combination therapy in public health sectors gather pace, concerns have been raised regarding their continued indiscriminate presumptive use for treating all childhood fevers. The availability of rapid-diagnostic tests to support practical and reliable parasitological diagnosis provides an opportunity to improve the rational treatment of febrile children across Africa. However, the cost effectiveness of diagnosis-based treatment polices will depend on the presumed numbers of fevers harbouring infection. Here we compute the number of fevers likely to present to public health facilities in Africa and the estimated number of these fevers likely to be infected with Plasmodium falciparum malaria parasites.
Methods and Findings
We assembled first administrative-unit level data on paediatric fever prevalence, treatment-seeking rates, and child populations. These data were combined in a geographical information system model that also incorporated an adjustment procedure for urban versus rural areas to produce spatially distributed estimates of fever burden amongst African children and the subset likely to present to public sector clinics. A second data assembly was used to estimate plausible ranges for the proportion of paediatric fevers seen at clinics positive for P. falciparum in different endemicity settings. We estimated that, of the 656 million fevers in African 0–4 y olds in 2007, 182 million (28%) were likely to have sought treatment in a public sector clinic of which 78 million (43%) were likely to have been infected with P. falciparum (range 60–103 million).
Spatial estimates of childhood fevers and care-seeking rates can be combined with a relational risk model of infection prevalence in the community to estimate the degree of parasitemia in those fevers reaching public health facilities. This quantification provides an important baseline comparison of malarial and nonmalarial fevers in different endemicity settings that can contribute to ongoing scientific and policy debates about optimum clinical and financial strategies for the introduction of new diagnostics. These models are made publicly available with the publication of this paper.
Please see later in the article for the Editors' Summary
Editors' Summary
Malaria —an infectious parasitic disease transmitted to people through the bite of an infected mosquito —kills about one million people (mainly children living in sub-Saharan Africa) every year. Although several parasites cause malaria, Plasmodium falciparum is responsible for most of these deaths. For the past 50 years, the main treatments for P. falciparum malaria have been chloroquine and sulfadoxine/pyrimethamine. Unfortunately, parasitic resistance to these “monotherapies” is now widespread and there has been a global upsurge in the illness and deaths caused by P. falciparum. To combat this increase, the World Health Organization recommends artemisinin combination therapy (ACT) for P. falciparum malaria in all regions with drug-resistant malaria. In ACT, artemisinin derivatives (new, fast-acting antimalarial drugs) are used in combination with another antimalarial to reduce the chances of P. falciparum becoming resistant to either drug.
Why Was This Study Done?
All African countries at risk of P. falciparum have now adopted ACT as first-line therapy for malaria in their public clinics. However, experts are concerned that ACT is often given to children who don't actually have malaria because, in many parts of Africa, health care workers assume that all childhood fevers are malaria. This practice, which became established when diagnostic facilities for malaria were very limited, increases the chances of P. falciparum becoming resistant to ACT, wastes limited drug stocks, and means that many ill children are treated inappropriately. Recently, however, rapid diagnostic tests for malaria have been developed and there have been calls to expand their use to improve the rational treatment of African children with fever. Before such an expansion is initiated, it is important to know how many African children develop fever each year, how many of these ill children attend public clinics, and what proportion of them is likely to have malaria. Unfortunately, this type of information is incompletely or unreliably collected in many parts of Africa. In this study, therefore, the researchers use a mathematical model to estimate the number of childhood fevers associated with malaria infection that presented to Africa's public clinics in 2007 from survey data.
What Did the Researchers Do and Find?
The researchers used survey data on the prevalence (the proportion of a population with a specific disease) of childhood fever and on treatment-seeking behavior and data on child populations to map the distribution of fever among African children and the likelihood of these children attending public clinics for treatment. They then used a recent map of the distribution of P. falciparum infection risk to estimate what proportion of children with fever who attended clinics were likely to have had malaria in different parts of Africa. In 2007, the researchers estimate, 656 million cases of fever occurred in 0–4-year-old African children, 182 million were likely to have sought treatment in a public clinic, and 78 million (just under half of the cases that attended a clinic with fever) were likely to have been infected with P. falciparum. Importantly, there were marked geographical differences in the likelihood of children with fever presenting at public clinics being infected with P. falciparum. So, for example, whereas nearly 60% of the children attending public clinics with fever in Burkino Faso were likely to have had malaria, only 15% of similar children in Kenya were likely to have had this disease.
What Do These Findings Mean?
As with all mathematical models, the accuracy of these findings depends on the assumptions included in the model and on the data fed into it. Nevertheless, these findings provide a map of the prevalence of malarial and nonmalarial childhood fevers across sub-Saharan Africa and an indication of how many of the children with fever reaching public clinics are likely to have malaria and would therefore benefit from ACT. The finding that in some countries more than 80% of children attending public clinics with fever probably don't have malaria highlights the potential benefits of introducing rapid diagnostic testing for malaria. Furthermore, these findings can now be used to quantify the resources needed for and the potential clinical benefits of different policies for the introduction of rapid diagnostic testing for malaria across Africa.
Additional Information
Please access these Web sites via the online version of this summary at
Information is available from the World Health Organization on malaria (in several languages) and on rapid diagnostic tests for malaria
The US Centers for Disease Control and Prevention provide information on malaria (in English and Spanish)
MedlinePlus provides links to additional information on malaria (in English and Spanish)
Information on the global mapping of malaria is available at the Malaria Atlas Project
Information is available from the Roll Back Malaria Partnership on the global control of malaria (in English and French) and on artemisinin combination therapy
PMCID: PMC2897768  PMID: 20625548
22.  Cancer Screening with Digital Mammography for Women at Average Risk for Breast Cancer, Magnetic Resonance Imaging (MRI) for Women at High Risk 
Executive Summary
The purpose of this review is to determine the effectiveness of 2 separate modalities, digital mammography (DM) and magnetic resonance imaging (MRI), relative to film mammography (FM), in the screening of women asymptomatic for breast cancer. A third analysis assesses the effectiveness and safety of the combination of MRI plus mammography (MRI plus FM) in screening of women at high risk. An economic analysis was also conducted.
Research Questions
How does the sensitivity and specificity of DM compare to FM?
How does the sensitivity and specificity of MRI compare to FM?
How do the recall rates compare among these screening modalities, and what effect might this have on radiation exposure? What are the risks associated with radiation exposure?
How does the sensitivity and specificity of the combination of MRI plus FM compare to either MRI or FM alone?
What are the economic considerations?
Clinical Need
The effectiveness of FM with respect to breast cancer mortality in the screening of asymptomatic average- risk women over the age of 50 has been established. However, based on a Medical Advisory Secretariat review completed in March 2006, screening is not recommended for women between the ages of 40 and 49 years. Guidelines published by the Canadian Task Force on Preventive Care recommend mammography screening every 1 to 2 years for women aged 50 years and over, hence, the inclusion of such women in organized breast cancer screening programs. In addition to the uncertainty of the effectiveness of mammography screening from the age of 40 years, there is concern over the risks associated with mammographic screening for the 10 years between the ages of 40 and 49 years.
The lack of effectiveness of mammography screening starting at the age of 40 years (with respect to breast cancer mortality) is based on the assumption that the ability to detect cancer decreases with increased breast tissue density. As breast density is highest in the premenopausal years (approximately 23% of postmenopausal and 53% of premenopausal women having at least 50% of the breast occupied by high density), mammography screening is not promoted in Canada nor in many other countries for women under the age of 50 at average risk for breast cancer. It is important to note, however, that screening of premenopausal women (i.e., younger than 50 years of age) at high risk for breast cancer by virtue of a family history of cancer or a known genetic predisposition (e.g., having tested positive for the breast cancer genes BRCA1 and/or BRCA2) is appropriate. Thus, this review will assess the effectiveness of breast cancer screening with modalities other than film mammography, specifically DM and MRI, for both pre/perimenopausal and postmenopausal age groups.
International estimates of the epidemiology of breast cancer show that the incidence of breast cancer is increasing for all ages combined whereas mortality is decreasing, though at a slower rate. The observed decreases in mortality rates may be attributable to screening, in addition to advances in breast cancer therapy over time. Decreases in mortality attributable to screening may be a result of the earlier detection and treatment of invasive cancers, in addition to the increased detection of ductal carcinoma in situ (DCIS), of which certain subpathologies are less lethal. Evidence from the Surveillance, Epidemiology and End Results (better known as SEER) cancer registry in the United States, indicates that the age-adjusted incidence of DCIS has increased almost 10-fold over a 20 year period, from 2.7 to 25 per 100,000.
There is a 4-fold lower incidence of breast cancer in the 40 to 49 year age group than in the 50 to 69 year age group (approximately 140 per 100,000 versus 500 per 100,000 women, respectively). The sensitivity of FM is also lower among younger women (approximately 75%) than for women aged over 50 years (approximately 85%). Specificity is approximately 80% for younger women versus 90% for women over 50 years. The increased density of breast tissue in younger women is likely responsible for the decreased accuracy of FM.
Treatment options for breast cancer vary with the stage of disease (based on tumor size, involvement of surrounding tissue, and number of affected axillary lymph nodes) and its pathology, and may include a combination of surgery, chemotherapy and/or radiotherapy. Surgery is the first-line intervention for biopsy-confirmed tumors. The subsequent use of radiation, chemotherapy or hormonal treatments is dependent on the histopathologic characteristics of the tumor and the type of surgery. There is controversy regarding the optimal treatment of DCIS, which is considered a noninvasive tumour.
Women at high risk for breast cancer are defined as genetic carriers of the more commonly known breast cancer genes (BRCA1, BRCA2 TP53), first degree relatives of carriers, women with varying degrees of high risk family histories, and/or women with greater than 20% lifetime risk for breast cancer based on existing risk models. Genetic carriers for this disease, primarily women with BRCA1 or BRCA2 mutations, have a lifetime probability of approximately 85% of developing breast cancer. Preventive options for these women include surgical interventions such as prophylactic mastectomy and/or oophorectomy, i.e., removal of the breasts and/or ovaries. Therefore, it is important to evaluate the benefits and risks of different screening modalities, to identify additional options for these women.
This Medical Advisory Secretariat review is the second of 2 parts on breast cancer screening, and concentrates on the evaluation of both DM and MRI relative to FM, the standard of care. Part I of this review (March 2006) addressed the effectiveness of screening mammography in 40 to 49 year old average-risk women. The overall objective of the present review is to determine the optimal screening modality based on the evidence.
Evidence Review Strategy
The Medical Advisory Secretariat followed its standard procedures and searched the following electronic databases: Ovid MEDLINE, EMBASE, Ovid MEDLINE In-Process & Other Non-Indexed Citations, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews and The International Network of Agencies for Health Technology Assessment database. The subject headings and keywords searched included breast cancer, breast neoplasms, mass screening, digital mammography, magnetic resonance imaging. The detailed search strategies can be viewed in Appendix 1.
Included in this review are articles specific to screening and do not include evidence on diagnostic mammography. The search was further restricted to English-language articles published between January 1996 and April 2006. Excluded were case reports, comments, editorials, nonsystematic reviews, and letters.
Digital Mammography: In total, 224 articles specific to DM screening were identified. These were examined against the inclusion/exclusion criteria described below, resulting in the selection and review of 5 health technology assessments (HTAs) (plus 1 update) and 4 articles specific to screening with DM.
Magnetic Resonance Imaging: In total, 193 articles specific to MRI were identified. These were examined against the inclusion/exclusion criteria described below, resulting in the selection and review of 2 HTAs and 7 articles specific to screening with MRI.
The evaluation of the addition of FM to MRI in the screening of women at high risk for breast cancer was also conducted within the context of standard search procedures of the Medical Advisory Secretariat. as outlined above. The subject headings and keywords searched included the concepts of breast cancer, magnetic resonance imaging, mass screening, and high risk/predisposition to breast cancer. The search was further restricted to English-language articles published between September 2007 and January 15, 2010. Case reports, comments, editorials, nonsystematic reviews, and letters were not excluded.
MRI plus mammography: In total, 243 articles specific to MRI plus FM screening were identified. These were examined against the inclusion/exclusion criteria described below, resulting in the selection and review of 2 previous HTAs, and 1 systematic review of 11 paired design studies.
Inclusion Criteria
English-language articles, and English or French-language HTAs published from January 1996 to April 2006, inclusive.
Articles specific to screening of women with no personal history of breast cancer.
Studies in which DM or MRI were compared with FM, and where the specific outcomes of interest were reported.
Randomized controlled trials (RCTs) or paired studies only for assessment of DM.
Prospective, paired studies only for assessment of MRI.
Exclusion Criteria
Studies in which outcomes were not specific to those of interest in this report.
Studies in which women had been previously diagnosed with breast cancer.
Studies in which the intervention (DM or MRI) was not compared with FM.
Studies assessing DM with a sample size of less than 500.
Digital mammography.
Magnetic resonance imaging.
Screening with film mammography.
Outcomes of Interest
Breast cancer mortality (although no studies were found with such long follow-up).
Recall rates.
Summary of Findings
Digital Mammography
There is moderate quality evidence that DM is significantly more sensitive than FM in the screening of asymptomatic women aged less than 50 years, those who are premenopausal or perimenopausal, and those with heterogeneously or extremely dense breast tissue (regardless of age).
It is not known what effect these differences in sensitivity will have on the more important effectiveness outcome measure of breast cancer mortality, as there was no evidence of such an assessment.
Other factors have been set out to promote DM, for example, issues of recall rates and reading and examination times. Our analysis did not show that recall rates were necessarily improved in DM, though examination times were lower than for FM. Other factors including storage and retrieval of screens were not the subject of this analysis.
Magnetic Resonance Imaging
There is moderate quality evidence that the sensitivity of MRI is significantly higher than that of FM in the screening of women at high risk for breast cancer based on genetic or familial factors, regardless of age.
Radiation Risk Review
Cancer Care Ontario conducted a review of the evidence on radiation risk in screening with mammography women at high risk for breast cancer. From this review of recent literature and risk assessment that considered the potential impact of screening mammography in cohorts of women who start screening at an earlier age or who are at increased risk of developing breast cancer due to genetic susceptibility, the following conclusions can be drawn:
For women over 50 years of age, the benefits of mammography greatly outweigh the risk of radiation-induced breast cancer irrespective of the level of a woman’s inherent breast cancer risk.
Annual mammography for women aged 30 – 39 years who carry a breast cancer susceptibility gene or who have a strong family breast cancer history (defined as a first degree relative diagnosed in their thirties) has a favourable benefit:risk ratio. Mammography is estimated to detect 16 to 18 breast cancer cases for every one induced by radiation (Table 1). Initiation of screening at age 35 for this same group would increase the benefit:risk ratio to an even more favourable level of 34-50 cases detected for each one potentially induced.
Mammography for women under 30 years of age has an unfavourable benefit:risk ratio due to the challenges of detecting cancer in younger breasts, the aggressiveness of cancers at this age, the potential for radiation susceptibility at younger ages and a greater cumulative radiation exposure.
Mammography when used in combination with MRI for women who carry a strong breast cancer susceptibility (e.g., BRCA1/2 carriers), which if begun at age 35 and continued for 35 years, may confer greatly improved benefit:risk ratios which were estimated to be about 220 to one.
While there is considerable uncertainty in the risk of radiation-induced breast cancer, the risk expressed in published studies is almost certainly conservative as the radiation dose absorbed by women receiving mammography recently has been substantially reduced by newer technology.
A CCO update of the mammography radiation risk literature for 2008 and 2009 gave rise to one article by Barrington de Gonzales et al. published in 2009 (Barrington de Gonzales et al., 2009, JNCI, vol. 101: 205-209). This article focuses on estimating the risk of radiation-induced breast cancer for mammographic screening of young women at high risk for breast cancer (with BRCA gene mutations). Based on an assumption of a 15% to 25% or less reduction in mortality from mammography in these high risk women, the authors conclude that such a reduction is not substantially greater than the risk of radiation-induced breast cancer mortality when screening before the age of 34 years. That is, there would be no net benefit from annual mammographic screening of BRCA mutation carriers at ages 25-29 years; the net benefit would be zero or small if screening occurs in 30-34 year olds, and there would be some net benefit at age 35 years or older.
The Addition of Mammography to Magnetic Resonance Imaging
The effects of the addition of FM to MRI screening of high risk women was also assessed, with inclusion and exclusion criteria as follows:
Inclusion Criteria
English-language articles and English or French-language HTAs published from September 2007 to January 15, 2010.
Articles specific to screening of women at high risk for breast cancer, regardless of the definition of high risk.
Studies in which accuracy data for the combination of MRI plus FM are available to be compared to that of MRI and FM alone.
RCTs or prospective, paired studies only.
Studies in which women were previously diagnosed with breast cancer were also included.
Exclusion Criteria
Studies in which outcomes were not specific to those of interest in this report.
Studies in which there was insufficient data on the accuracy of MRI plus FM.
Both MRI and FM.
Screening with MRI alone and FM alone.
Outcomes of Interest
Summary of Findings
Magnetic Resonance Imaging Plus Mammography
Moderate GRADE Level Evidence that the sensitivity of MRI plus mammography is significantly higher than that of MRI or FM alone, although the specificity remains either unchanged or decreases in the screening of women at high risk for breast cancer based on genetic/familial factors, regardless of age.
These studies include women at high risk defined as BRCA1/2 or TP53 carriers, first degree relatives of carriers, women with varying degrees of high risk family histories, and/or >20% lifetime risk based on existing risk models. This definition of high risk accounts for approximately 2% of the female adult population in Ontario.
PMCID: PMC3377503  PMID: 23074406
23.  Predictive values derived from lower wisdom teeth developmental stages on orthopantomograms to calculate the chronological age in adolescence and young adults as a prerequisite to obtain age-adjusted informed patient consent prior to elective surgical procedures in young patients with incomplete or mismatched personal data 
Introduction: Surgical procedures require informed patient consent, which is mandatory prior to any procedure. These requirements apply in particular to elective surgical procedures. The communication with the patient about the procedure has to be comprehensive and based on mutual understanding. Furthermore, the informed consent has to take into account whether a patient is of legal age. As a result of large-scale migration, there are eventually patients planned for medical procedures, whose chronological age can’t be assessed reliably by physical inspection alone. Age determination based on assessing wisdom tooth development stages can be used to help determining whether individuals involved in medical procedures are of legal age, i.e., responsible and accountable. At present, the assessment of wisdom tooth developmental stages barely allows a crude estimate of an individual’s age. This study explores possibilities for more precise predictions of the age of individuals with emphasis on the legal age threshold of 18 years.
Material and Methods: 1,900 dental orthopantomograms (female 938, male 962, age: 15–24 years), taken between the years 2000 and 2013 for diagnosis and treatment of diseases of the jaws, were evaluated. 1,895 orthopantomograms (female 935, male 960) of 1,804 patients (female 872, male 932) met the inclusion criteria. The archives of the Department of Diagnostic Radiology in Dentistry, University Medical Center Hamburg-Eppendorf, and of an oral and maxillofacial office in Rostock, Germany, were used to collect a sufficient number of radiographs. An effort was made to achieve almost equal distribution of age categories in this study group; ‘age’ was given on a particular day. The radiological criteria of lower third molar investigation were: presence and extension of periodontal space, alveolar bone loss, emergence of tooth, and stage of tooth mineralization (according to Demirjian). Univariate and multivariate general linear models were calculated. Using hierarchical multivariate analyses a formula was derived quantifying the development of the four parameters of wisdom tooth over time. This model took repeated measurements of the same persons into account and is only applicable when a person is assessed a second time. The second approach investigates a linear regression model in order to predict the age. In a third approach, a classification and regression tree (CART) was developed to derive cut-off values for the four parameters, resulting in a classification with estimates for sensitivity and specificity.
Results: No statistically significant differences were found between parameters related to wisdom tooth localization (right or left side). In univariate analyses being of legal age was associated with consecutive stages of wisdom tooth development, the obliteration of the periodontal space, and tooth emergence, as well with alveolar bone loss; no association was found with tooth mineralization. Multivariate models without repeated measurements revealed imprecise estimates because of the unknown individual-related variability. The precision of these models is thus not very good, although it improves with advancing age. When calculating a CART-analysis and a receiver operating characteristics – area under the curve of 78% was achieved; when maximizing both specificity and sensitivity, a Youden’s index of 47% was achieved (with 73% specificity and 74% sensitivity).
Discussion: This study provides a basis to help determine whether a person is 18 years or older in individuals who are assumed to be between 15 and 24 years old. From repeated measurements, we found a linear effect of age on the four parameters in the individuals. However, this information can't be used for prognosis, because of the large intra-individual variability. Thus, although the development of the four parameters can be estimated over time, a direct conclusion with regard to age can’t be drawn from the parameters without previous biographic information about a person. While a single parameter is of limited value for calculating the target age of 18 years, combining several findings, that can be determined on a standard radiography, may potentially be a more reliable diagnostic tool for estimating the target age in both sexes. However, a high degree of precision can’t be achieved. The reason for persistent uncertainty lies in the wide chronological range of wisdom tooth development, which stretches from well below to above the 18th life year. The regression approach thus seems not optimal. Although sensitivity and specificity of the CART-model are moderately high, this model is still not reliable as a diagnostic tool. Our findings could have impact, e.g. on elective surgeries for young individuals with unknown biography. However, these results cannot replace social engagement, in particular thorough physical examination of patients and careful registration of their histories. Further studies on the use of this calculation method in different ethnic groups would be desirable.
PMCID: PMC5141618  PMID: 27975042
informed consent to medical treatment; biometry; age determination by teeth; wisdom tooth development; forensic odontology
24.  Diagnostic accuracy of fine needle aspiration biopsy for detection of malignancy in pediatric thyroid nodules: protocol for a systematic review and meta-analysis 
Systematic Reviews  2015;4:120.
Fine needle aspiration biopsy (FNAB) is an accurate test commonly used to determine whether thyroid nodules are malignant in adults. However, less is known about its diagnostic accuracy for this purpose in children, where conduct of FNAB is less frequent, more technically challenging, and pre-test probabilities of malignancy are often higher. The purpose of this systematic review is to evaluate the diagnostic accuracy of FNAB for the detection of malignancy in pediatric thyroid nodules.
We will search electronic bibliographic databases (MEDLINE, EMBASE, the Cochrane Library, and Evidence-Based Medicine) from their date of inception, reference lists of included articles, proceedings from relevant conferences, and the table of contents of the Journal of Pediatric Surgery (January 2007–present). Two reviewers will independently screen titles and abstracts and identify diagnostic accuracy studies involving FNAB of the thyroid in children. We will include studies comparing FNAB to a reference standard of surgical histopathology or clinical follow-up for detection of malignancy in pediatric thyroid nodules. Two investigators will independently extract data and assess risk of bias using the Quality of Diagnostic Accuracy Studies-II tool. Pooled estimates of sensitivity, specificity, and positive and negative likelihood ratios will be calculated using bivariate random-effects and hierarchical summary receiver operating characteristic models. In the presence of between-study heterogeneity, we will conduct stratified meta-analyses and meta-regression to determine whether diagnostic accuracy estimates vary by country of origin, use of ultrasound guidance during FNAB, qualifications of the individuals performing/interpreting FNAB, adherence to the Bethesda criteria for cytology classification, length of clinical follow-up, timing of data collection, patient selection methods, and presence of verification bias.
This meta-analysis will determine the diagnostic accuracy of FNAB for detection of malignancy in pediatric thyroid nodules and explore whether heterogeneity observed across studies may be explained by variations in patient population, FNAB technique or interpretation, and/or study-level risks of bias. This will be the first study to determine the accuracy of Bethesda cytological classification levels of FNAB (benign, atypical, follicular, suspicious, malignant). We expect that our results will help in guiding clinical decision-making in children with thyroid nodules.
Systematic review registration
PROSPERO No. CRD42014007140
PMCID: PMC4581518  PMID: 26399232
Fine needle biopsy; Pediatric; Thyroid nodule; Thyroid cancer; Meta-analysis; Systematic review; Diagnostic accuracy; Sensitivity; Specificity; Likelihood ratio
25.  Double robust and efficient estimation of a prognostic model for events in the presence of dependent censoring 
Biostatistics (Oxford, England)  2015;17(1):165-177.
In longitudinal data arising from observational or experimental studies, dependent subject drop-out is a common occurrence. If the goal is estimation of the parameters of a marginal complete-data model for the outcome, biased inference will result from fitting the model of interest with only uncensored subjects. For example, investigators are interested in estimating a prognostic model for clinical events in HIV-positive patients, under the counterfactual scenario in which everyone remained on ART (when in reality, only a subset had). Inverse probability of censoring weighting (IPCW) is a popular method that relies on correct estimation of the probability of censoring to produce consistent estimation, but is an inefficient estimator in its standard form. We introduce sequentially augmented regression (SAR), an adaptation of the Bang and Robins (2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–972.) method to estimate a complete-data prediction model, adjusting for longitudinal missing at random censoring. In addition, we propose a closely related non-parametric approach using targeted maximum likelihood estimation (TMLE; van der Laan and Rubin, 2006. Targeted maximum likelihood learning. The International Journal of Biostatistics 2(1), Article 11). We compare IPCW, SAR, and TMLE (implemented parametrically and with Super Learner) through simulation and the above-mentioned case study.
PMCID: PMC4679073  PMID: 26224070
Inverse probability of censoring weighting; Longitudinal; Marginal structural model; Prediction; Targeted maximum likelihood estimation; Targeted minimum loss-based estimation

Results 1-25 (1838989)