PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1290182)

Clipboard (0)
None

Related Articles

1.  A robust method using propensity score stratification for correcting verification bias for binary tests 
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
doi:10.1093/biostatistics/kxr020
PMCID: PMC3276270  PMID: 21856650
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
2.  Estimation of the ROC Curve under Verification Bias 
Summary
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
doi:10.1002/bimj.200800128
PMCID: PMC3475535  PMID: 19588455
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
3.  A model for adjusting for nonignorable verification bias in estimation of ROC curve and its area with likelihood-based approach 
Biometrics  2010;66(4):1119-1128.
Summary
In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this paper, we proposed a likelihood-based approach to estimate the ROC curve and the area under ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all the four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in the Alzheimer’s disease.
doi:10.1111/j.1541-0420.2010.01397.x
PMCID: PMC3618959  PMID: 20222937
Alzheimer’s disease; nonignorable missing data; ROC curve; verification bias
4.  Clinical Utility of Serologic Testing for Celiac Disease in Ontario 
Executive Summary
Objective of Analysis
The objective of this evidence-based evaluation is to assess the accuracy of serologic tests in the diagnosis of celiac disease in subjects with symptoms consistent with this disease. Furthermore the impact of these tests in the diagnostic pathway of the disease and decision making was also evaluated.
Celiac Disease
Celiac disease is an autoimmune disease that develops in genetically predisposed individuals. The immunological response is triggered by ingestion of gluten, a protein that is present in wheat, rye, and barley. The treatment consists of strict lifelong adherence to a gluten-free diet (GFD).
Patients with celiac disease may present with a myriad of symptoms such as diarrhea, abdominal pain, weight loss, iron deficiency anemia, dermatitis herpetiformis, among others.
Serologic Testing in the Diagnosis Celiac Disease
There are a number of serologic tests used in the diagnosis of celiac disease.
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibodies (DGP)
Serologic tests are automated with the exception of the EMA test, which is more time-consuming and operator-dependent than the other tests. For each serologic test, both immunoglobulin A (IgA) or G (IgG) can be measured, however, IgA measurement is the standard antibody measured in celiac disease.
Diagnosis of Celiac Disease
According to celiac disease guidelines, the diagnosis of celiac disease is established by small bowel biopsy. Serologic tests are used to initially detect and to support the diagnosis of celiac disease. A small bowel biopsy is indicated in individuals with a positive serologic test. In some cases an endoscopy and small bowel biopsy may be required even with a negative serologic test. The diagnosis of celiac disease must be performed on a gluten-containing diet since the small intestine abnormalities and the serologic antibody levels may resolve or improve on a GFD.
Since IgA measurement is the standard for the serologic celiac disease tests, false negatives may occur in IgA-deficient individuals.
Incidence and Prevalence of Celiac Disease
The incidence and prevalence of celiac disease in the general population and in subjects with symptoms consistent with or at higher risk of celiac disease based on systematic reviews published in 2004 and 2009 are summarized below.
Incidence of Celiac Disease in the General Population
Adults or mixed population: 1 to 17/100,000/year
Children: 2 to 51/100,000/year
In one of the studies, a stratified analysis showed that there was a higher incidence of celiac disease in younger children compared to older children, i.e., 51 cases/100,000/year in 0 to 2 year-olds, 33/100,000/year in 2 to 5 year-olds, and 10/100,000/year in children 5 to 15 years old.
Prevalence of Celiac Disease in the General Population
The prevalence of celiac disease reported in population-based studies identified in the 2004 systematic review varied between 0.14% and 1.87% (median: 0.47%, interquartile range: 0.25%, 0.71%). According to the authors of the review, the prevalence did not vary by age group, i.e., adults and children.
Prevalence of Celiac Disease in High Risk Subjects
Type 1 diabetes (adults and children): 1 to 11%
Autoimmune thyroid disease: 2.9 to 3.3%
First degree relatives of patients with celiac disease: 2 to 20%
Prevalence of Celiac Disease in Subjects with Symptoms Consistent with the Disease
The prevalence of celiac disease in subjects with symptoms consistent with the disease varied widely among studies, i.e., 1.5% to 50% in adult studies, and 1.1% to 17% in pediatric studies. Differences in prevalence may be related to the referral pattern as the authors of a systematic review noted that the prevalence tended to be higher in studies whose population originated from tertiary referral centres compared to general practice.
Research Questions
What is the sensitivity and specificity of serologic tests in the diagnosis celiac disease?
What is the clinical validity of serologic tests in the diagnosis of celiac disease? The clinical validity was defined as the ability of the test to change diagnosis.
What is the clinical utility of serologic tests in the diagnosis of celiac disease? The clinical utility was defined as the impact of the test on decision making.
What is the budget impact of serologic tests in the diagnosis of celiac disease?
What is the cost-effectiveness of serologic tests in the diagnosis of celiac disease?
Methods
Literature Search
A literature search was performed on November 13th, 2009 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1st 2003 and November 13th 2010. Abstracts were reviewed by a single reviewer and, for those studies meeting the eligibility criteria, full-text articles were obtained. Reference lists were also examined for any additional relevant studies not identified through the search. Articles with unknown eligibility were reviewed with a second clinical epidemiologist, then a group of epidemiologists until consensus was established. The quality of evidence was assessed as high, moderate, low or very low according to GRADE methodology.
Studies that evaluated diagnostic accuracy, i.e., both sensitivity and specificity of serology tests in the diagnosis of celiac disease.
Study population consisted of untreated patients with symptoms consistent with celiac disease.
Studies in which both serologic celiac disease tests and small bowel biopsy (gold standard) were used in all subjects.
Systematic reviews, meta-analyses, randomized controlled trials, prospective observational studies, and retrospective cohort studies.
At least 20 subjects included in the celiac disease group.
English language.
Human studies.
Studies published from 2000 on.
Clearly defined cut-off value for the serology test. If more than one test was evaluated, only those tests for which a cut-off was provided were included.
Description of small bowel biopsy procedure clearly outlined (location, number of biopsies per patient), unless if specified that celiac disease diagnosis guidelines were followed.
Patients in the treatment group had untreated CD.
Studies on screening of the general asymptomatic population.
Studies that evaluated rapid diagnostic kits for use either at home or in physician’s offices.
Studies that evaluated diagnostic modalities other than serologic tests such as capsule endoscopy, push enteroscopy, or genetic testing.
Cut-off for serologic tests defined based on controls included in the study.
Study population defined based on positive serology or subjects pre-screened by serology tests.
Celiac disease status known before study enrolment.
Sensitivity or specificity estimates based on repeated testing for the same subject.
Non-peer-reviewed literature such as editorials and letters to the editor.
Population
The population consisted of adults and children with untreated, undiagnosed celiac disease with symptoms consistent with the disease.
Serologic Celiac Disease Tests Evaluated
Anti-gliadin antibody (AGA)
Anti-endomysial antibody (EMA)
Anti-tissue transglutaminase antibody (tTG)
Anti-deamidated gliadin peptides antibody (DGP)
Combinations of some of the serologic tests listed above were evaluated in some studies
Both IgA and IgG antibodies were evaluated for the serologic tests listed above.
Outcomes of Interest
Sensitivity
Specificity
Positive and negative likelihood ratios
Diagnostic odds ratio (OR)
Area under the sROC curve (AUC)
Small bowel biopsy was used as the gold standard in order to estimate the sensitivity and specificity of each serologic test.
Statistical Analysis
Pooled estimates of sensitivity, specificity and diagnostic odds ratios (DORs) for the different serologic tests were calculated using a bivariate, binomial generalized linear mixed model. Statistical significance for differences in sensitivity and specificity between serologic tests was defined by P values less than 0.05, where “false discovery rate” adjustments were made for multiple hypothesis testing. The bivariate regression analyses were performed using SAS version 9.2 (SAS Institute Inc.; Cary, NC, USA). Using the bivariate model parameters, summary receiver operating characteristic (sROC) curves were produced using Review Manager 5.0.22 (The Nordiac Cochrane Centre, The Cochrane Collaboration, 2008). The area under the sROC curve (AUC) was estimated by bivariate mixed-efects binary regression modeling framework. Model specification, estimation and prediction are carried out with xtmelogit in Stata release 10 (Statacorp, 2007). Statistical tests for the differences in AUC estimates could not be carried out.
The study results were stratified according to patient or disease characteristics such as age, severity of Marsh grade abnormalities, among others, if reported in the studies. The literature indicates that the diagnostic accuracy of serologic tests for celiac disease may be affected in patients with chronic liver disease, therefore, the studies identified through the systematic literature review that evaluated the diagnostic accuracy of serologic tests for celiac disease in patients with chronic liver disease were summarized. The effect of the GFD in patiens diagnosed with celiac disease was also summarized if reported in the studies eligible for the analysis.
Summary of Findings
Published Systematic Reviews
Five systematic reviews of studies that evaluated the diagnostic accuracy of serologic celiac disease tests were identified through our literature search. Seventeen individual studies identified in adults and children were eligible for this evaluation.
In general, the studies included evaluated the sensitivity and specificity of at least one serologic test in subjects with symptoms consistent with celiac disease. The gold standard used to confirm the celiac disease diagnosis was small bowel biopsy. Serologic tests evaluated included tTG, EMA, AGA, and DGP, using either IgA or IgG antibodies. Indirect immunoflurorescence was used for the EMA serologic tests whereas enzyme-linked immunosorbent assay (ELISA) was used for the other serologic tests.
Common symptoms described in the studies were chronic diarrhea, abdominal pain, bloating, unexplained weight loss, unexplained anemia, and dermatitis herpetiformis.
The main conclusions of the published systematic reviews are summarized below.
IgA tTG and/or IgA EMA have a high accuracy (pooled sensitivity: 90% to 98%, pooled specificity: 95% to 99% depending on the pooled analysis).
Most reviews found that AGA (IgA or IgG) are not as accurate as IgA tTG and/or EMA tests.
A 2009 systematic review concluded that DGP (IgA or IgG) seems to have a similar accuracy compared to tTG, however, since only 2 studies identified evaluated its accuracy, the authors believe that additional data is required to draw firm conclusions.
Two systematic reviews also concluded that combining two serologic celiac disease tests has little contribution to the accuracy of the diagnosis.
MAS Analysis
Sensitivity
The pooled analysis performed by MAS showed that IgA tTG has a sensitivity of 92.1% [95% confidence interval (CI) 88.0, 96.3], compared to 89.2% (83.3, 95.1, p=0.12) for IgA DGP, 85.1% (79.5, 94.4, p=0.07) for IgA EMA, and 74.9% (63.6, 86.2, p=0.0003) for IgA AGA. Among the IgG-based tests, the results suggest that IgG DGP has a sensitivity of 88.4% (95% CI: 82.1, 94.6), 44.7% (30.3, 59.2) for tTG, and 69.1% (56.0, 82.2) for AGA. The difference was significant when IgG DGP was compared to IgG tTG but not IgG AGA. Combining serologic celiac disease tests yielded a slightly higher sensitivity compared to individual IgA-based serologic tests.
IgA deficiency
The prevalence of total or severe IgA deficiency was low in the studies identified varying between 0 and 1.7% as reported in 3 studies in which IgA deficiency was not used as a referral indication for celiac disease serologic testing. The results of IgG-based serologic tests were positive in all patients with IgA deficiency in which celiac disease was confirmed by small bowel biopsy as reported in four studies.
Specificity
The MAS pooled analysis indicates a high specificity across the different serologic tests including the combination strategy, pooled estimates ranged from 90.1% to 98.7% depending on the test.
Likelihood Ratios
According to the likelihood ratio estimates, both IgA tTG and serologic test combinationa were considered very useful tests (positive likelihood ratio above ten and the negative likelihood ratio below 0.1).
Moderately useful tests included IgA EMA, IgA DGP, and IgG DGP (positive likelihood ratio between five and ten and the negative likelihood ratio between 0.1 and 0.2).
Somewhat useful tests: IgA AGA, IgG AGA, generating small but sometimes important changes from pre- to post-test probability (positive LR between 2 and 5 and negative LR between 0.2 and 0.5)
Not Useful: IgG tTG, altering pre- to post-test probability to a small and rarely important degree (positive LR between 1 and 2 and negative LR between 0.5 and 1).
Diagnostic Odds Ratios (DOR)
Among the individual serologic tests, IgA tTG had the highest DOR, 136.5 (95% CI: 51.9, 221.2). The statistical significance of the difference in DORs among tests was not calculated, however, considering the wide confidence intervals obtained, the differences may not be statistically significant.
Area Under the sROC Curve (AUC)
The sROC AUCs obtained ranged between 0.93 and 0.99 for most IgA-based tests with the exception of IgA AGA, with an AUC of 0.89.
Sensitivity and Specificity of Serologic Tests According to Age Groups
Serologic test accuracy did not seem to vary according to age (adults or children).
Sensitivity and Specificity of Serologic Tests According to Marsh Criteria
Four studies observed a trend towards a higher sensitivity of serologic celiac disease tests when Marsh 3c grade abnormalities were found in the small bowel biopsy compared to Marsh 3a or 3b (statistical significance not reported). The sensitivity of serologic tests was much lower when Marsh 1 grade abnormalities were found in small bowel biopsy compared to Marsh 3 grade abnormalities. The statistical significance of these findings were not reported in the studies.
Diagnostic Accuracy of Serologic Celiac Disease Tests in Subjects with Chronic Liver Disease
A total of 14 observational studies that evaluated the specificity of serologic celiac disease tests in subjects with chronic liver disease were identified. All studies evaluated the frequency of false positive results (1-specificity) of IgA tTG, however, IgA tTG test kits using different substrates were used, i.e., human recombinant, human, and guinea-pig substrates. The gold standard, small bowel biopsy, was used to confirm the result of the serologic tests in only 5 studies. The studies do not seem to have been designed or powered to compare the diagnostic accuracy among different serologic celiac disease tests.
The results of the studies identified in the systematic literature review suggest that there is a trend towards a lower frequency of false positive results if the IgA tTG test using human recombinant substrate is used compared to the guinea pig substrate in subjects with chronic liver disease. However, the statistical significance of the difference was not reported in the studies. When IgA tTG with human recombinant substrate was used, the number of false positives seems to be similar to what was estimated in the MAS pooled analysis for IgA-based serologic tests in a general population of patients. These results should be interpreted with caution since most studies did not use the gold standard, small bowel biopsy, to confirm or exclude the diagnosis of celiac disease, and since the studies were not designed to compare the diagnostic accuracy among different serologic tests. The sensitivity of the different serologic tests in patients with chronic liver disease was not evaluated in the studies identified.
Effects of a Gluten-Free Diet (GFD) in Patients Diagnosed with Celiac Disease
Six studies identified evaluated the effects of GFD on clinical, histological, or serologic improvement in patients diagnosed with celiac disease. Improvement was observed in 51% to 95% of the patients included in the studies.
Grading of Evidence
Overall, the quality of the evidence ranged from moderate to very low depending on the serologic celiac disease test. Reasons to downgrade the quality of the evidence included the use of a surrogate endpoint (diagnostic accuracy) since none of the studies evaluated clinical outcomes, inconsistencies among study results, imprecise estimates, and sparse data. The quality of the evidence was considered moderate for IgA tTg and IgA EMA, low for IgA DGP, and serologic test combinations, and very low for IgA AGA.
Clinical Validity and Clinical Utility of Serologic Testing in the Diagnosis of Celiac Disease
The clinical validity of serologic tests in the diagnosis of celiac disease was considered high in subjects with symptoms consistent with this disease due to
High accuracy of some serologic tests.
Serologic tests detect possible celiac disease cases and avoid unnecessary small bowel biopsy if the test result is negative, unless an endoscopy/ small bowel biopsy is necessary due to the clinical presentation.
Serologic tests support the results of small bowel biopsy.
The clinical utility of serologic tests for the diagnosis of celiac disease, as defined by its impact in decision making was also considered high in subjects with symptoms consistent with this disease given the considerations listed above and since celiac disease diagnosis leads to treatment with a gluten-free diet.
Economic Analysis
A decision analysis was constructed to compare costs and outcomes between the tests based on the sensitivity, specificity and prevalence summary estimates from the MAS Evidence-Based Analysis (EBA). A budget impact was then calculated by multiplying the expected costs and volumes in Ontario. The outcome of the analysis was expected costs and false negatives (FN). Costs were reported in 2010 CAD$. All analyses were performed using TreeAge Pro Suite 2009.
Four strategies made up the efficiency frontier; IgG tTG, IgA tTG, EMA and small bowel biopsy. All other strategies were dominated. IgG tTG was the least costly and least effective strategy ($178.95, FN avoided=0). Small bowel biopsy was the most costly and most effective strategy ($396.60, FN avoided =0.1553). The cost per FN avoided were $293, $369, $1,401 for EMA, IgATTG and small bowel biopsy respectively. One-way sensitivity analyses did not change the ranking of strategies.
All testing strategies with small bowel biopsy are cheaper than biopsy alone however they also result in more FNs. The most cost-effective strategy will depend on the decision makers’ willingness to pay. Findings suggest that IgA tTG was the most cost-effective and feasible strategy based on its Incremental Cost-Effectiveness Ratio (ICER) and convenience to conduct the test. The potential impact of IgA tTG test in the province of Ontario would be $10.4M, $11.0M and $11.7M respectively in the following three years based on past volumes and trends in the province and basecase expected costs.
The panel of tests is the commonly used strategy in the province of Ontario therefore the impact to the system would be $13.6M, $14.5M and $15.3M respectively in the next three years based on past volumes and trends in the province and basecase expected costs.
Conclusions
The clinical validity and clinical utility of serologic tests for celiac disease was considered high in subjects with symptoms consistent with this disease as they aid in the diagnosis of celiac disease and some tests present a high accuracy.
The study findings suggest that IgA tTG is the most accurate and the most cost-effective test.
AGA test (IgA) has a lower accuracy compared to other IgA-based tests
Serologic test combinations appear to be more costly with little gain in accuracy. In addition there may be problems with generalizability of the results of the studies included in this review if different test combinations are used in clinical practice.
IgA deficiency seems to be uncommon in patients diagnosed with celiac disease.
The generalizability of study results is contingent on performing both the serologic test and small bowel biopsy in subjects on a gluten-containing diet as was the case in the studies identified, since the avoidance of gluten may affect test results.
PMCID: PMC3377499  PMID: 23074399
5.  Covariate adjustment in estimating the area under ROC curve with partially missing gold standard 
Biometrics  2013;69(1):91-100.
Summary
In ROC analysis, covariate adjustment is advocated when the covariates impact the magnitude or accuracy of the test under study. Meanwhile, for many large scale screening tests, the true condition status may be subject to missingness because it is expensive and/or invasive to ascertain the disease status. The complete-case analysis may end up with a biased inference, also known as “verification bias”. To address the issue of covariate adjustment with verification bias in ROC analysis, we propose several estimators for the area under the covariate-specific and covariate-adjusted ROC curves (AUCx and AAUC). The AUCx is directly modelled in the form of binary regression, and the estimating equations are based on the U statistics. The AAUC is estimated from the weighted average of AUCx over the covariate distribution of the diseased subjects. We employ reweighting and imputation techniques to overcome the verification bias problem. Our proposed estimators are initially derived assuming that the true disease status is missing at random (MAR), and then with some modification, the estimators can be extended to the not-missing-at-random (NMAR) situation. The asymptotic distributions are derived for the proposed estimators. The finite sample performance is evaluated by a series of simulation studies. Our method is applied to a data set in Alzheimer's disease research.
doi:10.1111/biom.12001
PMCID: PMC3622116  PMID: 23410529
Alzheimer's disease; area under ROC curve; covariate adjustment; U statistics; verification bias; weighted estimating equations
6.  Direct Estimation of the Area Under the Receiver Operating Characteristic Curve in the Presence of Verification Bias 
Statistics in medicine  2009;28(3):361-376.
SUMMARY
The area under a receiver operating characteristic (ROC) curve (AUC) is a commonly used index for summarizing the ability of a continuous diagnostic test to discriminate between healthy and diseased subjects. If all subjects have their true disease status verified, one can directly estimate the AUC nonparametrically using the Wilcoxon statistic. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Because estimators of the AUC based only on verified subjects are typically biased, it is common to estimate the AUC from a bias-corrected ROC curve. The variance of the estimator, however, does not have a closed-form expression and thus resampling techniques are used to obtain an estimate. In this paper, we develop a new method for directly estimating the AUC in the setting of verification bias based on U-statistics and inverse probability weighting. Closed-form expressions for the estimator and its variance are derived. We also show that the new estimator is equivalent to the empirical AUC derived from the bias-corrected ROC curve arising from the inverse probability weighting approach.
doi:10.1002/sim.3388
PMCID: PMC2626141  PMID: 18680124
Diagnostic test; Inverse probability weighting; Missing at random; U-statistic
7.  Semiparametric estimation of the covariate-specific ROC curve in presence of ignorable veri cation bias 
Biometrics  2011;67(3):906-916.
Summary
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
doi:10.1111/j.1541-0420.2011.01562.x
PMCID: PMC3596883  PMID: 21361890
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
8.  Diagnosing Severe Falciparum Malaria in Parasitaemic African Children: A Prospective Evaluation of Plasma PfHRP2 Measurement 
PLoS Medicine  2012;9(8):e1001297.
Arjen Dondorp and colleagues investigate whether the plasma level of Plasmodium falciparum histidine-rich protein 2 can be used to distinguish between severe malaria and other severe febrile illness in African children with malaria.
Background
In African children, distinguishing severe falciparum malaria from other severe febrile illnesses with coincidental Plasmodium falciparum parasitaemia is a major challenge. P. falciparum histidine-rich protein 2 (PfHRP2) is released by mature sequestered parasites and can be used to estimate the total parasite burden. We investigated the prognostic significance of plasma PfHRP2 and used it to estimate the malaria-attributable fraction in African children diagnosed with severe malaria.
Methods and Findings
Admission plasma PfHRP2 was measured prospectively in African children (from Mozambique, The Gambia, Kenya, Tanzania, Uganda, Rwanda, and the Democratic Republic of the Congo) aged 1 month to 15 years with severe febrile illness and a positive P. falciparum lactate dehydrogenase (pLDH)-based rapid test in a clinical trial comparing parenteral artesunate versus quinine (the AQUAMAT trial, ISRCTN 50258054). In 3,826 severely ill children, Plasmadium falciparum PfHRP2 was higher in patients with coma (p = 0.0209), acidosis (p<0.0001), and severe anaemia (p<0.0001). Admission geometric mean (95%CI) plasma PfHRP2 was 1,611 (1,350–1,922) ng/mL in fatal cases (n = 381) versus 1,046 (991–1,104) ng/mL in survivors (n = 3,445, p<0.0001), without differences in parasitaemia as assessed by microscopy. There was a U-shaped association between log10 plasma PfHRP2 and risk of death. Mortality increased 20% per log10 increase in PfHRP2 above 174 ng/mL (adjusted odds ratio [AOR] 1.21, 95%CI 1.05–1.39, p = 0.009). A mechanistic model assuming a PfHRP2-independent risk of death in non-malaria illness closely fitted the observed data and showed malaria-attributable mortality less than 50% with plasma PfHRP2≤174 ng/mL. The odds ratio (OR) for death in artesunate versus quinine-treated patients was 0.61 (95%CI 0.44–0.83, p = 0.0018) in the highest PfHRP2 tertile, whereas there was no difference in the lowest tertile (OR 1.05; 95%CI 0.69–1.61; p = 0.82). A limitation of the study is that some conclusions are drawn from a mechanistic model, which is inherently dependent on certain assumptions. However, a sensitivity analysis of the model indicated that the results were robust to a plausible range of parameter estimates. Further studies are needed to validate our findings.
Conclusions
Plasma PfHRP2 has prognostic significance in African children with severe falciparum malaria and provides a tool to stratify the risk of “true” severe malaria-attributable disease as opposed to other severe illnesses in parasitaemic African children.
Please see later in the article for the Editors' Summary.
Editors' Summary
Background
Malaria is a life-threatening disease caused by parasites that are transmitted to people through the bites of infected mosquitoes. In 2010, malaria caused an estimated 655,000 deaths worldwide, mostly in Africa, where according to the World Health Organization, one African child dies every minute from the disease. There are four Plasmodium parasite species that cause malaria in humans, with one species, Plasmodium falciparum, causing the most severe disease. However, diagnosing severe falciparum malaria in children living in endemic areas is problematic, as many semi-immune children may have the malaria parasites in their blood (described as being parasitaemic) but do not have clinical disease. Therefore, a positive malaria blood smear may be coincidental and not be diagnostic of severe malaria, and unfortunately, neither are the clinical symptoms of severe malaria, such as shock, acidosis, or coma, which can also be caused by other childhood infections. For these reasons, the misdiagnosis of falciparum malaria in severely ill children is an important problem in sub-Saharan Africa, and may result in unnecessary child deaths.
Why Was This Study Done?
Previous studies have suggested that a parasite protein—P. falciparum histidine-rich protein-2 (PfHRP2)—is a measure of the total number of parasites in the patient. Unlike the circulating parasites detected on a blood film, which do not represent the parasites that get stuck in vital organs, PfHRP2 is distributed equally through the total blood plasma volume, and so can be considered a measure of the total parasite burden in the previous 48 hours. So in this study, the researchers assessed the prognostic value of plasma PfHRP2 in African children with severe malaria and whether this protein could distinguish children who really do have severe malaria from those who have severe febrile illness but coincidental parasitaemia, who may have another infection.
What Did the Researchers Do and Find?
The researchers assessed levels of plasma PfHRP2 in 3,826 out of a possible 5,425 African children who participated in a large multinational trial (in Mozambique, The Gambia, Rwanda, Tanzania, Kenya, Uganda, and the Democratic Republic of Congo) that compared the anti-malarial drugs quinine and artesunate for the treatment of severe malaria. All children had a clinical diagnosis of severe malaria confirmed by a rapid diagnostic test, and the researchers used clinical signs to define the severity of malaria. The researchers assessed the relationship between plasma PfHRP2 concentrations and risk of death taking other well established predictors of death, such as coma, convulsions, hypoglycaemia, respiratory distress, and shock, into account.
The researchers found that PfHRP2 was detectable in 3,800/3,826 (99%) children with severe malaria and that the average plasma PfHRP2 levels was significantly higher in the 381 children who died from malaria than in children who survived (1,611 ng/mL versus 1,046 ng/mL). Plasma PfHRP2 was also significantly higher in children with severe malaria signs and symptoms such as coma, acidosis, and severe anaemia. Importantly, the researchers found that high death rates were associated with either very low or very high values of plasma PfHRP2: the odds (chance) of death were 20% higher per unit increase in PfHRP2 above a specific threshold (174 ng/ml), but below this concentration, the risk of death increased with decreasing levels, probably because at lower levels disease was caused by a severe febrile disease other than malaria, like septicemia. Finally, the researchers found that in children within the highest PfHRP2 tertile, the chance of death when treated with the antimalarial drug artesunate versus quinine was 0.61 but that there was no difference in death rates in the lowest tertile, which supports that patients with very low plasma PfHRP2 have a different severe febrile illness than malaria. The researchers use mathematical modeling to provide cut-off values for plasma PfHRP2 denoting the proportion of patients with a diagnosis other than severe malaria.
What Do These Findings Mean?
These findings suggest that in areas of moderate or high malaria transmission where a high proportion of children are parasitaemic, plasma PfHRP2 levels taken on admission to hospital can differentiate children at highest risk of death from severe falciparum malaria from those likely to have alternative causes of severe febrile illness. Therefore, plasma PfHRP2 could be considered a valuable additional diagnostic tool and prognostic indicator in African children with severe falciparum malaria. This finding is important for clinicians treating children with severe febrile illnesses in malaria-endemic countries: while high levels of plasma PfHRP2 is indicative of severe malaria which needs urgent antimalarial treatment, low levels suggest that another severe infective disease should be considered, warranting additional investigations and urgent treatment with antibiotics.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001297.
A previous small study in PLOS ONE explores the relationship between plasma PfHRP2 and severe malaria in Tanzanian children
The WHO website and the website of Malaria No More have comprehensive information about malaria
doi:10.1371/journal.pmed.1001297
PMCID: PMC3424256  PMID: 22927801
9.  Optimal cut-off criteria for duplex ultrasound for the diagnosis of restenosis in stented carotid arteries: Review and protocol for a diagnostic study 
BMC Neurology  2009;9:36.
Background
Carotid angioplasty with stenting is a relatively new, increasingly used, less-invasive treatment for the treatment of symptomatic carotid artery stenosis. It is being evaluated in ongoing and nearly finished randomized trials. An important factor in the evaluation of stents is the occurrence of in-stent restenosis. An un-stented carotid artery is likely to have a more elastic vessel wall than a stented one, even if stenosis is present. Therefore, duplex ultrasound cut-off criteria for the degrees of an in-stent stenosis, based on blood velocity parameters, are probably different from the established cut-offs used for un-stented arteries. Routine criteria can not be applied to stented arteries but new criteria need to be established for this particular purpose.
Methods/Design
Current literature was systematically reviewed. From the selected studies, the following data were extracted: publication year, population size, whether the study was prospective, duplex ultrasound cut-off criteria reported, which reference test was used, and if there was an indication for selection bias and for verification bias in particular. Previous studies often were retrospective, or the reference test (DSA or CTA) was carried out only when a patient was suspected of having restenosis at DUS, which may result in verification bias.
Results
In general, the velocity cut-off values for stenosis measurements in stented arteries were higher than those reported for unstented arteries. Previous studies often were retrospective, or the reference test (DSA or CTA) was carried out only when a patient was suspected of having restenosis at DUS, which may result in verification bias.
Discussion
To address the deficiencies of the existing studies, we propose a prospective cohort study nested within the International Carotid Stenting Study (ICSS), an international multi-centre trial in which over 1,700 patients have been randomised between stenting and CEA. In this cohort we will enrol a minimum of 300 patients treated with a stent. All patients undergo regular DUS examination at the yearly follow-up visit according to the ICSS protocol. To avoid verification bias, an additional computed tomography angiography (CTA) will be performed as a reference test in all consecutive patients, regardless of the degree of stenosis on the initial DUS test.
doi:10.1186/1471-2377-9-36
PMCID: PMC2722571  PMID: 19624830
10.  A systematic approach to statistical analysis in dosimetry and patient-specific IMRT plan verification measurements 
Purpose
In the presence of random uncertainties, delivered radiation treatment doses in patient likely exhibit a statistical distribution. The expected dose and variance of this distribution are unknown and are most likely not equal to the planned value since the current treatment planning systems cannot exactly model and simulate treatment machine. Relevant clinical questions are 1) how to quantitatively estimate the expected delivered dose and extrapolate the expected dose to the treatment dose over a treatment course and 2) how to evaluate the treatment dose relative to the corresponding planned dose. This study is to present a systematic approach to address these questions and to apply this approach to patient-specific IMRT (PSIMRT) plan verifications.
Methods
The expected delivered dose in patient and variance are quantitatively estimated using Student T distribution and Chi Distribution, respectively, based on pre-treatment QA measurements. Relationships between the expected dose and the delivered dose over a treatment course and between the expected dose and the planned dose are quantified with mathematical formalisms. The requirement and evaluation of the pre-treatment QA measurement results are also quantitatively related to the desired treatment accuracy and to the to-be-delivered treatment course itself. The developed methodology was applied to PSIMRT plan verification procedures for both QA result evaluation and treatment quality estimation.
Results
Statistically, the pre-treatment QA measurement process was dictated not only by the corresponding plan but also by the delivered dose deviation, number of measurements, treatment fractionation, potential uncertainties during patient treatment, and desired treatment accuracy tolerance. For the PSIMRT QA procedures, in theory, more than one measurement had to be performed to evaluate whether the to-be-delivered treatment course would meet the desired dose coverage and treatment tolerance.
Conclusion
By acknowledging and considering the statistical nature of multi-fractional delivery of radiation treatment, we have established a quantitative methodology to evaluate the PSIMRT QA results. Both the statistical parameters associated with the QA measurement procedure and treatment course need to be taken into account to evaluate the QA outcome and to determine whether the plan is acceptable and whether additional measures should be taken to reduce treatment uncertainties. The result from a single QA measurement without the appropriate statistical analysis can be misleading. When the required number of measurements is comparable to the planned number of fractions and the variance is unacceptably high, action must be taken to either modify the plan or adjust the beam delivery system.
doi:10.1186/1748-717X-8-225
PMCID: PMC3852372  PMID: 24074185
Dose measurement; IMRT QA; Uncertainty; Statistical analysis
11.  Disease Models for Event Prediction 
Objective
The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event.
Introduction
One of the primary goals of this research was to characterize the viability of biosurveillance models to provide operationally relevant information to decision makers, in order to identify areas for future research. Two critical characteristics differentiate this work from other infectious disease modeling reviews [1,2]. First, we reviewed models that attempted to predict the disease event, not merely its transmission dynamics. Second, we considered models involving pathogens of concern as determined by the US National Select Agent Registry.
Background: A rich and diverse field of infectious disease modeling has emerged over the past 60 years and has advanced our understanding of population- and individual-level disease transmission dynamics, including risk factors, virulence and spatio-temporal patterns of disease spread. Recent modeling advances include biostatistical methods, and massive agent-based population, biophysical, ordinary differential equation, and ecological-niche models. Diverse data sources are being integrated into these models as well, such as demographics, remotely-sensed measurements and imaging, environmental measurements, and surrogate data such as news alerts and social media. Yet, there remains a gap in the sensitivity and specificity of these models not only in tracking infectious disease events but also predicting their occurrence.
Methods
We searched dozens of commercial and government databases and harvested Google search results for eligible models utilizing terms and phrases provided by public health analysts relating to biosurveillance, remote sensing, risk assessments, spatial epidemiology, and ecological niche-modeling, This returned 13,767 webpages and 12,152 citations. After de-duplication and removal of extraneous material, a core collection of 6,503 items was established, these publications and their abstracts are presented in a semantic wiki at http://BioCat.pnnl.gov. Next, PNNL’s IN-SPIRE visual analytics software was used to cross-correlate these publications with the definition for a biosurveillance model. As a result, we systematically reviewed 44 papers, and the results are presented in this analysis.
Results
The models were classified as one or more of the following types: event forecast (9%), spatial (59%), ecological niche (64%), diagnostic or clinical (14%), spread or response (20%), and reviews (7%). The distribution of transmission modes in the models was: direct contact (55%), vector-borne (34%), water- or soil-borne (16%), and non-specific (7%). The parameters (e.g., etiology, cultural) and data sources (e.g., remote sensing, NGO, epidemiological) for each model were recorded. A highlight of this review is the analysis of verification and validation procedures employed by (and reported for) each model, if any. All models were classified as either a) Verified or Validated (89%), or b) Not Verified or Validated (11%; which for the purposes of this review was considered a standalone category).
Conclusions
The verification and validation (V&V) of these models is discussed in detail. The vast majority of models studied were verified or validated in some form or another, which was a surprising observation made from this portion of the study. We subsequently focused on those models which were not verified or validated in an attempt to identify why this information was missing. One reason may be that the V&V was simply not reported upon within the paper reviewed for those models. A positive observation was the significant use of real epidemiological data to validate the models. Even though ‘Validation using Spatially and Temporally Independent Data’ was one of the smallest classification groups, validation through the use of actual data versus predicted data represented approximately 33% of these models. We close with initial recommended operational readiness level guidelines, based on established Technology Readiness Level definitions.
PMCID: PMC3692832
Disease models; Event prediction; Operational readiness
12.  Estimating the Number of Paediatric Fevers Associated with Malaria Infection Presenting to Africa's Public Health Sector in 2007 
PLoS Medicine  2010;7(7):e1000301.
Peter Gething and colleagues compute the number of fevers likely to present to public health facilities in Africa and the estimated number of these fevers likely to be infected with Plasmodium falciparum malaria parasites.
Background
As international efforts to increase the coverage of artemisinin-based combination therapy in public health sectors gather pace, concerns have been raised regarding their continued indiscriminate presumptive use for treating all childhood fevers. The availability of rapid-diagnostic tests to support practical and reliable parasitological diagnosis provides an opportunity to improve the rational treatment of febrile children across Africa. However, the cost effectiveness of diagnosis-based treatment polices will depend on the presumed numbers of fevers harbouring infection. Here we compute the number of fevers likely to present to public health facilities in Africa and the estimated number of these fevers likely to be infected with Plasmodium falciparum malaria parasites.
Methods and Findings
We assembled first administrative-unit level data on paediatric fever prevalence, treatment-seeking rates, and child populations. These data were combined in a geographical information system model that also incorporated an adjustment procedure for urban versus rural areas to produce spatially distributed estimates of fever burden amongst African children and the subset likely to present to public sector clinics. A second data assembly was used to estimate plausible ranges for the proportion of paediatric fevers seen at clinics positive for P. falciparum in different endemicity settings. We estimated that, of the 656 million fevers in African 0–4 y olds in 2007, 182 million (28%) were likely to have sought treatment in a public sector clinic of which 78 million (43%) were likely to have been infected with P. falciparum (range 60–103 million).
Conclusions
Spatial estimates of childhood fevers and care-seeking rates can be combined with a relational risk model of infection prevalence in the community to estimate the degree of parasitemia in those fevers reaching public health facilities. This quantification provides an important baseline comparison of malarial and nonmalarial fevers in different endemicity settings that can contribute to ongoing scientific and policy debates about optimum clinical and financial strategies for the introduction of new diagnostics. These models are made publicly available with the publication of this paper.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Malaria —an infectious parasitic disease transmitted to people through the bite of an infected mosquito —kills about one million people (mainly children living in sub-Saharan Africa) every year. Although several parasites cause malaria, Plasmodium falciparum is responsible for most of these deaths. For the past 50 years, the main treatments for P. falciparum malaria have been chloroquine and sulfadoxine/pyrimethamine. Unfortunately, parasitic resistance to these “monotherapies” is now widespread and there has been a global upsurge in the illness and deaths caused by P. falciparum. To combat this increase, the World Health Organization recommends artemisinin combination therapy (ACT) for P. falciparum malaria in all regions with drug-resistant malaria. In ACT, artemisinin derivatives (new, fast-acting antimalarial drugs) are used in combination with another antimalarial to reduce the chances of P. falciparum becoming resistant to either drug.
Why Was This Study Done?
All African countries at risk of P. falciparum have now adopted ACT as first-line therapy for malaria in their public clinics. However, experts are concerned that ACT is often given to children who don't actually have malaria because, in many parts of Africa, health care workers assume that all childhood fevers are malaria. This practice, which became established when diagnostic facilities for malaria were very limited, increases the chances of P. falciparum becoming resistant to ACT, wastes limited drug stocks, and means that many ill children are treated inappropriately. Recently, however, rapid diagnostic tests for malaria have been developed and there have been calls to expand their use to improve the rational treatment of African children with fever. Before such an expansion is initiated, it is important to know how many African children develop fever each year, how many of these ill children attend public clinics, and what proportion of them is likely to have malaria. Unfortunately, this type of information is incompletely or unreliably collected in many parts of Africa. In this study, therefore, the researchers use a mathematical model to estimate the number of childhood fevers associated with malaria infection that presented to Africa's public clinics in 2007 from survey data.
What Did the Researchers Do and Find?
The researchers used survey data on the prevalence (the proportion of a population with a specific disease) of childhood fever and on treatment-seeking behavior and data on child populations to map the distribution of fever among African children and the likelihood of these children attending public clinics for treatment. They then used a recent map of the distribution of P. falciparum infection risk to estimate what proportion of children with fever who attended clinics were likely to have had malaria in different parts of Africa. In 2007, the researchers estimate, 656 million cases of fever occurred in 0–4-year-old African children, 182 million were likely to have sought treatment in a public clinic, and 78 million (just under half of the cases that attended a clinic with fever) were likely to have been infected with P. falciparum. Importantly, there were marked geographical differences in the likelihood of children with fever presenting at public clinics being infected with P. falciparum. So, for example, whereas nearly 60% of the children attending public clinics with fever in Burkino Faso were likely to have had malaria, only 15% of similar children in Kenya were likely to have had this disease.
What Do These Findings Mean?
As with all mathematical models, the accuracy of these findings depends on the assumptions included in the model and on the data fed into it. Nevertheless, these findings provide a map of the prevalence of malarial and nonmalarial childhood fevers across sub-Saharan Africa and an indication of how many of the children with fever reaching public clinics are likely to have malaria and would therefore benefit from ACT. The finding that in some countries more than 80% of children attending public clinics with fever probably don't have malaria highlights the potential benefits of introducing rapid diagnostic testing for malaria. Furthermore, these findings can now be used to quantify the resources needed for and the potential clinical benefits of different policies for the introduction of rapid diagnostic testing for malaria across Africa.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000301.
Information is available from the World Health Organization on malaria (in several languages) and on rapid diagnostic tests for malaria
The US Centers for Disease Control and Prevention provide information on malaria (in English and Spanish)
MedlinePlus provides links to additional information on malaria (in English and Spanish)
Information on the global mapping of malaria is available at the Malaria Atlas Project
Information is available from the Roll Back Malaria Partnership on the global control of malaria (in English and French) and on artemisinin combination therapy
doi:10.1371/journal.pmed.1000301
PMCID: PMC2897768  PMID: 20625548
13.  Estimates of sensitivity and specificity can be biased when reporting the results of the second test in a screening trial conducted in series 
Background
Cancer screening reduces cancer mortality when early detection allows successful treatment of otherwise fatal disease. There are a variety of trial designs used to find the best screening test. In a series screening trial design, the decision to conduct the second test is based on the results of the first test. Thus, the estimates of diagnostic accuracy for the second test are conditional, and may differ from unconditional estimates. The problem is further complicated when some cases are misclassified as non-cases due to incomplete disease status ascertainment.
Methods
For a series design, we assume that the second screening test is conducted only if the first test had negative results. We derive formulae for the conditional sensitivity and specificity of the second test in the presence of differential verification bias. For comparison, we also derive formulae for the sensitivity and specificity for a single test design, both with and without differential verification bias.
Results
Both the series design and differential verification bias have strong effects on estimates of sensitivity and specificity. In both the single test and series designs, differential verification bias inflates estimates of sensitivity and specificity. In general, for the series design, the inflation is smaller than that observed for a single test design.
The degree of bias depends on disease prevalence, the proportion of misclassified cases, and on the correlation between the test results for cases. As disease prevalence increases, the observed conditional sensitivity is unaffected. However, there is an increasing upward bias in observed conditional specificity. As the proportion of correctly classified cases increases, the upward bias in observed conditional sensitivity and specificity decreases. As the agreement between the two screening tests becomes stronger, the upward bias in observed conditional sensitivity decreases, while the specificity bias increases.
Conclusions
In a series design, estimates of sensitivity and specificity for the second test are conditional estimates. These estimates must always be described in context of the design of the trial, and the study population, to prevent misleading comparisons. In addition, these estimates may be biased by incomplete disease status ascertainment.
doi:10.1186/1471-2288-10-3
PMCID: PMC2819240  PMID: 20064254
14.  Rapid Diagnosis of Tuberculosis with the Xpert MTB/RIF Assay in High Burden Countries: A Cost-Effectiveness Analysis 
PLoS Medicine  2011;8(11):e1001120.
A cost-effectiveness study by Frank Cobelens and colleagues reveals that Xpert MTB/RIF is a cost-effective method of tuberculosis diagnosis that is suitable for use in low- and middle-income settings.
Background
Xpert MTB/RIF (Xpert) is a promising new rapid diagnostic technology for tuberculosis (TB) that has characteristics that suggest large-scale roll-out. However, because the test is expensive, there are concerns among TB program managers and policy makers regarding its affordability for low- and middle-income settings.
Methods and Findings
We estimate the impact of the introduction of Xpert on the costs and cost-effectiveness of TB care using decision analytic modelling, comparing the introduction of Xpert to a base case of smear microscopy and clinical diagnosis in India, South Africa, and Uganda. The introduction of Xpert increases TB case finding in all three settings; from 72%–85% to 95%–99% of the cohort of individuals with suspected TB, compared to the base case. Diagnostic costs (including the costs of testing all individuals with suspected TB) also increase: from US$28–US$49 to US$133–US$146 and US$137–US$151 per TB case detected when Xpert is used “in addition to” and “as a replacement of” smear microscopy, respectively. The incremental cost effectiveness ratios (ICERs) for using Xpert “in addition to” smear microscopy, compared to the base case, range from US$41–$110 per disability adjusted life year (DALY) averted. Likewise the ICERS for using Xpert “as a replacement of” smear microscopy range from US$52–$138 per DALY averted. These ICERs are below the World Health Organization (WHO) willingness to pay threshold.
Conclusions
Our results suggest that Xpert is a cost-effective method of TB diagnosis, compared to a base case of smear microscopy and clinical diagnosis of smear-negative TB in low- and middle-income settings where, with its ability to substantially increase case finding, it has important potential for improving TB diagnosis and control. The extent of cost-effectiveness gain to TB programmes from deploying Xpert is primarily dependent on current TB diagnostic practices. Further work is required during scale-up to validate these findings.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Tuberculosis (TB) is a bacterial disease that infects one-third of the world's population. The disease is caused by Mycobacterium tuberculosis, a bacterium that most commonly infects the lungs (known as pulmonary TB) and is transmitted from person to person when an infected individual coughs, sneezes, or talks. The symptoms of TB include chest pain, weight loss, fever, and a persistent cough that sometimes contains blood. Only 5%–10% of people who are infected with TB become sick or infectious, but people with weakened immune systems, such as individuals who are HIV-positive, are more likely to develop the disease. TB is estimated to have killed 1.7 million people in 2009 and is currently the leading cause of death among people infected with HIV.
Why Was This Study Done?
Although TB can be treated with a six-month course of antibiotics, effectively diagnosing TB is not always straightforward and drug resistance is becoming an increasing problem. One of the most common and simple methods to diagnose TB is a technique called sputum smear microscopy, which involves examining matter from the lungs under a microscope for the presence of TB-causing bacteria. However, despite being cheap and relatively simple, the test does not always detect active TB (smear-negative) and cannot determine whether the TB-causing bacteria are resistant to antibiotics. The World Health Organization has recently endorsed a new rapid test, called Xpert MTB/RIF (referred to as Xpert), for the initial diagnosis of TB. The test uses DNA amplification methods to reliably and quickly detect TB and whether infecting bacteria are resistant to the antibiotic rifampicin. The new test is expensive so there are concerns that the test might not be cost-effective in low- and middle-income countries.
What Did the Researchers Do and Find?
The researchers used a technique called modeling to simulate the outcome of 10,000 individuals with suspected TB as they went through a hypothetical diagnostic and treatment pathway. The model compared the costs associated with the introduction of Xpert to a base case for two different scenarios. In the base case all individuals with suspected TB had two sputum smear microscopy examinations followed by clinical diagnosis if they were smear-negative. For the different scenarios Xpert was either used in addition to the two sputum smear microscopy examinations (if the patient was smear-negative) or Xpert was used as a replacement for sputum smear microscopy for all patients. Different input parameters, based on country-specific estimates, were applied so that the model reflected the implementation of Xpert in India, South Africa, and Uganda.
In the researcher's model the introduction of Xpert increased the proportion of TB-infected patients who were correctly diagnosed with TB in any of the settings. However, the cost per TB case detected increased by approximately US$100 in both scenarios. Although the cost of detection increased significantly, the cost of treatment increased only moderately because the number of false-positive cases was reduced. For example, the percentage of treatment costs spent on false-positive diagnoses in India was predicted to fall from 22% to 4% when Xpert was used to replace sputum smear microscopy. The model was used to calculate incremental cost effectiveness ratios (ICERs—the additional cost of each disability-adjusted life year [DALY] averted) for the different scenarios of Xpert implementation in the different settings. In comparison to the base case, introducing Xpert in addition to sputum smear microscopy produced ICERs ranging from US$41 to US$110 per DALY averted, while introducing Xpert instead of sputum smear microscopy yielded ICERs ranging from US$52 to US$138 per DALY averted.
What Do These Findings Mean?
The findings suggest that the implementation of Xpert in addition to, or instead of, sputum smear microscopy will be cost-effective in low- and middle-income countries. The calculated ICERs are below the World Health Organization's “willingness to pay threshold” for all settings. That is the incremental cost of each DALY averted by introduction of Xpert is below the gross domestic product per capita for each country ($1,134 for India, $5,786 South Africa, and $490 for Uganda in 2010). However, the authors note that achieving ICERs below the “willingness to pay threshold” does not necessarily mean that countries have the resources to implement the test. The researchers also note that there are limitations to their study; additional unknown costs associated with the scale-up of Xpert and some parameters, such as patient costs, were not included in the model. Although the model strongly suggests that Xpert will be cost-effective, the researchers caution that initial roll-out of Xpert should be carefully monitored and evaluated before full scale-up.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001120.
The World Health Organization provides information on all aspects of tuberculosis, including tuberculosis diagnostics and the Stop TB Partnership (some information is in several languages)
The US Centers for Disease Control and Prevention has information about tuberculosis, including information on the diagnosis of tuberculosis disease
MedlinePlus has links to further information about tuberculosis (in English and Spanish)
doi:10.1371/journal.pmed.1001120
PMCID: PMC3210757  PMID: 22087078
15.  Evidence of bias and variation in diagnostic accuracy studies 
Background
Studies with methodologic shortcomings can overestimate the accuracy of a medical test. We sought to determine and compare the direction and magnitude of the effects of a number of potential sources of bias and variation in studies on estimates of diagnostic accuracy.
Methods
We identified meta-analyses of the diagnostic accuracy of tests through an electronic search of the databases MEDLINE, EMBASE, DARE and MEDION (1999–2002). We included meta-analyses with at least 10 primary studies without preselection based on design features. Pairs of reviewers independently extracted study characteristics and original data from the primary studies. We used a multivariable meta-epidemiologic regression model to investigate the direction and strength of the association between 15 study features on estimates of diagnostic accuracy.
Results
We selected 31 meta-analyses with 487 primary studies of test evaluations. Only 1 study had no design deficiencies. The quality of reporting was poor in most of the studies. We found significantly higher estimates of diagnostic accuracy in studies with nonconsecutive inclusion of patients (relative diagnostic odds ratio [RDOR] 1.5, 95% confidence interval [CI] 1.0–2.1) and retrospective data collection (RDOR 1.6, 95% CI 1.1–2.2). The estimates were highest in studies that had severe cases and healthy controls (RDOR 4.9, 95% CI 0.6–37.3). Studies that selected patients based on whether they had been referred for the index test, rather than on clinical symptoms, produced significantly lower estimates of diagnostic accuracy (RDOR 0.5, 95% CI 0.3–0.9). The variance between meta-analyses of the effect of design features was large to moderate for type of design (cohort v. case–control), the use of composite reference standards and the use of differential verification; the variance was close to zero for the other design features.
Interpretation
Shortcomings in study design can affect estimates of diagnostic accuracy, but the magnitude of the effect may vary from one situation to another. Design features and clinical characteristics of patient groups should be carefully considered by researchers when designing new studies and by readers when appraising the results of such studies. Unfortunately, incomplete reporting hampers the evaluation of potential sources of bias in diagnostic accuracy studies.
doi:10.1503/cmaj.050090
PMCID: PMC1373751  PMID: 16477057
16.  On Estimating Diagnostic Accuracy From Studies With Multiple Raters and Partial Gold Standard Evaluation 
We are often interested in estimating sensitivity and specificity of a group of raters or a set of new diagnostic tests in situations in which gold standard evaluation is expensive or invasive. Numerous authors have proposed latent modeling approaches for estimating diagnostic error without a gold standard. Albert and Dodd showed that, when modeling without a gold standard, estimates of diagnostic error can be biased when the dependence structure between tests is misspecified. In addition, they showed that choosing between different models for this dependence structure is difficult in most practical situations. While these results caution against using these latent class models, the difficulties of obtaining gold standard verification remain a practical reality. We extend two classes of models to provide a compromise that collects gold standard information on a subset of subjects but incorporates information from both the verified and nonverified subjects during estimation. We examine the robustness of diagnostic error estimation with this approach and show that choosing between competing models is easier in this context. In our analytic work and simulations, we consider situations in which verification is completely at random as well as settings in which the probability of verification depends on the actual test results. We apply our methodological work to a study designed to estimate the diagnostic error of digital radiography for gastric cancer.
doi:10.1198/016214507000000329
PMCID: PMC2755302  PMID: 19802353
Diagnostic error; Latent class models; Misclassification; Semilatent class models
17.  Bias in trials comparing paired continuous tests can cause researchers to choose the wrong screening modality 
Background
To compare the diagnostic accuracy of two continuous screening tests, a common approach is to test the difference between the areas under the receiver operating characteristic (ROC) curves. After study participants are screened with both screening tests, the disease status is determined as accurately as possible, either by an invasive, sensitive and specific secondary test, or by a less invasive, but less sensitive approach. For most participants, disease status is approximated through the less sensitive approach. The invasive test must be limited to the fraction of the participants whose results on either or both screening tests exceed a threshold of suspicion, or who develop signs and symptoms of the disease after the initial screening tests.
The limitations of this study design lead to a bias in the ROC curves we call paired screening trial bias. This bias reflects the synergistic effects of inappropriate reference standard bias, differential verification bias, and partial verification bias. The absence of a gold reference standard leads to inappropriate reference standard bias. When different reference standards are used to ascertain disease status, it creates differential verification bias. When only suspicious screening test scores trigger a sensitive and specific secondary test, the result is a form of partial verification bias.
Methods
For paired screening tests with bivariate normally distributed scores, we give formulae and programs to quantify the effect of paired screening trial bias on a paired comparison of area under the curves. We fix the prevalence of disease, and the chance a diseased subject manifests signs and symptoms. We derive the formulas for true sensitivity and specificity, and those for the sensitivity and specificity observed by the study investigator.
Results
The observed area under the ROC curves is quite different from the true area under the ROC curves. The typical direction of the bias is a strong inflation in sensitivity, paired with a concomitant slight deflation of specificity.
Conclusion
In paired trials of screening tests, when area under the ROC curve is used as the metric, bias may lead researchers to make the wrong decision as to which screening test is better.
doi:10.1186/1471-2288-9-4
PMCID: PMC2657218  PMID: 19154609
18.  Biosurveillance Ecosystem (BSVE) Workflow Analysis 
Introduction
The Defense Threat Reduction Agency Chemical and Biological Technologies Directorate (DTRA CB) has initiated the Biosurveillance Ecosystem (BSVE) research and development program. Operational biosurveillance capability gaps were analyzed and the required characteristics of new technology were outlined, the results of which will be described in this contribution.
Methods
Work process flow diagrams, with associated explanations and historical examples, were developed based on in-person, structured interviews with public health and preventative medicine analysts from a variety of Department of Defense (DoD) organizations, and with one organization in the Department of Health and Human Services (DHHS) and with a major U.S. city health department. The particular nuanced job characteristics of each organization were documented and subsequently validated with the individual analysts. Additionally, the commonalities across different organizations were described in meta-workflow diagrams and descriptions.
Results
Two meta-workflows were evident from the analysis. In the first type, epidemiologists identify and characterize health-impacting events, determine their cause, and determine community-level responses to the event. Analysts of this type monitor information (primarily statistical case information) from syndromic or disease reporting system or other sources to determine whether there are any unusual diseases or clusters of disease outbreaks in the jurisdiction. This workflow involved three consecutive processes: triage, analysis and reporting. Investigation and response processes to disease outbreaks are both parallel and overlapping in many circumstances. In the second meta-workflow type, analysts monitor for a potential health event through text-based sources and data reports within their particular area of responsibility. This surveillance activity is often interspersed with other activities required of their job. They may generate a daily/weekly/monthly report or only report when an event is detected that requires notification/response. There are similar triage, analysis and reporting workflow stages to the first meta-workflow type, but in contrast these analysts are focused on informing leadership and response in the form of policy modification. They are also subject to answering leadership-driven biosurveillance queries.
Conclusions
In these interviews, analysts described the shortcomings of various technologies that they use, or technology features that they wish were available. These can be grouped into the following feature categories:
Data: Analysts want rapid access to all relevant data sources, advisories for data that may be relevant to their interests, and availability of information at the appropriate level for their analysis (e.g., output of interpretations from expert analysts instead of raw data).
Enhanced search: Analysts would like customization of information based on relevance, selective filtering of sources, prioritization of search topics, and the ability to view other analysts searches.
Verification: Analysts want indications of information that has been verified or discarded by other analysts, a trail of information history and uses, and automatic verification (e.g., data quality editing) if possible.
Analytics: Analysts want access to forecasting models, services to suggest analysis methods, pointers to other analysts’ expertise, methods, and reports, and tools for “big data” exploitation.
Collaboration and communication: Analysts want assistance identifying people who may have needed information, real-time chat, the ability to compare analyses with colleagues, and the ability to shield data, results, or collaborations from selected others.
Archival: Analysts want automation to provide lessons learned, methods and outcomes for related events, the ability to automatically improve baselines with analyzed data, and assistance with reporting on interim analytic decisions.
The current understanding of the biosurveillance analyst’s functions and processes, based on the results of these interviews, will continue to evolve as further dialog with analysts are combined with results of evaluations during subsequent phases of the new BSVE program.
PMCID: PMC3692935
biosurveillance; workflow; collaboration; operations
19.  Erectile Dysfunction Severity as a Risk Marker for Cardiovascular Disease Hospitalisation and All-Cause Mortality: A Prospective Cohort Study 
PLoS Medicine  2013;10(1):e1001372.
In a prospective Australian population-based study linking questionnaire data from 2006–2009 with hospitalisation and death data to June 2010 for 95,038 men aged ≥45 years, Banks and colleagues found that more severe erectile dysfunction was associated with higher risk of cardiovascular disease.
Background
Erectile dysfunction is an emerging risk marker for future cardiovascular disease (CVD) events; however, evidence on dose response and specific CVD outcomes is limited. This study investigates the relationship between severity of erectile dysfunction and specific CVD outcomes.
Methods and Findings
We conducted a prospective population-based Australian study (the 45 and Up Study) linking questionnaire data from 2006–2009 with hospitalisation and death data to 30 June and 31 Dec 2010 respectively for 95,038 men aged ≥45 y. Cox proportional hazards models were used to examine the relationship of reported severity of erectile dysfunction to all-cause mortality and first CVD-related hospitalisation since baseline in men with and without previous CVD, adjusting for age, smoking, alcohol consumption, marital status, income, education, physical activity, body mass index, diabetes, and hypertension and/or hypercholesterolaemia treatment. There were 7,855 incident admissions for CVD and 2,304 deaths during follow-up (mean time from recruitment, 2.2 y for CVD admission and 2.8 y for mortality). Risks of CVD and death increased steadily with severity of erectile dysfunction. Among men without previous CVD, those with severe versus no erectile dysfunction had significantly increased risks of ischaemic heart disease (adjusted relative risk [RR] = 1.60, 95% CI 1.31–1.95), heart failure (8.00, 2.64–24.2), peripheral vascular disease (1.92, 1.12–3.29), “other” CVD (1.26, 1.05–1.51), all CVD combined (1.35, 1.19–1.53), and all-cause mortality (1.93, 1.52–2.44). For men with previous CVD, corresponding RRs (95% CI) were 1.70 (1.46–1.98), 4.40 (2.64–7.33), 2.46 (1.63–3.70), 1.40 (1.21–1.63), 1.64 (1.48–1.81), and 2.37 (1.87–3.01), respectively. Among men without previous CVD, RRs of more specific CVDs increased significantly with severe versus no erectile dysfunction, including acute myocardial infarction (1.66, 1.22–2.26), atrioventricular and left bundle branch block (6.62, 1.86–23.56), and (peripheral) atherosclerosis (2.47, 1.18–5.15), with no significant difference in risk for conditions such as primary hypertension (0.61, 0.16–2.35) and intracerebral haemorrhage (0.78, 0.20–2.97).
Conclusions
These findings give support for CVD risk assessment in men with erectile dysfunction who have not already undergone assessment. The utility of erectile dysfunction as a clinical risk prediction tool requires specific testing.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Erectile dysfunction is the medical term used when a man is unable to achieve or sustain an erection of his penis suitable for sexual intercourse. Although a sensitive topic that can cause much embarrassment and distress, erectile dysfunction is very common, with an estimated 40% of men over the age of 40 years experiencing frequent or occasional difficulties. The most common causes of erectile dysfunction are medications, chronic illnesses such as diabetes, and drinking too much alcohol. Stress and mental health problems can also cause or worsen erectile dysfunction. There is also increasing evidence that erectile dysfunction may actually be a symptom of cardiovascular disease—a leading cause of death worldwide—as erectile dysfunction could indicate a problem with blood vessels or poor blood flow commonly associated with cardiovascular disease.
Why Was This Study Done?
Although previous studies have suggested that erectile dysfunction can serve as a marker for cardiovascular disease in men not previously diagnosed with the condition, few studies to date have investigated whether erectile dysfunction could also indicate worsening disease in men already diagnosed with cardiovascular disease. In addition, previous studies have typically been small and have not graded the severity of erectile dysfunction or investigated the specific types of cardiovascular disease associated with erectile dysfunction. In this large study conducted in Australia, the researchers investigated the relationship of the severity of erectile dysfunction with a range of cardiovascular disease outcomes among men with and without a previous diagnosis of cardiovascular disease.
What Did the Researchers Do and Find?
The researchers used information from the established 45 and Up Study, a large cohort study that includes 123,775 men aged 45 and over, selected at random from the general population of New South Wales, a large region of Australia. A total of 95,038 men were included in this analysis. The male participants completed a postal questionnaire that included a question on erectile functioning, which allowed the researchers to define erectile dysfunction as none, mild, moderate, or severe. Using information captured in the New South Wales Admitted Patient Data Collection—a complete record of all public and private hospital admissions, including the reasons for admission and the clinical diagnosis—and the government death register, the researchers were able to determine health outcomes of all study participants. They then used a statistical model to estimate hospital admissions for cardiovascular disease events for different levels of erectile dysfunction.
The researchers found that the rates of severe erectile dysfunction among study participants were 2.2% for men aged 45–54 years, 6.8% for men aged 55–64 years, 20.2% for men aged 65–74 years, 50.0% for men aged 75–84 years, and 75.4% for men aged 85 years and over. During the study period, the researchers recorded 7,855 hospital admissions related to cardiovascular disease and 2,304 deaths. The researchers found that among men without previous cardiovascular disease, those with severe erectile dysfunction were more likely to develop ischemic heart disease (risk 1.60), heart failure (risk 8.00), peripheral vascular disease (risk 1.92), and other causes of cardiovascular disease (risk 1.26) than men without erectile dysfunction. The risks of heart attacks and heart conduction problems were also increased (1.66 and 6.62, respectively). Furthermore, the combined risk of all cardiovascular disease outcomes was 1.35, and the overall risk of death was also higher (risk 1.93) in these men. The researchers found that these increased risks were similar in men with erectile dysfunction who had previously been diagnosed with cardiovascular disease.
What Do These Findings Mean?
These findings suggest that compared to men without erectile dysfunction, there is an increasing risk of ischemic heart disease, peripheral vascular disease, and death from all causes in those with increasing degrees of severity of erectile dysfunction. The authors emphasize that erectile dysfunction is a risk marker for cardiovascular disease, not a risk factor that causes cardiovascular disease. These findings add to previous studies and highlight the need to consider erectile dysfunction in relation to the risk of different types of cardiovascular disease, including heart failure and heart conduction disorders. However, the study's reliance on the answer to a single self-assessed question on erectile functioning limits the findings. Nevertheless, these findings provide useful information for clinicians: men with erectile dysfunction are at higher risk of cardiovascular disease, and the worse the erectile dysfunction, the higher the risk of cardiovascular disease. Men with erectile dysfunction, even at mild or moderate levels, should be screened and treated for cardiovascular disease accordingly.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001372.
Wikipedia defines erectile dysfunction (note that Wikipedia is a free online encyclopedia that anyone can edit)
MedlinePlus also has some useful patient information on erectile dysfunction
The Mayo Clinic has patient-friendly information on the causes of, and treatments for, erectile dysfunction, and also includes information on the link with cardiovascular disease
The National Heart Foundation of Australia provides information for health professionals, patients, and the general public about how to prevent and manage cardiovascular disease, including assessment and management of cardiovascular disease risk
doi:10.1371/journal.pmed.1001372
PMCID: PMC3558249  PMID: 23382654
20.  Collaborative Double Robust Targeted Maximum Likelihood Estimation* 
Collaborative double robust targeted maximum likelihood estimators represent a fundamental further advance over standard targeted maximum likelihood estimators of a pathwise differentiable parameter of a data generating distribution in a semiparametric model, introduced in van der Laan, Rubin (2006). The targeted maximum likelihood approach involves fluctuating an initial estimate of a relevant factor (Q) of the density of the observed data, in order to make a bias/variance tradeoff targeted towards the parameter of interest. The fluctuation involves estimation of a nuisance parameter portion of the likelihood, g. TMLE has been shown to be consistent and asymptotically normally distributed (CAN) under regularity conditions, when either one of these two factors of the likelihood of the data is correctly specified, and it is semiparametric efficient if both are correctly specified.
In this article we provide a template for applying collaborative targeted maximum likelihood estimation (C-TMLE) to the estimation of pathwise differentiable parameters in semi-parametric models. The procedure creates a sequence of candidate targeted maximum likelihood estimators based on an initial estimate for Q coupled with a succession of increasingly non-parametric estimates for g. In a departure from current state of the art nuisance parameter estimation, C-TMLE estimates of g are constructed based on a loss function for the targeted maximum likelihood estimator of the relevant factor Q that uses the nuisance parameter to carry out the fluctuation, instead of a loss function for the nuisance parameter itself. Likelihood-based cross-validation is used to select the best estimator among all candidate TMLE estimators of Q0 in this sequence. A penalized-likelihood loss function for Q is suggested when the parameter of interest is borderline-identifiable.
We present theoretical results for “collaborative double robustness,” demonstrating that the collaborative targeted maximum likelihood estimator is CAN even when Q and g are both mis-specified, providing that g solves a specified score equation implied by the difference between the Q and the true Q0. This marks an improvement over the current definition of double robustness in the estimating equation literature.
We also establish an asymptotic linearity theorem for the C-DR-TMLE of the target parameter, showing that the C-DR-TMLE is more adaptive to the truth, and, as a consequence, can even be super efficient if the first stage density estimator does an excellent job itself with respect to the target parameter.
This research provides a template for targeted efficient and robust loss-based learning of a particular target feature of the probability distribution of the data within large (infinite dimensional) semi-parametric models, while still providing statistical inference in terms of confidence intervals and p-values. This research also breaks with a taboo (e.g., in the propensity score literature in the field of causal inference) on using the relevant part of likelihood to fine-tune the fitting of the nuisance parameter/censoring mechanism/treatment mechanism.
doi:10.2202/1557-4679.1181
PMCID: PMC2898626  PMID: 20628637
asymptotic linearity; coarsening at random; causal effect; censored data; crossvalidation; collaborative double robust; double robust; efficient influence curve; estimating function; estimator selection; influence curve; G-computation; locally efficient; loss-function; marginal structural model; maximum likelihood estimation; model selection; pathwise derivative; semiparametric model; sieve; super efficiency; super-learning; targeted maximum likelihood estimation; targeted nuisance parameter estimator selection; variable importance
21.  Does Finasteride Affect the Severity of Prostate Cancer? A Causal Sensitivity Analysis 
In 2003 Thompson and colleagues reported that daily use of finasteride reduced the prevalence of prostate cancer by 25% compared to placebo. These results were based on the double-blind randomized Prostate Cancer Prevention Trial (PCPT) which followed 18,882 men with no prior or current indications of prostate cancer annually for seven years. Enthusiasm for the risk reduction afforded by the chemopreventative agent and adoption of its use in clinical practice, however, was severely dampened by the additional finding in the trial of an increased absolute number of high-grade (Gleason score ≥ 7) cancers on the finasteride arm. The question arose as to whether this finding truly implied that finasteride increased the risk of more severe prostate cancer or was a study artifact due to a series of possible post-randomization selection biases, including differences among treatment arms in patient characteristics of cancer cases, differences in biopsy verification of cancer status due to increased sensitivity of prostate-specific antigen under finasteride, differential grading by biopsy due to prostate volume reduction by finasteride, and nonignorable drop-out. Via a causal inference approach implementing inverse probability weighted estimating equations, this analysis addresses the question of whether finasteride caused more severe prostate cancer by estimating the mean treatment difference in prostate cancer severity between finasteride and placebo for the principal stratum of participants who would have developed prostate cancer regardless of treatment assignment. We perform sensitivity analyses that sequentially adjust for the numerous potential post-randomization biases conjectured in the PCPT.
doi:10.1198/016214508000000706
PMCID: PMC2880822  PMID: 20526381
Causal inference; principal stratification; treatment effects; selection bias
22.  Population Pharmacokinetics of Sifalimumab, an Investigational Anti-Interferon-α Monoclonal Antibody, in Systemic Lupus Erythematosus 
Clinical Pharmacokinetics  2013;52(11):1017-1027.
Background and Objectives
Sifalimumab is a fully human immunoglobulin G1κ monoclonal antibody that binds to and neutralizes a majority of the subtypes of human interferon-α. Sifalimumab is being evaluated as a treatment for systemic lupus erythematosus (SLE). The primary objectives of this analysis were (a) to develop a population pharmacokinetic model for sifalimumab in SLE; (b) to identify and quantitate the impact of patient/disease characteristics on pharmacokinetic variability; and (c) to evaluate fixed versus body weight (WT)-based dosing regimens.
Methods
Sifalimumab serum concentration-time data were collected from a phase Ib study (MI-CP152) designed to evaluate the safety and tolerability of sifalimumab in adult patients with SLE. Sifalimumab was administered every 14 days as a 30- to 60-minute intravenous infusion with escalating doses of 0.3, 1.0, 3.0, and 10 mg/kg and serum concentrations were collected over 350 days. A total of 120 patients provided evaluable pharmacokinetic data with a total of 2,370 serum concentrations. Sifalimumab serum concentrations were determined using a validated colorimetric enzyme-linked immunosorbent assay (ELISA) with a lower limit of quantitation of 1.25 μg/mL. Population pharmacokinetic modeling of sifalimumab was performed using a non-linear mixed effects modeling approach with NONMEM VII software. Impact of patient demographics, clinical indices, and biomarkers on pharmacokinetic parameters were explored using a stepwise forward selection and backward elimination approach. The appropriateness of the final model was tested using visual predictive check (VPC). The impact of body WT-based and fixed dosing of sifalimumab was evaluated using a simulation approach. The final population model was utilized for phase IIb dosing projections.
Results
Sifalimumab pharmacokinetics were best described using a two-compartment linear model with first order elimination. Following intravenous dosing, the typical clearance (CL) and central volume of distribution (V1) were estimated to be 176 mL/day and 2.9 L, respectively. The estimates (coefficient of variation) of between-subject variability for CL and V1 were 28 and 31 %, respectively. Patient baseline body WT, interferon gene signature from 21 genes, steroid use, and sifalimumab dose were identified as significant covariates for CL, whereas only baseline body WT was a significant covariate for V1 and peripheral volume of distribution (V2). Although the above-mentioned covariates were statistically significant, they did not explain variability in pharmacokinetic parameters to any relevant extent (<7 %). Thus, no dosing adjustments are necessary. VPC confirmed good predictability of the final population pharmacokinetic model. Simulation results demonstrate that both fixed and body WT-based dosing regimens yield similar median steady state concentrations and overall variability. Fixed sifalimumab doses of 200, 600, and 1,200 mg monthly (with a loading dose at Day 14) were selected for a phase IIb clinical trial.
Conclusion
A two-compartment population pharmacokinetic model adequately described sifalimumab pharmacokinetics. The estimated typical pharmacokinetic parameters were similar to other monoclonal antibodies without target mediated elimination. Although the population pharmacokinetic analysis identified some statistically significant covariates, they explained <7 % between-subject variability in pharmacokinetic parameters indicating that these covariates are not clinically relevant. The population pharmacokinetic analysis also demonstrated the feasibility of switching to fixed doses in phase IIb clinical trials of sifalimumab.
doi:10.1007/s40262-013-0085-2
PMCID: PMC3824374  PMID: 23754736
23.  A fast Monte Carlo EM algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with AGC 
Journal of applied statistics  2013;40(12):2699-2719.
In this article we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo EM (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group (GOG) study of significant cervical lesion (S-CL) diagnosis in women with atypical glandular cells of undetermined significance (AGC) to compare the diagnostic accuracy of a histology-based evaluation, a CA-IX biomarker-based test and a human papillomavirus (HPV) DNA test.
doi:10.1080/02664763.2013.825704
PMCID: PMC3806648  PMID: 24163493
adjusted information matrix; bootstrap standard errors; diagnostic accuracy; imperfect gold standard; latent class model; MCEM estimation
24.  Bias associated with delayed verification in test accuracy studies: accuracy of tests for endometrial hyperplasia may be much higher than we think! 
BMC Medicine  2004;2:18.
Background
To empirically evaluate bias in estimation of accuracy associated with delay in verification of diagnosis among studies evaluating tests for predicting endometrial hyperplasia.
Methods
Systematic reviews of all published research on accuracy of miniature endometrial biopsy and endometr ial ultrasonography for diagnosing endometrial hyperplasia identified 27 test accuracy studies (2,982 subjects). Of these, 16 had immediate histological verification of diagnosis while 11 had verification delayed > 24 hrs after testing. The effect of delay in verification of diagnosis on estimates of accuracy was evaluated using meta-regression with diagnostic odds ratio (dOR) as the accuracy measure. This analysis was adjusted for study quality and type of test (miniature endometrial biopsy or endometrial ultrasound).
Results
Compared to studies with immediate verification of diagnosis (dOR 67.2, 95% CI 21.7–208.8), those with delayed verification (dOR 16.2, 95% CI 8.6–30.5) underestimated the diagnostic accuracy by 74% (95% CI 7%–99%; P value = 0.048).
Conclusion
Among studies of miniature endometrial biopsy and endometrial ultrasound, diagnostic accuracy is considerably underestimated if there is a delay in histological verification of diagnosis.
doi:10.1186/1741-7015-2-18
PMCID: PMC419332  PMID: 15137911
25.  Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo 
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
doi:10.1080/10095020.2012.714663
PMCID: PMC3595116  PMID: 23504576
Simulium damnosum s.l.; cluster covariates; QuickBird; onchoceriasis; annual biting rates; Bayesian; Togo

Results 1-25 (1290182)