In trials designed to estimate rates of perinatal mother to child transmission of HIV, HIV assays are scheduled at multiple points in time. Still, infection status for some infants at some time points may be unknown, particularly when interim analyses are conducted.
Logistic regression models are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here we propose using coarsened multinomial regression models to estimate cumulative and conditional rates of HIV transmission. Through simulation, we compare the proposed models to standard logistic models in terms of bias, mean squared error, coverage probability, and power. We consider a range of treatment effect and visit process scenarios, while including imperfect sensitivity of the assay and contamination of the endpoint due to early breastfeeding transmission. We illustrate the approach through analysis of data from a clinical trial designed to prevent perinatal transmission.
The proposed cumulative and conditional models performed well when compared to their logistic counterparts. Performance of the proposed cumulative model was particularly strong under scenarios where treatment was assumed to increase the risk of in utero transmission but decrease the risk of intrapartum and overall perinatal transmission and under scenarios designed to represent interim analyses. Power to estimate intrapartum and perinatal transmission was consistently higher for the proposed models.
Coarsened multinomial regression models are preferred to standard logistic models for estimation of perinatal mother to child transmission of HIV, particularly when assays are missing or occur off-schedule for some infants.
Missing covariate data is common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, so here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project (LIBCSP) follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete case analysis yielded markedly different results.
proportional hazards regression; non-ignorably missing data; missing covariates; selection model
We extend the standard multivariate mixed model by incorporating a smooth time effect and relaxing distributional assumptions. We propose a semiparametric Bayesian approach to multivariate longitudinal data using a mixture of Polya trees prior distribution. Usually, the distribution of random effects in a longitudinal data model is assumed to be Gaussian. However, the normality assumption may be suspect, particularly if the estimated longitudinal trajectory parameters exhibit multimodality and skewness. In this paper we propose a mixture of Polya trees prior density to address the limitations of the parametric random effects distribution. We illustrate the methodology by analyzing data from a recent HIV-AIDS study.
Conditional predictive ordinate; Longitudinal data; Mixture of Polya trees; Penalized spline
Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs require estimation of a single finite number of classes, which does not increase with the sample size, and have a well-known sensitivity to parametric assumptions on the distributions within a class. Bayesian nonparametric methods have been developed to allow an infinite number of classes in the general population, with the number represented in a sample increasing with sample size. In this article, we propose a new nonparametric Bayes model that allows predictors to flexibly impact the allocation to latent classes, while limiting sensitivity to parametric assumptions by allowing class-specific distributions to be unknown subject to a stochastic ordering constraint. An efficient MCMC algorithm is developed for posterior computation. The methods are validated using simulation studies and applied to the problem of ranking medical procedures in terms of the distribution of patient morbidity.
Factor analysis; Latent variables; Mixture model; Model-based clustering; Nested Dirichlet process; Order restriction; Random probability measure; Stick breaking
The Spectrum program is used to estimate key HIV indicators from the trends in incidence and prevalence estimated by the Estimation and Projection Package or the Workbook. These indicators include the number of people living with HIV, new infections, AIDS deaths, AIDS orphans, the number of adults and children needing treatment, the need for prevention of mother-to-child transmission and the impact of antiretroviral treatment on survival. The UNAIDS Reference Group on Estimates, Models and Projections regularly reviews new data and information needs, and recommends updates to the methodology and assumptions used in Spectrum.
The latest update to Spectrum was used in the 2009 round of global estimates. This update contains new procedures for estimating: the age and sex distribution of adult incidence, new child infections occurring around delivery or through breastfeeding, the survival of children by timing of infection and the number of double orphans.
HIV; modelling; AIDS; estimates; epidemiology
Assumptions about survival of HIV-infected children in Africa without antiretroviral therapy need to be updated to inform ongoing UNAIDS modelling of paediatric HIV epidemics among children. Improved estimates of infant survival by timing of HIV-infection (perinatally or postnatally) are thus needed.
A pooled analysis was conducted of individual data of all available intervention cohorts and randomized trials on prevention of HIV mother-to-child transmission in Africa. Studies were right-censored at the time of infant antiretroviral initiation. Overall mortality rate per 1000 child-years of follow-up was calculated by selected maternal and infant characteristics. The Kaplan-Meier method was used to estimate survival curves by child's HIV infection status and timing of HIV infection. Individual data from 12 studies were pooled, with 12,112 children of HIV-infected women. Mortality rates per 1,000 child-years follow-up were 39.3 and 381.6 for HIV-uninfected and infected children respectively. One year after acquisition of HIV infection, an estimated 26% postnatally and 52% perinatally infected children would have died; and 4% uninfected children by age 1 year. Mortality was independently associated with maternal death (adjusted hazard ratio 2.2, 95%CI 1.6–3.0), maternal CD4<350 cells/ml (1.4, 1.1–1.7), postnatal (3.1, 2.1–4.1) or peri-partum HIV-infection (12.4, 10.1–15.3).
These results update previous work and inform future UNAIDS modelling by providing survival estimates for HIV-infected untreated African children by timing of infection. We highlight the urgent need for the prevention of peri-partum and postnatal transmission and timely assessment of HIV infection in infants to initiate antiretroviral care and support for HIV-infected children.
We propose Bayesian parametric and semiparametric partially linear regression methods to analyze the outcome-dependent follow-up data when the random time of a follow-up measurement of an individual depends on the history of both observed longitudinal outcomes and previous measurement times. We begin with the investigation of the simplifying assumptions of Lipsitz, Fitzmaurice, Ibrahim, Gelber, and Lipshultz, and present a new model for analyzing such data by allowing subject-specific correlations for the longitudinal response and by introducing a subject-specific latent variable to accommodate the association between the longitudinal measurements and the follow-up times. An extensive simulation study shows that our Bayesian partially linear regression method facilitates accurate estimation of the true regression line and the regression parameters. We illustrate our new methodology using data from a longitudinal observational study.
Bayesian cubic smoothing spline; Latent variable; Partially linear model
Nucleic-acid-testing (NAT) to diagnose HIV infection in children under age 18 months provides a barrier to HIV-testing in exposed children from resource-constrained settings. The ultrasensitive HIV- p24- antigen (Up24) assay is cheaper and easier to perform and is sensitive (84–98%) and specific (98–100%). The cut-point optical density (OD) selected for discriminating between positive and negative samples may need assessment due to regional differences in mother-to-child HIV-transmission rates.
We used receiver operator characteristics (ROC) curves and logistic regression analyses to assess the effect of various cut-points on the diagnostic performance of Up24 for HIV-infection status among HIV-exposed children. Positive and negative predictive values at different rates of disease prevalence were also estimated.
A study of Up24 testing on dried blood spot (DBS) samples collected from 278 HIV-exposed Haitian children, 3–24-months of age, in whom HIV-infection status was determined by NAT on the same DBS card.
The sensitivity and specificity of Up24 varied by the cut-point-OD value selected. At a cut-point-OD of 8-fold the standard deviation of the negative control (NCSD), sensitivity and specificity of Up24 were maximized [87.8% (95% CI, 83.9–91.6) and 92% (95% CI, 88.8–95.2), respectively]. In lower prevalence settings (5%), positive and negative predictive values of Up24 were maximal (75.9% and 98.8%, respectively) at a cut-point-OD that was 15-fold the NCSD.
In low prevalence settings, a high degree of specificity can be achieved with Up24 testing of HIV-exposed children when a higher cut-point OD is used; a feature that may facilitate more frequent use of Up24 antigen testing for HIV-exposed children.
Understanding temporal change in human behavior and psychological processes is a central issue in the behavioral sciences. With technological advances, intensive longitudinal data (ILD) are increasingly generated by studies of human behavior that repeatedly administer assessments over time. ILD offer unique opportunities to describe temporal behavioral changes in detail and identify related environmental and psychosocial antecedents and consequences. Traditional analytical approaches impose strong parametric assumptions about the nature of change in the relationship between time-varying covariates and outcomes of interest. This paper introduces time-varying effect models (TVEM) that explicitly model changes in the association between ILD covariates and ILD outcomes over time in a flexible manner. In this article, we describes unique research questions that the TVEM addresses, outline the model-estimation procedure, share a SAS macro for implementing the model, demonstrate model utility with a simulated example, and illustrate model applications in ILD collected as part of a smoking-cessation study to explore the relationship between smoking urges and self-efficacy during the course of the pre- and post- cessation period.
intensive longitudinal data; time-varying effect model; non-parametric; P-spline; applications
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian methods; hierarchical models; multivariate analysis; receiver operating characteristic (ROC) curve; Box–Cox transformation
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
We propose a new general Bayesian latent class model for evaluation of the performance of multiple diagnostic tests in situations in which no gold standard test exists based on a computationally intensive approach. The modeling represents an interesting and suitable alternative to models with complex structures that involve the general case of several conditionally independent diagnostic tests, covariates, and strata with different disease prevalences. The technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the modeling, but it makes the model more flexible and interpretable. To illustrate the general model proposed, we evaluate the performance of six diagnostic screening tests for Chagas disease considering some epidemiological variables. Serology at the time of donation (negative, positive, inconclusive) was considered as a factor of stratification in the model. The general model with stratification of the population performed better in comparison with its concurrents without stratification. The group formed by the testing laboratory Biomanguinhos FIOCRUZ-kit (c-ELISA and rec-ELISA) is the best option in the confirmation process by presenting false-negative rate of 0.0002% from the serial scheme. We are 100% sure that the donor is healthy when these two tests have negative results and he is chagasic when they have positive results.
The proportional odds model may serve as a useful alternative to the Cox proportional hazards model to study association between covariates and their survival functions in medical studies. In this article, we study an extended proportional odds model that incorporates the so-called “external” time-varying covariates. In the extended model, regression parameters have a direct interpretation of comparing survival functions, without specifying the baseline survival odds function. Semiparametric and maximum likelihood estimation procedures are proposed to estimate the extended model. Our methods are demonstrated by Monte-Carlo simulations, and applied to a landmark randomized clinical trial of a short course Nevirapine (NVP) for mother-to-child transmission (MTCT) of human immunodeficiency virus type-1 (HIV-1). Additional application includes analysis of the well-known Veterans Administration (VA) Lung Cancer Trial.
Counting process; Estimating function; HIV/AIDS; Maximum likelihood estimation; Semiparametric model; Time-varying covariate
We consider the estimation of the parameters indexing a parametric model for the conditional distribution of a diagnostic marker given covariates and disease status. Such models are useful for the evaluation of whether and to what extent a marker’s ability to accurately detect or discard disease depends on patient characteristics. A frequent problem that complicates the estimation of the model parameters is that estimation must be conducted from observational studies. Often, in such studies not all patients undergo the gold standard assessment of disease. Furthermore, the decision as to whether a patient undergoes verification is not controlled by study design. In such scenarios, maximum likelihood estimators based on subjects with observed disease status are generally biased. In this paper, we propose estimators for the model parameters that adjust for selection to verification that may depend on measured patient characteristics and additonally adjust for an assumed degree of residual association. Such estimators may be used as part of a sensitivity analysis for plausible degrees of residual association. We describe a doubly robust estimator that has the attractive feature of being consistent if either a model for the probability of selection to verification or a model for the probability of disease among the verified subjects (but not necessarily both) is correct.
Missing at Random; Nonignorable; Missing Covariate; Sensitivity Analysis; Semiparametric; Diagnosis
Researchers modeling historical heights have typically relied on the restrictive assumption of a normal distribution, only the mean of which is affected by age, income, nutrition, disease, and similar influences. To avoid these restrictive assumptions, we develop a new semiparametric approach in which covariates are allowed to affect the entire distribution without imposing any parametric shape. We apply our method to a new database of height distributions for Italian provinces, drawn from conscription records, of unprecedented length and geographical disaggregation. Our method allows us to standardize distributions to a single age and calculate moments of the distribution that are comparable through time. Our method also allows us to generate counterfactual distributions for a range of ages, from which we derive age-height profiles. These profiles reveal how the adolescent growth spurt (AGS) distorts the distribution of stature, and they document the earlier and earlier onset of the AGS as living conditions improved over the second half of the nineteenth century. Our new estimates of provincial mean height also reveal a previously unnoticed “regime switch” from regional convergence to divergence in this period.
Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman’s coalescent, Dirichlet diffusion trees and Wishart processes.
probabilistic modelling; Bayesian statistics; non-parametrics; machine learning
The timing of mother-to-child transmission (MTCT) of HIV is critical in understanding the dynamics of MTCT. It has a great implication to developing any effective treatment or prevention strategies for such transmissions. In this paper, we develop an imputation method to analyze the censored MTCT timing in presence of auxiliary information. Specifically, we first propose a statistical model based on the hazard functions of the MTCT timing to reflect three MTCT modes: in utero, during delivery and via breastfeeding, with different shapes of the baseline hazard that vary between infants. This model also allows that the majority of infants may be immuned from the MTCT of HIV. Then, the model is fitted by MCMC to explore marginal inferences via multiple imputation. Moreover, we propose a simple and straightforward approach to take into account the imperfect sensitivity in imputation step, and study appropriate censoring techniques to account for weaning. Our method is assessed by simulations, and applied to a large trial designed to assess the use of antibiotics in preventing MTCT of HIV.
HIV/AIDS; mixture models; mother to child transmission of HIV; multiple imputation
To evaluate the probabilities of a disease state, ideally all subjects in a study should be diagnosed by a definitive diagnostic or gold standard test. However, since definitive diagnostic tests are often invasive and expensive, it is generally unethical to apply them to subjects whose screening tests are negative. In this article, we consider latent class models for screening studies with two imperfect binary diagnostic tests and a definitive categorical disease status measured only for those with at least one positive screening test. Specifically, we discuss a conditional independent and three homogeneous conditional dependent latent class models and assess the impact of misspecification of the dependence structure on the estimation of disease category probabilities using frequentist and Bayesian approaches. Interestingly, the three homogeneous dependent models can provide identical goodness-of-fit but substantively different estimates for a given study. However, the parametric form of the assumed dependence structure itself is not “testable” from the data, and thus the dependence structure modeling considered here can only be viewed as a sensitivity analysis concerning a more complicated non-identifiable model potentially involving heterogeneous dependence structure. Furthermore, we discuss Bayesian model averaging together with its limitations as an alternative way to partially address this particularly challenging problem. The methods are applied to two cancer screening studies, and simulations are conducted to evaluate the performance of these methods. In summary, further research is needed to reduce the impact of model misspecification on the estimation of disease prevalence in such settings.
maximum likelihood; Bayesian inference; diagnostic test; dependence; screening; latent class models
We present a semi-parametric deconvolution estimator for the density function of a random variable X that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean–variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length.
density estimation; measurement error; mean–variance function model
We study a mixed-effects model in which the response and the main covariate are linked by position. While the covariate corresponding to the observed response is not directly observable, there exists a latent covariate process that represents the underlying positional features of the covariate. When the positional features and the underlying distributions are parametric, the expectation-maximization (EM) is the most commonly used procedure. Though without the parametric assumptions, the practical feasibility of a semi-parametric EM algorithm and the corresponding inference procedures remain to be investigated. In this paper, we propose a semiparametric approach, and identify the conditions under which the semiparametric estimators share the same asymptotic properties as the unachievable estimators using the true values of the latent covariate; that is, the oracle property is achieved. We propose a Monte Carlo graphical evaluation tool to assess the adequacy of the sample size for achieving the oracle property. The semiparametric approach is later applied to data from a colon carcinogenesis study on the effects of cell DNA damage on the expression level of oncogene bcl-2. The graphical evaluation shows that, with moderate size of subunits, the numerical performance of the semiparametric estimator is very close to the asymptotic limit. It indicates that a complex EM-based implementation may at most achieve minimal improvement and is thus unnecessary.
Carcinogenesis; Consistency; Generalized estimating equation; Local linear smoothing; Mixed-effects model
Exposure lagging and exposure-time window analysis are 2 widely used approaches to allow for induction and latency periods in analyses of exposure-disease associations. Exposure lagging implies a strong parametric assumption about the temporal evolution of the exposure-disease association. An exposure-time window analysis allows for a more flexible description of temporal variation in exposure effects but may result in unstable risk estimates that are sensitive to how windows are defined. The authors describe a hierarchical regression approach that combines time window analysis with a parametric latency model. They illustrate this approach using data from 2 occupational cohort studies: studies of lung cancer mortality among 1) asbestos textile workers and 2) uranium miners. For each cohort, an exposure-time window analysis was compared with a hierarchical regression analysis with shrinkage toward a simpler, second-stage parametric latency model. In each cohort analysis, there is substantial stability gained in time window-specific estimates of association by using a hierarchical regression approach. The proposed hierarchical regression model couples a time window analysis with a parametric latency model; this approach provides a way to stabilize risk estimates derived from a time window analysis and a way to reduce bias arising from misspecification of a parametric latency model.
cohort studies; hierarchical model; latency; neoplasms; regression
Outcome-dependent sampling (ODS) study designs are commonly implemented with rare diseases or when prospective studies are infeasible. In longitudinal data settings, when a repeatedly measured binary response is rare, an ODS design can be highly efficient for maximizing statistical information subject to resource limitations that prohibit covariate ascertainment of all observations. This manuscript details an ODS design where individual observations are sampled with probabilities determined by an inexpensive, time-varying auxiliary variable that is related but is not equal to the response. With the goal of validly estimating marginal model parameters based on the resulting biased sample, we propose a semi-parametric, sequential offsetted logistic regressions (SOLR) approach. The SOLR strategy first estimates the relationship between the auxiliary variable and the response and covariate data by using an offsetted logistic regression analysis where the offset is used to adjust for the biased design. Results from the auxiliary variable model are then combined with the known or estimated sampling probabilities to formulate a second offset that is used to correct for the biased design in the ultimate target model relating the longitudinal binary response to covariates. Because the target model offset is estimated with SOLR, we detail asymptotic standard error estimates that account for uncertainty associated with the auxiliary variable model. Motivated by an analysis of the BioCycle Study (Gaskins et al., Effect of daily fiber intake on reproductive function: the BioCycle Study. American Journal of Clinical Nutrition 2009; 90(4): 1061–1069) that aims to describe the relationship between reproductive health (determined by luteinizing hormone levels) and fiber consumption, we examine properties of SOLR estimators and compare them with other common approaches.
outcome-dependent sampling; bias sampling; study design; generalized estimating equations; longitudinal data analysis; binary data
With today’s rapid advances in technology and understanding of disease, more screening and diagnostic tests have become available in a variety of sociodemographic and clinical settings. This analysis quantifies the impact of varying prevalence rates on test performance for given sensitivity and specificity values.
Using a worked example of latent tuberculosis infection, we compared true-positive (TP) and false-positive (FP) results when varying prevalence and test sensitivity and specificity. We used estimates from published literature to estimate two tests’ sensitivity (81%, QuantiFERON®-TB Gold In-Tube; 88%, T-SPOT®.TB) and specificity (99%; 88%), and we used World Health Organization data to estimate disease prevalence in five countries.
Varying sensitivity impacted outcomes most in high-prevalence settings; change in specificity had greater impact in low-prevalence settings. In switching from QuantiFERON-TB to T-SPOT.TB (higher sensitivity, lower specificity), trade-offs between increasing case identification (TPs) and decreasing unnecessary treatments (FPs) varied dramatically with prevalence. Lower-prevalence settings paid a greater “price” of more FPs for each TP gained, with 37.7 FPs per TP in the United States (5% prevalence) versus 2.5 in the Ivory Coast (55% prevalence).
Prevalence affects test performance for given sensitivity and specificity values. To optimize test performance, disease prevalence should be incorporated in testing decisions, and sensitivity and specificity should be set locally, not globally. In lower-prevalence settings, using highly specific assays may optimize outcomes.
Disease prevalence; Testing; Screening; Test thresholds; Sensitivity; Specificity; Tuberculosis; Outcomes
Accuracy of rapid diagnostic tests for dengue infection has been repeatedly estimated by comparing those tests with reference assays. We hypothesized that those estimates might be inaccurate if the accuracy of the reference assays is not perfect. Here, we investigated this using statistical modeling.
Data from a cohort study of 549 patients suspected of dengue infection presenting at Colombo North Teaching Hospital, Ragama, Sri Lanka, that described the application of our reference assay (a combination of Dengue IgM antibody capture ELISA and IgG antibody capture ELISA) and of three rapid diagnostic tests (Panbio NS1 antigen, IgM antibody and IgG antibody rapid immunochromatographic cassette tests) were re-evaluated using Bayesian latent class models (LCMs). The estimated sensitivity and specificity of the reference assay were 62.0% and 99.6%, respectively. Prevalence of dengue infection (24.3%), and sensitivities and specificities of the Panbio NS1 (45.9% and 97.9%), IgM (54.5% and 95.5%) and IgG (62.1% and 84.5%) estimated by Bayesian LCMs were significantly different from those estimated by assuming that the reference assay was perfect. Sensitivity, specificity, PPV and NPV for a combination of NS1, IgM and IgG cassette tests on admission samples were 87.0%, 82.8%, 62.0% and 95.2%, respectively.
Our reference assay is an imperfect gold standard. In our setting, the combination of NS1, IgM and IgG rapid diagnostic tests could be used on admission to rule out dengue infection with a high level of accuracy (NPV 95.2%). Further evaluation of rapid diagnostic tests for dengue infection should include the use of appropriate statistical models.
The diagnosis of canine echinococcosis can be a challenge in surveillance studies because there is no perfect gold standard that can be used routinely. However, unknown test specificities and sensitivities can be overcome using latent-class analysis with appropriate data.
We utilised a set of faecal and purge samples used previously to explore the epidemiology of canine echinococcosis on the Tibetan plateau. Previously only the purge results were reported and analysed in a largely deterministic way. In the present study, additional diagnostic tests of copro-PCR and copro-antigen ELISA were undertaken on the faecal samples. This enabled a Bayesian analysis in a latent-class model to examine the diagnostic performance of a genus specific copro-antigen ELISA, species-specific copro-PCR and arecoline purgation. Potential covariates including co-infection with Taenia, age and sex of the dog were also explored. The dependence structure of these diagnostic tests could also be analysed.
The most parsimonious result, indicated by deviance-information criteria, suggested that co-infection with Taenia spp. was a significant covariate with the Echinococcus infection. The copro-PCRs had estimated sensitivities of 89% and 84% respectively for the diagnoses of Echinococcus multilocularis and E. granulosus. The specificities for the copro-PCR were estimated at 93 and 83% respectively. Copro-antigen ELISA had sensitivities of 55 and 57% for the diagnosis of E. multilocularis and E. granulosus and specificities of 71 and 69% respectively. Arecoline purgation with an assumed specificity of 100% had estimated sensitivities of 76% and 85% respectively.
This study also shows that incorporating diagnostic uncertainty, in other words assuming no perfect gold standard, and including potential covariates like sex or Taenia co-infection into the epidemiological analysis may give different results than if the diagnosis of infection status is assumed to be deterministic and this approach should therefore be used whenever possible.
Dogs are a key definitive host of Echinococcus spp; hence, accurate diagnosis in dogs is important for the surveillance and control of echinococcosis. A perfect diagnostic test would detect every infected dog (100% sensitivity) whilst never giving a false positive reaction in non-infected dogs (100% specificity). Since no such test exists, it is important to understand the performance of available diagnostic techniques. We used the results of a study that used three diagnostic tests on dogs from the Tibetan plateau, where there is co-endemicity of E. granulosus and E. multilocularis. In this study opro-antigen ELISA and copro-PCR diagnostic tests were undertaken on faecal samples from all animals. The dogs were also purged with arecoline hydrobromide to recover adult parasites as a highly specific but relatively insensitive third diagnostic test. We used a statistical approach (Bayesian latent-class models) to estimate simultaneously the sensitivities of all three tests and the specificities of the copro-antigen and copro-PCR tests. We also analysed how some determinants of infection can affect parasite prevalence. This approach provides a robust framework to increase the accuracy of surveillance and epidemiological studies of echinococcosis by overcoming the problems of poor diagnostic test performance.