In trials designed to estimate rates of perinatal mother to child transmission of HIV, HIV assays are scheduled at multiple points in time. Still, infection status for some infants at some time points may be unknown, particularly when interim analyses are conducted.
Logistic regression models are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here we propose using coarsened multinomial regression models to estimate cumulative and conditional rates of HIV transmission. Through simulation, we compare the proposed models to standard logistic models in terms of bias, mean squared error, coverage probability, and power. We consider a range of treatment effect and visit process scenarios, while including imperfect sensitivity of the assay and contamination of the endpoint due to early breastfeeding transmission. We illustrate the approach through analysis of data from a clinical trial designed to prevent perinatal transmission.
The proposed cumulative and conditional models performed well when compared to their logistic counterparts. Performance of the proposed cumulative model was particularly strong under scenarios where treatment was assumed to increase the risk of in utero transmission but decrease the risk of intrapartum and overall perinatal transmission and under scenarios designed to represent interim analyses. Power to estimate intrapartum and perinatal transmission was consistently higher for the proposed models.
Coarsened multinomial regression models are preferred to standard logistic models for estimation of perinatal mother to child transmission of HIV, particularly when assays are missing or occur off-schedule for some infants.
Missing covariate data is common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, so here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project (LIBCSP) follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete case analysis yielded markedly different results.
proportional hazards regression; non-ignorably missing data; missing covariates; selection model
We extend the standard multivariate mixed model by incorporating a smooth time effect and relaxing distributional assumptions. We propose a semiparametric Bayesian approach to multivariate longitudinal data using a mixture of Polya trees prior distribution. Usually, the distribution of random effects in a longitudinal data model is assumed to be Gaussian. However, the normality assumption may be suspect, particularly if the estimated longitudinal trajectory parameters exhibit multimodality and skewness. In this paper we propose a mixture of Polya trees prior density to address the limitations of the parametric random effects distribution. We illustrate the methodology by analyzing data from a recent HIV-AIDS study.
Conditional predictive ordinate; Longitudinal data; Mixture of Polya trees; Penalized spline
Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs require estimation of a single finite number of classes, which does not increase with the sample size, and have a well-known sensitivity to parametric assumptions on the distributions within a class. Bayesian nonparametric methods have been developed to allow an infinite number of classes in the general population, with the number represented in a sample increasing with sample size. In this article, we propose a new nonparametric Bayes model that allows predictors to flexibly impact the allocation to latent classes, while limiting sensitivity to parametric assumptions by allowing class-specific distributions to be unknown subject to a stochastic ordering constraint. An efficient MCMC algorithm is developed for posterior computation. The methods are validated using simulation studies and applied to the problem of ranking medical procedures in terms of the distribution of patient morbidity.
Factor analysis; Latent variables; Mixture model; Model-based clustering; Nested Dirichlet process; Order restriction; Random probability measure; Stick breaking
The Spectrum program is used to estimate key HIV indicators from the trends in incidence and prevalence estimated by the Estimation and Projection Package or the Workbook. These indicators include the number of people living with HIV, new infections, AIDS deaths, AIDS orphans, the number of adults and children needing treatment, the need for prevention of mother-to-child transmission and the impact of antiretroviral treatment on survival. The UNAIDS Reference Group on Estimates, Models and Projections regularly reviews new data and information needs, and recommends updates to the methodology and assumptions used in Spectrum.
The latest update to Spectrum was used in the 2009 round of global estimates. This update contains new procedures for estimating: the age and sex distribution of adult incidence, new child infections occurring around delivery or through breastfeeding, the survival of children by timing of infection and the number of double orphans.
HIV; modelling; AIDS; estimates; epidemiology
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis.
multi-state models; recurrent event; competing risks; survival analysis; frailties; sleep; hypnogram
Assumptions about survival of HIV-infected children in Africa without antiretroviral therapy need to be updated to inform ongoing UNAIDS modelling of paediatric HIV epidemics among children. Improved estimates of infant survival by timing of HIV-infection (perinatally or postnatally) are thus needed.
A pooled analysis was conducted of individual data of all available intervention cohorts and randomized trials on prevention of HIV mother-to-child transmission in Africa. Studies were right-censored at the time of infant antiretroviral initiation. Overall mortality rate per 1000 child-years of follow-up was calculated by selected maternal and infant characteristics. The Kaplan-Meier method was used to estimate survival curves by child's HIV infection status and timing of HIV infection. Individual data from 12 studies were pooled, with 12,112 children of HIV-infected women. Mortality rates per 1,000 child-years follow-up were 39.3 and 381.6 for HIV-uninfected and infected children respectively. One year after acquisition of HIV infection, an estimated 26% postnatally and 52% perinatally infected children would have died; and 4% uninfected children by age 1 year. Mortality was independently associated with maternal death (adjusted hazard ratio 2.2, 95%CI 1.6–3.0), maternal CD4<350 cells/ml (1.4, 1.1–1.7), postnatal (3.1, 2.1–4.1) or peri-partum HIV-infection (12.4, 10.1–15.3).
These results update previous work and inform future UNAIDS modelling by providing survival estimates for HIV-infected untreated African children by timing of infection. We highlight the urgent need for the prevention of peri-partum and postnatal transmission and timely assessment of HIV infection in infants to initiate antiretroviral care and support for HIV-infected children.
We propose Bayesian parametric and semiparametric partially linear regression methods to analyze the outcome-dependent follow-up data when the random time of a follow-up measurement of an individual depends on the history of both observed longitudinal outcomes and previous measurement times. We begin with the investigation of the simplifying assumptions of Lipsitz, Fitzmaurice, Ibrahim, Gelber, and Lipshultz, and present a new model for analyzing such data by allowing subject-specific correlations for the longitudinal response and by introducing a subject-specific latent variable to accommodate the association between the longitudinal measurements and the follow-up times. An extensive simulation study shows that our Bayesian partially linear regression method facilitates accurate estimation of the true regression line and the regression parameters. We illustrate our new methodology using data from a longitudinal observational study.
Bayesian cubic smoothing spline; Latent variable; Partially linear model
Nucleic-acid-testing (NAT) to diagnose HIV infection in children under age 18 months provides a barrier to HIV-testing in exposed children from resource-constrained settings. The ultrasensitive HIV- p24- antigen (Up24) assay is cheaper and easier to perform and is sensitive (84–98%) and specific (98–100%). The cut-point optical density (OD) selected for discriminating between positive and negative samples may need assessment due to regional differences in mother-to-child HIV-transmission rates.
We used receiver operator characteristics (ROC) curves and logistic regression analyses to assess the effect of various cut-points on the diagnostic performance of Up24 for HIV-infection status among HIV-exposed children. Positive and negative predictive values at different rates of disease prevalence were also estimated.
A study of Up24 testing on dried blood spot (DBS) samples collected from 278 HIV-exposed Haitian children, 3–24-months of age, in whom HIV-infection status was determined by NAT on the same DBS card.
The sensitivity and specificity of Up24 varied by the cut-point-OD value selected. At a cut-point-OD of 8-fold the standard deviation of the negative control (NCSD), sensitivity and specificity of Up24 were maximized [87.8% (95% CI, 83.9–91.6) and 92% (95% CI, 88.8–95.2), respectively]. In lower prevalence settings (5%), positive and negative predictive values of Up24 were maximal (75.9% and 98.8%, respectively) at a cut-point-OD that was 15-fold the NCSD.
In low prevalence settings, a high degree of specificity can be achieved with Up24 testing of HIV-exposed children when a higher cut-point OD is used; a feature that may facilitate more frequent use of Up24 antigen testing for HIV-exposed children.
Understanding temporal change in human behavior and psychological processes is a central issue in the behavioral sciences. With technological advances, intensive longitudinal data (ILD) are increasingly generated by studies of human behavior that repeatedly administer assessments over time. ILD offer unique opportunities to describe temporal behavioral changes in detail and identify related environmental and psychosocial antecedents and consequences. Traditional analytical approaches impose strong parametric assumptions about the nature of change in the relationship between time-varying covariates and outcomes of interest. This paper introduces time-varying effect models (TVEM) that explicitly model changes in the association between ILD covariates and ILD outcomes over time in a flexible manner. In this article, we describes unique research questions that the TVEM addresses, outline the model-estimation procedure, share a SAS macro for implementing the model, demonstrate model utility with a simulated example, and illustrate model applications in ILD collected as part of a smoking-cessation study to explore the relationship between smoking urges and self-efficacy during the course of the pre- and post- cessation period.
intensive longitudinal data; time-varying effect model; non-parametric; P-spline; applications
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian methods; hierarchical models; multivariate analysis; receiver operating characteristic (ROC) curve; Box–Cox transformation
Estimates of the sensitivity and specificity for new diagnostic tests based on evaluation against a known gold standard are imprecise when the accuracy of the gold standard is imperfect. Bayesian latent class models (LCMs) can be helpful under these circumstances, but the necessary analysis requires expertise in computational programming. Here, we describe open-access web-based applications that allow non-experts to apply Bayesian LCMs to their own data sets via a user-friendly interface.
Applications for Bayesian LCMs were constructed on a web server using R and WinBUGS programs. The models provided (http://mice.tropmedres.ac) include two Bayesian LCMs: the two-tests in two-population model (Hui and Walter model) and the three-tests in one-population model (Walter and Irwig model). Both models are available with simplified and advanced interfaces. In the former, all settings for Bayesian statistics are fixed as defaults. Users input their data set into a table provided on the webpage. Disease prevalence and accuracy of diagnostic tests are then estimated using the Bayesian LCM, and provided on the web page within a few minutes. With the advanced interfaces, experienced researchers can modify all settings in the models as needed. These settings include correlation among diagnostic test results and prior distributions for all unknown parameters. The web pages provide worked examples with both models using the original data sets presented by Hui and Walter in 1980, and by Walter and Irwig in 1988. We also illustrate the utility of the advanced interface using the Walter and Irwig model on a data set from a recent melioidosis study. The results obtained from the web-based applications were comparable to those published previously.
The newly developed web-based applications are open-access and provide an important new resource for researchers worldwide to evaluate new diagnostic tests.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
We propose a new general Bayesian latent class model for evaluation of the performance of multiple diagnostic tests in situations in which no gold standard test exists based on a computationally intensive approach. The modeling represents an interesting and suitable alternative to models with complex structures that involve the general case of several conditionally independent diagnostic tests, covariates, and strata with different disease prevalences. The technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the modeling, but it makes the model more flexible and interpretable. To illustrate the general model proposed, we evaluate the performance of six diagnostic screening tests for Chagas disease considering some epidemiological variables. Serology at the time of donation (negative, positive, inconclusive) was considered as a factor of stratification in the model. The general model with stratification of the population performed better in comparison with its concurrents without stratification. The group formed by the testing laboratory Biomanguinhos FIOCRUZ-kit (c-ELISA and rec-ELISA) is the best option in the confirmation process by presenting false-negative rate of 0.0002% from the serial scheme. We are 100% sure that the donor is healthy when these two tests have negative results and he is chagasic when they have positive results.
The proportional odds model may serve as a useful alternative to the Cox proportional hazards model to study association between covariates and their survival functions in medical studies. In this article, we study an extended proportional odds model that incorporates the so-called “external” time-varying covariates. In the extended model, regression parameters have a direct interpretation of comparing survival functions, without specifying the baseline survival odds function. Semiparametric and maximum likelihood estimation procedures are proposed to estimate the extended model. Our methods are demonstrated by Monte-Carlo simulations, and applied to a landmark randomized clinical trial of a short course Nevirapine (NVP) for mother-to-child transmission (MTCT) of human immunodeficiency virus type-1 (HIV-1). Additional application includes analysis of the well-known Veterans Administration (VA) Lung Cancer Trial.
Counting process; Estimating function; HIV/AIDS; Maximum likelihood estimation; Semiparametric model; Time-varying covariate
Statistical methods such as latent class analysis can estimate the sensitivity and specificity of diagnostic tests when no perfect reference test exists. Traditional latent class methods assume a constant disease prevalence in one or more tested populations. When the risk of disease varies in a known way, these models fail to take advantage of additional information that can be obtained by measuring risk factors at the level of the individual. We show that by incorporating complex field-based epidemiologic data, in which the disease prevalence varies as a continuous function of individual-level covariates, our model produces more accurate sensitivity and specificity estimates than previous methods. We apply this technique to a simulated population and to actual Chagas disease test data from a community near Arequipa, Peru. Results from our model estimate that the first-line enzyme-linked immunosorbent assay has a sensitivity of 78% (95% CI: 62–100%) and a specificity of 100% (95% CI: 99–100%). The confirmatory immunofluorescence assay is estimated to be 73% sensitive (95% CI: 65–81%) and 99% specific (95% CI: 96–100%).
Chagas disease; latent class analysis; Trypanosoma cruzi
Researchers modeling historical heights have typically relied on the restrictive assumption of a normal distribution, only the mean of which is affected by age, income, nutrition, disease, and similar influences. To avoid these restrictive assumptions, we develop a new semiparametric approach in which covariates are allowed to affect the entire distribution without imposing any parametric shape. We apply our method to a new database of height distributions for Italian provinces, drawn from conscription records, of unprecedented length and geographical disaggregation. Our method allows us to standardize distributions to a single age and calculate moments of the distribution that are comparable through time. Our method also allows us to generate counterfactual distributions for a range of ages, from which we derive age-height profiles. These profiles reveal how the adolescent growth spurt (AGS) distorts the distribution of stature, and they document the earlier and earlier onset of the AGS as living conditions improved over the second half of the nineteenth century. Our new estimates of provincial mean height also reveal a previously unnoticed “regime switch” from regional convergence to divergence in this period.
We consider the estimation of the parameters indexing a parametric model for the conditional distribution of a diagnostic marker given covariates and disease status. Such models are useful for the evaluation of whether and to what extent a marker’s ability to accurately detect or discard disease depends on patient characteristics. A frequent problem that complicates the estimation of the model parameters is that estimation must be conducted from observational studies. Often, in such studies not all patients undergo the gold standard assessment of disease. Furthermore, the decision as to whether a patient undergoes verification is not controlled by study design. In such scenarios, maximum likelihood estimators based on subjects with observed disease status are generally biased. In this paper, we propose estimators for the model parameters that adjust for selection to verification that may depend on measured patient characteristics and additonally adjust for an assumed degree of residual association. Such estimators may be used as part of a sensitivity analysis for plausible degrees of residual association. We describe a doubly robust estimator that has the attractive feature of being consistent if either a model for the probability of selection to verification or a model for the probability of disease among the verified subjects (but not necessarily both) is correct.
Missing at Random; Nonignorable; Missing Covariate; Sensitivity Analysis; Semiparametric; Diagnosis
Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman’s coalescent, Dirichlet diffusion trees and Wishart processes.
probabilistic modelling; Bayesian statistics; non-parametrics; machine learning
We present a semi-parametric deconvolution estimator for the density function of a random variable X that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean–variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length.
density estimation; measurement error; mean–variance function model
The timing of mother-to-child transmission (MTCT) of HIV is critical in understanding the dynamics of MTCT. It has a great implication to developing any effective treatment or prevention strategies for such transmissions. In this paper, we develop an imputation method to analyze the censored MTCT timing in presence of auxiliary information. Specifically, we first propose a statistical model based on the hazard functions of the MTCT timing to reflect three MTCT modes: in utero, during delivery and via breastfeeding, with different shapes of the baseline hazard that vary between infants. This model also allows that the majority of infants may be immuned from the MTCT of HIV. Then, the model is fitted by MCMC to explore marginal inferences via multiple imputation. Moreover, we propose a simple and straightforward approach to take into account the imperfect sensitivity in imputation step, and study appropriate censoring techniques to account for weaning. Our method is assessed by simulations, and applied to a large trial designed to assess the use of antibiotics in preventing MTCT of HIV.
HIV/AIDS; mixture models; mother to child transmission of HIV; multiple imputation
To evaluate the probabilities of a disease state, ideally all subjects in a study should be diagnosed by a definitive diagnostic or gold standard test. However, since definitive diagnostic tests are often invasive and expensive, it is generally unethical to apply them to subjects whose screening tests are negative. In this article, we consider latent class models for screening studies with two imperfect binary diagnostic tests and a definitive categorical disease status measured only for those with at least one positive screening test. Specifically, we discuss a conditional independent and three homogeneous conditional dependent latent class models and assess the impact of misspecification of the dependence structure on the estimation of disease category probabilities using frequentist and Bayesian approaches. Interestingly, the three homogeneous dependent models can provide identical goodness-of-fit but substantively different estimates for a given study. However, the parametric form of the assumed dependence structure itself is not “testable” from the data, and thus the dependence structure modeling considered here can only be viewed as a sensitivity analysis concerning a more complicated non-identifiable model potentially involving heterogeneous dependence structure. Furthermore, we discuss Bayesian model averaging together with its limitations as an alternative way to partially address this particularly challenging problem. The methods are applied to two cancer screening studies, and simulations are conducted to evaluate the performance of these methods. In summary, further research is needed to reduce the impact of model misspecification on the estimation of disease prevalence in such settings.
maximum likelihood; Bayesian inference; diagnostic test; dependence; screening; latent class models
We study a mixed-effects model in which the response and the main covariate are linked by position. While the covariate corresponding to the observed response is not directly observable, there exists a latent covariate process that represents the underlying positional features of the covariate. When the positional features and the underlying distributions are parametric, the expectation-maximization (EM) is the most commonly used procedure. Though without the parametric assumptions, the practical feasibility of a semi-parametric EM algorithm and the corresponding inference procedures remain to be investigated. In this paper, we propose a semiparametric approach, and identify the conditions under which the semiparametric estimators share the same asymptotic properties as the unachievable estimators using the true values of the latent covariate; that is, the oracle property is achieved. We propose a Monte Carlo graphical evaluation tool to assess the adequacy of the sample size for achieving the oracle property. The semiparametric approach is later applied to data from a colon carcinogenesis study on the effects of cell DNA damage on the expression level of oncogene bcl-2. The graphical evaluation shows that, with moderate size of subunits, the numerical performance of the semiparametric estimator is very close to the asymptotic limit. It indicates that a complex EM-based implementation may at most achieve minimal improvement and is thus unnecessary.
Carcinogenesis; Consistency; Generalized estimating equation; Local linear smoothing; Mixed-effects model
Exposure lagging and exposure-time window analysis are 2 widely used approaches to allow for induction and latency periods in analyses of exposure-disease associations. Exposure lagging implies a strong parametric assumption about the temporal evolution of the exposure-disease association. An exposure-time window analysis allows for a more flexible description of temporal variation in exposure effects but may result in unstable risk estimates that are sensitive to how windows are defined. The authors describe a hierarchical regression approach that combines time window analysis with a parametric latency model. They illustrate this approach using data from 2 occupational cohort studies: studies of lung cancer mortality among 1) asbestos textile workers and 2) uranium miners. For each cohort, an exposure-time window analysis was compared with a hierarchical regression analysis with shrinkage toward a simpler, second-stage parametric latency model. In each cohort analysis, there is substantial stability gained in time window-specific estimates of association by using a hierarchical regression approach. The proposed hierarchical regression model couples a time window analysis with a parametric latency model; this approach provides a way to stabilize risk estimates derived from a time window analysis and a way to reduce bias arising from misspecification of a parametric latency model.
cohort studies; hierarchical model; latency; neoplasms; regression
Recent trends to earlier access to anti-retroviral treatment underline the importance of accurate HIV diagnosis. The WHO HIV testing strategy recommends the use of two or three rapid diagnostic tests (RDTs) combined in an algorithm and assume a population is serologically stable over time. Yet RDTs are prone to cross reactivity which can lead to false positive or discordant results. This paper uses discordancy data from Médecins Sans Frontières (MSF) programmes to test the hypothesis that the specificity of RDTs change over place and time.
Data was drawn from all MSF test centres in 2007-8 using a parallel testing algorithm. A Bayesian approach was used to derive estimates of disease prevalence, and of test sensitivity and specificity using the software WinBUGS. A comparison of models with different levels of complexity was performed to assess the evidence for changes in test characteristics by location and over time.
106, 035 individuals were included from 51 centres in 10 countries using 7 different RDTs. Discordancy patterns were found to vary by location and time. Model fit statistics confirmed this, with improved fit to the data when test specificity and sensitivity were allowed to vary by centre and over time. Two examples show evidence of variation in specificity between different testing locations within a single country. Finally, within a single test centre, variation in specificity was seen over time with one test becoming more specific and the other less specific.
This analysis demonstrates the variable specificity of multiple HIV RDTs over geographic location and time. This variability suggests that cross reactivity is occurring and indicates a higher than previously appreciated risk of false positive HIV results using the current WHO testing guidelines. Given the significant consequences of false HIV diagnosis, we suggest that current testing and evaluation strategies be reviewed.