Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2879599

Formats

Article sections

- Summary
- 1. Introduction
- 2. Maximum Likelihood Estimators for Models with Homogeneous Dependence
- 3. Bayesian Estimation for Models with Homogeneous Dependence
- 4. Two Case Studies
- 5. Simulation Studies
- 6. Discussion
- Reference List

Authors

Related links

Stat Med. Author manuscript; available in PMC 2011 May 20.

Published in final edited form as:

Stat Med. 2010 May 20; 29(11): 1206–1218.

doi: 10.1002/sim.3862PMCID: PMC2879599

NIHMSID: NIHMS202362

See other articles in PMC that cite the published article.

To evaluate the probabilities of a disease state, ideally all subjects in a study should be diagnosed by a definitive diagnostic or gold standard test. However, since definitive diagnostic tests are often invasive and expensive, it is generally unethical to apply them to subjects whose screening tests are negative. In this article, we consider latent class models for screening studies with two imperfect binary diagnostic tests and a definitive categorical disease status measured only for those with at least one positive screening test. Specifically, we discuss a conditional independent and three homogeneous conditional dependent latent class models and assess the impact of misspecification of the dependence structure on the estimation of disease category probabilities using frequentist and Bayesian approaches. Interestingly, the three homogeneous dependent models can provide identical goodness-of-fit but substantively different estimates for a given study. However, the parametric form of the assumed dependence structure itself is not “testable” from the data, and thus the dependence structure modeling considered here can only be viewed as a sensitivity analysis concerning a more complicated non-identifiable model potentially involving heterogeneous dependence structure. Furthermore, we discuss Bayesian model averaging together with its limitations as an alternative way to partially address this particularly challenging problem. The methods are applied to two cancer screening studies, and simulations are conducted to evaluate the performance of these methods. In summary, further research is needed to reduce the impact of model misspecification on the estimation of disease prevalence in such settings.

Screening for a specific disease or condition is a fundamental component of human disease control and prevention. The objective of screening is to classify asymptomatic people as likely or unlikely to have the disease or condition of interest. People who appear likely to have the disease or condition are examined further for a diagnosis, and those people who are diagnosed with the disease are treated. Therefore, screening can reduce the morbidity and mortality of the disease among people screened and can enable early treatment for diagnosed cases. Screening programs for cancer and heart diseases are well established in many countries. In many screening programs, a population with known size *n* is screened by two imperfect binary diagnostic tests. If the results of both diagnostic tests are negative, no further screening is undertaken. If either of the two diagnostic tests is positive, then a full evaluation of the disease using a gold standard classification is undertaken [1].

For estimating diagnostic accuracy without a gold standard, it is well known that if the conditional independence assumption is incorrectly assumed, parameter estimates may be biased [2–4]. When the disease status D is a binary random variable, Albert and Dodd [5] showed that the estimation of diagnostic accuracy and prevalence is sensitive to the choice of dependence structure for studies with multiple diagnostic tests. The dependence structure was specified using a Gaussian random effects model [6,7] and a finite mixture model [8]. They showed that it is difficult to distinguish between different dependence structures in the absence of a gold standard test in most practical situations (i.e., unless there are more than 10 tests). Albert [9] proposed methods for estimating diagnostic accuracy of multiple binary tests with an imperfect reference standard when information about the diagnostic accuracy of the imperfect test is available from external data sources. Furthermore, using the same dependence structure, Albert and Dodd [10] examined the effect of model misspecification on the estimation of test accuracy and prevalence when a binary gold standard is partially verified. They showed that for extreme biased sampling the estimation is sensitive to the choice of dependence structure. Other latent class models with a focus on diagnostic accuracy have also been considered in a single study [11,12] as well as in a meta-analysis [13]. In addition, Black and Craig [14] discussed the estimation of disease prevalence in a scenario involving two imperfect tests in the absence of a gold standard and proposed Bayesian model averaging for inference over the conditional independence and dependence models. However, those dependence models are not directly applicable in the setting that we are considering because there are only two diagnostic tests, and more importantly, if both diagnostic tests are negative, no further gold standard classification will be applied.

Let T_{1}, T_{2}, and D be the random variables denoting the two screening tests and the disease status, respectively. In this article, we consider T_{1} and T_{2} to be binary variables with value 1 indicating test positive and 0 indicating test negative, and D to be a categorical variable with value d = *1, 2,*…*, K* indicating the classes of disease. Let
${x}_{ij}^{d}$ be the observed frequency with *D*= *d*, *T*_{1} = *i* and *T*_{2} = *j* (*i = 0, 1* and *j = 0, 1*),
${x}_{ij}={\displaystyle \sum _{d}}{x}_{ij}^{d}$ be the observed frequency with *T*_{1} = *i* and *T*_{2} = *j*, and
$n={\displaystyle \sum _{i}}{\displaystyle \sum _{j}}{x}_{ij}$ be the total number of observations. Furthermore, let *π _{ij}* =

One way to write the likelihood function (ignoring constant terms) in this setting is in terms of *P _{d}* and
${P}_{ij}^{d}$ (

$${x}_{00}log\left(\sum _{d}{P}_{00}^{d}{P}_{d}\right)+\sum _{d}{x}_{11}^{d}log({P}_{11}^{d}{P}_{d})+\sum _{d}{x}_{10}^{d}log({P}_{10}^{d}{P}_{d})+\sum _{d}{x}_{01}^{d}log({P}_{01}^{d}{P}_{d}).$$

(1)

This parameterization involves a mixture likelihood in the first term and prevents a closed-form solution for the maximum likelihood estimators (MLE). It contains *4d−1* free parameters. Without further assumptions, the parameters in equation (1) are not identifiable. However, this parameterization allows for direct specification of commonly used assumptions, usually specified through some constrains on
${P}_{ij}^{d}$. For example, the frequently used conditional independence assumption [15,16] assumes that the two tests T_{1} and T_{2} are independent conditioning on the disease status D, i.e., T_{1}T_{2}|D, and the number of free parameters in equation (1) is reduced to 3*d*−1 giving model identification since
${P}_{11}^{d}={P}_{1+}^{d}{P}_{+1}^{d},{P}_{10}^{d}={P}_{1+}^{d}(1-{P}_{+1}^{d}),{P}_{01}^{d}={P}_{1+}^{d}(1-{P}_{1+}^{d})$, and
${P}_{00}^{d}=(1-{P}_{1+}^{d})(1-{P}_{+1}^{d})$. For convenience, we denote the conditional independence model as the model with
${\widehat{P}}_{d}^{\perp}$ as the corresponding MLEs. Under the homogeneous dependence assumptions (i.e., the α, θ, and ρ models that will be discussed in Sections 2 and 3), the number of free parameters in equation (1) is reduced to 3*d* and the models become saturated and equivalent to the alternative parameterization below [17].

An alternative parameterization of the log-likelihood function can be written in terms of *π _{ij}* (

$$\sum _{i}\sum _{j}{x}_{ij}log({\pi}_{ij})+\sum _{d}{x}_{11}^{d}log({\pi}_{11}^{d})+\sum _{d}{x}_{10}^{d}log({\pi}_{10}^{d})+\sum _{d}{x}_{01}^{d}log({\pi}_{01}^{d}).$$

(2)

This representation relates to previous work in other settings [18–20]. This model is a saturated model with *3d* parameters. The maximum likelihood equations are tractable and yield MLEs in closed-form. Omitting the algebra, we obtain, * _{ij}* =

In similar settings when the gold standard was only measured on those who screened positive, Cheng et al. [22] and Pepe and Alonzo [23] have examined the potential overwhelming impact of the correlation between the two screening tests on the estimation of absolute test accuracy parameters. Both suggest using relative test accuracy for comparing disease screening tests. However, to our knowledge, no one has assessed the impact of the misspecification of conditional dependence structures, which can be specified by a homogeneous dependence parameter for two diagnostic tests, on the estimation of disease class probabilities in such screen-positive ascertained studies.

In this article, we empirically assess the impact of misspecification of the conditional dependence structure on the estimation of disease class probabilities through two case studies and simulations. Specifically, in Sections 2.1 and 2.2, we define the MLEs for the homogeneous dependent α and θ models, and in Section 2.3 we propose the homogeneous correlation coefficient conditional dependent ρ model. Bayesian approaches, which incorporate prior beliefs about dependence, are developed for the three models in Section 3 as an alternative to the maximum likelihood methods. Furthermore, we discuss Bayesian model averaging in Section 3.4 as an alternative way to address the challenging estimation problem since the three homogeneous dependent models can provide the same goodness-of-fit for the data but substantively different estimates [17] and the dependence structure itself is not “testable”. In Section 4, we compare the results for the two case studies using both the maximum likelihood methods and the Bayesian approaches. The two case studies were reanalyzed recently by Böhning and Patilea [1] using a capture-recapture approach under the α and θ model assumptions. Our focus here is to compare the estimates under the α, θ and ρ model assumptions using both the maximum likelihood methods and Bayesian approaches. A simulation study is conducted in Section 5 and a brief discussion is presented in Section 6.

In Section 2.1 and 2.2, we will briefly introduce the homogeneous conditional dependence α and θ models, recently proposed by Böhning and Patilea [1] using a capture-recapture approach. Using the alternative parameterization of the maximum likelihood as presented in equation (2), Chu and Nie [17] presented closed-form maximum likelihood solutions under the α and θ model assumptions.

Under this model, the association of the two tests T_{1} and T_{2} conditional on the disease status D as measured by the odds ratio is assumed to be homogeneous over all disease categories, i.e.,
${\alpha}_{d}=\frac{{P}_{11}^{d}{P}_{00}^{d}}{{P}_{01}^{d}{P}_{10}^{d}}$ (*d* = *1, 2,* …*, K*) is assumed to be homogenous. By Bayes’ theorem, we obtain
${\alpha}_{d}=\frac{{\pi}_{11}{\pi}_{00}}{{\pi}_{01}{\pi}_{10}}\times \frac{{\pi}_{11}^{d}}{{\pi}_{01}^{d}{\pi}_{10}^{d}}{\pi}_{00}^{d}$. With
$\sum _{d}}{\pi}_{00}^{d}=1$ and simple algebra, we obtain the solution of
$\alpha =\frac{{\pi}_{11}{\pi}_{00}}{{\pi}_{01}{\pi}_{10}}\times {\left[{\displaystyle \sum _{d}}\frac{{\pi}_{01}^{d}{\pi}_{10}^{d}}{{\pi}_{11}^{d}}\right]}^{-1}$ under this homogeneity assumption. Thus, by plugging in the closed-form solutions of MLEs of *π* (*i, j =0, 1*) and
${\pi}_{ij}^{d}\phantom{\rule{0.16667em}{0ex}}(i+j>0)$ from equation (2), the closed-form MLEs of α and
${P}_{d}^{\alpha}$ are

$$\widehat{\alpha}=n\times \frac{{x}_{11}^{+}{x}_{00}^{+}}{{x}_{01}^{+}{x}_{10}^{+}}\times {\left(\sum _{d}\frac{{x}_{01}^{d}{x}_{10}^{d}}{{x}_{11}^{d}}\right)}^{-1},{\widehat{P}}_{d}^{\alpha}=\frac{1}{n}\left[{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}+{x}_{00}\frac{{x}_{01}^{d}{x}_{10}^{d}}{{x}_{11}^{d}}{\left(\sum _{d}\frac{{x}_{01}^{d}{x}_{10}^{d}}{{x}_{11}^{d}}\right)}^{-1}\right],$$

(3)

where the superscript α indicates the homogeneous odds ratio assumption.

Under this model, ratio of conditional (conditional on test T_{2} being positive) and unconditional probabilities of test T_{1} being positive is assumed to be homogeneous over all disease categories, i.e.,
${\theta}_{d}=\frac{{P}_{1\mid 1}^{d}}{{P}_{1+}^{d}}$ (*d* = *1, 2,* …*, K*) is assumed to be homogenous. By Bayes’ theorem, we obtain
${\theta}_{d}=\frac{{\pi}_{11}{\pi}_{11}^{d}}{({\pi}_{01}{\pi}_{01}^{d}+{\pi}_{11}{\pi}_{11}^{d})({\pi}_{10}{\pi}_{10}^{d}+{\pi}_{11}{\pi}_{11}^{d})}\times {P}_{d}$. With
$\sum _{d}}{P}_{d}=1$ and simple algebra, we obtain the solution of
$\theta ={\left[{\displaystyle \sum _{d}}\frac{1}{{\pi}_{11}{\pi}_{11}^{d}}({\pi}_{10}{\pi}_{10}^{d}+{\pi}_{11}{\pi}_{11}^{d})({\pi}_{01}{\pi}_{01}^{d}+{\pi}_{11}{\pi}_{11}^{d})\right]}^{-1}$. Thus, the closed- form MLEs of θ and
${P}_{d}^{\theta}$ are

$$\widehat{\theta}=n{\left[\sum _{d}\frac{1}{{x}_{11}^{d}}({x}_{10}^{d}+{x}_{11}^{d})({x}_{01}^{d}+{x}_{11}^{d})\right]}^{-1},{\widehat{P}}_{d}^{\theta}=\frac{1}{n}\left[\frac{{x}_{1+}^{d}{x}_{+1}^{d}}{{x}_{11}^{d}}{\left(\sum _{d}\frac{{x}_{1+}^{d}{x}_{+1}^{d}}{{x}_{11}^{d}}\right)}^{-1}\right],$$

(4)

where the superscript θ indicates the homogeneous relative risk assumption.

In this section, we propose an alternative homogeneous conditional dependence model, the ρ model. Under this model, the correlation of the two tests T_{1} and T_{2} is assumed to be homogeneous over all disease categories, i.e., *ρ _{d}*(

$$\underset{d}{max}\left\{-\sqrt{\frac{{P}_{+1}^{d}{P}_{1+}^{d}}{(1-{P}_{+1}^{d})(1-{P}_{+1}^{d})}},-\sqrt{\frac{(1-{P}_{+1}^{d})(1-{P}_{+1}^{d})}{{P}_{+1}^{d}{P}_{1+}^{d}}}\right\}\le \rho \le \underset{d}{min}\left\{\sqrt{\frac{(1-{P}_{+1}^{d}){P}_{1+}^{d}}{{P}_{+1}^{d}(1-{P}_{1+}^{d})}},\sqrt{\frac{{P}_{+1}^{d}(1-{P}_{1+}^{d})}{(1-{P}_{+1}^{d}){P}_{1+}^{d}}}\right\}.$$

(5)

Let the MLEs of ρ and *P _{d}* be denoted as and
${\widehat{P}}_{d}^{\rho}$, where the superscript ρ indicates the homogeneous correlation coefficient assumption. They do not have a closed-form solution under this model.

All three models assume a homogeneous dependence structure. This is a rather strong assumption. However, because all three homogeneous dependent models are already saturated, heterogeneous dependent models are not identifiable without additional constraints on test accuracy parameters (e.g., assuming the test accuracy parameters are the same for the two diagnostic tests, which is a much stronger assumption in general). Furthermore, the three homogeneous dependent models can provide the same goodness-of-fit for the data but substantively different estimates [17], a natural way addressing this problem might be through the frequentist model average estimators [24]. Let *w _{α}*,

In practice, it is often of interest to test the difference between the estimated probabilities of disease states using different dependence assumptions (i.e., the α, θ or ρ model). Since the closed-form maximum likelihood solutions for the α and θ models are based on the likelihood function as presented in equation (2), a Wald-type test comparing ${\widehat{p}}_{d}^{\alpha}-{\widehat{p}}_{d}^{\theta}$ is directly available with the standard error $se\left({\widehat{p}}_{d}^{\alpha}-{\widehat{p}}_{d}^{\theta}\right)$ obtained by the delta method. Due to the technical difficulty of computing the variance-covariance matrix between ( ${\widehat{P}}_{d}^{\alpha},{\widehat{P}}_{d}^{\theta}$) and ${\widehat{P}}_{d}^{\rho}$, comparing the MLEs of ( ${\widehat{P}}_{d}^{\alpha},{\widehat{P}}_{d}^{\theta}$) with ${\widehat{P}}_{d}^{\rho}$ is not straightforward. In practice, bootstrapping methods can be used as an alternative way to compute the corresponding p-values and 95% confidence intervals [25].

We developed a SAS macro (SAS Institute, Cary, NC) to implement the models discussed above parameterized both in terms of *P _{d}* and
${P}_{ij}^{d}$ as in equation (1) for the homogeneous ρ model, and in terms of

In this Section, we discuss the Bayesian approaches [27,28]. Because the Bayesian approach and the frequentist approach use different frameworks, they can be considered complementary. When relatively large studies are combined with weak prior distributions, inferences obtained by Bayesian and frequentist methods generally agree. However, the Bayesian framework is particularly attractive when suitable prior distributions can be constructed to incorporate known constraints and subject-matter knowledge on model parameters [29]. The Bayesian framework allows direct construction of 100(1−α)% equal tail and highest probability density (HPD) credible intervals of general functions of the estimated parameters without having to rely on asymptotic approximations. Furthermore, the Bayesian framework provides direct implementation of model averaging [30], which provides a natural way to address the problem of selecting a model from several competing models that give equal goodness-of-fit but potentially different inferences for a particular study.

To implement the constrains of
$\sum _{i}}{\displaystyle \sum _{j}}{P}_{ij}^{d}=1$ and
${\alpha}_{d}=\frac{{P}_{11}^{d}{P}_{00}^{d}}{{P}_{01}^{d}{P}_{10}^{d}}=\alpha $ under the *α* model, we re-parameterize
${P}_{11}^{d},{P}_{01}^{d},{P}_{10}^{d}$ and
${P}_{00}^{d}$ as follows,

$$\begin{array}{l}{P}_{11}^{d}=\frac{\alpha exp({a}_{d}+{b}_{d})}{1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})},{P}_{10}^{d}=\frac{exp({a}_{d})}{1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})}\\ {P}_{01}^{d}=\frac{exp({b}_{d})}{1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})},{P}_{00}^{d}=\frac{1}{1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})}.\end{array}$$

(6)

Let *f*(*α*, *a _{d}*,

$${({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{({P}_{11}^{d})}^{{x}_{11}^{d}}{({P}_{10}^{d})}^{{x}_{10}^{d}}{({P}_{01}^{d})}^{{x}_{01}^{d}}{\left(\sum _{d}{P}_{d}{P}_{00}^{d}\right)}^{{x}_{00}}f(\alpha ,{a}_{d},{b}_{d},{P}_{d})=\frac{{({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{\alpha}^{{x}_{11}^{d}}exp\left[{a}_{d}({x}_{11}^{d}+{x}_{10}^{d})+{b}_{d}({x}_{11}^{d}+{x}_{01}^{d})\right]}{{\left[1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})\right]}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}}\times {\left(\sum _{d}\frac{{P}_{d}}{1+exp({a}_{d})+exp({b}_{d})+\alpha exp({a}_{d}+{b}_{d})}\right)}^{{x}_{00}}\times f\left(\alpha ,{a}_{d},{b}_{d},{P}_{d}\right)\times I\left(\alpha >0\right).$$

(7)

To implement the constrains of $\sum _{i}}{\displaystyle \sum _{j}}{P}_{ij}^{d}=1$ and ${\theta}_{d}={P}_{1\mid 1}^{d}/{P}_{1+}^{d}=\theta $ under the θ model, we re-parameterize ${P}_{11}^{d},{P}_{01}^{d},{P}_{10}^{d}$ and ${P}_{00}^{d}$ as follows,

$${P}_{11}^{d}=\theta {P}_{1+}^{d}{P}_{+1}^{d},{P}_{10}^{d}={P}_{1+}^{d}(1-\theta {P}_{+1}^{d}),{P}_{01}^{d}={P}_{+1}^{d}(1-\theta {P}_{1+}^{d}),{P}_{00}^{d}=1-{P}_{1+}^{d}-{P}_{+1}^{d}+\theta {P}_{1+}^{d}{P}_{+1}^{d}.$$

Let
$f(\theta ,{P}_{1+}^{d},{P}_{+1}^{d},{P}_{d})$ be the prior joint distribution of (*θ*,
${P}_{1+}^{d},{P}_{+1}^{d}$, *P _{d}*) (

$${({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{({P}_{11}^{d})}^{{x}_{11}^{d}}{({P}_{10}^{d})}^{{x}_{10}^{d}}{({P}_{01}^{d})}^{{x}_{01}^{d}}{\left(\sum _{d}{P}_{d}{P}_{00}^{d}\right)}^{{x}_{00}}f(\theta ,{P}_{1+}^{d},{P}_{+1}^{d},{P}_{d})={({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{({P}_{1+}^{d})}^{{x}_{11}^{d}+{x}_{10}^{d}}{({P}_{+1}^{d})}^{{x}_{11}^{d}+{x}_{01}^{d}}{\theta}^{{x}_{11}^{d}}{(1-\theta {P}_{+1}^{d})}^{{x}_{10}^{d}}{(1-\theta {P}_{1+}^{d})}^{{x}_{01}^{d}}\times {\left[\sum _{d}{P}_{d}(1-{P}_{1+}^{d}-{P}_{+1}^{d}+\theta {P}_{1+}^{d}{P}_{+1}^{d})\right]}^{{x}_{00}}\times f(\theta ,{P}_{1+}^{d},{P}_{+1}^{d},{P}_{d})\times I(\theta >0)\times I(1-\theta {P}_{+1}^{d}>0)\times I(1-\theta {P}_{1+}^{d}>0)\times I(1-{P}_{1+}^{d}-{P}_{+1}^{d}+\theta {P}_{1+}^{d}{P}_{+1}^{d}>0).$$

(8)

The feasible range of θ is determined by the marginal probability of testing positive ${P}_{1+}^{d}$ and ${P}_{+1}^{d}$ and is implemented through the addition of the four indicator functions I(·) in equation (8).

To implement the constrains of
$\sum _{i}}{\displaystyle \sum _{j}}{P}_{ij}^{d}=1$ and *ρ _{d}*=

$${({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{({P}_{11}^{d})}^{{x}_{11}^{d}}{({P}_{10}^{d})}^{{x}_{10}^{d}}{({P}_{01}^{d})}^{{x}_{01}^{d}}{\left(\sum _{d}{P}_{d}{P}_{00}^{d}\right)}^{{x}_{00}^{+}}f(\rho ,{P}_{1+}^{d},{P}_{+1}^{d},{P}_{d})={({P}_{d})}^{{x}_{11}^{d}+{x}_{10}^{d}+{x}_{01}^{d}}{({P}_{1+}^{d}{P}_{+1}^{d}+{\delta}_{d})}^{{x}_{11}^{d}}{({P}_{1+}^{d}-{P}_{1+}^{d}{P}_{+1}^{d}-{\delta}_{d})}^{{x}_{10}^{d}}{({P}_{+1}^{d}-{P}_{+1}^{d}{P}_{1+}^{d}-{\delta}_{d})}^{{x}_{01}^{d}}\times {\left\{\sum _{d}{P}_{d}\left[(1-{P}_{1+}^{d})(1-{P}_{+1}^{d})+{\delta}_{d}\right]\right\}}^{{x}_{00}}\times f(\rho ,{P}_{1+}^{d},{P}_{+1}^{d},{P}_{d})\times I({P}_{1+}^{d}{P}_{+1}^{d}+{\delta}_{d}>0)\times I({P}_{1+}^{d}-{P}_{1+}^{d}{P}_{+1}^{d}-{\delta}_{d}>0)\times I({P}_{+1}^{d}-{P}_{+1}^{d}{P}_{1+}^{d}-{\delta}_{d}>0)\times I(1-{P}_{1+}^{d}-{P}_{+1}^{d}+{P}_{1+}^{d}{P}_{+1}^{d}+{\delta}_{d}>0).$$

(9)

The feasible range of correlation determined by the marginal probability of test positive ${P}_{1+}^{d}$ and ${P}_{+1}^{d}$ as in equation (5) is implemented through the addition of the four indicator functions I(·) in equation (9).

The homogeneous dependence models are saturated. Therefore, they provide the same goodness-of-fit for the data, but can provide substantively different estimates. Bayesian model averaging (BMA) provides a natural way to address this problem [30]. The posterior distribution of the quantity of interest *P _{d}* given data is

$$pr({P}_{d}\mid \mathit{Data})=\sum _{k=1}^{K}pr({P}_{d}\mid {M}_{k},\mathit{Data})pr({M}_{k}\mid \mathit{Data}),$$

(10)

where *M*_{1} …, *M _{K}* are the models considered, and the posterior probability for model

In the Bayesian models discussed above, computation was done using Markov chain Monte Carlo (MCMC) [31] in WinBUGS [32] and BRUGs in R (http://www.r-project.org). Burn-in consisted of 50,000 iterations; 50,000 subsequent iterations were used for posterior summaries. Convergence of Markov chains was assessed using the Gelman and Rubin convergence statistic [33,34]. To describe disease class prevalence and to implement the constrain of 0 < *P* < 1 and
$\sum _{d}}{P}_{d}=1$, we use the linear generalized logit model which has inverse link function defined as
${P}_{d}=\frac{exp({\beta}_{d})}{{\displaystyle \sum _{d}}exp({\beta}_{d})}$ (*d* = *1, 2,* …*, K*) with *β _{K}* = 0 [26]. We selected proper but diffuse prior distributions for the hyperparameters [35]. Specifically, the hyper-priors for the parameters were assumed to be as follows: 1) Vague priors of N(0, 10

For the purpose of comparing the performance of different models, we reanalyzed the data from two screening studies, in which the disease status has been evaluated only for those who tested positive for at least one of the two tests. The first study consists of data from the Health Insurance Plan Study for screening breast cancer in New York [36]. The study was carried out by the Health Insurance Plan, a prepaid comprehensive medical care plan with 750,000 subscribers enrolled in 31 medical groups. Periodic screening for breast cancer using mammography as well as clinical physical examination was performed for women aged 40 to 64 years who were chosen at random. In this study, 307 out of 20,211 women, who were test positive by either physical examination or mammography, underwent biopsy for the classification of two disease states: no cancer (*d* = 1) or cancer (*d* = 2). The second study is the multicenter study comparing cervicography with the standard pap smear cytology test for detecting cervical cancer between November 1991 and December 1992 [37]. In this study, 228 out of 5,192 women, who were test positive by either cervicography or the standard pap smear cytology test, underwent biopsy for the classification of three disease states: not present (*d* = 1), low grade (condyloma) (*d* = 2) and high grade (invasive cancer) (*d* = 3). Table 1 presents the observed frequencies in the two screening studies.

Table 2 presents the estimates of the conditional dependence parameters (i.e., α, θ and ρ) when using both the maximum likelihood method and the Bayesian method. We use the triple of percentiles, _{2.5}50_{97.5}, to display a parameter estimate (or posterior median) with its 95% confidence (or credible) interval, as suggested by Louis and Zeger [38]. In summary, both approaches suggest statistically significant dependence when using all three models for the two studies. Tables 3 and and44 present the estimated probabilities of the disease classes under the three homogenous dependence models as well as under the independence model, when using the maximum likelihood method and the Bayesian method, respectively. The twice negative likelihood is presented in Table 3 for comparing the goodness-of-fit of the independent model, and the homogeneous dependent α, θ, and ρ models, which demonstrate that the α, θ, and ρ models give exactly the same goodness-of-fit for both studies. In addition, the BMA estimates across the three conditional dependence models are presented in Table 4. In summary, the estimates were consistent between the maximum likelihood and Bayesian approaches except for the probabilities of not present and low grade cervical cancer in the multicenter study detecting cervical cancer using the ρ model, potentially due to the constrains implemented in the Markov chain Monte Carlo samplings. Specifically, the estimated probability of low grade cervical cancer is estimated to be _{54}255_{883} per 1000 women using the Bayesian approach, but only _{0}115_{308} per 1000 women using the maximum likelihood method.

Summary of parameter estimates for conditional dependence using the maximum likelihood method and Bayesian model. The triple notation of _{L}P_{U} denotes the point estimate P with 95% Wald-type confidence limits (L, U).

Summary of parameter estimates using the maximum likelihood method under the assumption of homogenous dependence. The triple notation of _{L}P_{U} denotes the point estimate P with 95% Wald-type confidence limits (L, U). The estimates of the probabilities of **...**

Summary of posterior estimates using the Bayesian approach under the assumption of homogenous dependence. The triple notation of _{L}P_{U} denotes the posterior median P with 95% equal tailed credible limits (L, U). The posterior estimates of the probabilities **...**

As an interesting observation, we found that the difference between the estimated probabilities of disease states using different dependence assumptions (i.e., the α, θ or ρ model) can be statistically significant and practically meaningful. For example, in the Health Insurance Plan Study for breast cancer screening in New York, the estimated probability of having breast cancer using the maximum likelihood method is _{3}48_{93} per one thousand women assuming the α model, while the estimate is _{28}75_{122} per one thousand women assuming the θ model, and _{2}7_{11} per one thousand women assuming the ρ model. The difference between the estimated probabilities of having breast cancer assuming α and θ models is _{14}27_{40} per one thousand women with a p-value less than 0.001 by a Wald-type test. The non-overlapping 95% confidence intervals between the estimated probabilities assuming the ρ model and the α (or θ) model suggests a statistically significant difference at least at the 5% significant level. In addition, using the maximum likelihood approach, the estimated probability of having invasive high grade cervical cancer is _{28}61_{94} per one thousand in the multicenter study for detecting cervical cancer assuming the θ model, which is about eight times higher than the estimate of _{5}8_{11} per one thousand women assuming the ρ model, and the 95% confidence intervals do not overlap. The Bayesian approaches gave similar inferences to the frequentist approaches. This substantial difference in estimated probability high grade cervical cancer can have an impact on cancer surveillance and prevention. Unfortunately, the data does not contain any information to differentiate those dependent models since they all give the same goodness-of-fit. Thus, without some sensible assumptions, the disease prevalence may not be estimable from the data set, even with Bayesian model averaging, particularly if proposed models in BMA do not contain the correct model (which is arguably true in practice given that an infinite large number of models exist and potentially many can give same goodness of fit).

To further study how the disease status probability estimates vary with the dependent model assumption and to evaluate the impact of misspecification of different dependent models on the estimation of the probabilities of disease classes, we performed four sets of simulations assuming the independent model, the α, θ, and ρ dependent models, respectively. For ease of presentation and interpretation, we considered two disease strata. The simulation parameters are: the probabilities of disease classes *P _{d}* = (0.8,0.2), the marginal conditional probabilities of test T

Table 5 presents the means of the estimated disease prevalence across 2,000 replicates, using both the maximum likelihood and Bayesian approaches. For the Bayesian models, posterior medians were used as estimates for disease prevalence for a single replicate. If the true underlying model is the conditional independence model, fitting the α, θ and ρ dependence models will provide unbiased estimates for the disease prevalence. However, if the underlying model is one of the three dependence models, assuming independence provides biased estimates for disease prevalence. In addition, if the underlying model is a dependence model, assuming an incorrect dependence structure leads to biased estimates for disease prevalence. One interesting observation is that Bayesian model averaging (BMA) estimates tend to be less biased than the estimates under a misspecified dependence model. Furthermore when the underlying model is the α dependent model, the BMA estimates lead to nearly unbiased estimates. For all the scenarios, the maximum likelihood and Bayesian approaches provide similar estimates.

The means of estimated disease prevalence (true value = 0.2) based on simulation studies with 2000 replicates. The bolded cells represent the correctly chosen model. For the Bayesian models, posterior medians were used as estimates for disease prevalence **...**

Table 6 presents the average length of the 95% confidence/credible intervals or the precision of the disease prevalence estimates across 2,000 replicates when using the maximum likelihood and Bayesian approaches. We found that if the true underlying model is the conditional independence model, assuming the α and θ dependence models leads to intervals that are too wide. For example, the 95% confidence/credible interval length using the θ dependence model is about twice than that using the true independence model. This suggests a substantive efficiency loss when conservatively assuming the α and θ dependence models. However, if the ρ dependence model is assumed, the average interval lengths are only slightly inflated. On the other hand, if the underlying model structure is one of the three dependence models, assuming independence leads to intervals that are too narrow (and biased). In addition, if the underlying model is the θ model, incorrectly assuming the α and ρ dependence models also leads to underestimation of the interval length. Furthermore, the results are highly concordant between the maximum likelihood and Bayesian approaches. Note that the average interval lengths of the BMA estimates are generally larger than those under any dependence model alone, regardless of whether the dependence model is correctly or incorrectly specified. This is due to the fact that the BMA estimates incorporate the additional uncertainty from model specification.

The 95% confidence/credible interval length of disease prevalence based on simulation studies with 2000 replicates. The bolded cells represent the correctly chosen model.

Table 7 presents the coverage performance of the 95% confidence/credible intervals of the disease prevalence across 2,000 replicates using both the maximum likelihood and Bayesian approaches. The coverage upon misspecification using dependent models is still around 95% if the true underlying model is the conditional independence model, possibly due to the negligible bias and wider confidence/credible intervals upon such misspecification, as suggested in Tables 5 and and6.6. However, if the underlying model structure is one of the three dependent models, the coverage upon misspecification decreases as the degree of dependence increases and as the sample size increases. In addition, the results suggest that if the underlying model is the ρ model, decent coverage tends to be difficult when the model is misspecified. In general, the Bayesian 95% credible intervals show slightly better coverage compared with the maximum likelihood 95% confidence intervals. More importantly, the coverage of the BMA intervals generally exceeds 90%, which has much better performance than the intervals from any single misspecified model. One reason for the better coverage using BMA is that such intervals are generally wider than those under a single model alone, and the true underlying model is included in the model averaging.

For screening studies where a categorical disease status is verified only if at least one out of the two binary screening tests being positive, we investigated three homogeneous dependence models (i.e., the α, θ, and ρ models) with two case studies and four sets of simulation studies, in which the ρ model is proposed in this paper. If the true underlying model is the conditional independence model, assuming the α and θ dependence models leads to intervals that are too wide (i.e., the 95% confidence/credible interval length of the θ model can be as twice as that of the model), while the ρ dependence model only slightly inflated the average interval lengths. By two real data analyses and simulation studies, we demonstrated that the three homogeneous dependence models can provide substantively different estimates for a study although with the same goodness-of-fit. We discussed both the frequentist and Bayesian approaches, and evaluated the impact of model misspecification on the estimation of disease class probabilities. Furthermore, we discussed Bayesian model averaging as an alternative way to partially address this particularly challenging estimation problem. Although we focused on the inference of disease class probabilities in this article, the same conclusion applies to the inference of the cell probabilities for two negative tests, i.e., ${\pi}_{00}^{d}$, and the unknown cell frequency ${x}_{00}^{d}$. We did not discuss the impact of misspecification of dependence structures on the estimation of test accuracies because it has been well studied from frequentist perspective [5,10,39]. It might be of interest to compare the performance of frequentist and Bayesian approaches on the estimation of test accuracy parameters under different settings such as low, moderate and high sensitivities and specificities.

The results imply that large differences in the estimated disease class probabilities may occur when assuming different dependence models, which can have a substantial impact on disease surveillance and prevention. Other more robust statistical methods, e.g. the generalized estimation equations [40,41], may be used to reduce the impact from misspecification of the dependence structure in this setting. We do not intend to suggest that these homogeneous dependence models are useless in practice because we cannot statistically differentiate between them based on the data alone. Caution against using these models due to the possible misspecification should be balanced with the need to estimate disease status probabilities. Furthermore, we realize that there are many more potential dependence structures than what we have considered, e.g., one could argue that the tests are dependent only for the cases but are independent for the controls [42]. Depending on the problem in hand, some assumptions may be justifiable and preferable. In addition, note that the indistinguishable characteristic of these models is based on goodness-of-fit statistics alone. We can always use additional information such as expert opinion, historic information on sensitivities and specificities of the two binary diagnostic tests, and/or the range of dependence parameters to assist our choice of selecting a homogeneous dependence model. For the Bayesian approach, the additional information can be formulated as informative priors to improve the posterior inference. However, how to solicit and formulate informative priors in this case deserves thorough investigation and is beyond our current scope.

A potential strategy to justify the homogeneous assumptions of the α, θ and ρ is to incorporate a design element into the screening study that allows the selection of homogeneity models. This strategy could be randomly selecting a subset of both test negatives for ascertainment by a gold standard. However, in cases when a gold standard test is invasive and/or expensive, it is generally considered unethical to apply it to subjects whose screening tests are negative. In this case, if historical data or additional sample from a set of confirmed cases and controls in a similar population is available for determining test accuracy parameters, one can use the data to guide the selection of homogeneity dependence models.

In cases when there is no scientific justification to prefer a particular dependence model over the others, we suggest to treat those dependence models (including the three homogeneous dependence models that we have considered) as sensitivity analyses, and investigate how the dependence structure will impact the estimation of probabilities of disease classes. If there is a clinically significant difference, caution should be taken with any statistical inference. As a last choice, if the dependence structure assumption cannot be reasonably determined, Bayesian model averaging (BMA) may be preferable to any single model, but there is a heavy price to pay for the BMA: 1) the computations become more complex and 2) the credible intervals get much larger in some cases (to reflect the added uncertainty).

Assuming that models used in the Bayesian averaging includes the correctly specified model, the simulation results show that BMA inference generally performs better than any misspecified model alone, especially with respect to the interval coverage performance. In practice, all candidate models can be misspecified and thus one can argue that Bayesian model averaging may not be effective in reducing bias. Intuitively, if some models tend to overestimate and the other models tend to underestimate the parameters of interest, then the Bayesian model averaging will be effective in reducing bias compared to a specific misspecified model. However, we realize that if all models tend to overestimate (or underestimate) the parameters of interest or if the estimates from the incorrect models are far away from the correct model estimates, then the Bayesian model averaging may not be effective in reducing bias. In addition, because the data do not contain information to distinguish between conditional dependence models, one should not expect that the posterior model probabilities to be accurately estimated in practice, casting some doubt on the utility of the BMA estimate in this case.

In this article we consider only homogeneous dependence models which are identifiable from are data setting. Some researchers [43,44] have argued that one can do better in using a non-identifiable model with some informative prior information compared to a less realistic model with strong assumptions that is identifiable. Further research on an expanded model, potentially with heterogeneous dependence structure, may shed more light on the impact of prior misspecification versus model misspecification and the trade-off between an expanded non-identifiable model with less model assumption but more prior assumption and an identifiable model with stronger model assumption but less prior assumption on the estimation of disease prevalence in the case that we discussed.

We assumed that a perfect gold standard (or definitive) test exists, which may limit the usage of the proposed methods, because arguably all diagnostic tests are imperfect and even those with theoretically perfect properties can be rendered imperfect by laboratory or human errors. It may be fruitful for further methodological research to incorporate measurement errors of the third stage gold standard test, e.g., by a sensitivity analysis [45] or multiple imputation [46]. However, this is beyond our present scope.

Another important potential bias in the estimation of disease prevalence is selection bias as to who participates in the screening program. We acknowledge that the estimates from our method can be biased if those who participate in the screening program are not representative of the target population whose prevalence is being estimated. If the information on who tend to participate in the screening program is available, further adjustment for the selection bias can be done by e.g., multiple imputation or inverse probability weighting (i.e., weighting each participant by the inverse of its estimated probability of participating the screening program).

Haitao Chu was supported in part by the Lineberger Cancer Center Core Grant CA16086 from the U.S. National Cancer Institute. The authors are grateful to the editor and two anonymous referees for their constructive comments and suggestions which have greatly improved this manuscript.

1. Bohning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–21. [PMC free article] [PubMed]

2. Vacek PM. The Effect of Conditional Dependence on the Evaluation of Diagnostic-Tests. Biometrics. 1985;41(4):959–68. [PubMed]

3. Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Statistics in Medicine. 1997;16(19):2157–75. [PubMed]

4. Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158–67. [PubMed]

5. Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427–35. [PubMed]

6. Qu YS, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996;52(3):797–810. [PubMed]

7. Qu YS, Hadgu A. A model for evaluating sensitivity and specificity for correlated diagnostic tests in efficacy studies with an imperfect reference test. Journal of the American Statistical Association. 1998;93(443):920–8.

8. Albert PS, McShane LM, Shih JH. Latent class modeling approaches for assessing diagnostic error without a gold standard: With applications to p53 immunohistochemical assays in bladder tumors. Biometrics. 2001;57(2):610–9. [PubMed]

9. Albert PS. Estimating diagnostic accuracy of multiple binary tests with an imperfect reference standard. Statistics in Medicine. 2009;28:780–97. [PMC free article] [PubMed]

10. Albert PS, Dodd LE. On Estimating Diagnostic Accuracy From Studies With Multiple Raters and Partial Gold Standard Evaluation. Journal of the American Statistical Association. 2008;103(481):61–73. [PMC free article] [PubMed]

11. Yang I, Becker MP. Latent variable modeling of diagnostic accuracy. Biometrics. 1997;53(3):948–58. [PubMed]

12. Espeland MA, Handelman SL. Using Latent Class Models to Characterize and Assess Relative Error in Discrete Measurements. Biometrics. 1989;45(2):587–99. [PubMed]

13. Chu H, Chen S, Louis TA. Random Effects Models in a Meta-Analysis of the Accuracy of Two Diagnostic Tests without a Gold Standard. Journal of the American Statistical Association. 2009;104:512–23. [PMC free article] [PubMed]

14. Black MA, Craig BA. Estimating disease prevalence in the absence of a gold standard. Statistics in Medicine. 2002;21(18):2653–69. [PubMed]

15. Hui SL, Walter SD. Estimating the Error Rates of Diagnostic-Tests. Biometrics. 1980;36(1):167–71. [PubMed]

16. Walter SD. Estimation of test sensitivity and specificity when disease confirmation is limited to positive results. Epidemiology. 1999;10(1):67–72. [PubMed]

17. Chu H, Nie L. A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea. Journal of the American Statistical Association. 2008;103:1518–9. [PMC free article] [PubMed]

18. Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88(421):200–8.

19. Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58(4):1034–6. [PubMed]

20. Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostat. 2007;8(2):474–84. [PubMed]

21. Ibrahim JG, Lipsitz SR, Chen MH. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism Is Non-Ignorable. Journal of the Royal Statistical Society Series B (Statistical Methodology) 1999;61(1):173–90.

22. Cheng H, Macaluso M, Waterbor J. Estimation of relative and absolute test accuracy. Epidemiology. 1999;10(5):566–7. [PubMed]

23. Pepe MS, Alonzo TA. Comparing disease screening tests when true disease status is ascertained only for screen positives. Biostat. 2001;2(3):249–60. [PubMed]

24. Hjort NL, Claeskens G. Frequentist model average estimators. Journal of the American Statistical Association. 2003;98(464):879–99.

25. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1993.

26. Agresti A. Categorical data analysis. 2. John Wiley & Sons, Inc; 2002.

27. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Chapman & Hall/CRC; 1995.

28. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. 2. Chapman & Hall/CRC; 2000.

29. Davidian M, Giltinan DM. Nonlinear models for repeated measurement data: An overview and update. Journal of Agricultural Biological and Environmental Statistics. 2003;8(4):387–419.

30. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: A tutorial. Statistical Science. 1999;14(4):382–401.

31. Gelfand AE, Smith AFM. Sampling-Based Approaches to Calculating Marginal Densities. Journal of the American Statistical Association. 1990;85(410):398–409.

32. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. Journal of Royal Statistical Society, Series B. 2002;63(4):583–639.

33. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;138:182–95.

34. Brooks SP, Gelman A. Alternative methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics. 1998;7:434–55.

35. Natarajan R, McCulloch CE. Gibbs sampling with diffuse proper priors: A valid approach to data-driven inference? Journal of Computational and Graphical Statistics. 1998;7(3):267–77.

36. Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of Breast. Cancer. 1967;20(12):2184. [PubMed]

37. De Sutter P, Coibion M, Vosse M, Hertens D, Huet F, Wesling F, et al. A multicentre study comparing cervicography and cytology in the detection of cervical intraepithelial neoplasia. British Journal of Obstetrics and Gynaecology. 1998;105(6):613–20. [PubMed]

38. Louis TA, Zeger S. Effective Communication of Standard Errors and Confidence Intervals. Biostat. 2009 in press. [PMC free article] [PubMed]

39. Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press; 2003.

40. Zeger SL, Liang KY. Longitudinal Data-Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42(1):121–30. [PubMed]

41. Liang KY, Zeger SL. Longitudinal Data-Analysis Using Generalized Linear-Models. Biometrika. 1986;73(1):13–22.

42. van der Merwe L, Maritz JS. Estimating the conditional false-positive rate for semi-latent data. Epidemiology. 2002;13(4):424–30. [PubMed]

43. Gustafson P. On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statistical Science. 2005;20(2):111–29.

44. Gustafson P. The utility of prior information and stratification for parameter estimation with two screening tests but no gold standard. Statistics in Medicine. 2005;24(8):1203–17. [PubMed]

45. Chu H, Wang Z, Cole SR, Greenland S. Sensitivity analysis of misclassification: a graphical and a Bayesian approach. Ann Epidemiol. 2006;16(11):834–41. [PubMed]

46. Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. International Journal of Epidemiology. 2006;35(4):1074–81. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |