J Am Stat Assoc. Author manuscript; available in PMC 2010 June 9.
Published in final edited form as:
J Am Stat Assoc. 2008 December; 103(484): 1518–1519.
PMCID: PMC2882802
NIHMSID: NIHMS90014

# A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea

Haitao Chu, Research Associate Professor and Lei Nie, Mathematical Statistician

Using a capture-recapture approach, Böhning and Patilea (2008) proposed two useful estimators for unobserved cell counts, assuming homogeneous association of the screening tests over disease status. However, they are mistaken in claiming that the maximum likelihood estimators (MLEs) are difficult to obtain. The point of this note is to present closed-form MLEs for, in their notation: 1) the α model where $α=p11(i)p00(i)/(p01(i)p10(i))$ is assumed to be identical for all i = 1, 2, …, d; and 2) the θ model where $θ=p1|1(i)/p1+(i)$ is assumed to be identical for all i.

One way to write the likelihood function (ignoring constant terms) in this setting is in terms of qi and $pjk(i)(i=1,2,…,d;j=0,1;k=0,1)$, as the authors did:

$x00(+)log(∑ip00(i)qi)+∑ix11(i)log(p11(i)qi)+∑ix10(i)log(p10(i)qi)+∑ix01(i)log(p01(i)qi).$
(1)

This parameterization involves a mixture likelihood, preventing closed-form solution for the MLEs. To obtain closed-form MLEs, we consider an alternative parameterization in terms of πjk and $πjk(i)(j,k=0,1andi=1,2,…,d)$ where πjk = P(T1 = j, T2 = k), $πjk(i)=P(D=i|T1=j,T2=k)$. The log-likelihood function is (ignoring constant terms),

$logL=∑j∑kxjk(+)log(πjk)+∑ix11(i)log(π11(i))+∑ix10(i)log(π10(i))+∑ix01(i)log(π01(i))=x00(+)log(π00)+∑dx11(i){log(π11(i))+log(π11)}+∑ix10(i){log(π10(i))+log(π10)}+∑ix01(i){log(π01(i))+log(π01)}=x00(+)log(π00)+∑ix11(i)log(π11(i)π11)+∑ix10(i)log(π10(i)π10)+∑ix01(i)log(π01(i)π01)$
(2)

This representation relates to previous work in some other settings (Satten and Kupper 1993; Lyles 2002; Pepe and Janes 2007). Note that, $πjk(i)πjk=P(D=i|T1=j,T2=k)P(T1=j,T2=k)=P(T1=j,T2=k|D=i)P(D=i)=pjk(i)qi$, and $π00=P(T1=0,T2=0)=∑iP(T1=0,T2=0,D=i)=∑iP(T1=0,T2=0|D=i)P(D=i)=∑ip00(i)qi$ Therefore equation (2) is equivalent to equation (1).

These equations are tractable and yield closed-form MLEs of πjk (j, k =0, 1) and $πjk(i)$ if j + k > 0. Omitting the algebra, we obtain the MLEs as $πjk=xjk/n(j,k=0,1)$ and $πjk(i)=xjk(i)/xjk$ if j + k > 0. Therefore, the MLEs of qi s, which can be written as functions of πjk (j, k = 0, 1) and $πjk(i)(j+k>0)$ under the α or θ model assumptions, have closed-form solutions. The details are given below.

Under the α model, $α=p11(i)p00(i)p01(i)p10(i)$ is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule,

$α=p11(i)p00(i)p01(i)p10(i)=P(T1=1,T2=1|D=i)P(T1=0,T2=0|D=i)P(T1=0,T2=1|D=i)P(T1=1,T2=0|D=i)=P(D=i|T1=1,T2=1)P(T1=1,T2=1)P(D=i|T1=0,T2=0)P(T1=0,T2=0)P(D=i|T1=1,T2=0)P(T1=1,T2=0)P(D=i|T1=0,T2=1)P(T1=0,T2=1)=π11π00π01π10×π11(i)π00(i)π01(i)π10(i),$

Thus

$α=π11π00π01π10×[∑iπ01(i)π10(i)π11(i)]−1,π00α(i)=π01(i)π10(i)π11(i)×[∑iπ01(i)π10(i)π11(i)]−1,qiα=π11π11(i)+π10π10(i)+π01π01(i)+π00π01(i)π10(i)π11(i)[∑iπ01(i)π10(i)π11(i)]−1,$

where the subscript α indicates the α model assumption. Since the MLEs of the parameters πjk and $πjk(i)$ are $πjk=xjk/n(j,k=0,1)$ and $πjk(i)=xjk(i)/xjk$ if j + k > 0, the closed-form MLE of niα under the α model is

$n^iα=nq^iα=x11(i)+x10(i)+x01(i)+x00x01(i)x10(i)x11d[∑ix01(i)x10(i)x11(i)]−1,$
(3)

which is essentially the same as the equation (15) in Böhning and Patilea (2008) without the stability correction. In other words, the estimator obtained in equation (15) is the MLE under the α model assumption with the stability correction.

Under the θ model $θ=p1|1(i)p1+(i)$ is assumed to be identical for all i = 1, 2, …, d; by Bayes’ rule

$θ=p1|1(i)p1+(i)=P(T1=1|T2=1,D=i)P(T1=1|D=i)=P(D=i|T1=1,T2=1)P(T1=1,T2=1)P(T2=1,D=i)P(T1=1,D=i)×P(D=i)=π11π11(i)(π01π01(i)+π11π11(i))(π10π10(i)+π11π11(i))×P(D=i),$

Thus $θ=[∑i(π10π10(i)π11π11(i)+1)(π01π01(i)+π11π11(i))]−1$ and

$π00θ(i)=1π00{(π10π10(i)π11π11(i)+1)(π01π01(i)+π11π11(i))[∑i(π10π10(i)π11π11(i)+1)(π01π01(i)+π11π11(i))]−1−π11π11(i)−π10π10(i)−π01π01(i)}$

$qiθ=(π10π10(i)π11π11(i)+1)(π01π01(i)+π11π11(i))[∑i(π10π10(i)π11π11(i)+1)(π01π01(i)+π11π11(i))]−1,$

where the subscript θ indicates the θ model assumption. Similarly, the closed-form MLE of under niθ under the θ model is

$n^iθ=nq^iθ=(x10(i)x11(i)+1)(x01(i)+x11(i))[∑i(x10(i)x11(i)+1)(x01(i)+x11(i))]−1=x1+(i)x+1(i)x11(i)[∑ix1+(i)x+1(i)x11(i)]−1,$
(4)

which is essentially the same as the equation (10) in Böhning and Patilea (2008) without the stability correction.

As a byproduct of this alternative parameterization, we can test the difference between ^qiθ and ^qiα (or equivalently, the difference between ^niθ and ^niα) to make inference on whether these two assumptions provide statistically significantly different predictions for the probability (or equivalently, the number) of individuals with certain disease class i. Although the formula for se(^qiθ^qiα) is tedious, its numerical value can be obtained easily through statistical software using the delta method. We note that the difference between estimated probabilities of disease classes under the α and θ models can be statistically different and potentially meaningful for the same study. For example, in the Health Insurance Plan Study for breast cancer screening in New York (Strax, Venet Shapiro and Gross 1967), the estimated probability of having cancer assuming the α model is 4.8% with a 95% confidence interval (CI) of 0.3% to 9.3%, while the estimated probability of having cancer assuming the θ model is 7.5% with 95% CI of 2.8% to 12.2%. The difference is 2.7% (95% CI: 1.4% to 4%) with a p-value less than 0.001. This difference can have a big impact on the cancer surveillance and prevention. Unfortunately, the data does not contain information to differentiate the α model versus the θ model.

The alternative parameterization in (2) sheds lights on maximum likelihood approaches in the setting considered here; the corresponding closed-form ML estimators under the α and θ models allow tests of the difference between the estimated probabilities of a specific disease class using the α versus the θ model. Our results complements the estimators obtained in equations (10) and (15) by Böhning and Patilea (2008) using a capture-recapture approach, and ensure the usual MLE properties.

## Acknowledgments

Dr. Chu was supported in part by the Lineberger Cancer Center Core Grant CA16086 from the U.S. National Cancer Institute. The authors are very grateful to the editor for his helpful comments and suggestions.

## Contributor Information

Haitao Chu, Department of Biostatistics and the Lineberger Comprehensive Cancer Center, The Univerity of North Carolina, Chapel Hill, NC 27516 (Email: ude.cnu.soib@uhch).

Lei Nie, Office of Biostatistics, Food and Drug Administration, Silver Spring, MD 20993 (Email: vog.shh.adf@ein.iel).

## References

• Böhning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–221. [PubMed]
• Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58:1034–1036. [PubMed]
• Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostatistics. 2007;8:474–484. [PubMed]
• Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88:200–208.
• Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of the Breast. Cancer. 1967;20:2184–2188. [PubMed]

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.