Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Med. Author manuscript; available in PMC 2010 October 15.
Published in final edited form as:
Stat Med. 2009 October 15; 28(23): 2912–2928.
doi:  10.1002/sim.3678
PMCID: PMC2760016

Global goodness-of-fit tests for group testing regression models


In a variety of biomedical applications, particularly those involving screening for infectious diseases, testing individuals (e.g., blood/urine samples, etc.) in pools has become a standard method of data collection. This experimental design, known as group testing (or pooled testing), can provide a large reduction in testing costs and can offer nearly the same precision as individual testing. To account for covariate information on individual subjects, regression models for group testing data have been proposed recently. However, there are currently no tools available to check the adequacy of these models. In this paper, we present various global goodness-of-fit tests for regression models with group testing data. We use simulation to examine small-sample size and power properties of the tests for different pool composition strategies. We illustrate our methods using two infectious disease data sets, one from an HIV study in Kenya and one from the Infertility Prevention Project.

Keywords: Infertility Prevention Project, IOS test, latent binary response, logistic regression, pooled testing, score test


When screening subjects for sexually transmitted diseases such as HIV, it is often more practical, at least initially, to test subjects in pools rather than to test them individually. Whether the goal is to estimate the prevalence of infection or to identify those who are infected, the argument for testing subjects in pools through group testing has been made repeatedly in the statistical literature [1]–[5], and the infectious disease literature is replete with empirical evidence showing the profound advantages of pooling; e.g., [6]–[9]. The advantages of group testing for estimation and identification in drug discovery are also well known [10, 11].

In this paper, we consider regression models for a pooled binary response from group testing when covariate information is available for each subject in the pool. This type of model falls outside the usual binary regression model framework because the individual binary responses (e.g., infection statuses, etc.) are not necessarily observed. Motivated by a Kenyan HIV study [6], Vansteelandt et al. [1] propose a likelihood-based approach to model the probability of infection, as a function of fixed subject-specific covariates, using the observed initial pooled testing results. Xie [2] presents a conceptually different approach to fit fixed effects group testing regression models by treating individual statuses as latent and using the EM algorithm. This approach is more flexible than that in [1] because, among other things, one can incorporate testing results from retests; e.g., additional results obtained from retesting subsets of positive pools. More recently, Chen et al. [12] propose a generalization of the approach in [1] to include effects which are best regarded as random.

Modeling group testing data has been an important advance, because it allows one to take advantage of risk factors through the covariates while acknowledging population heterogeneity. This sharply contrasts with previous research in group testing which has largely required the individual probability of positivity to be the same for each subject. However, despite this methodological advance, there are currently no analytical tools one can use to assess the fit of the model. With this in mind, we propose global goodness-of-fit (GOF) tests tailored for use with the fixed effects group testing regression model proposed by Vansteelandt et al. [1]. These tests can provide needed guidance to assess the fit of a model, because graphical assessments of adequacy for this type of model are difficult to make. Our overarching approach in this paper is to consider analytical procedures which test whether or not a particular model is appropriate by detecting general departures from it.

Subsequent sections of this paper are organized as follows. In Section 2, we review the group testing regression modeling approach of Vansteelandt et al. [1]. In Section 3, we discuss the components of model adequacy and present four global GOF tests for group testing regression models. In Section 4, we consider different pool composition strategies and use simulation to characterize the performance of the tests. In Section 5, we apply our GOF procedures to two infectious disease data sets, one from the Kenyan HIV study in [1] and one from the Infertility Prevention Project. In Section 6, we summarize our findings, describe other approaches, and discuss future research in this area. Additional mathematical details are catalogued in the appendices.


Suppose that N individuals (e.g., blood/urine specimens, etc.) are drawn from a large population and that each individual is assigned to exactly one of n pools. Denote by ci the pool size for the ith pool and let Yij = 1 if the jth individual in the ith pool is positive, Yij = 0 otherwise, for i = 1, 2, …, n and j = 1, 2, …, ci, so that N=i=1nci. We assume throughout that the individual statuses Yij are independent and let xij = (1, xij1, xij2, …, xijp)′ denote the (p + 1) × 1 fixed covariate vector for the jth individual in the ith pool. For the model formulation in [1], an arbitrary link function between the mean of Yij and the covariate xij is allowed. We restrict attention to the logistic link function, although it is possible to extend our work to handle other link functions. Specifically, we write


where β is the (p + 1) × 1 fixed effects regression parameter that is to be estimated from the pooled testing results. For i = 1, 2, …, n, define Zi = 1, if the testing result for the ith pool indicates the presence of at least one positive individual in the pool and Zi = 0 otherwise. Similar to Vansteelandt et al. [1], we treat Zi as an observable whereas the individual statuses Yij are not observed. We assume that the sensitivity (γ1) and the specificity (γ2) of the diagnostic test (e.g., ELISA test, etc.) used to make pooled determinations are constants close to 1 and that they do not depend on ci so that


where xi=(xi1,xi2,,xici) and γ12 = 1 − γ1γ2. Because the latent statuses Yij are independent, the log-likelihood function of β, based on the observed data Z = (Z1, Z2, …, Zn)′, can be written as


One can maximize the log-likelihood in (3) to obtain the maximum likelihood estimate (MLE) for β, denoted throughout by [beta]. In practice, [beta] can be computed using the quasi-Newton method described in [1], which we implement using the R function optim.


Because the latent individual statuses Yij are independent, the group testing regression model in (2) is literally “induced” by the logistic regression model in (1), so we focus on assessing the validity of (1). The essential components of fit for the logistic model can be specified by the following assumptions [13]:

  • A1. The logit transformation is the correct function linking the covariates with the conditional mean.
  • A2. The linear predictor βxij is correct; that is, one does not need to include additional covariates, transformations of variables, interactions, etc.
  • A3. The variance of Yij, conditional on xij, is var(Yij|xij) = pij(β, xij) {1 − pij(β, xij)}.

Instead of checking A1–A3 individually (assumptions which are obviously confounded), we focus on the assessment of overall GOF with pooled testing response; that is, we aim to detect violations of any of the assumptions listed above. In doing so, we consider only the situation wherein there is no “natural grouping” of the individuals. This is likely the case when at least one of the covariates is continuous, so that the number of covariate patterns is close to or equal to N, the number of individuals.

We now describe four global GOF tests for (1) with pooled testing response. As the individual statuses Yij are not observed, GOF is assessed using the available pooled testing results Zi. Other testing approaches are discussed in Section 6.

3.1. Pearson χ2 test

The first GOF test statistic we consider is a Pearson χ2-type statistic, defined by


where pi([beta]) is an estimate of the probability that the ith pool tests positive, based on the assumed model. As with its individual data analogue, (4) can be viewed as the sum of squared Pearson residuals and is a measure of deviation of the observed data from the assumed model. For logistic regression models with individual data; i.e., when ci = 1, Hosmer et al. [13] show via simulation that the Pearson χ2 statistic, using a normal approximation, enjoys good size and power properties when compared to other commonly used GOF statistics for logistic regression. Also using individual data, Osius and Rojek [14] derive the asymptotic normal distribution of the Pearson χ2 statistic in very general settings.

One could mimic the approach of [14] and establish the asymptotic normality of our statistic defined in (4). However, we instead use the scaled chi-square distribution bχν2 as a reference, rather than a normal distribution, to achieve better distributional properties with finite samples [13, 15]. The bχν2 distribution is commonly used to approximate the distribution of nonnegative random variables [16]. The values b > 0 and ν > 0 are estimated by equating the first two (approximate) moments of G and those of the scaled χ2 distribution. The approach we use to obtain the moments of G follows the same spirit as the approach in [14], although the calculations are more complicated when considering group testing responses. It is understood that asymptotic properties apply when the number of pools n is large.

We sketch the derivation of the first two moments of G = G([beta]) here and provide full details in Appendix A. Suppose that the true value β0 is not on the boundary of the parameter space and denote the score function by S(β) = [partial differential]l(β|Z)/[partial differential]β. Expanding G([beta]) about β0, we have


Under mild regularity conditions, it can be shown that


and that var{G([beta])}≈ var{G(β0)} − C(β0)′x2110−1(β0)C(β0), where x2110(β0) = E {S(β0)S(β0)′} is the Fisher information matrix and


We provide closed-form expressions for var{G(β0)}, x2110(β0), and C(β0) in Appendix A. Denote by σ^G2 the estimate of var{G([beta])}, obtained by substituting [beta] for β0 in the above expressions. The testing procedure we propose is outlined as follows:

  1. Calculate the Pearson statistic G = G([beta]) and the approximate variance σ^G2.
  2. Estimate the scale parameter b and degrees of freedom ν for the scaled χ2 distribution by equating the first two moments of bχν2 and G; that is, set [mu]G = and σ^G2=2b2ν. Solving for b and ν, we get b=σ^G2/2μ^G and ν=2μ^G2/σ^G2.
  3. Because G/b follows an approximate χ2 distribution with ν degrees of freedom, the p-value for the test of GOF is p=pr(χν2G/b).

3.2. Score test

For individual testing data, Stukel [17] extends the logistic regression model by adding two additional parameters α1 and α2, on the logit scale, where α1(α2) controls the shape of the mean function for positive (negative) values of the linear predictor βxij. In logistic regression, positive values of βxij correspond to probabilities greater than 0.5, while negative values correspond to probabilities less than 0.5. However, in the low prevalence settings where group testing is commonly used, most subjects have small positive probabilities (often much smaller than 0.5). We thus modify Stukel’s [17] approach and introduce only one additional parameter. Specifically, we embed the logistic model in (1) into a larger class of models indexed by the (p + 2) × 1 parameter vector θ = (β′, α0)′. For notational convenience, we let ηij = βxij denote the linear predictor so that the extended model can be written as


where hα0 (ηij) = ηij when ηij > 0. When ηij ≤ 0,


It is worth emphasizing that pij(θ) depends on β only through the linear predictor ηij, whereas the extra parameter α0 controls the shape of the lower tail of the mean function pij(θ). A positive value of α0 produces a curve with a lighter lower tail than the logistic curve, whereas a negative value of α0 corresponds to a heavier lower tail. When α0 = 0, it is easy to see that the model in (6) reduces to the logistic model in (1). The adequacy of the logistic model in (1) can thus be assessed by testing H0: α0 = 0.

Similar to Stukel [17], we formulate a score statistic to test H0 for the group testing regression model in [1]. Specifically, we rewrite the mean of Zi, i = 1, 2, …, n, as


so that the log-likelihood based on the extended model is


The score function S(θ) is given by


The statistic we propose is TS = S([theta w/ hat]), where [theta w/ hat] = ([beta]′, 0)′ and [beta] is the MLE of β calculated under H0, as described in Section 2. It can be shown that the explicit form of TS is


where [eta w/ hat]ij = [beta]xij, I(·) is the indicator function, and [p with hat]ij and [p with hat]i are the model-predicted probabilities obtained by substituting θ = [theta w/ hat] into (6) and (7), respectively. When the number of pools n is large, under H0, TS is approximately normal with zero mean. To derive the variance of TS, denoted by σTS2, we partition S(θ) and the Fisher information matrix x2110(θ) = E{S(θ)S(θ)′} according to θ = (β′, α0)′, i.e., we write




so that σTS2=Iα0α0Iα0βIββ1Iβα0. To obtain an estimate of σTS2, we substitute [theta w/ hat] = ([beta]′, 0)′ for θ in x2110(θ). Denote the resulting matrix by x2110([theta w/ hat]) and the corresponding partitions by Îββ, Îβα0, Îα0β, and Îα0α0, respectively, so that σ^TS2=I^α0α0I^α0βI^ββ1I^βα0. A closed-form expression for x2110([theta w/ hat]) is provided in Appendix B. The score test for H0: α0 = 0 is a two sided test, with both positive and negative large values indicating lack of fit of the assumed model in (1). For implementation, we compare TS/[sigma with hat]TS to the An external file that holds a picture, illustration, etc.
Object name is nihms147356ig1.jpg(0, 1) reference distribution.

3.3. Hosmer-Lemeshow test

In logistic regression with individual data, the distribution of the Pearson χ2 statistic can be approximated by a χ2 distribution when the expected number of observations for each covariate pattern is not small. Unfortunately, this requirement is violated when the number of covariate patterns is close to or equal to the number of observations, the situation in which we are interested. One way to address this problem is to “group” the observed individual data so that asymptotic arguments can be applied to the “grouped data” [18]–[20]. We extend this idea to applications involving pooled binary responses.

When compared to other available GOF tests for individual testing data, the Hosmer-Lemeshow (HL) test is known to have lower power in detecting certain departures from the logistic model [13, 15]. However, the HL statistic remains widely used, so we reevaluate its potential use and include it for comparison purposes with pooled response data. For clarity, when we henceforth refer to the HL test, we exclusively use the term “pool” to identify an amalgamate of individuals for group testing. We use the term “group” to represent a set of the observed pools.

The HL statistic is computed as follows. First, we fit the pooled testing regression model defined in (1) and (2), obtaining the MLE [beta] and model-predicted pool probabilities [p with hat]i, i = 1, 2, …, n. Second, we partition the pooled testing responses Zi, i = 1, 2, …, n, into m subgroups g1, g2, …, gm according to the ranking of the n values of [p with hat]i. We denote the number of observations in each group by sk, k = 1, 2, …, m. The HL statistic based on the pooled responses is then given by


In general, we choose the subgroup sizes sk to be equal. If n is not a multiple of m, we let g1, g2, …, gm−1 each consist of [n/m] pooled responses and let gm consist of the remaining n − (m − 1) [n/m] pooled responses. Using extensive simulation, Hosmer and Lemeshow [18] demonstrate that when the number of distinct covariate patterns values is close to the sample size, the null distribution of the HL statistic can be well approximated by a χ2 distribution with m − 2 degrees of freedom, provided that [n/m] is not small. We use the same reference distribution to diagnose GOF with pooled testing responses.

3.4. IOS test

Presnell and Boos [21] propose a general test for misspecification in parametric models by comparing the maximized “in-sample” likelihood with the maximized “out-of-sample” likelihood. We now describe how their methodology can be applied to diagnose GOF for the group testing regression model in [1]. Let f(Zi; β) denote the probability mass function of Zi, i = 1, 2, …, n, that is, f (Zi; β) is a Bernoulli mass function with mean equal to (2). For convenience, in our notation of f(Zi; β) we have suppressed the dependence on the covariate vectors xij, j = 1, 2, …, ci. Let [beta] denote the MLE of β, as described in Section 2, and let [beta](i) denote the MLE of β when the ith (pooled) observation is deleted from the sample Z. The test statistic proposed by Presnell and Boos [21] is the logarithm of the “in-and-out-of-sample” (IOS) likelihood ratio


where l(Zi; β) = log f (Zi; β). Computing IOS can be extremely cumbersome for large n, so the authors in [21] derive an approximate version of the IOS statistic. For i = 1, 2, …, n, let l(Zi; β) = [partial differential]l(Zi; β)/[partial differential]β, and l(Zi; β) = [partial differential]2l(Zi; β)/[partial differential]β[partial differential]β′ denote the gradient and Hessian of l(Zi; β), respectively. The approximate form of the IOS statistic is


where I^(β^)=n1i=1n{l¨(Zi;β^)} is the average observed information matrix and B^(β^)=n1i=1nl.(Yi;β^)l.(Zi;β^) is the sample covariance of the score function. Note that IOSA does not require the leave-one-out estimates, so it is far simpler to compute when n is large.

Presnell and Boos [21] show that IOSA is approximately normal with mean equal to tr[E{Î(β0)}−1E{[B with circumflex](β0)}], where β0 is the probability limit of [beta]. When the assumed model is correct, this mean equals the dimension of β. However, the authors show that the asymptotic variance of IOSA is too difficult for “routine use” and note that the convergence to normality can be very slow. These findings were reinforced by Campanu and Presnell [22], who used simulation to investigate the use of IOSA with logistic regression for individual testing data. For implementation, the authors in [21] recommend using a parametric bootstrap to approximate the null distribution of IOSA; i.e., the distribution of IOSA under the assumed model. We adopt this recommendation in our consideration of IOSA for GOF with group testing data.


4.1. Pooling strategies

We consider two strategies to form pools for group testing. The first strategy is random pooling; i.e., individuals are assigned to pools at random regardless of their covariate values in xij. The second strategy we consider is to form the pools homogeneously; i.e., individuals are assigned to pools based on their values of xij, with “similar” individuals being pooled together. Vansteelandt et al. [1] showed that a more precise estimator of β results when pools consist of xij that are as similar as possible. We refer to this as homogeneous pooling. Bilder and Tebbs [23] discuss the practical aspects of using both pooling strategies in various group testing applications, including infectious disease screening.

4.2. Simulation models

We consider three types of model misspecification. For each of the simulation models we consider, we aim to emulate the low-prevalence settings where group testing is commonly used. To be specific, we choose parameter settings in each model so that the proportion of positive individuals is between 0.05 and 0.10. The first type of departure we consider is the omission of a quadratic term. We generate individual responses Yij from the model


where xij is a continuous covariate generated from a An external file that holds a picture, illustration, etc.
Object name is nihms147356ig2.jpg(−3, 3) distribution. The parameters β0, β1, and β2 are determined by equations pr(Yij = 1|xij = 3) = 0.40, pr(Yij = 1|xij = 0) = 0.0293 and pr(Yij = 1|xij = −3) = a with a = 0.0014, 0.01, 0.03, 0.05, 0.1. As a increases from 0.0014 to 0.1, the effect of the omitted quadratic term is progressively pronounced. The case a = 0.0014 corresponds to the null model with β2 = 0.

The second type of departure we consider is the omission of an interaction term between a binary and a continuous covariate. The true model is


where xij1 is generated from a Bernoulli(0.4) distribution and the continuous xij2 is generated from a An external file that holds a picture, illustration, etc.
Object name is nihms147356ig2.jpg(−3, 3) distribution. We choose β3 = 0, 0.3, 0.5, 0.8, and 1 so that the interaction term is progressively important. The null model corresponds to β3 = 0. For a given choice of β3, the other parameters β0, β1, β2 are determined by equations pr(Yij = 1|xij1 = 0, xij2 = 0) = 0.02, pr(Yij = 1|xij1 = 1, xij2 = 3) = 0.4, and pr(Yij = 1|xij1 = 1, xij2 = 0) = 0.1.

The third type of departure we consider is a misspecification of the link function. We specify models with different link functions using the modified Stukel’s generalized logistic model in (6). Individual samples are generated from


where hα0(·) is as defined in Section 3.2 and xij is a continuous covariate generated from a An external file that holds a picture, illustration, etc.
Object name is nihms147356ig2.jpg(0, 6) distribution. We choose (α0, β) = (0, −1.2), (−0.5, −3), (−1, −3), and (0.5, −1.2) so that GOF can be assessed for misspecified links taking on a variety of shapes. Among these parameter settings, (α0, β) = (0, −1.2) corresponds to the null case, (α0, β) = (−0.5, −3) and (−1, −3) represent link functions with a heavier lower tail than the logistic function, while (α0, β) = (0.5, −1.2) corresponds to link function with a lighter lower tail.

4.3. Simulation description

For each model in Section 4.2, we first simulate N = 1000 individual observations (Yij, xij) and create group testing samples (Zi, xi), i = 1, 2, …, n, based on the simulated individual observations. We then repeat this procedure 500 times; that is, our simulation results are based on M = 500 group testing data sets, each consisting of N = 1000 individuals. For simplicity, we assume a common pool size ci = c, for i = 1, 2, …, n, with c = 2, 5, and 10, so that n = 500, 200 and 100, respectively. We also assume that γ1 = γ2 = 1 (perfect testing), because the results are similar for other values of γ1 and γ2 that are reasonably close to 1. Imperfect tests are considered in Section 5.

When we form the pools for group testing, we use both random pooling (RP) and homogeneous pooling (HP), as described in Section 4.1. Random pools are created by assigning individual observations to pools in the order in which they were generated; homogeneous pools are created by sorting the covariates first and assigning individuals to pools based on this covariate ordering. HP is straightforward to implement when the simulation model has only one continuous covariate. For the model in (9) with dichotomous xij1 and continuous xij2, we first sort the individuals by xij1. We then sort by xij2, within the two levels of xij1, and construct quasi-homogeneous pools based on these ordered individual samples, following the recommendation in [1]. Bilder and Tebbs [23] discuss other approaches to form homogeneous pools. Once the group testing samples have been formed, the binary pooled responses Zi are determined by Zi=I(j=1cYij>0).

For the HL test, we take m = 10 (i.e., the decile of risk statistic). For the IOSA test, we calculate p-values using a parametric bootstrap with B = 200 resamples. That is, for each simulated group testing data set, we first fit the group testing regression model in (2). We then generate 200 bootstrap resamples, treating the MLE [beta] as the true value and keeping intact the individual covariate structure in each pool. The IOSA statistics are calculated based on the original simulated data set and the bootstrap samples, which we denote by IOSA and IOSA(b), for b = 1, 2, …, 200, respectively. The p-value is calculated by #{IOSA(b)IOSA}/200.

4.4. Simulation results

Tables 1, ,2,2, and and3,3, contain estimated sizes and powers from our simulations under models (8), (9), and (10), respectively. First, note that all four GOF statistics G, TS, THL, and IOSA confer the correct size for the null situations; estimated nominal α = 0.05 size tests are within the margin of Monte Carlo error for all simulations. Second, with few exceptions, tests for GOF under HP are more powerful than the same tests with RP. This is perhaps not unexpected, based on the findings of [1, 23], but it is interesting to see how large the gain in power can be when comparing HP to RP. If logistically feasible, it appears to be well worth the additional burden to construct homogeneous pools. Note that Bilder and Tebbs [23] describe situations where HP may be difficult or impossible to implement. Finally, smaller pool sizes generally provide larger power; however, using larger pool sizes does not affect the power for HP as much as it does for RP, especially for the first and second types of misspecification.

Table I
Simulation study for the omission of the quadratic term in model (8) with a = 0.0014, 0.01, 0.03, 0.05, 0.10, and γ1 = γ2 = 1. The null model corresponds to a = 0.0014 and the effect of misspecification becomes more pronounced as a increases. ...
Table II
Simulation study for the omission of the interaction term in model (9) with β3 = 0, 0.3, 0.5, 0.8, 1.0, and γ1 = γ2 = 1. The null model corresponds to β3 = 0 and the effect of misspecification becomes more pronounced as ...
Table III
Simulation study for the misspecification of the link function in model (10) with α0 = 0, 0.5, −0.5, −1.0, and γ1 = γ2 = 1. The null model corresponds to γ0 = 0. The nominal size is 0.05. Estimated sizes ...

Among the four GOF statistics, each has excellent power to detect the omission of a quadratic term (Table 1) when HP is used. The same can be said for detecting a missing interaction term (Table 2), except for IOSA. None of the four tests do particularly well at detecting a misspecified link function (Table 3). Our findings are not incongruous with those in [13, 22], where the performance of similar GOF tests is investigated for logistic regression with individual testing data. For the misspecified link function with α0 = 0.5 (Table 3), there are some convergence problems with HP when c = 10. This is largely due to the “complete separation” problem, as described in [23] for group testing regression models. In short, complete separation means that the covariate(s) perfectly discriminate(s) between the binary pooled responses, which creates problems when trying to maximize (3). For the α0 = 0.5, c = 10, HP setting in Table 3, the nonconvergence rate is about 30 percent, so we do not report the results. No other convergence problems were observed.


5.1. Kenya data

Verstraeten et al. [6] and Vansteelandt et al. [1] discuss a public-health study investigating HIV prevalence in rural Kenya. In this study, blood samples were collected from 740 pregnant women at four locations, and risk covariates were measured on each individual, including age, marital status, parity, and education level. Samples were first tested by an enzyme-linked immunosorbent assay (ELISA) test and positives were confirmed by a rapid assay test. Individual samples with a positive ELISA test result and a negative rapid assay test result were tested again with an ELISA to classify the sample as positive or negative. This sequential testing procedure was also implemented on pools of samples created at a central laboratory. Verstraeten et al. [6] report a significant cost reduction from using group testing while producing very similar prevalence estimates to those obtained from individual testing. To illustrate their regression methodology, Vansteelandt et al. [1] use the Kenya data and relate the individual prevalence to age through the logistic model


Our goal is to assess GOF for this model using pooled responses from the study.

After removing observations with missing values, as in [1], there are N = 706 individual observations available for analysis. We consider both “chronological” and “ordered” pool compositions. Chronological pools are constructed by the date of collection whereas ordered pools are constructed based on the values of the age covariate. As pointed out by the authors in [1], these compositions likely emulate random and homogeneous pooling, respectively. For both pooling strategies, we use n = 101 pools; 100 of size c = 7 and 1 of size c = 6, as in [1]. The sequential testing strategy used to elicit the pooled responses Zi provides highly sensitive and specific assessments; we use γ1 = 1 and γ2 = 0.9997, values used also in [1]. Table 4 displays the parameter estimates and standard errors for the model in (11) and the GOF results for the four tests in Section 3. For the HL test, we use m = 10; for the IOSA test, we use B = 1000 bootstrap resamples. All four GOF tests reach the same conclusion for both pooling strategies; specifically, there is no evidence of model misspecification.

Table IV
Kenya HIV data. Parameter estimates and standard errors from the model in (11) for chronological and ordered pooling strategies with γ1 = 1 and γ2 = 0.9997. Probability values for GOF are also provided. Results are based on 100 pools of ...

5.2. Infertility Prevention Project data

The state of Nebraska takes part in the nationwide Infertility Prevention Project (IPP) through its Sexually Transmitted Diseases and Infertility Control Program. At a number of clinic sites throughout the state, urine or swab specimens are collected on individuals and are then transported to the Nebraska Public Health Laboratory in Omaha for chlamydia and gonorrhea testing. Each year, about 30,000 tests are performed in the state; we use the individual testing data from the first quarter of 2006 for illustration purposes here. Specifically, the data set consists of chlamydia and gonorrhea infection statuses for 6,138 individual subjects as well as many potential risk covariates. Chen et al. [12] analyzes this same data set, treating the site effect as random. However, because we are interested in fixed effects models, we do not take this perspective and ignore the site effect. For each infection (chlamydia and gonorrhea, treated separately), we consider the fixed effects part of the model in [12] and relate the infection statuses Yij to four covariates via


Our goal is to assess GOF of this model for both infections using pooled responses.

For each infection, we construct artificial pooled testing responses Zi, as in [12], by grouping individuals into pools of size c = 10 (except one pool of size 8) according to the order of the testing date, thereby creating a pseudo-randomized composition of the covariates. To incorporate the effects of imperfect testing, pooled responses are simulated assuming that assay sensitivity and specificity are γ1 = 0.98 and γ2 = 0.95, respectively. These values are reasonable for both infections based on the empirical findings in [8, 9]. Table 5 gives the values of the four GOF statistics presented in Section 3 for both infections. For the HL test, we use m = 10; for the IOSA test, we use B = 1000 bootstrap resamples, as before. For chlamydia, each GOF statistic except THL provides a similar small p-value, indicating some evidence of model misspecification in (12). For gonorrhea, however, the GOF results suggest little evidence of misspecification.

Table V
Nebraska IPP data. GOF testing results for the model in (12) with γ1 = 0.98 and γ2 = 0.95. Results are based on 613 pools of size 10 and 1 pool of size 8.


We have developed four global GOF tests for group testing regression models and have illustrated the usefulness of these tests using data from two infectious disease applications. Among the four tests, the Pearson χ2 and modified score tests have the highest power, and we recommend them for use as a global check for model adequacy. Of course, due to the information loss from pooling, these tests may not have sufficient power to detect certain types of misspecification. This is especially true when random pooling is used. The computation time involved for all tests is not overly prohibitive, possibly with the exception of the IOSA test. Programs for data analysis are available by request.

Other GOF tests could be formulated for this problem; in fact, the four tests presented in this paper are only a subset of those we investigated. We initially considered formulating an unweighted sum of squares statistic, similar to the Ŝ statistic proposed in [13] for individual testing data. However, the statistic, when suitably normalized under group testing, did not provide a test with the correct size, and the test did not have higher power than the best tests herein. We also investigated a collection of bootstrapped statistics obtained from the predicted individual latent responses and the smoothed residual statistics discussed in [13, 15]. However, the bootstrap tests were not powerful, and there were technical difficulties in formulating the smoothed residuals tests with latent binary responses.

In this paper, we have taken a “global departure approach” to diagnose GOF with group testing models. It would be worthwhile to examine directed tests which target specific departures from A1–A3, although we leave this to future research. Future work could also include the generalization of our GOF tests to incorporate information from retesting subsets of positive pools or from other pooling algorithms [4]. Depending on the specific testing strategy used, some of the GOF tests discussed herein could be modified accordingly. However, the mathematical details could prove to be markedly more formidable. Detecting lack of fit in mixed effects group testing regression models [12] also remains as an open problem.


The authors would like to thank the Associate Editor and two anonymous referees for helpful comments on earlier versions of this article. We are grateful to Dr. Stijn Vansteelandt and his colleagues for providing us with the Kenya HIV data. We also thank Dr. Peter Iwen, Dr. Steven Hinrichs, and Philip Medina for their consultation on the Infertility Prevention Project. This research was supported by Grant R01 AI067373 from the National Institutes of Health.

Contract/grant sponsor: National Institutes of Health; contract/grant number: R01 AI067373


We provide additional details on the derivation of the first two approximate moments for G in Section 3.1. Suppose that the true value β0 is not on the boundary of the parameter space and denote the score function by S(β) = [partial differential]l(β|Z)/[partial differential]β. Expanding S([beta]) about β0, we have


where 0 is a (p + 1) × 1 vector of zeros. Under suitable regularity conditions and from (5), it follows from the Law of Large Numbers that


for large n. Approximating the observed information matrix −[partial differential]S(β)/[partial differential]β′|β= β0 by the Fisher information matrix x2110(β0) = E {S(β0)S(β0)′}, and combining (5), (A.1), and (A.2), we write


Our approximation of the first two moments of G is based on (A.3). Note that in (A.3), the only random quantities are G(β0) and S(β0), and all other terms are constants which depend on the true parameter β0. Let f (z; β) denote the density function for Z = (Z1, Z2, …, Zn)′. In what follows, we denote expectation with respect to the density function f(z; β) by Eβ(·) and expectation with respect to f(z; β0) by E(·). Since E {S(β0)} = 0, we have


The fact that E {G(β0)} does not depend on β0 implies Eβ {G(β)} = n, for all β. Therefore,


Interchanging integration and differentiation, the right-hand side of (A.4) can be written as


so that Eβ {G(β)S(β)} = −Eβ{[partial differential]G(β)/[partial differential]β}, for all β. In particular, when β, = β0,


Thus, it follows that


since E {S(β0)} = 0. From (A.3), the approximate variance of G is given by


where the first equality follows from (A.5) and the fact that cov{S(β0)} = x2110(β0). To derive an explicit (approximate) expression for var{G([beta])}, we need to calculate var{G(β0)}, C(β0), and the information matrix x2110(β0). Define the n × 1 vector


the (p + 1) × n matrix


and the n × n matrix


Using this notation, it is straightforward to show that var{G(β0)} = 1D(β0)1 − 4n, C(β0) = −Q(β0)a(β0), and x2110(β0) = Q(β0)D(β0)Q(β0)′, where 1 is an n × 1 vector of ones.


The Fisher information matrix in Section 3.2 is given by


where the penultimate equality holds because Zi are independent and E {Zipi(θ)} = 0. Substituting θ = [theta w/ hat] into the last expression and simplifying, we obtain


where wi=j=1cip^ij(xij,η^ij2I(η^ij<0)/2).


1. Vansteelandt S, Goetghebeur E, Verstraeten T. Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics. 2000;56:1126–1133. doi: 10.1111/j.0006-341X.2000.01126.x. [PubMed] [Cross Ref]
2. Xie M. Regression analysis of group testing samples. Statistics in Medicine. 2001;20:1957–1969. doi: 10.1002/sim.817. [PubMed] [Cross Ref]
3. Gastwirth J, Johnson W. Screening with cost effective quality control: Potential applications to HIV and drug testing. Journal of the American Statistical Association. 1994;89:972–981.
4. Kim H, Hudgens M, Dreyfuss J, Westreich D, Pilcher C. Comparison of group testing algorithms for case identification in the presence of testing error. Biometrics. 2007;63:1152–1163. doi: 10.1111/j.1541-0420.2007.00817.x. [PubMed] [Cross Ref]
5. Hung M, Swallow W. Use of binomial group testing in tests of hypotheses for classification or quantitative covariables. Biometrics. 2000;56:204–212. doi: 10.1111/j.0006-341X.2000.00204.x. [PubMed] [Cross Ref]
6. Verstraeten T, Farah B, Duchateau L, Matu R. Pooling sera to reduce the cost of HIV surveillance: A feasibility study in a rural Kenyan district. Tropical Medicine and International Health. 1998;3:747–750. doi: 10.1046/j.1365-3156.1998.00293.x. [PubMed] [Cross Ref]
7. Pilcher C, Fiscus S, Nguyen T, Foust E, Wolf L, Williams D, Ashby R, O’Dowd J, McPherson J, Stalzer B, Hightow L, Miller W, Eron J, Cohen M, Leone P. Detection of acute infections during HIV testing in North Carolina. New England Journal of Medicine. 2005;352:1873–1883. [PubMed]
8. Kacena K, Quinn S, Hartman S, Quinn T, Gaydos C. Pooling of urine samples for screening for Neisseria gonorrhoeae by ligase chain reaction: Accuracy and application. Journal of Clinical Microbiology. 1998;36:3624–3628. [PMC free article] [PubMed]
9. Kacena K, Quinn S, Howell M, Madico G, Quinn T, Gaydos C. Pooling urine samples for ligase chain reaction screening for genital Chlamydia trachomatis infection in asymptomatic women. Journal of Clinical Microbiology. 1998;36:481–485. [PMC free article] [PubMed]
10. Remlinger K, Hughes-Oliver J, Young S, Lam R. Statistical design of pools using optimal coverage and minimal collision. Technometrics. 2006;48:133–143. doi: 10.1198/004017005000000481. [Cross Ref]
11. Hughes-Oliver J. Pooling experiments for blood screening and drug discovery. In: Dean Angela, Lewis Susan., editors. Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics. Springer; New York: 2006.
12. Chen P, Tebbs J, Bilder C. Group testing regression models with fixed and random effects. Biometrics. 2009 doi: 10.1111/j.1541-0420.2008.01183.x. in press. [PMC free article] [PubMed] [Cross Ref]
13. Hosmer D, Hosmer T, le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine. 1997;16:965–980. [PubMed]
14. Osius G, Rojek D. Normal goodness-of-fit tests for multinomial models with large degrees of freedom. Journal of the American Statistical Association. 1992;87:1145–1152.
15. le Cessie S, Van Houwelingen J. A goodness-of-fit test for binary regression models, based on smoothing methods. Biometrics. 1991;47:1267–1282.
16. Cox R, Hinkley D. Theoretical Statistics. Chapman and Hall/CRC; Boca Raton: 1974.
17. Stukel T. Generalized logistic models. Journal of the American Statistical Association. 1988;83:426–431.
18. Hosmer D, Lemeshow S. Goodness of fit tests for the multiple logistic regression model. Communications in Statistics: Theory and Methods. 1980;9:1043–1069.
19. Lemeshow S, Hosmer D. A review of goodness of fit statistics for use in the development of logistic regression models. American Journal of Epidemiology. 1982;115:92–106. [PubMed]
20. Hosmer D, Lemeshow S, Klar J. Goodness-of-fit testing for the multiple logistic regression analysis when the estimated probabilities are small. Biometrical Journal. 1988;30:911–924.
21. Presnell B, Boos D. The IOS test for model misspecification. Journal of the American Statistical Association. 2004;99:216–227. doi: 10.1198/016214504000000214. [Cross Ref]
22. Capanu M, Presnell B. Misspecification tests for binomial and beta-binomial models. Statistics in Medicine. 2008;27:2536–2554. doi: 10.1002/sim.3049. [PubMed] [Cross Ref]
23. Bilder C, Tebbs J. Bias, efficiency, and agreement in group-testing regression models. Journal of Statistical Computation and Simulation. 2009;79:67–80. doi: 10.1080/00949650701608990. [PMC free article] [PubMed] [Cross Ref]