|Home | About | Journals | Submit | Contact Us | Français|
For binary outcome data from epidemiological studies, this article investigates the interval estimation of several measures of interest in the absence or presence of categorical covariates. When covariates are present, the logistic regression model as well as the log-binomial model are investigated. The measures considered include the common odds ratio (OR) from several studies, the number needed to treat (NNT), and the prevalence ratio. For each parameter, confidence intervals are constructed using the concepts of generalized pivotal quantities and fiducial quantities. Numerical results show that the confidence intervals so obtained exhibit satisfactory performance in terms of maintaining the coverage probabilities even when the sample sizes are not large. An appealing feature of the proposed solutions is that they are not based on maximization of the likelihood, and hence are free from convergence issues associated with the numerical calculation of the maximum likelihood estimators, especially in the context of the log-binomial model. The results are illustrated with a number of examples. The overall conclusion is that the proposed methodologies based on generalized pivotal quantities and fiducial quantities provide an accurate and unified approach for the interval estimation of the various epidemiological measures in the context of binary outcome data with or without covariates.
This articles investigates inferences for several epidemiological measures of practical interest, in the absence or presence of covariates. In the latter scenario, both the logistic regression model and the log-binomial model will be considered. The logistic regression model plays a crucial role in the analysis of binary data arising from clinical trials and observational studies, and the focus of inferences is very often the odds ratio (OR). Another index that is very often used is the relative risk, or the risk ratio (RR). The RR measures the strength of association between a risk factor (or an exposure variable) and disease. Other related indices are the risk difference (RD), and the relative risk difference (RRD). The odds ratio computed under the logistic regression model is known to be a good approximation for the risk ratio for a rare outcome, but not so for an outcome that is common (i.e., not rare). Another epidemiological measure of interest is the prevalence ratio (PR), which measures the association between prevalence of the health outcome and an exposure variable or risk factor. The log-binomial model can be used to estimate the risk ratio in the presence of covariates, when the outcome is not rare. Yet another measure of interest is the number needed to treat (NNT), which is the average number of patients needed to be treated to prevent an additional adverse outcome; the NNT is simply the reciprocal of the risk reduction. For randomized controlled trials with binary outcomes, the NNT is now widely used to measure the benefit of the treatment.
All of the above epidemiological measures are functions of the unknown parameters from the regression models, and inferences concerning them has been widely discussed in the literature, very often using standard likelihood based asymptotic methods [1,2,3,4]. We refer to these articles for background information and earlier literature on the point and interval estimation of the above epidemiological measures. The purpose of the present investigation is to explore the methodologies based on generalized confidence intervals and fiducial intervals for the interval estimation of the above quantities, and to assess their performance relative to the likelihood based large sample methods; performance in small sample scenarios will be of particular interest. The generalized confidence interval methodology is due to Weerahandi , and it has found numerous applications in interval estimation problems, resulting in confidence intervals that exhibit satisfactory performance in small samples; see also the books by Weerahandi [6,7]. In the context of binary data, the methodology was adopted to obtain satisfactory confidence intervals in a quantal assay problem  and in surrogate endpoint validation . Recently, the fiducial approach has seen a revival; in fact, some of the generalized confidence intervals are indeed fiducial intervals. We refer to [10,11] for very detailed discussions of the fiducial methodology. It is important to notice that we are using all these intervals as confidence intervals in the usual frequentist sense, since it has been shown that they provide asymptotically correct frequentist coverage and have very good small sample properties.
We give a brief description of the generalized confidence intervals and fiducial intervals in the next section, and then explain their application for computing confidence intervals for the above epidemiological measures under the usual binomial model when covariates are absent, and under the logistic and log-binomial models when covariates are present. The fiducial approach was used for inferences concerning several parameters in the context of the binomial distribution (and also the Poisson distribution) in the absence of covariates . As will become clear, the confidence intervals that we have derived do not rely on the maximum likelihood estimators, and hence are free of the computational issues associated with the maximization of the likelihood. This is of particular interest in the log-binomial model, since it is known that the restricted parameter space under the log-binomial model presents numerical difficulties, and the models may fail to converge while maximizing the likelihood [13,14]. Our proposed methodology based on generalized confidence intervals and fiducial intervals can also be used for stratified studies, when one is interested in constructing a confidence interval for the (assumed) common OR, for example.
As will be seen, the confidence intervals that we have constructed for the above interval estimation problems are conceptually simple and straightforward in terms of implementation. The performance of the proposed confidence intervals are assessed based on simulations, and illustrated using several examples. In terms of maintaining the coverage probabilities, the proposed confidence intervals turn out to be quite satisfactory, regardless of the sample size. Our overall conclusion is that the generalized confidence interval approach and the fiducial approach have resulted in a unified methodology for the interval estimation of various epidemiological measures, and the resulting confidence intervals exhibit satisfactory performance and are preferable to the likelihood based methods available in the literature.
The computation of a generalized confidence interval is based on the concept of a generalized pivotal quantity (GPQ). Similarly, the computation of a fiducial interval (FI) is based on a fiducial quantity. In this section, we define these. The GPQ and the fiducial quality are first introduced for a binomial setup without covariates (Section 2.1 and Section 2.2); they are then used to derive the corresponding quantities for the logistic regression model and the log-binomial model with categorical covariates.
In order to define a GPQ, let be a random sample from a distribution that depends on a parameter of interest θ, and a nuisance parameter δ. Let x denote the observed value of X. A GPQ for θ is a function of X, x, θ and δ, say , satisfying the following two conditions:
We note that when is a GPQ for θ, as defined above, then for any scalar valued function of θ, a GPQ is given by , and the percentiles of can be used to obtain confidence limits for . The resulting confidence intervals are referred to as generalized confidence intervals. Sometimes, the distributional property (i) given above will hold only approximately; in this case, we will get only an “approximate GPQ”. This is indeed the situation for the problems investigated in this article. In what follows, we will refer to these approximate GPQs simply as GPQs.
The starting point for the computation of generalized confidence intervals for the various epidemiological measures we have considered is based on an approximate GPQ for the binomial parameter [8,9], and is obtained as follows. For a binomial distribution with parameter p and sample size n, our approximate GPQ for p is based on the normal approximation:
where is the sample proportion. If denotes the observed value of , a GPQ for p is given by,
where Z is standard normal. Quantiles of can be used as confidence limits for p. We shall briefly explain the estimation of the required quantiles by simulation, since such a simulation will be necessary to compute confidence limits for the various parameters that we shall take up in later sections. The quantiles of can be estimated by proceeding as follows. Once data are available, compute the observed proportion . Now generate M times (M = 10,000, for example), say , i = 1, 2, ...., M, and let , i = 1, 2, ...., M. The 95th percentile of the sequence provides a 95% upper confidence limit for p.
However, we note that, with the above definition, . This undesirable feature can be taken care of by using the quantiles of a sequence obtained by concatenating and . In what follows, we shall use this approach.
Here, we shall not provide a general treatment of fiducial quantities; we refer to  for a very detailed discussion. We shall now exhibit two fiducial quantities for the binomial success probability p; the first one being an approximate fiducial quantity. In what follows, we will refer to these approximate fiducial quantities simply as fiducial quantities.
Consider a binomial random variable X with sample size n and success probability p. An approximate fiducial quantity for p, say , is given by:
where x denotes the observed value of X, is the rth order statistic based on a sample of size n from a uniform (0, 1) distribution, and W follows a uniform (0, 1) distribution, independent of and , where , , and . An efficient algorithm to generate these order statistics is described in the Appendix A. The quantiles of the fiducial quantities can be estimated by proceeding as in the case of the GPQ, mentioned in the previous sub-section.
Since inferences concerning the various epidemiological measures under the logistic regression model will be taken up in this article, we shall now exhibit GPQs and fiducial quantities for the logistic parameters. Thus, consider Bernoulli responses where the success probability depends on categorical covariates through the logistic regression model, and suppose we have data corresponding to m covariate vectors, say , i = 1, 2, ...., m, corresponding to the combinations of the values of the covariates. If denotes the probability of a positive response at the covariate vector , we thus have
, where is a vector of unknown parameters. Suppose there are responses corresponding to the covariate vector , and among these, let denote the sample proportion of positive responses. We note that could be equal to one, and is not available in this case. If , so that is available, let denote the GPQ of , as given in Equation (1). Consequently, is a GPQ for . If is the vector consisting of the s, then is a GPQ for . Using this observation, we construct the following two GPQs for the vector , denoted by and :
where , and V is a diagonal matrix whose ith diagonal element is an approximate variance of . Using the delta method, an approximate variance is given by .
Different fiducial quantities for can be similarly constructed using and given in Equations (2) and (3). We note that in order to be able to construct the GPQs and given in Equation (4), we require each to be larger than one, where is the number of Bernoulli responses corresponding to the covariate vector . However, in order to construct the fiducial quantities for , some (or all) s can be equal to one. While this is possible in principle, we noted that the performance of the resulting confidence intervals is not satisfactory, in terms of maintaining the coverage probability.
Under the log-binomial model, the probability for a positive response at the covariate vector is given by:
, where is a vector of unknown parameters. GPQs and fiducial quantities can now be constructed similar to what is given above for the logistic case; simply replace the logit function with the natural logarithm.
Once a GPQ or a fiducial quantity is available for a parameter of interest, confidence limits can be obtained using the percentiles of the GPQ (or the fiducial quantity). This is precisely what is done in this section for the various epidemiological measures mentioned earlier. A property that we shall use is that if independent GPQs (or fiducial quantities) are available for several parameters, then a GPQ (or a fiducial quantity) for any function of the parameters can be obtained as the corresponding function of the GPQs (respectively, fiducial quantities). We start by considering the case of no covariates.
We now apply the GPQ methodology and the fiducial approach for computing confidence intervals for the odds ratio from a single contingency table, or for the common odds ratio from several independent contingency tables. The case of a single contingency table has been addressed using the fiducial solution .
Consider two independent binomial random variables and with respective success probabilities and , and respective sample sizes and . Let and denote the sample proportions. The odds ratio is then defined as . In the absence of covariates, an approximate GPQ for the odds ratio can be easily constructed, and is given by , where and are defined similar to in Equation (1). Fiducial quantities can be similarly defined for the odds ratio. Percentiles of the quantities so obtained provide confidence intervals for the odds ratio. For example, the 2.5th and 97.5th percentiles of give 95% confidence limits for the odds ratio in the absence of covariates.
Consider K independent studies (or strata from the same study), where from the kth study, we have observations for two independent binomial random variables and with respective success probabilities and , and respective sample sizes and , k = 1, 2, ...., K. Thus, the odds ratio from the kth study is , k = 1, 2, ...., K. Assuming that the odds ratio is the same across the K studies, we have (say).
An approximate GPQ for each , to be denoted by , can be constructed from the kth study, proceeding as mentioned in Section 3.1. We now combine these GPQs in order to obtain an approximate GPQ for the common odds ratio δ. For this, we propose a weighted average of the study-specific GPQs on the log scale. The weights that we shall use are motivated as follows. For i = 1, 2, if denote sample proportions from the kth study, and if , k = 1, 2, ...., K, then using the delta method, an approximate variance of , say , is given by:
where we have also used a continuity correction. Noting that , an approximate GPQ for the common odds ratio can be obtained from
The percentiles of can be used to obtain confidence intervals for the common odds ratio δ. Note that we have used data dependent weights that are meant to reflect the variability of each study-specific GPQ on the log scale. This is similar to the adaptive weights proposed in  in the context of robust meta-analysis using confidence distributions; see also , Section 6. Clearly, different choices are possible for the weights, as noted in [15,16], Section 6. Here, we have not investigated a comparison of the different choices for the weights. It is important to note that the approach described above for the common odds ratio can be easily extended to other measures such as the prevalence ratio and the relative risk.
The procedures outlined above for constructing approximate GPQs for the odds ratio and the common odds ratio can easily be adapted for obtaining fiducial quantities for these parameters. In fact, we can obtain two fiducial quantities for each parameter, using Equations (2) and (3). The required derivations should be obvious and the details are omitted. It should be noted that for inferences concerning the odds ratio from a single study, a fiducial solution based on Equation (2) has been previously investigated .
It should be clear that proceeding along the lines of what has been done for the odds ratio and the common odds ratio, GPQs and fiducial quantities can be constructed for any scalar valued function of independent binomial parameters. In particular, if we have two independent binomial random variables and with respective success probabilities, and , and respective sample sizes, and , the relative risk is given by , for which an approximate GPQ is given by (the notations are as before). Fiducial quantities can be similarly obtained. If and are the probabilities corresponding to an adverse event in a treatment group and a control group, respectively, then the number needed to treat (NNT) is given by . Thus, a confidence interval for the NNT can be obtained from a confidence interval for . An approximate GPQ as well as fiducial quantities can be used for computing confidence intervals for . In fact, fiducial quantities for these parameters based on Equation (2) are given in .
So far, our adaptation of the methodology based on GPQs and fiducial quantities has been for situations where covariates are absent. Clearly, the odds ratio, as well as the other epidemiological measures, have extensive practical applications in the context of binomial responses that depend on covariates. The logistic model is very often used to model the response probability. The log-binomial model is sometimes used to estimate the risk ratio in the presence of covariates, when the outcome is not rare. As noted in Section 2.4, under the log-binomial model, the binomial success probabilities p satisfies , where is a covariate vector. Writing and , where s is the number of covariates, the parameter is the prevalence ratio (PR) for a one unit increase in , adjusted for the other covariates. We recall that GPQs and fiducial quantities for are given in Section 2.3 and Section 2.4 for the logistic model and the log-binomial model, respectively. From this, GPQs and fiducial quantities can be constructed for any function of ; in particular, for the various epidemiological measures, including the prevalence ratio.
The accuracy of the proposed procedures based on GPQs and fiducial quantities is assessed using simulations. Here, we have presented the results for only two scenarios: interval estimation of a common odds ratio (under binomial distributions without covariates), and the interval estimation of a prevalence ratio (under the log-binomial model). We refer to  for numerical results on the performance of fiducial intervals for a few other parameters, including that for the difference between binomial proportions. Note that coverage probability for the latter is equivalent to that for the NNT.
Table 1 gives the coverage probabilities of the confidence intervals based on different approaches for a common odds ratio from K = 5 studies, for a 95% nominal level. We also assume that, for the different studies, , and (k = 1, 2, …, 5), where we have used the notations in Section 3.2. The following confidence intervals are considered for the comparison: (i) confidence interval based on the Mantel–Haentzel estimator (denoted by MH in Table 1); see  for details; (ii) the Sato–Mantel–Haentzel confidence interval (denoted by SMH in Table 1); (iii) confidence interval based on the GPQ (denoted by GPQ in the table); (iv) confidence interval based on the fiducial quantity Equation (2) (denoted by F1 in the table); and (v) confidence interval based on the fiducial quantity Equation (3), denoted by F2 in the table; the computation of these intervals is explained in Section 3.2.2. The notation OR in the table refers to the true value of the common odds ratio. We have also computed the mean length and the median length of the different confidence intervals (given within brackets in Table 1). In terms of coverage probability and expected mean length (or median length), the confidence interval based on the fiducial quantity Equation (2) appears to perform better than the other approaches in the simulation setups considered. The mean length as well as the median length of the interval based on Equation (2) is substantially lower compared to those based on MH, SMH and GPQ, while satisfactorily maintaining the coverage probability. While the interval based on Equation (3) exhibits comparable performance in many cases, its coverage probability is not as satisfactory as that of the interval based on Equation (2). The satisfactory performance of the fiducial approach for inferences concerning the odds ratio from a single study was previously noted .
The simulation set up used here is motivated by Example 1 in . Here, apart from a treatment indicator, we have gender as a covariate. Suppose male patients are assigned to the treatment, and to a placebo, and let and , respectively, be the corresponding sample sizes for the females. With Bernoulli outcomes for each patient, we assume a log-binomial model for the probability of a positive response. Thus, if is the probability of a positive response, we assume the model , where the β’s are unknown parameters, is a binary indicator for the treatment, and is a binary indicator for gender. Then, exp() is the prevalence ratio of interest. For various sample sizes and parameter choices, Table 2 gives the coverage probabilities of confidence intervals for using the GPQ, using the two fiducial quantities, and using the asymptotic normality of the maximum likelihood estimator (denoted by ML in the table). The mean lengths and median lengths are also given (the numbers within parenthesis in Table 2). It appears that all the approaches perform well in terms of coverage probabilities; the minor differences noted among the mean lengths and median lengths among the GPQ-based and fiducial-based solutions are perhaps due to the minor differences among the coverage probabilities. We also note that in terms of median lengths, the ML solution has a slight edge over the other solutions. However, its mean length is unusually large in a few cases. This could be a reflection of the convergence problems while maximizing the likelihood; we suspect that the information matrix is becoming close to being singular, resulting in wide intervals. Note that the solutions based on the GPQ approach and the fiducial approach are both free of this drawback.
We present four examples in this section in order to illustrate our interval estimation methodologies, and for making comparisons with other available intervals.
This example is based on data from a cross-sectional study of sleep disturbances among HIV-infected persons in an investigation of the association between depression and insomnia . Insomnia was assessed using the Pittsburgh Sleep Quality Index (PSQI) (with a global score greater than five taken as indication of insomnia). Depression was assessed using the Beck Depression Inventory (BDI). The problem is to estimate the NNT, or, more precisely, the number needed to expose (NNE). The data are reported in Table 3.
Among subjects with normal levels of depression (BDI ), 36.6% have insomnia (56/(56 + 97)), while among subjects with at least a mild level of depression (BDI ), 82.5% have insomnia (33/(33 + 7)).
Thus, the estimated NNE is 1/(0.825 – 0.366) = 2.18. This means that, on average, among approximately every two subjects with a level of depression mild or above, there will be one additional insomnia case relative to the normal group. Ninety-five percent confidence intervals for the NNE are reported in Table 4. We have also included the Wald–Yates and Agresti–Caffo intervals  for comparison.
We note that the intervals based on the GPQ, as well as those based on F1 and F2, are shorter compared to the other two intervals.
The U.S. Military HIV Natural History Study (NHS) is a prospective continuous enrollment cohort study of consenting military beneficiaries with HIV infection including active duty personnel, retirees, and dependents . In this example, we consider the subjects on highly active antiretroviral therapy (HAART) with at least one viral load value (VL) in the first year. A subject is considered viral suppressed (VS) if the VL value at the last visit during the first year is below 400 copies/mL. The goal is to compare the odds of VS between African-American (AA) subjects and Caucasian (C) subjects. Analyses are stratified by enrollment site to accommodate for potential difference in treatment practices. A total of 1796 subjects (AA and C) started HAART after January 1st 1996 at one of three sites, and have at least one VL value during the first year. Table 5 presents the VS status (counts) stratified by race and site. The counts represent those that are virally suppressed (under the Y column) and those that are not suppressed (under the N column).
The p-value for the Breslow–Day Test  for homogeneity of the odds ratios across the sites is 0.48. Thus, we proceed under the assumption of a common odds ratio.
The 95 confidence intervals for the common odds ratio, based on the different methods are reported in Table 6.
This example is taken from , and the data provide counts on the presence or absence of symptoms among AIDS patients who are on the antiretroviral drug AZT, categorized by race (White or Black). Thus, race is a binary covariate. The data are reported in Table 7.
Following , we assume a model without an interaction between race and treatment. If p denotes the proportion having symptoms, we model it as logit, where is a binary covariate for race, and is a binary covariate that categorizes a patient as taking AZT or not taking it. Thus, if denotes the proportion having symptoms among the whites who take AZT, and , and similarly defined, the model for (logit(), logit(), logit(), logit( can be written as:
similar to what is presented in Section 2.3. Based on the second GPQ in (4), we computed confidence intervals for and , and they are given below. For comparison, we have also included the Wald interval. Results are similar: adjusted for race, treatment is effective in reducing the probability of developing AIDS symptoms (corresponding to ), while adjusted for treatment, there was no difference in outcome based on race (corresponding to ).
We now revisit Example 1 in , a clinical trial for treatment of migraine headaches; some details of the example are presented in Section 4.2 and will not be repeated here. Here, we have the log-binomial model ; we once again refer to Section 4.2 for an explanation of the notations. We shall consider the interval estimation of the prevalence ratio . Using the data in , the maximum likelihood estimators (MLEs) of the β’s are = −1.398, = 0.783 and = −0.151. Thus, we have the estimated prevalence ratio = 2.189. The 95% confidence intervals for , obtained by different methods are given in Table 8. We have also included the likelihood based interval.
We note that the intervals based on the different approaches are all very similar. This is also consistent with the numerical results in Table 2, since the different approaches (including the ML, when it converged) resulted in similar coverage probabilities and mean lengths (as well as the median lengths).
Interval estimation of various epidemiological measures is of considerable practical significance while analyzing data from epidemiological studies. The present work addresses this problem for a variety of measures when we have binary outcomes. This investigation has been motivated by two practical considerations: accuracy of the confidence intervals in terms of maintaining the coverage probability close to the nominal level (especially in small samples), and ease of computation. The concepts of generalized pivotal quantities and fiducial quantities appear to provide confidence intervals that meet both of these requirements for a variety of epidemiological measures. In short, the approaches described here appear to provide a unified methodology for obtaining accurate and easy to use confidence intervals for binary data under the logistic regression model, and also under the log-binomial model. The computational advantage could be especially interesting in the context of the log-binomial model, since the model is known to present computational challenges (lack of convergence) while trying to compute the MLEs; this issue came up in the context of the numerical results in Table 2. A major advantage of the methodologies proposed here is that they are not based on the MLEs, and there is no need to compute the MLEs.
Our work can be extended in several directions. First, the generalized and the fiducial quantities proposed and investigated herein are frequentist in nature, and therefore only frequentist methods were considered. It would be of interest to further compare them with Bayesian approaches . Second, other fiducial quantities can be considered. For example, a generalized fiducial quantity was obtained in  by solving a data-generating equation in the context of binary logistic item response models. The solution is not unique, but the impact of the selection rule is usually asymptotically negligible .
The log-binomial model imposes a natural constraint on , namely, (using the notation in Section 2.4). Consequently, a GPQ (or a fiducial quantity) of must satisfy . The construction of the GPQ (as well as the fiducial quantity) described in this article is not guaranteed to meet this condition. One approach to have this constraint satisfied is to consider the projection of the GPQ onto the convex set defined by the constraint. Such a projection will also be a GPQ. However, this could present a methodology that is computationally demanding, and we have not pursued it in the present investigation.
The generalized confidence interval approach and the fiducial approach provide a unified methodology for the interval estimation of various epidemiological measures. The resulting confidence intervals exhibit satisfactory performance in terms of maintaining the coverage probability close to the nominal level.
The authors would like to thank the academic editor and the two reviewers for thoughtful and constructive suggestions.
Notice that for , one has . In addition, the joint distribution of is given by:
and one can use a conditional approach to simulate from this distribution as follows:
Ionut Bebu, George Luta and Thomas Mathew developed the statistical methods; and Ionut Bebu and George Luta performed the numerical experiments. All authors contributed to the acquisition, analysis, and interpretation of the data, and to the preparation of the manuscript.
The authors declare no conflict of interest.