presents the results from the four models for all vaginal sex episodes outcome at 12 months, while controlling for the baseline value of the outcome, and six other covariates: age at baseline, white race (0 = not, 1 = yes), multiple racial (0 = no, 1 = yes), other ethnicity (0 = no, 1 = yes), Hispanic (0 = no, 1 = yes), poverty (0 = no, 1 = yes), and treatment condition (0 = control, 1 = intervention). For the two-component ZIP (ZINB) model, the table includes results from both the logit and Poisson (NB) modules. We tested for excess zeros by comparing the Poisson and NB models to the ZIP and ZINB models, respectively, using the Vuong test. The test statistics, V = 7.64 for ZIP versus Poisson and V = 3.24 for ZINB versus NB, show that both ZIP and ZINB provide a better fit than their one-component counterparts. The Lagrange multiplier is also significant. Hence, there is evidence of overdispersion due to excess zeros. Further, estimates of the dispersion parameter α = 0.84 from ZINB and α = 1.39 from NB also indicate overdispersion due to data clustering.
| Table 2The estimated parameters, coefficients of the Poisson, NB, ZIP, and ZINB models for all vaginal sex episodes outcome at 12 months. |
For the nested structure, both NB and ZINB had a much lower −2 log likelihood than that of the Poisson and NB (P values < 0.0001). Thus, likelihood ratio tests also favor ZIP over Poisson, and ZINB over NB models.
The AIC obtained from the data were in the following order (see ):
The BICs showed the same order. Thus, under both AIC and BIC, ZINB seems to be optimal model among the four models considered.
For the results at 3 and 6 months, the Vuong and likelihood ratio tests, and AIC criterion, also show that ZINB (NB) was a better fit of the data than ZIP Poisson, with ZINB having the lowest AIC among the four models. In addition, both dispersion parameter and Lagrange multiplier (LM) tests implied the existence of overdispersion due to data clustering. The results from BIC are consistent with those from AICs, with the exception that NB at 6 month had a slightly smaller BIC than that of ZINB (3751.12 versus 3757.00).
Next we compared the four models in terms of how well each model captures the zeros in the data. summarizes the percentage of zeros captured by the Poisson, NB, ZIP, and ZINB. For the all vaginal sex episodes outcome at 3 and 12 months, the fitted zeros by ZIP were very close to the observed ones; at 6 months, ZINB was slightly better than ZIP in that regard. For all the visits, Poisson was the worst in estimating the zeros. For the unprotected vaginal sex with steady partners, the percentage of zeros estimated by ZIP had almost an exact match to their observed counterparts at 3, 6, and 12 months. Compared to ZIP, the estimated percents of zeros by NB and ZINB were slightly lower than that by ZIP. For the unprotected vaginal sex with other partners outcome, NB, ZIP, and ZINB all had good performance in estimating the zeros, with ZINB (ZIP) providing the best estimate at 3 (6 and 12) months. For the any unprotected vaginal sex with steady or other partners outcome, ZIP performed the best, while ZINB slightly overestimated zeros at 12 months. Again, Poisson performed the worst.
| Table 3Percentage of zeros captured by the POIS, NB, ZIP, and ZINB models. |
Plots of observed versus fitted values are also quite helpful to visualize model fit. For the count data within our context, we can compare the fitted and observed probabilities of the count response by taking the probability distribution into consideration. Shown in are the plots of the probabilities from the fitted models versus the observed for the all vaginal sex episodes and the unprotected vaginal sex with steady partners outcomes at 3, 6, and 12 months. ZINB fit the observed data well for all 3, 6, and 12 months, as compared to the other models. In terms of capturing the observed zeros, ZIP behaved very well overall across all three visits, while ZINB had the best fit to the zeros at 6 months.
Generally, the two-component nature of ZIP and ZINB provides them a competitive edge in terms of accurately representing the zeros in the data. Poisson exhibited the worst fit to both zero and positive counts, followed by ZIP. For example, for the all vaginal sex episodes outcome at 3-month visit, Poisson underestimated zeros and small counts (e.g., 0 ≤ count ≤ 4), but overestimated intermediate counts (e.g., 6 ≤ count ≤ 12); ZIP also underestimated small counts (e.g., 1 ≤ count ≤ 6) and overestimated intermediate counts (e.g., 7 ≤ count ≤ 12), although it fared better in the overestimated intermediate counts compared to the Poisson. NB underestimated zeros and overestimated small counts (e.g., 1 ≤ count ≤ 5 in same case), although with less bias than the Poisson.
ZINB was better than NB in both estimating the zeros and small counts, but it still underfitted the number of zeros, and overfitted the small counts (e.g., 1 ≤ count ≤ 5) at 3 month visit, but the fit improved at 6 month visit. At 12 months, ZINB and NB were identical, with both underestimating the number of zeros and overestimating the small counts (e.g., 1 ≤ count ≤ 5); the Poisson severely underestimated both zeros and small counts (1 ≤ count ≤ 5) but overestimated for intermediate counts (7 ≤ count ≤ 22).
The performance improved for all these four models as the number of zeros decreased and the range of counts became smaller. For other outcomes, the plots for comparing the fitted and observed data and conclusions about the comparisons are quite similar and thus are not further discussed.
Taken together, ZINB is the best model in terms of model fit by best capturing the shape of distribution of observed values at the same time, followed by NB, ZIP, and the Poisson. The results indicate that there are not only structure zeros presented in the data, but data clustering as well. This conclusion is consistent with the goal of the intervention and objects of this study—to promote safer sex and abstinence from risky sexual behaviors. Thus, the better performance of the two-component ZIP and ZINB models over their respective one-component counterpart Poisson and NB is expected from the conceptual grounds.
Upon establishing the right models, we now turn our attention to the interpretation of the results with the specific context of the HIV prevention intervention study. As only ZIP and ZINB are appropriate for modeling the outcomes in this study, they were fit to the data at 3, 6, and 12 months data for each outcome. For illustration purposes and space consideration, we focus on the intervention results for the all types of vaginal sex episodes outcome at 12 months.
Both models were fit, while controlling for the baseline value of the outcome, and the six covariates. We did not model all followup data simultaneously using longitudinal methods, since such an approach was unavailable from major software packages such as SAS, which we used to fit ZIP and ZINB in the current context. Rather, we modeled each followup visit one at a time, while controlling for the outcome of interest at baseline along with the covariates mentioned above. Also, we only report the results for the treatment condition as the intervention effect is the main outcome of this randomized controlled trial.
displays the estimates of regression coefficients for the intervention effect of both components of the ZIP and ZINB models, respectively. Shown under the Poisson regression part from ZIP (negative binomial regression part from ZINB) are the coefficients for the treatment condition, with the control condition serving as a referent level, for the Poisson submodel of ZIP (Negative binomial submodel of ZINB) over each of the followup visits for the All types of vaginal sex episodes outcome. Shown under the Logistic regression part are the coefficients for the logistic regression submodel of ZIP (ZINB). As mentioned, the Poisson (Negative binomial) component of ZIP (ZINB) models the effect of the intervention for the at-risk subgroup, while the logistic module models the intervention effect for the nonrisk subgroup.
| Table 4Intervention effect for all types of vaginal sex episodes outcome from ZIP and ZINB. |
Using ZIP, the effect of the intervention condition was statistically significant for the Poisson module over all the followup visits (
P values < 0.0001). The negative sign of the coefficient indicates that the intervention reduced the mean frequency of this outcome for the subjects in the at-risk group who received the intervention, as compared to those within the control group. The reduction was 13.89% (1−exp (−0.1495) = 0.1389), 12.83%, and 13.06%, at 3, 6, and 12 months, respectively. The effect of the intervention condition was also observed for the negative binomial component of ZINB over all the followup visits, although the results were only significant at 3 and 12 months with
P values = 0.0055, and 0.0486, respectively. As compared with the control condition, the reduction was 19.87% (1−exp

(−0.2215) = 0.1987), 13.13%, and 17.40%, at 3, 6, and 12 months, respectively.
The intervention effect was also statistically significant for the logistic model from ZIP and ZINB at 6 months with P value = 0.0011, and 0.0044, respectively. The positive sign of the coefficient indicates that a significantly higher proportion of girls stayed abstinent from the particular type of sex under consideration in the intervention than in the control group, with an odds ratio of 2.19 (log odds ratio = 0.7859) from ZIP model, and 2.74 (log odds ratio = 1.0068) from ZINB. Although the intervention effect did not reach statistical significance at 3 and 12 months, the positive signs of the coefficient at both visits from ZIP, and 12 month visit from ZINB, show that more girls in the intervention group exercised abstinence than those in the control group during the respective time periods.