We analyzed baseline data from the Healthy Living Project (HLP) [

14], a multi-site randomized controlled trial to determine the effect of a behavioral intervention on sexual risk behaviors in HIV-positive participants. Baseline data were collected on 2,845 HIV-positive individuals who were on ARVs. Mean ARV adherence data was assessed using the AIDS Clinical Trials Group (ACTG) measure of three-day adherence [

15] for each ARV and averaged for a participant's complete ARV regimen. For the purpose of demonstrating an example with the use of LR, hurdle, and ZINB models, age (categorized at ≤ 34, 35–44, and ≥ 45) was used as a historical predictor of ARV non-adherence, where higher age is associated with lower levels of non-adherence. [

1,

16,

17] We utilized Stata, version 11 (Statacorp, College Station, TX, USA) for all analyses.

Percent ARV adherence was transformed to percent ARV non-adherence by subtraction from 100%. In assessing age effects, data were dichotomized at a cut-off of 0% (1 = 0% non-adherence; 0 = non-adherence greater than 0%) and binary LR using the Stata command -logistic- was employed to determine the odds of 0% non-adherence between age categories in comparison to the reference category (i.e., 35–44). A 2-degree-of-freedom (2-df) Wald chi-square test with a two-sided alpha of 0.05 was used.

Similarly, for the hurdle model, LR was used to analyze the odds of 0% non-adherence using the -logistic- command; additionally, for the subset of participants who had > 0% non-adherence we used GLM-GL via the Stata command -GLM-. GLM regression was conducted using a gamma distribution and log function to link mean non-adherence to the predictor (i.e., age) in order to determine the fold-change in mean non-adherence between various categories of the predictor in comparison to the reference category (i.e., 35–44). We used 2-df Wald chi-square test in the LR and GLM-GL components of the overall model, rejecting the null hypothesis if either p-value was less than a Bonferroni-corrected 0.025, in order to correct for multiple testing. In addition, we used an omnibus 4-df test for the combined effect of age in both components of the model using the -suest- command.

Lastly, for the ZINB model, we utilized the Stata command -zinb-. We rejected the null hypothesis if the omnibus 4-df Wald chi-square p-value for the combined effect of age in both parts of the model was less than 0.05. Throughout this paper, p-values where we compare across categories of the predictor (e.g. age) will be referred to as the “overall p-value” (2-df) and p-values where we combine across components of various models will be referred to as the “omnibus p-value” (4df).

In order to further investigate conditions where the hurdle (using LR plus GLM-GL) or ZINB models would perform more or less well in comparison to binary LR alone, we conducted a simulation study using three scenarios. Our outcome variable consisted of ARV non-adherence and our predictor was a hypothetical variable with four categories (0, 1, 2, and 3). The allocation of observations in categories 0–3 was 40%, 35%, 19%, and 6%, respectively. This was based on other predictors of ARV non-adherence, such as depression. Model parameters (signifying 0% non-adherence and degree of non-adherence) were selected jointly to yield the desired distribution of data under both the hurdle and ZINB models. The zeroes were generated as a Bernoulli random draw for each simulated participant and had the expected variability across runs. Since simulated values from a gamma distribution are not constrained to be less than 100%, all values of non-adherence greater than 100% were set to 100%. This small collection of data points with 100% non-adherence was also noted in the actual HLP data because a small group of individuals had reported that they had not taken a single dose of their ARVs within the past three days.

For each of the scenarios described below, we generated 1,000 datasets under the hurdle model and 1,000 datasets under the ZINB model, then applied LR, hurdle, and ZINB analysis models to each of the resulting 2,000 datasets. We estimated power and type-I error by the proportion of datasets in which each of the models rejected the null hypothesis, using the testing procedures previously described. With 1,000 simulated datasets, margins of simulation error are approximately 1.4–3.1 percentage points.

In Scenario 1, also called the Sample Size Scenario, we compared LR, hurdle, and ZINB models while focusing on the effect of increasing sample size from 200 to 3,000. Type-I error was estimated under the case of no differences with increasing sample size. For the power evaluation, we selected a situation in which the categorical predictor had moderate effects in both components of the data-generating hurdle or ZINB models ().

| **Table 1**Logistic regression, hurdle, and ZINB estimates of age effects on ARV non-adherence in the Healthy Living Project |

In Scenarios 2 (the Gamma Distribution Scenario) and 3 (the Binomial Distribution Scenario), we assessed the power of the LR, hurdle, and ZINB models under nine settings in which the predictor had small, moderate, or large effects in one or both components (i.e., 0% non-adherence or the degree of non-adherence) of the data-generating model. In these simulations, sample size was fixed at 2850, the size of the HLP sample.