Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2452993

Formats

Article sections

- Abstract
- PART 1: DECISIONS REGARDING ITEM CONTENT AND SCALING
- PART 2: THE ANALYSIS OF COUNT DATA
- SUMMARY AND CONCLUSIONS
- References

Authors

Related links

Ann Behav Med. Author manuscript; available in PMC 2008 July 10.

Published in final edited form as:

Ann Behav Med. 2003 October; 26(2): 76–103.

PMCID: PMC2452993

NIHMSID: NIHMS52211

Center for Health and Behavior, Syracuse University

See other articles in PMC that cite the published article.

Investigation of sexual behavior involves many challenges, including how to assess sexual behavior and how to analyze the resulting data. Sexual behavior can be assessed using absolute frequency measures (also known as “counts”) or with relative frequency measures (e.g., rating scales ranging from “never” to “always”). We discuss these two assessment approaches in the context of research on HIV risk behavior. We conclude that these two approaches yield non-redundant information and, more importantly, that only data yielding information about the absolute frequency of risk behavior have the potential to serve as valid indicators of HIV contraction risk. However, analyses of count data may be challenging due to non-normal distributions with many outliers. Therefore, we identify new and powerful data analytical solutions that have been developed recently to analyze count data, and discuss limitations of a commonly applied method (viz., ANCOVA using baseline scores as covariates).

Research on sexual behavior influences public policy as well as educational, clinical, and public health practice for a diverse range of health domains, including family planning, infertility, unintended pregnancy, sexual functioning, and sexually transmitted infections (STIs). The quality of the information yielded by sexual behavior research depends on the methodological rigor of that research. Because of the private (and often stigmatized) nature of sexual behavior, the dyadic (rather than individual) aspect, the multiple motives for sexual behavior, and the large intra- and inter-individual differences in behavioral frequency, research on sexual behavior involves many challenges for investigators (1, 2).

In this paper, we address two of the challenges that researchers confront when investigating sexual behavior, namely, decisions regarding (a) the assessment of sexual behavior (i.e., item content and scaling), and (b) the analysis of sexual risk behavior data. We focus on conceptual differences and data analytical problems that distinguish counts from relative frequency measures of condom use.^{i} We discuss these two challenges in the context of HIV research, with a focus on unprotected (vaginal or anal) intercourse because, as noted by Jemmott and Jemmott, unprotected intercourse is “the best indicator of risk of sexually transmitted infection inasmuch as it indicates the number of exposures to risk” (p. S50)(4). Our purposes are (a) to raise awareness about the need to differentiate between count and relative frequency measures, (b) to discuss options suitable for the analysis of count data, and (c) to identify needs for further methodological research.

Perhaps the most important decision that a sexual health researcher must make involves item content and scaling. Two major categories of sexual risk measures can be found in the literature, namely, count data and relative frequency measures.^{ii} In this section, we identify options for the measurement of unprotected intercourse, review recent trends in the assessment of sexual risk behavior, and discuss the rationale and utility of the most common measurement approaches.

Count measures and relative frequency measures are two distinct categories of sexual risk behavior measures. Most (but not all) assessment methods can be subsumed under these two major categories.^{iii} We define and discuss each of these measures.

Theoretically, count items represent measures of discrete events on a ratio scale. Count measures ask participants to indicate the exact number of times they engaged in a sexual risk behavior during a specified period of time. Items assessing counts typically employ an open response format and often assess unprotected intercourse with two or more related questions. For example, the respondent may be asked “How many times did you have vaginal sex during the past three months” and “How many of these times did you use a condom?” The number of unprotected vaginal intercourse occasions is then computed as the difference between the total number of vaginal intercourse occasions and the number of times condoms were used. Alternatively, count measures can be collected by diary or Timeline Followback (TLFB) methods, which assess sexual risk behavior on event level. That is, each single event is recorded as either protected or unprotected, and count measures are derived by summing all occasions of unprotected intercourse that a person reports.

The primary disadvantage of count data is the usually extreme deviation from a Gaussian distribution, creating difficulties for data analysis. A secondary disadvantage of count data is that, when derived from event-level data (i.e., those obtained with diary, TLFB, or similar event-by-event reporting techniques), such data can require more time to obtain compared to measures that ask for an overall estimate of frequencies about a specified time period only.

Relative frequencies of unprotected intercourse emerge from four kinds of measures: (a) proportions, (b) percentage ratings, (c) categorical measures, and (d) dichotomies. The common feature shared by these measures is the assessment of unprotected intercourse relative to the total number of intercourse occasions.

Proportions or percentages are derived from count data; however, they are relative frequency measures because they represent the ratio of protected or unprotected intercourse to the total number of intercourse occasions. The proportion of condom-protected vaginal intercourse is computed as the number of condom-protected intercourse occasions divided by the total frequency of vaginal intercourse during the reference time interval. For example, a person reporting condom use on 5 of 15 occasions of intercourse would receive a value of .33 on the proportion scale, or a value of 33% on a percentage scale. Proportions derived from counts may also deviate from normal distributions. Occasionally, a negative kurtosis emerges with high frequencies on both ends of the distribution and low frequencies between the values of 0 and 1.

Percentage ratings do not originate from count data but emerge when respondents rate the use of condoms on a percentage scale. Subjects may be asked: “How often did you use condoms when you had sex in the past three months?” They may, for example, respond to an 11-point scale ranging from 0 to 100 percent in ten percent increments. Percentage ratings are estimates only and provide ordinal data in form of ordered categories. They, too, may be affected by skewness or negative kurtosis, with high frequencies on one or both ends of the distribution and lower frequencies in the intermediate categories.

Categorical measures of relative condom use comprise Likert scales and dichotomous measures. Both yield information similar to proportions and percentage ratings in that they ask participants to report the frequency of condom use relative to the frequency of intercourse. Likert scales work with multiple response options, such as using condoms “every time,” “sometimes,” or “never.” Regardless of the number of response options employed, the typical feature of Likert scales for safer sex is a range from “never” to “always.” Likert-type categories can be derived from proportions as well by dividing the sample into those who report condom use in 100% of their sexual encounters (always), those who report 0% condom use (never), and those who have percentages >0 and < 100 (sometimes), which may be divided further into two or more categories, such as < = 50% and >50%. An advantage of Likert ratings is their greater approximation to a Gaussian distribution. Such data usually do not require transformation, and their more favorable distribution allows researchers to apply parametric significance tests, which tend to be more powerful than non-parametric analyses. Two disadvantages are their lower precision and their potential limitation as an indicator of sexual risk behavior, two issues that we discuss in more detail later.

Dichotomous measures are similar to ordinal measures, reduced to two categories. For example, a common dichotomy identifies “low-risk” individuals as those respondents who use condoms consistently (“always”) versus “high-risk” individuals who use condoms inconsistently or not at all (“not always”). Dichotomies can also be derived from ordinal data or count measures by data reduction. They have to be regarded as measures of relative condom use as well, as they provide information only about condom use relative to the total number of intercourse occasions. Dichotomous measures facilitate group comparisons (e.g., using odds ratios) to explore or test hypotheses regarding the correlates of high-risk sexual behavior. The primary disadvantage of dichotomous measures is the loss of quantitative information. Dividing a sample into two groups often leads to a heterogeneous group of “high-risk” individuals, including – for example – those who (a) have used condoms all but one time, (b) never use condoms because they believe they are in a mutually monogamous relationship, and (c) engage in extremely high-risk activities such as sex trading with multiple partners without using condoms. Even if sub-samples are analyzed separately, the results are still based on a rough measure of HIV contraction risk, reduced to two categories only.

Given the many assessment options, investigators must decide whether to use count data, relative frequency data, or both. This choice is typically guided by factors such as the goals of a study, the level of precision desired, anticipated data collection costs, data analytical considerations, and feasibility. In order to determine how the choice of assessment methods is related to study goals, we reviewed studies published between 1995 and 2001. We searched PsycINFO and Medline using as keywords “HIV or AIDS” and “condom use” or “unprotected intercourse.” Further, we searched in the reference sections of identified studies and included publications from 1995 or later that matched the purpose of this review. We selected all studies that analyzed self-reported condom use as a distinct measure of sexual risk behavior. We excluded studies that did not (a) analyze condom use as a separate outcome (i.e., those studies that used unprotected intercourse only as part of a composite measure), (b) focus on condom use or sexual risk behavior as a primary outcome of interest, or (c) provide enough information to allow us to categorize the measure as a count or a relative frequency measure.

Table 1 lists the resulting sample of 116 studies that illustrate the range of measures used to assess unprotected intercourse (5–120). We classified the studies according to their purpose as (a) intervention studies, (b) correlational studies (i.e., non-experimental research investigating relationships between risk behavior and potential predictors), or (c) methodological studies (testing the reliability and validity of condom use data; see column b). We also distinguished among studies analyzing count measures, proportions, percentage ratings, and categorical/dichotomous measures (see column c).

Of the 116 studies listed in Table 1, the majority (*n* = 74) relied exclusively on relative frequency data, most often in the form of categorical or dichotomous data. Count data were also used commonly, either alone or in combination with proportional measures (*n* = 42). A closer look at the relationships between item category and study goals (see Table 2) reveals that count data were employed more often in intervention research and methodological studies (*n* = 34, 81%) and less often in correlational studies (*n* =8, 19%). In contrast, relative frequency data were used more often in correlational studies (n = 50, 68%) rather than intervention or methodological investigations (n = 24, 32%). Thus, the use of relative frequency and count data differs by study type, χ^{2} = 25.23, *df* = 1, *p* < .001.

In comparison to earlier reviews (121–123) and a recent meta-analysis on correlates of condom use (124), the current review indicates increased use of count measures and proportions since 1995 (121, 122, 124, 125). In their meta-analysis, Sheeran et al. (124) identified 121 studies published between 1981 and 1996, which focused nearly exclusively on categorical and relative frequency measures of condom use. In contrast, the current review reveals that approximately 36% of the studies used counts, or a combination of counts and proportions. This trend might be interpreted as reflecting an increasing need for precision for HIV risk behavior research, particularly in intervention and methodological research. This pattern also suggests an emerging consensus that relative condom use measures may not suffice to inform about the extent of sexual risk behavior before and after a treatment and may not be useful to evaluate intervention success. We will discuss these hypotheses in detail in the following section.

The increasing interest in counts in intervention and methodological research, and the frequent use of categorical data / percentage ratings in correlational research, require explanation. The aforementioned trend towards including count measures has not affected all types of studies equally, suggesting that counts are not universally regarded as the more useful measures. In this section we discuss the utility of count and relative frequency measures relative to study goals. First, we compare these measures as indicators of HIV contraction risk and markers of HIV intervention success. Second, we discuss their utility in studies testing theoretical models of health behavior.

The most crucial question to be answered in intervention research is whether the treatment reduces effectively the frequency of exposure to unprotected intercourse in the target population. If we agree that the focus in intervention studies needs to be the reduction of risk behavior rather than a mere increase in condom use, then the most important criterion in evaluating treatment effects needs to be a decrease in the absolute number of risk exposures. Relative frequency measures of condom may not suffice to evaluate intervention success. We argue that (a) relative frequency measures are usually imprecise indicators of HIV contraction risk, (b) count data yield important and non-redundant information, and (c) results obtained with relative frequency measures may not be generalized beyond the limited information that they provide. Further, we discuss (d) situations in which relative frequency measures may be useful in intervention research.

Several recent publications indicate an increasing consensus that count data are needed in order to evaluate intervention effectiveness. According to Jemmott and Jemmott (4), unprotected intercourse is “the best indicator of risk of sexually transmitted infection inasmuch as it indicates the number of exposures to risk” (p. S50). Similarly, Jaccard concludes that “from a public health perspective, a major criterion of interest is the sheer number of instances of unprotected sexual intercourse that occurs in a population over a given time period”(50). Further, in a recent paper discussing biological and behavioral markers of intervention success, Fishbein and Pequegnat (126) come to the same conclusion, stating that “if one is truly interested in preventing disease or pregnancy, it is the number of unprotected sex acts and not the percentage of times condoms are used that should be the critical variable.” In general, the risk of HIV infection that an uninfected person takes increases as a function of the number of times this person exposes him- or herself to unprotected sex with an infected partner, all other factors held constant, as each single event adds to the risk of HIV contraction. If we control for co-factors such as the amount of risk behavior displayed by a sexual partner, infectiousness of a seropositive partner at a given time, a person’s biological vulnerability, and viral load, then the likelihood of becoming infected with HIV is proportional to the number of times he or she is exposed to the virus.

The need for count data in evaluating HIV contraction risk and intervention success becomes most apparent in formulas that try to quantify factual HIV risk based on absolute frequencies of unprotected intercourse. For example, the Vaginal Episode Equivalent (VEE) Index developed by Susser et al. (127) assumes that counts of unprotected intercourse are the best indicator of HIV contraction risk, which is expressed by giving weight to each single occasion of unprotected intercourse depending on the relative amount of risk connected with unprotected vaginal (ω=1), anal (ω=2), and oral (ω=.1) intercourse. Similarly, mathematical models for the prediction of HIV contraction risk and the spread of HIV, such as the Bernoulli Process Model (128, 129), require highly accurate and detailed information, including the exact number of times a person engages in unprotected sex; such data can be obtained only from count measures of sexual risk behavior. The same applies to the evaluation of HIV risk reduction programs and estimates of cost-effectiveness of interventions.

Accordingly, the primary task of HIV risk behavior interventions has to be seen in reducing the risk of HIV contraction, which needs to be evaluated by comparing the absolute frequencies of engagement in risky behaviors before and after treatment. Although relative condom use measures may supplement count measures in intervention research, we argue against the reliance on relative frequency measures in intervention studies for the following reasons.

An important disadvantage of relative frequency measures of condom use is that they usually do not inform about the absolute frequency of unprotected intercourse, as the following example will show. Imagine two cases: Person A reporting two occasions of sexual intercourse, and once using a condom; Person B reports 100 occasions and condom use during 50 of those. In a count approach, A has a risk score of 1, B a risk score of 50, reflecting the fact that B had 50 exposures to infection risk and thus behaved 50 times riskier than A. In terms of proportions, however, both A and B get the same value because each has had unprotected sex in 50% of their sexual encounters. The same applies to categorical measures of relative condom use. On a three-point scale, for example, ranging from “never” to “sometimes” and “always,” both persons are assigned to the same middle category (for a similar example, see Fishbein and Pequegnat (126). This lack of precision in the evaluation of HIV contraction risk argues against the use of relative frequencies as quantitative measures of sexual risk behavior.

This disadvantage of relative frequency measures may account for the fact that only about 50% of the intervention studies analyzing count data report proportions of condom use as a supplemental criterion of treatment effects. Thus, counts or total frequencies of unprotected intercourse seem to be regarded as the primary outcome of interest (18, 55, 62, 63, 69, 101, 105, 116, 130).

Further, as Table 1 reveals, authors using count data rarely report proportions of unsafe sex; instead, if proportions are analyzed, the proportion of protected intercourse is reported. Statistically, it should be irrelevant whether the proportion of protected or unprotected intercourse is reported, as the magnitude of one determines the other. The preference of proportions of protected intercourse may reflect the conviction that count data of condom-protected intercourse yield valuable information only relative to the total frequency of intercourse. In contrast to unprotected intercourse, count data of protected intercourse are not very informative without comparison to the total number of sexual events. If we know that person A reported 15 occasions of condom-protected intercourse, we still do not know what this means in terms of the consistency of A's self-protective behavior. However, if we know that A uses condoms 75% of the time, we have an idea how condom use relates to the overall sexual behavior of A. Thus, whereas count measures of unprotected intercourse are better indicators of HIV contraction risk, relative condom use measures may be the better choice in evaluating condom use as a habit, that is, the conditional likelihood that a condom is used if a person engages in sexual intercourse.

In principle, relative frequency measures of condom use could substitute for count measures of unprotected intercourse in intervention research if we could rely on a strong association between the two measures. However, there exists no mathematical reason to assume a consistently high correlation between counts and relative frequency measures. Further, a variety of factors (e.g., partner type) are likely to influence the association between counts and relative frequency measures. Therefore, before results obtained with relative condom use can be generalized to counts of unprotected intercourse, the correlation between these two measures has to be determined. This, however, requires the simultaneous assessment of absolute frequencies – counts – of sexual risk behavior.

The uncertain association between counts and relative frequency measures can be demonstrated, as the following examples show. First, we created the hypothetical data depicted in Table 3 to illustrate how the correlation between an absolute and a relative frequency measure might range from −1.00 to 0.00 to +1.00. Second, using actual data from our ongoing and published studies (18, 19, 21), we calculated correlations between counts and proportions of protected and unprotected vaginal intercourse. Proportions were further transformed in various categorical condom use measures. Pearson correlations between (normalized) count data of unprotected intercourse and the various relative condom use measures ranged from *r* = −.23 to *r* = −.78, and the correlations between (normalized) counts of condom-protected intercourse and relative condom use ranged from *r* = −.04 to *r* = .82. Our conclusions regarding the divergence of relative and absolute frequency measures are further supported by several previously published studies reporting only low to moderate associations between these measures. Fishbein and Pequegnat (126) found correlations between r=−.20 and r=−.40 for the number of unprotected sex acts and the percentage of condom-protected intercourse, and O’Leary et al. (86) reported a correlation of *r* = .01 between the frequency of unprotected intercourse (i.e., counts) and the proportion of occasions in which a condom was used. In addition, several counter-intuitive findings in the literature provide evidence that caution is warranted in generalizing results obtained on the basis of the relative frequency of condom use. In a recent meta-analytic study, Sheeran et al. (124) found an overall negative correlation of r = −.18 between the frequency of condom use and the frequency of sexual intercourse based on a sample of studies that usually employed measures of relative condom use (categorical measures, dichotomies, percentage ratings). This result is extremely unlikely to occur with count data because a person with few occasions of sexual intercourse has fewer opportunities to display safer sex than a more sexually active person. Similarly, Kasprzyk et al. (65) reported an insignificant negative correlation of *r* = −.16 between unwanted pregnancy and the relative frequency of condom use in vaginal sex but a strong negative correlation of *r* = −.40 between the relative frequency of condom use in anal intercourse and unwanted pregnancy. Again, this finding is unlikely to occur if count measures of unprotected intercourse are used. Overall, then, this pattern of results indicates that relative frequency measures of condom use cannot be used as proxy for count measures.

Although our general perspective is that count data provide a more sensitive measure of HIV contraction risk and should, therefore, be preferred in HIV intervention research, we do not wish to imply that relative frequency measures of condom use are incapable of indicating HIV infection risk. Indeed, several partner studies reported a dose-response relationship between categories of condom use (e.g., “never,” “sometimes,” and “always”) and HIV seroconversion. For example, a study involving commercial sex workers in Kenya revealed an association between a categorical measure of condom use and HIV seroconversion (131). Similarly, a meta-analysis of HIV serodiscordant partner studies revealed a dose-response relationship between 3 categories of condom use (“never,” “sometimes,” and “always”) in predicting seroconversion (132). However, because these studies did not provide information about absolute frequencies of risk behavior, it is not possible to judge how relative condom use measures compare to count data as indicators of risk for HIV contraction. Indeed, we would assume a relationship with HIV contraction risk for any measure that distinguishes between “consistent” condom users and “inconsistent” or “never” users. However, the fact that such a relationship exists for categorical measures does not demonstrate that relative frequency data are superior to count data as indicators of HIV contraction risk.

Even with this evidence as background, the fact remains that relative condom use measures do not, by themselves, yield sufficient information about the extent of individual risk behavior, as our former examples have demonstrated. Relative condom use measures may provide valuable information about the extent of risk behavior if combined with background information about the absolute frequency of engagement in sexual intercourse. For example, in a homogeneous sample of highly active sex workers, relative condom use measures are highly informative about the extent to which each individual engages in unprotected intercourse. However, homogeneity of behavior frequencies cannot be assumed without testing, which again points to the need for count data in evaluating intervention success. In intervention research, relative frequency measures of condom use may be employed in combination with information about absolute frequencies of intercourse. However, we discourage the use of relative frequency measures if they are not indicative of the absolute frequency of sexual risk behavior.

One final example may demonstrate the problem inherent in the exclusive use of relative condom use measures. In principle, it is possible that HIV risk interventions induce participants to engage more often in sexual activities. In this case, even if the relative frequency of condom use may be enhanced after treatment, there is still the possibility that the overall frequency of unprotected intercourse remained unchanged or has increased at the same time. Thus, without any additional background information about absolute frequencies of intercourse or unprotected intercourse (i.e., counts), relative frequency measures of condom use cannot be recommended for the evaluation of intervention success.

In sum, our review of intervention and methodological research indicates a recent trend toward inclusion of count measures, particularly in intervention trials. This trend reflects the view that studies focusing on unprotected intercourse as the main outcome and as quantitative indicator of individual HIV contraction risk require information about the actual numbers of risk behaviors enacted. The lack of conceptual and consistent empirical overlap between counts and relative condom use measures leads to the conclusion that results obtained with measures of relative condom use should not be generalized to research programs employing count measures without assuring that a strong empirical relationship between the two measures is present in the respective population.

Although count data of unprotected intercourse appear to be more sensitive indicators of HIV contraction risk, ratings of relative condom use have been preferred in correlational studies modeling safer sex in HIV risk populations. Counts measures of unprotected intercourse are particularly unlikely to be used as outcome in studies testing theoretical models of health behavior (5, 16, 34, 44, 65, 82, 84, 91, 97, 99, 104, 120, 133). For example, Bryan, Fisher, Fisher, and Murray (16) tested the Information-Motivation-Behavioral Skills (IMB) model of Fisher and Fisher (134) using a percentage rating of condom use as outcome. Bryan, Aiken, and West (15) and Thompson, Anderson, Freedman, and Swan (111) tested the prediction of safer sex, using percentage ratings of condom use as the criterion. Schroder, Hobfoll, Jackson, and Lavin (99) included ratings of relative condom use as one of the outcomes to test a model of safer sex among African- and European-American women. In their meta-analysis, Sheeran et al. (124) found strong correlations between psychosocial predictors and condom use, with almost all of the studies using percent ratings, categorical, or dichotomous measures of relative condom use.

One explanation for the apparent preference of relative condom use measures in model testing research is that these measures may reflect a “latent disposition” to use condoms rather than being a precise indicator of behavior frequencies. If interpreted (broadly) as latent tendencies towards safer sex, relative frequency measures can be expected to relate more strongly (than would count measures) to social-cognitive predictors. A person who reports always using a condom is more likely to believe in their effectiveness and utility, and less likely to anticipate negative effects of condom use. In testing such hypotheses it is most reasonable to conceptualize safer sex as a “habit,” which may be indicated most precisely by the conditional likelihood of condom use. In contrast, the absolute frequency of condom use may provide less information about the motivational basis of a person’s sexual behavior. This is because the absolute frequency of intercourse as measured by counts is a function of many other individual and dyadic factors (e.g., opportunity, drive, social skills, time, general attitudes towards sex).

Although relative condom use may qualify as the better indicator of a “latent likelihood of safer sex behavior” and thus be preferable in model testing research, the question remains whether results obtained with these measures can be generalized to research programs targeting count measures of sexual risk behavior. Our former discussion of generalization issues does not support this notion. For that reason, we hope that future models testing studies take the challenge and test their predictive power simultaneously using count measures as outcomes. Model testing results obtained with counts of unprotected intercourse may offer a more appropriate empirical reference frame for theory-based HIV prevention programs that employ these measures as primary criterion for intervention success.

Analysis of sexual behavior data, especially count data, can be challenging even for experienced investigators. Next, we identify a variety of options for the analysis of count data and discuss the advantages and disadvantages of each approach. We focus on methods available for the analyses of counts. Count data of sexual behaviors are often characterized by extreme skewness, variance, and kurtosis, thus deviating strongly from normality. The analysis of count data requires a number of difficult decisions (e.g., how to define outliers in a given distribution or identify the most appropriate and powerful analytical strategy) for which there is no single “correct” choice. Solutions for the analysis of non-normal count data strive to balance the weights of high- and low-frequency cases, and to reduce the biases introduced by extreme skewness and variance.

Our goals for in this section are (a) to raise awareness about the variety of data analytical options, and (b) to identify alternatives to traditional statistical methods. Because the need for count data is particularly apparent in intervention research, the following review focuses on methods suitable for the analysis of randomized controlled trials (RCTs). Further, because results with relative frequency measures of condom use cannot be generalized to counts of unprotected intercourse, we call for model testing research using count measures of sexual risk behavior. For that reason, the following overview includes correlational analytical methods as well. The list of data analytical options we discuss cannot provide the kind of in-depth discussion that can be provided in books devoted entirely to the analytic approach, so we refer interested readers to such sources throughout (135–137).

As a general organizing framework for the following review, we refer to the Generalized Linear Model (GLM). The GLM provides a unifying approach to regression and experimental designs for both linear and non-linear regression models as well as normally and non-normally distributed data. The GLM requires that the distribution of the outcome belong to the exponential family of distributions, such as the Gaussian, binomial, Poisson, inverse normal, negative binomial, exponential, and gamma distributions (138). In our discussion, we overview (a) linear models that require normal distributions, (b) generalized linear regression models for the analysis of non-normal count data, (c) non-parametric data analytical options, and (d) distribution-free solutions developed specifically for non-normal interval- or ratio-scaled outcomes.

Linear models usually assume interval level data, normal distributions, and homoscedasticity, and lead to exact significance levels only under these conditions. Count data of sexual risk behavior may occasionally approximate these assumptions and assume a normal distribution if assessed in a homogeneous group of highly sexually active individuals (e.g., sex workers) in which zero-counts and extreme outliers may occur less often. However, behavioral count data often violate the assumptions of linear parametric tests. Data transformations may provide an approximation to a normal distribution but may not suffice due to some extreme outliers. Thus, if count data deviate from a normal distribution, usually two procedures precede linear parametric analyses: (a) data transformations, and (b) the treatment of outliers. Therefore we begin by commenting on these two preliminary steps that may be needed prior to using linear models for normally distributed data.

Data transformation aims at an approximation of non-normal data to a Gaussian distribution. However, with counts of sexual risk behavior characterized by a high number of zero-counts, data transformation may not suffice. Often it is assumed that violations of normality after data transformation may be tolerated if the sample is large; however, this assurance does not apply to the analysis of rare outcomes with a majority of zero-counts. One solution is the two-step approach used by Carey et al. (19). In this study, participants who reported no risk (i.e., never having unprotected sex) were first compared with participants who reported any unprotected intercourse using logistic regression. In a second step, the group reporting any unprotected intercourse was analyzed further, excluding non-risk cases.

The most common transformation applied in HIV research is the log_{10} (*x* + 1) transformation (e.g., see studies 18, 61, 69 in Table 1). However, log_{10} (*x* + 1) transformations do not repair all skewed distributions. The NIMH Multisite Prevention Trial Group (139) used square root transformations, and O'Leary et al. (86) applied a cubic root transformation in order to approximate their data to a normal distribution. In general, there are no universal transformation for normalizing extremely skewed behavioral count data. The degree of approximation to a Gaussian distribution that can be accomplished by diverse transformations may be tested using the Kolmogorov-Smirnov test or similar statistics. Alternatively, the most effective approximation may be determined by the Box-Cox power transformation (implemented, for example, in SAS, Stata, and Statistica), which iterates the response variable in a regression model through a series of power functions until normality is maximized. Thus, the Box-Cox method identifies the optimal transformation parameter for the dependent variable that improves most successfully the fit of the specified linear regression model.^{iv}^{,}^{v}

It is sometimes recommended to either exclude extreme outliers or to reduce their impact by assigning a limit value. Tabachnik and Fidell (140), for example, recommend using a *z*-score of >=+3.29 (or 3.40, one-sided) for the definition of univariate outliers, which equals a likelihood of *p* < .001. For extremely skewed distributions, we recommend defining and treating outliers on the basis of previously normalized scores. In linear parametric analyses, only the distribution of the normalized scores is relevant and should not yield any extreme outliers.^{vi}

Multivariate outliers (i.e., cases with an unusual combination of scores) can be identified using regression procedures that offer outlier diagnostics such as the Mahalanobis distance (i.e., the deviation of a case from the centroid of the remaining cases established by the means of all variables involved), Cook’s distance, and the leverage of a case; the latter provide information about the influence of this case on the regression coefficient. Cut-off scores recommended for the definition of multivariate outliers based on regression diagnostics can be found in Tabachnik and Fidell (140). Again, outlier diagnostics should be performed using the normalized scores of non-normally distributed count data, because these statistics are based on linear regression procedures and apply to Gaussian distributions. The methods for defining and treating outliers in behavioral count measures have to be chosen with care. Dependent on the distribution of the target behavior in the population, the existence of unusual and extreme cases has to be expected as a naturally occurring event. Consequently, the elimination of outliers may satisfy statistical needs at the expense of losing valid cases with the greatest risk of HIV contraction. For that reason, we cannot recommend the removal of outliers as a general strategy. However, there are exceptions to consider. If, for example, test-retest correlations of behavioral measures produce an extreme outlier in the bivariate distribution, indicating that an individual is either not able or unwilling to provide reliable reports, the removal of this bivariate outlier, using Mahalanobis distance, Cook’s distance, and leverage statistics as criteria, might be preferable to accepting an overly influential and unreliable case. Decisions regarding the elimination of outliers should be made based on validity considerations and should not be misused as a strategy to improve the psychometric properties of a measure or the outcomes of an intervention. The removal of an outlier needs to be reported together with evidence for the unreliability of the participant (20).

Alternatively, the impact of outliers can be reduced (winsorized) by assigning a limit value (e.g., a *z*-score with *p* <=. 001) in the normalized distribution. Winsorizing of extreme outliers reduces their disproportional weight and is more likely to preserve the results for the majority of the sample. An additional justification for a treatment of single outliers may be that outlier scores are likely to yield the highest measurement error. Thus, it would be preferable to weigh an outlier less than a lower but more reliable score. The treatment of outliers is usually used in combination with other strategies for the analysis of skewed distributions (data transformations, non-linear analyses).

Two potential problem may occur by assigning a limit score to the upper end of the distribution First, relationships with connected variables (e.g., proportions of (un)protected intercourse, or behavior change in a longitudinal design) may not be preserved for the respective outliers. Outlier reduction requires decisions regarding the adjustment of related behavioral scores (e.g., how to adjust a score for protected intercourse if unprotected intercourse is set to a limit value, or whether and how to adjust post-intervention scores if a person’s pre-intervention scores were truncated). If outliers in the frequency distribution of unprotected intercourse need to be reduced, we recommend computing proportions of (un)protected intercourse prior to any treatment of outliers in order to preserve the information about relative condom use for those cases.

Second, statistical outlier diagnosis can indicate a high number of extreme values, specifically in distributions of behaviors that are uncommon in the target population. Reducing all statistically identified outliers to a defined limit value may lead to a bimodal distribution that has no advantages compared to the original distribution of the data.

In sum, the reduction of single outliers can be helpful if these cases exert a strong bias on study findings (i.e., if the results differ considerably depending upon whether the outlier is included or not). The treatment of outliers should not be overused. Instead of reducing a high number of statistically defined outliers, it may be preferable to define outliers more liberally than usually recommended, perform distribution-free statistical tests, use robust estimation, or analyze the data on ordinal or categorical level only (discussed later). Decisions regarding the treatment of outliers remain difficult and need to be reflected for each single case. If there is reason to believe that an outlier involves a disproportional large measurement error (i.e., extreme over-reporting), treatment of this outlier (removal, reduction) is indicated. If an extreme outlier may be valid but merely indicate an extremely unrepresentative case (e.g., a single sex worker in the sample), decisions regarding this outlier should consider the question whether and to what extent its inclusion or exclusion may bias inferences regarding the target population. A single overly influential case that outweighs the majority of other scores in the sample may be removed in order to come to valid conclusions in the remainder of the sample. Such a decision implies that the case is a member of a specific, unrepresentative sub-population, which needs to be addressed in a different study. The removal of this outlier requires explicit statements regarding the limited generalizability of the results. In test-retest reliability studies, an extreme bivariate outlier seems to indicate unreliability of an instrument that may otherwise provide highly reliable and valid results. Because decisions regarding this case have immediate effect on instrument evaluation, we would recommend reporting both the results with and without the unrepresentative case (20).

Once an approximation to a normal distribution has been achieved, almost all linear parametric analyses can be applied. Instead of listing the many well-known data analytical options for normally distributed variables, we will focus in the remainder of this section on specific problems connected with the analysis of longitudinal data from RCTs. Our purposes are to discuss the available options and to identify specific problems regarding the use of repeated measures analysis of covariance (ANCOVA), which is still widely used to analyze intervention effects.

Randomized controlled trials with a pretest posttest control-group design are traditionally analyzed with Analysis of Variance (ANOVA) if the outcome variable approximates a Gaussian distribution. Even with non-normal count data, data transformations and subsequent ANOVA may be preferred over alternative methods (a) because of their simplicity, transparency, and ease of interpretation and (b) to perform multivariate analyses for which no equivalents exist among non-linear or non-parametric methods. Multivariate Analysis of Variance (MANOVA) offers a solution for multiple criteria of intervention success such as (a) an estimate of overall treatment effects without inflating the Type I error probability; (b) the possibility to find overall treatment effects even if each single dependent variable fails to indicate intervention effects; (c) accounting for correlations among the outcome variables, which are ignored in univariate analyses, and (d) reducing the number of outcome criteria by grouping them according to hypotheses or outcome types into a smaller number of multivariate outcome sets (141).

Controversy exists regarding the validity of results obtained with repeated measures ANOVA, difference score analysis, and Analysis of Covariance (ANCOVA) using the pretest scores of the outcome as covariate (142–148). In the following discussion, we challenge the notion that difference scores are biased indicators of change; instead, we claim that the analysis of residuals in ANCOVA models, using pre-intervention scores of the outcome as covariate, leads to biased results in RCTs and do not provide a sensitive approach to the question of behavior change. We will discuss these three topics in turn, beginning with change score analysis and repeated measures ANOVA, and ending with a critique of the ANCOVA approach.

In repeated measures ANOVA, the outcome of interest is the treatment-by-time interaction. A difference between the experimental and control group is expected after, but not before, the treatment. Alternatively, difference (or change) scores can be computed between pre- and post-intervention scores, using one-way ANOVA (or *t*-test) in order to test the main effect of group^{vii}.

It can be shown that, in a simple pretest-posttest design, the group-by-time interaction in repeated measures ANOVA leads to the same results as the main effect of group in analyzing change scores by one-way ANOVA or *t*-test (144, 145). However, current HIV intervention research often employs multiple post-intervention assessments to evaluate treatment effects over time. For analyses with 2 or more post-intervention assessments, the results of repeated measures ANOVA and difference score analyses will not be the same. In a multiple post-test design, the computation and analysis of difference scores, using repeated measures ANOVA, may have important advantages over pretest-posttest repeated measures ANOVA. First, in pretest-posttest ANOVA, group-by-time interactions are likely to become increasingly inflated with an increasing number of post-intervention assessments. Second, group-by-time interactions may not be clearly interpretable as they confound treatment-group interactions with interactions between group and post-intervention development. In contrast, computing difference scores between each single post-intervention assessment and pre-intervention scores allows clear interpretation of effects: Group differences in the change scores are interpretable as treatment effects, time effects indicate post-intervention change over time, and group-by-time interactions can be clearly interpreted as differential post-intervention change in the two groups.

In the following discussion, we focus on a comparison between difference score analysis/repeated measures ANOVA and analysis of covariance (ANCOVA) controlling for pre-intervention scores of the outcome. Difference score analysis and repeated measures ANOVA have often been displaced by ANCOVA due to a widely accepted critique of a bias inherent in difference score measures (142–145). We challenge this critique and question instead the validity of the ANCOVA approach.

Although the difference score is an unbiased estimate of true change and follows a simple compelling logic, it has been applied infrequently due to criticism regarding (a) the unreliability of difference scores, (b) the correlation between difference scores and pre-measures, and (c) the claim that difference scores do not account for both an imperfect relationship between pre- and post-measures and the likelihood of a regression to the mean (142, 143). We address each of these criticisms briefly.

First, difference scores are said to be unreliable because they sum measurement errors yielded in both pre- and post measures. Instead, the application of ANCOVA has been recommended, using pre-intervention scores of the dependent variable as a covariate. The ANCOVA approach attempts to control for “error” in terms of pre-existing individual differences. The part of the variance that can be explained by pre-intervention scores is removed, and the remaining residuals are analyzed for treatment-induced behavior change. The residual scores result from linear regression and represent a combination of “true change” in the relative position in the distribution of the scores and an unexplained “rest” (i.e., measurement error, which should occur at random).

Although this criticism is widely cited, there is little support for the claim that residuals contain less error then change scores: Predictions of the true scores are based on the correlation between pre- and post-measures, which is strongly affected by the error in both measures. That change scores are not necessarily less reliable than residuals has been demonstrated by Llabre, Spitzer, Saab, Ironson, and Schneiderman (146); these authors showed that difference scores may be of similar and even higher reliability than residuals. Llabre et al. (146) as well as Rogosa, Brandt, and Zimowski (147) argue that the (statistical) reliability of difference scores is not a trustworthy indicator of the upper limit of their validity (see also Malgady (149)). Imagine a simple pre-post intervention design that is 100% successful in reducing risk behavior, eliminating the variance in the post-intervention scores to zero. In this case, the retest reliability of the scores (and thus the statistical reliability of the difference scores) also will be zero although the pre- and post-intervention scores and their difference may be perfectly accurate.

A second criticism of the difference score approach involves the correlation between pre-measures and difference scores. Cohen and Cohen (142) claim that a good indicator of change would remove pre-existing inter-individual variance in the true scores and, thus, should be uncorrelated with the pre-measures (assuming that measurement errors occur at random). However, this claim seems unreasonable and may not be applicable to intervention research, in which the measurement of change is most crucial. On the contrary: Because we are interested in the reduction of a behavioral outcome variable that has an absolute limit of zero (no risk behavior), we can assume a correlation between pre-measures and difference scores; it seems illogical to expect that the possible range of risk behavior reduction could be the same for high-risk and low-risk individuals. Thus, a good indicator of change should be allowed to correlate with the pre-measure.

The third criticism is that difference scores do not adjust for an imperfect correlation in defining the unit of change. Cohen and Cohen claim that “the trouble with using the simple change score is that is presumes that the regression of *x* on *y* has a slope of 1.00 instead of the actual *B _{xy}*”, which may be a regression coefficient of .60 instead of 1.00 (142). This means that one unit in

In sum, the rejection of absolute change scores results from an invalid application of regression theory to the difference score approach. It seems inappropriate to reject difference scores on account of regression theory because (in contrast to residual change score analysis) they are not based on the regression paradigm and thus do not need to satisfy its assumptions.

Based on our arguments in defense of difference scores, one might question whether the analysis of residuals can be regarded as an appropriate method of analyzing behavior change. Maris (148) criticized the use of ANCOVA in RCT studies. He argues that covariance adjustment is necessary if group assignment is based on pre-intervention scores; however, in RCTs, covariance adjustment leads to biased estimation of intervention effects, whereas difference scores are an unbiased estimator of change.

We agree with Maris’ critique for the following reasons. First, residuals of baseline scores do not exactly reflect “true change” within the unexplained rest left after removing the “true score variance.” This is because the pre-post correlation of the scores cannot be interpreted as stability coefficient, due to the change induced in the treatment group. Further, change in the treatment group may not follow the assumptions of a linear increase or decrease as assumed by the ANCOVA approach. Thus, the correlation between pre-and post-test is likely to produce incorrect estimates of true individual change.

Second, ANCOVA assumes the regression coefficients to be the same in the groups to be compared (141). However, this assumption contradicts the treatment hypothesis in RCTs: Stability (and thus, a strong correlation between pre and post scores) is expected in the control group only; the treatment group is expected to show greater behavior change, meaning that the correlation with the pre-scores will be reduced. Thus, for both theoretical and empirical reasons, ANCOVA and similar methods accounting for the correlation between pre- and post-measures cannot be recommended for the analysis of RCT outcomes.

In sum, the analysis of difference scores/ repeated measures ANOVA should be preferred to the traditional covariance approach. From our point of view it is the more appropriate strategy to evaluate intervention success.

The generalization of linear models to non-normal cross-sectional and panel data offers a variety of options for the analysis of normal and non-normal data. In the following section, we introduce log-linear techniques before turning to Generalized Estimation Equation (GEE) and Mixed Models, or Hierarchical Linear Models (HLM), as further options for longitudinal research. Although increasingly used by biomedical, behavioral, and public health researchers, the two latter strategies are still rarely used in HIV intervention research compared to more traditional ANCOVA and MANCOVA analyses (see Table 1). For comparison with ANOVA, we will again focus on the analysis of intervention effects when discussing GEE and HLM.^{viii}

If count data are extremely skewed, as it is often the case, analyses of the linear relationships with other variables will not be appropriate. Ordinary linear regression can be substituted by Poisson or negative binomial regression. Both negative binomial and Poisson distributions meet the characteristics of discrete behavioral data with a disproportional high number of zero scores and a low number of extremely high scores (137, 152). The Poisson model is defined by the assumption that the conditional mean (i.e., the mean of y at a given score of x) is equal to the conditional variance (i.e., equidispersion). Although this assumption accounts for a correlation between observed score and measurement error, which is typical for most count distributions but violates the assumptions of most parametric tests, the Poisson model rarely fits count data distributions, which have often a conditional variance greater than the conditional mean. The negative binomial model allows for overdispersion and is therefore more appropriate than the Poisson model in most cases. Several statistical procedures allow a specification of the distribution characteristics and can be performed based on the Poisson or negative binomial model. Both the Generalized Estimation Equation (GEE) approach and Hierarchical Linear Modeling (HLM) offer regression analyses for panel data with non-normal distributions.

Further, models for zero-truncated count distributions can be applied when zero-counts do not occur because inclusion of a case in a sample is only possible after the occurrence of at least one event (e.g., the recruitment of sexually active persons only). The frequency of zero-counts can be estimated based on the variance of the zero-truncated distribution and the application of a particular model for complete distributions (Poisson, negative binomial). This might be useful in some contexts (e.g., if the probability of the target event or behavior in an unknown population is being evaluated). Zero-truncated and further zero-inflated models are described in detail by Long (137).

In longitudinal research with more than two assessments, repeated measures of discrete, normal and non-normal outcome variables can be analyzed using GEE. GEE and HLM have important advantages that include the possibility (a) to model non-normal outcome variables, (b) to account for individual differences in behavior change, and (c) to model the variance-covariance structure of the longitudinal data. We discuss each of these advantages in turn.

GEE is a particularly useful tool for longitudinal group comparisons with non-normal outcomes and multiple post-intervention assessments, as is needed in HIV intervention research. GEE assumes that a known transformation of the marginal distribution of the outcome (e.g., log_{10} (x+1)) is a linear function of the covariates or predictors. In contrast to the common fixed-effects models (e.g., ANOVA), GEE estimates “population-averaged” models^{ix}, using an extension of the quasi-likelihood approach. Quasi-likelihood makes few assumptions about the distribution of the dependent variable and, for that reason, is applicable to a wide variety of non-normally distributed outcome variables (153). The only requirement involves the specification of the mean-covariance structure. GEE uses an iterative procedure for the development of an estimator whose error has a mean of zero and is asymptotically multivariate Gaussian. However, this requires that missing observations be “missing at random” (153, 154). In GEE, the data are modeled by specifying the appropriate distribution family for the dependent variable (e.g., Poisson, negative binomial). If the data are not normally distributed, GEE is likely to yield considerably more test-power compared to repeated measures ANOVA with normalized variables. However, it is important to ensure that the specified distribution family provides a good fit for the dependent variable. A Poisson model is quite restrictive in its assumptions and may not be appropriate for most count measures. Falsely specifying a Poisson distribution may produce misleading results and indicate significant effects that may not apply to the true distribution of the outcome. In general, the significance achieved with the specification of a particular distribution family or correlation structure is not a valid indicator of the appropriateness of the model. The applicability of a distribution needs to be tested using diagnostics that are available in some standard statistical software (e.g., STATA™).

Even when working with normally distributed outcome variables, GEE might be preferred over classical ANOVA models because GEE treats individual change as a random variable. The advantages of a mixed design with a random individualized change variable and a fixed treatment group effect factor may be seen in a potentially increased test power. Further, GEE analyses are flexible in that they allow specifying the within-group correlation structure for the panels.

Despite its flexibility, increased test power, and easy implementation, GEE analyses are rarely applied in HIV intervention research. An exception is the study reported by Otto-Salaj, Kelly, Stevenson, Hoffmann, and Kalichman (89). As a consequence, many published results may underestimate intervention effects. Because of the advantages discussed, we recommend GEE or equivalent methods for the analysis of non-normal count data in longitudinal research.

Mixed models, also called hierarchical linear models (HLM), specify both fixed and random effects. Fixed effects apply to a factor whose levels are fully represented in a study. Random effects refer to a factor whose levels are considered a random sample of potential levels only, instead of a full representation of its levels. In intervention studies, the treatment and the time of assessment represent fixed effect factors, and individuals represent the random effects factor. This model can be described as a randomized block design with fixed treatment effects and random individual (block) effects. Individuals are treated as experimental units that are grouped into blocks (in this case the repeated observations within individuals), to which the treatments are randomly assigned. The advantage of mixed models is the estimation of individual effects over time compared to the averaging procedures offered in ANOVA models.

Mixed models share three features with GEE: (a) the possibility to model the covariance structure of the repeated measures, (b) the possibility to model time as a regression variable and to specify different regression effects for time (e.g., quadratic), and (c) repeated measures mixed models can be applied to non-normal data by specifying the correct distribution family. In addition mixed models can be used for longitudinal data sets with missing data, thus offering a convenient alternative to listwise or pairwise deletion or missing value substitution. This requires that data be missing at random.

The crucial difference between GEE and mixed models is that the random factor in mixed models gives estimates on individual level (that is, for each person), whereas the population-averaged method used by GEE provides estimates for the “average person.” In practice, however, both estimators tend to deliver similar results. For a more detailed discussion of differences between population-averaged models and mixed models, we refer to Neuhaus (155) and Neuhaus, Kalbfleisch, and Hauck (156).

Higher-order hierarchical linear models (HLM) are a special case of mixed models (151, 157) needed when the data yield a hierarchical structure. In randomized block designs, for example, longitudinal data are doubly-nested, with repeated measurements nested within individuals, who are nested within groups that are defined by blocks (classrooms, schools), or clusters. On the first level, HLM includes repeated measures as nested within individuals. On higher levels, HLM evaluates random group or cluster effects (e.g., of intervention sites), which in turn can be predicted by higher-level independent variables. HLM offers a solution to the dilemma of either ignoring the hierarchical structure and performing the analyses on individual level (with *n* = number of subjects) or aggregating individual-level data to the higher level and performing the analyses with these blocks of clusters only (with *n* = number of blocks). Because individuals in the same block can be assumed to be more similar than individuals belonging to other blocks, independence of observations cannot be assumed, which discounts analyses on individual level. However, performing the analyses on higher level only, means to lose (a) information available on individual level, and (b) test power by the reduction of *n* to the number of blocks. HLM allows both testing the effects of a treatment on individual level without ignoring the effects of higher-order organizing structures. Littell et al. (158), Raudenbush (151, 157), and Raudenbush and Bryk (157) provide more detailed information about mixed models and HLM‥

In sum, the application of GEE and mixed models is most appropriate and efficient for the analysis of count data when there are two or more post-intervention assessments available, when differences in individual change suggest to take individual trajectories into account, and when the data assume a non-normal distribution that can be modeled in the framework of the GLM (e.g., Poisson, negative binomial regression). Higher-order, multilevel models are indicated when intra-group similarities, or the effects of environmental clusters (e.g., multiple sites), have to be taken into account.

Non-parametric analyses are taken into consideration when the data quantify a variable on ordinal level only, or when the distribution of a variable violates the assumptions of linear parametric significance tests. For the latter case, a variety of solutions have been developed that overcome some of the limitations of non-parametric analyses. For that reason, these alternative methods may be preferred over non-parametric tests for the analysis of count data. However, if there is reason to believe that the data yield high measurement error and that the counts reported do not approximate interval or ratio-level, an investigator may choose to apply non-parametric analyses.

In general, we do not recommend the use of non-parametric means because of the loss of valuable quantitative information yielded in count data. However, it is a possible solution in dealing with extremely skewed data that are suspected to offer no more than rank order information. In this section, we briefly discuss two options: (a) analyzing untransformed count data on an ordinal level, and (b) analyzing counts that are reduced into ordered categories.

Although non-parametric tests usually have a lower test-power, this is not necessarily true when working with data that deviate extremely from a normal distribution. Spearman correlations, for example, can be higher and more significant than Pearson correlations when working with variables that differ strongly in their distributions. Similarly, a Wilcoxon or a Mann Whitney U-test may yield higher test power than a t-test when comparing sub-samples or experimental groups if the outcome variables are extremely skewed (159). A variety of non-parametric tests have been developed allowing multiple regression and analyses of variance with non-normal data (160, 161). Path analyses and complex theoretical model tests can be performed by LISREL (162) or EQS (163) using polychoric, polyserial, or Spearman rank correlation matrices instead of Pearson correlation or covariance matrices.

One strategy to deal with non-normal count data and count derivatives is to reduce the distribution into ordered categories. Similar to the distinction between counts and relative measures of condom use, categorizations of counts need to be distinguished from categorizations of proportions. The categorization of counts results in Likert-type answer options that define count intervals, with the highest category covering all scores that exceed its limit (e.g., > = 25 times). Thresholds of count distributions can be determined, for example, by the cumulative proportion of cases to be expected at certain percentiles of the normal distribution. This strategy can usually only be applied to the upper part of the distribution because of the accumulation of scores at the lower end and because of the distinct nature of the scores. The resulting categories may, for example, define 0 ( = never), 1 ( = 1–2 times), 2 ( = 3–5 times), 3 ( = 6–10 times), 4 ( = 11–25 times), and 5 (> 25 times).

Count data reduced to categories deliver no more information than a “count” measure that uses a categorical response scale to assess risk behavior. Thus, in choosing to reduce data into ordered categories, the question arises: Why were count measures of unprotected intercourse collected in the first place? We recommend considering whether this strategy threatens the validity of test results in a particular study. For example, categorization (as well as truncation of counts) may affect the distributions of several sub-samples differently, making them appear more similar or divergent than indicated by the counts, which may render the results of subsequent statistical group comparisons questionable. Further, the reduction of counts into categorical data involves loosing quantitative information about the cases that are likely to bear the highest risk.

In the following section, we briefly discuss logit models and structural equation modeling with ordinal or categorical outcome variables without further commenting on the utility of reducing counts into categories.

Logit models are the most commonly used and recommended ordinal regression models but have been applied rarely in HIV risk behavior research. Logit models allow interpretation of parameters in terms of average odds for the likelihood to score in one of two categories of the ordered distribution^{x}. In general, the ordered logit model expresses relationships between predictors and ordered categories of the dependant variable as “average discrete change” in the predicted probability for each comparison between two categories of the outcome for a unit change in the predictor, holding all other variables constant (137). A variety of logit models are available that enhance flexibility in analyzing ordinal outcomes (164). For example, log odds can be defined for cumulative probabilities, informing about the odds of a response at or below a particular category simultaneously for all possible transitions of the outcome (“cumulative odds”). Alternatively, the probabilities of a response in a particular category in comparison to the next higher, adjacent category can be defined simultaneously for all transitions of the outcome. However, the assumption of parallelism (or proportionality of odds), which requires that the effects of the predictor variables are invariant across the entire scale of the ordered outcome, has to be tested before using ordered logit models. (In contrast to ordered logit regression, the ordered probit model is most often inappropriate for the analysis of categorized counts (as well as for assessments by Likert scales) because it requires normally distributed errors). For more detailed information about logit models see Agresti (165, 166) and O’Connell (164).

SEM with non-normal data involves a number of problems including increased Chi-Square values, underestimation of fit indices, and, most importantly, a severe underestimation of standard errors of the parameter estimates (167). Coarse categorization of counts and proportions as well as Likert-type assessments of relative condom use may offer a solution in approximating a normal distribution. However, alternative methods and estimation techniques tailored to ordinal and non-normal data should be preferred, in particular if categorizations into ordinal outcome variables do not lead to the desired normal distribution and thus do not eliminate the problem of underestimated errors.

Muthen (168, 169) has developed a method for latent variable mixture modeling that can handle any combination of categorical, ordinal, and continuous variables. In contrast to linear models, the estimator developed by Muthen and implemented in MPLUS delivers unbiased, consistent, and efficient parameter estimates for categorical and ordinal outcomes. This method also allows modeling of growth curves that involve categorical and ordinal variables (growth mixture modeling), which can be understood as a multiple-group analyses, similar to simultaneous group comparisons, except that group membership is unobserved and has to be elicited from the data (169). A particularly interesting application of growth mixture modeling is “complier average causal effect estimation” in which compliance is treated as a latent class (169, 170). Class membership (compliance) is observed in the treatment group only, whereas potential compliers in the control group have to be identified by an estimation procedure in order to allow comparisons with actual compliers.

Several “distribution-free” analytical methods have been developed specifically for interval and ratio-level data that ask for parametric analyses but do not fit the distribution assumptions required for their application. Similar to non-parametric analyses, these methods are distribution-free in the sense that they do not make any assumptions regarding the distribution of the outcomes. However, unlike truly non-parametric analyses, distribution-free estimators can be used to deliver parametric results. We briefly discuss two methods that may be applied to non-normal count data: (a) asymptotically distribution free (ADF) estimation of effects, and (b) bootstrapping.

An option for the analysis of extremely skewed data is ADF estimation combined with parametric or non-parametric tests. For example, structural equation modeling can be performed using asymptotic covariance matrices of non-normal bivariate distributions instead of analyzing polychoric and polyserial (non-parametric) correlations (162, 171). Model fit estimation with maximum likelihood (ML), generalized least square (GLM), or unweighted least square (ULS) methods require multivariate normality, an assumption that cannot be met if a variable is extremely skewed. Thus, ADF estimation in combination with the weighted least square (WLS) method may be a solution for skewed count data if the sample size is sufficiently large. The derivation of asymptotic covariance matrices from raw data requires samples of at least 200 to 500 participants even for simple models, and may require several thousand for complex models, depending on the number of variables involved (172). However, EQS offers an ADF statistic, called the Yuan-Bentler Corrected AGLS Chi-Square, which can be applied to smaller samples (167, 173).

Further, in testing group differences or correlations by nonparametric techniques, exact tests options are offered for diverse statistical procedures by many statistical packages. These tests calculate exact significance levels on the basis of asymptotic results even if the sample size is too small or otherwise violates the requirements of traditional significance tests or standard asymptotic analyses (174). The ADF method will lead to incorrect estimation of significance if samples or sub-samples are small, and if variables with a high proportion of zero-counts are analyzed. The exact test option offers efficient p-values for this kind of data (however, see the discussion of permutation tests).

Bootstrapping allows the application of parametric statistics that otherwise would not be appropriate with non-normal data. The most important feature of the bootstrap might be the estimation of standard errors without the need to meet specific assumptions regarding the distribution of the scores. The bootstrap substitutes for statistical inference based on sampling theory. Instead of inferring from sample data to population parameters and standard errors, sample data are used to “simulate” the sampling distribution. The desired statistic is computed many times by repeatedly drawing random samples of size *N* from the available data, with replacement. The resulting distribution of the sample statistics is used to provide information about the limit scores that determine the confidence interval of any desired size (usually 95%). A high number of repetitions ensures that the results are asymptotically correct.

In principle, the bootstrap can be used to estimate confidence intervals for any parametric test statistic that, otherwise, would require uni- or multivariate normal distributions and homoscedasticity (175, 176), and would be likely to deliver biased results with non-normal data (176–179). As a simple example, testing a mean or mean difference by *t*-tests is inappropriate with skewed variables. Bootstrapping solves this problem by simulating the sampling distributions of the means and providing model-free, distribution-based confidence intervals for the mean of each group, taking the given distribution characteristics (e.g., skewness) into account. Non-overlapping confidence intervals of the two means for *p* > = 95% (or *p* > = 90% in one-sided testing) indicate a significant difference. Similarly, the bootstrap can be used to test between-group or pre-post difference scores, correlation and regression coefficients, and ANOVA models without the need to rely on a pre-specified distribution model. The substitution of the bootstrap for parametric significance testing with non-normal data is justified because: (a) confidence intervals can be derived without any inference and reference to normal theory; (b) confidence intervals derived with bootstrap re-sampling will be adjusted to the skewness of the data; and (c) according to sampling theory, the sampling distribution (whether inferred or derived by an appropriate number of re-samplings) should approximate a normal distribution even for extremely skewed data with *N* > = 100 (thus allowing the application of parametric significance tests).

In SEM applications, AMOS can be used to estimate structural paths with bootstrap standard errors and confidence intervals. The Bollen-Stine bootstrap estimator implemented in AMOS optimizes the conditions for unbiased estimates of global model fit under multivariate non-normality also with smaller sample sizes (180).

The bootstrap has the disadvantage that confidence intervals for estimated parameters of extremely skewed data are usually large. Similar to parametric significance tests, bootstrap results may turn out to be unduly conservative (i.e., have less test power) compared to conventional non-parametric tests, if applied to skewed count data, which are likely to mask systematic effects of a predictor. Also, the bootstrap cannot ensure generalization of the results. As with other significance tests, bootstrap estimates are subject to sampling error and cannot resolve the problem of bias that may be due to the impact of some extreme, unrepresentative outliers or systematic drop-out. However, the use of bootstrap for count distributions that are transformed to approximate a normal distribution provides an alternative to the application of standard linear parametric tests if the transformed scores, as often with count data, still deviate strongly from normality.

Permutation tests, also called randomization, re-randomization, and exact tests, provide exact significance levels and are an almost distribution-free alternative to parametric tests. Originally developed in the 1930s, preceding computer technology, permutation tests are computer-intensive statistical techniques, applied primarily to the analysis of very small samples (175, 181). Permutation tests have much in common with the bootstrap technique. Similar to the bootstrap, the desired test statistic is computed many times with varying drawings from the sample. However, unlike the bootstrap, permutation tests permute systematically through all possible combinations of the sample observations, thereby leading to an exhaustive distribution of all possible test results and their respective probabilities. Based on this exhaustive likelihood distribution of all possible test scores, exact significance levels can be derived. Thus, the bootstrap can be regarded as an approximation to a permutation test: Whereas the bootstrap leads to asymptotically exact results, the permutation test provides exact test statistics (181).

Permutation tests are extremely versatile and almost universally applicable because they can accommodate any kind of data, providing exact test results on categorical, rank, interval, or ratio-scale level. In fact, common non-parametric statistics are usually realizations of permutation tests applied to categorical or ordinal data (181). For the analysis of interval- or ratio-level data, permutation tests are as powerful as their parametric counterparts. Their major disadvantage is the computation capacity required to perform exhaustive permutation with other than small samples. However, with the rapidly increasing capabilities of desk-top computers, the disadvantage of high computation demands connected with exhaustive permutation will soon be overcome. Thus, permutation tests are likely to gain popularity as a substitute for parametric analyses and bootstrap estimation in the future, also with large samples.

Permutation tests are not distribution-free, but very few assumptions have to be met for their application. For single-sample statistics, the distribution needs to be symmetric (181). Thus, in sexual risk behavior research, permutation tests may be especially helpful in analyzing pre-post difference scores or proportions of condom use derived from counts, which may assume a symmetric distribution but fail to approximate a normal distribution. For multiple samples, permutation tests lead to exact test results only under the assumption that observations be exchangeable, transformably exchangeable, or asymptotically exchangeable, under the null hypothesis. Observation are exchangeable if they are independent, identically distributed, or if they are jointly normal with identical covariances (181, 182). Exchangeability is given if the joint distribution of the observations is invariant under permutations of the units.

Although these assumptions are markedly relaxed compared to the requirements of parametric tests, they may render the application of permutation tests to strongly skewed count data problematic. Given the existence of single extreme outliers, permuted sample drawings may not lead to invariant distributions. Hayes (187), for example, has shown that the permutation test, although providing “statistically exact” results for the sample at hand, does not necessarily outperform parametric tests in drawing inferences on the population if distributions deviate strongly from their assumptions. For example, under conditions of extreme marginal non-normality, asymmetric marginal distributions, heteroscedasticity, or non-independent observations, the permutation test showed a similar likelihood to inflate the usual alpha-limits and lead to false rejection of the null hypothesis as the parametric counterpart.

Thus, although providing exact likelihood of sample results on data level, permutation tests may not lead to better decisions regarding the validity of study hypotheses in the population. For example, in RCTs with an uneven distribution of extreme outliers over treatment and control groups (heteroscedasticity), the permutation test may lead to similar biased results as parametric solutions, although homoscedasticity is not explicitly a requirement of this test (187). Further, as Hayes (187) has pointed out, exhaustive permutation of all observations may not apply to real-world scenarios, in which certain combinations of scores (such as a combination of anal and vaginal intercourse in a homosexual man) may never occur and thus do not need to be taken into consideration in decisions regarding the likelihood of the study results. In this case, exchangeability is violated, and the permutation test is likely to lead to biased results.

In sum, sexual behavior frequency data are likely to violate even the liberal assumptions of permutation tests. The statistically exact solutions provided by the permutation test may still remain of greatest advantage if samples are very small. Otherwise, for extremely non-normally distributed count data, parametric tests accommodating Poisson or negative binomial models may still be most appropriate.

In this article, we discussed count measures of sexual risk behavior and compared them to relative frequency measures of condom use. Further, we discussed a variety of data analytical options for the analysis of counts. Our discussion and review of the literature leads to the following conclusions:

First, count data and measures of relative condom use yield different information. Count data provide a more precise indicator of HIV contraction risk (and risk reduction), as needed for the evaluation of HIV risk reduction interventions and the calculation of cost-effectiveness of HIV prevention programs. Measures of relative condom use inform about the conditional likelihood of condom use (that is, the likelihood of safer sex *if* a person engaged in sexual intercourse) and are preferred in correlational studies and model-testing research. However, they do not yield sufficient information about the absolute frequency of unprotected intercourse. This conceptual difference is reflected by empirical findings showing a wide range of correlations between counts and relative frequency measures. With regard to their general utility, count data appear to be more versatile because they can be transformed into proportions and categorical data if needed. Because count data yield both absolute and relative frequency measures, they may be useful for both the quantification of HIV risk reduction in HIV intervention research as well as for testing theoretical models.

Second, because of the conceptual difference and lack of consistent empirical overlap between count and relative condom use measures, model-testing results obtained with the former may not be generalized to count measures of sexual risk behavior. Thus, although relative frequency measures of condom use may be well-suited to test theoretical models of health behavior, future model testing research should also attempt to predict absolute frequencies of unprotected intercourse with cognitive-motivational variables. Results obtained with counts would inform about the applicability of theoretical models to the outcome that may be of greatest interest in intervention evaluation and public health.

Third, count measures have several disadvantages that require sophisticated data preparation and analytical methods. Collecting and analyzing count data can be more time-consuming and expensive than using ratings or dichotomous measures, in particular if event-level data are collected that request the recall of each single sexual occasion over a specified time period. Further, count data are more challenging to analyze due to their distributional properties (i.e., strong deviations from normality). A variety of methods have been developed that are well-suited for the analysis of non-normal count data, including data transformations in combination with ordinary parametric significance tests, log-linear regression, GEE, mixed models, ADF estimation, permutation tests and bootstrapping, non-parametric tests, and the analysis of ordered categories. In discussing these data analytical options, we hope to encourage more flexibility in choosing the specific analytic strategy that is most appropriate for the data to be analyzed rather than applying the most common and best-known approach.

In randomized controlled trials testing the effects of HIV interventions on risk behavior reduction, GEE as well as mixed models (HLM) are promising alternatives to repeated measures ANOVA or change score analysis. Both GEE and HLM take individual trajectories into account, allow specification of the appropriate distribution family, and offer flexibility in modeling the variance-covariance structure. In most cases, they may increase test power compared to traditional linear parametric techniques. These advantages of GEE and HLM apply if multiple post-intervention assessments are available. If only a single post-intervention assessment is used to evaluate treatment effects, and if the data approximate a normal distribution (or can be transformed accordingly), ANOVA models may be most suitable. However, we discourage the use of ANCOVA for the analysis of RCTs. ANCOVA cannot resolve the problems connected with pre-existing differences between treatment groups, and the analysis of residual change does not guarantee unbiased estimates of intervention effects, as still widely believed. Fourth, we discourage methods that involve a loss of information such as reducing count measures to ordinal or categorical data. Categorization of counts does not only inflate the quantitative information about HIV contraction risk yielded in count measures. Another concern emerges from the possibility that categorizations and truncations of counts may affect the distributions in diverse sub-populations differentially, which may render the results of subsequent statistical group comparisons questionable. Further, the reduction of counts into categorical data involves losing quantitative information about the cases that are likely to bear the highest risk.

We conclude by acknowledging the complexity of data preparation and analysis with count data. Because of this complexity and the many data analytical strategies available, it is imperative that authors communicate their strategy clearly. Readers of journal articles need to be able to understand authors' decisions and their rationale. This need will become more important in the future. The growing number of statistical options and the development of new procedures reduce the likelihood that analytical methods can be regarded as “standard knowledge.” The complexity of count data analysis renders clear communication of the methods employed an essential step of a high-quality research project.

All authors are with the Center for Health and Behavior, Syracuse University. Correspondence and request for reprints to Michael P. Carey, Center for Health and Behavior, 430 Huntington Hall, Syracuse University, Syracuse, NY 13244-2340; email: ude.rys@yeracpm. This work was supported by the National Institute of Mental Health to Michael P. Carey (grants # R01-MH54929 and K02-MH01582). The authors thank Kate B. Carey, Martin J. Sliwinski, and Dan Neal and the anonymous reviewers for their comments on earlier versions of this paper.

^{i}In a companion paper, we address other assessment questions, including what sources of error affect the accuracy of sexual behavior self-reports, to what degree we can rely on retrospective self-reports, and how the accuracy of the data is affected by the population, assessment instruments, the assessment interval, and computerized methods (3).

^{ii}The present paper focuses on conceptual differences between absolute and relative frequency data. The validity of sexual frequency self-reports and the influence of moderating factors (e.g., self-reporting intervals, and assessment modes) are discussed in the companion paper (3). Similarly, the current paper does not comment on the effects of item construction or question sequence on cognitive processing of item content or the use of different recall heuristics. For discussion of these matters, readers are referred to the work of Schwarz (183, 184, 186) and Rothman et al. (185).

^{iii}An option not addressed in this paper is the dichotomous assessment of condom use at last intercourse. This measure can neither be categorized as a count nor as a relative frequency measure of condom use because it refers to a single event only. Although assessment of last intercourse is likely to be very accurate, it may not be representative of a person’s sexual behavior and it provides no information regarding the cumulative risk of multiple events.

^{iv}The Box Cox power transformation may also be used to transform both predictors and outcomes in a regression model simultaneously, or to perform regressions based on the transformation of predictor variables only.

^{v}Stata also offers the “ladder” command, which indicates the alternative with the strongest approximation to normality among a number of possible transformations.

^{vi}We recommend this strategy also in preparation of non-linear parametric analyses, as this allows applying a statistically defined limit score – based on normal theory – to the retranslated, non-normal outcome.

^{vii}One of the most compelling arguments for using difference scores in the analysis of count data may be that they are more likely to approximate a normal distribution. Difference scores eliminate (or at least reduce) the problem of extreme skewness inherent in count measures of sexual risk behavior, although they do not necessarily improve the over-sized kurtosis and dispersion of the data. Transforming count data prior to the computation of difference scores cannot eliminate this problem but can improve the distribution.

^{viii}Latent curve analysis via Structural Equation Modeling (SEM) is a further option to be mentioned in the framework of the GLM (150, 151). However, this approach is unusual for the analysis of intervention effects on behavioral count data and seems inappropriate for absolute frequencies of manifest behavior in specified time intervals before and after treatment.

^{ix}An effect is random if the levels of the factor are considered to be a random sample from a larger population of potential levels.

^{x}Logistic and ordered logistic regression are non-parametric in the sense that they do not rely on Gaussian errors. However, they do rely on an underlying assumption of some distribution (Bernoulli); consequently, these procedures may, by some people's definition, be considered a parametric test.

1. di Mauro D. Sexuality research in the United States: An assessment of the social and behavioral sciences. New York: Social Science Research Council; 1995.

2. Ostrow DG, Kalichman SC. Methodological issues in HIV behavioral interventions. In: Peterson JL, DiClemente RJ, editors. Handbook of HIV prevention. New York: Kluwer Academic / Plenum; 2001. pp. 67–80.

3. Schroder KEE, Carey MP, Vanable PA. Methodological challenges in research on sexual risk behavior: II. Accuracy of self-reports. Annals of Behavioral Medicine. 2003 [PMC free article] [PubMed]

4. Jemmott JB, 3rd, Jemmott LS. HIV risk reduction behavioral interventions with heterosexual adolescents. AIDS. 2000;14 Supp. 2:S40–S52. [PubMed]

5. Abraham CS, Sheeran P, Abrams D, Spears SR. Health beliefs and teenage condom use: A prospective study. Psychology and Health. 1996;11:641–655. [PubMed]

6. Agnew CR, Loving TJ. Future time orientation and condom use attitudes, intentions, and behavior. Journal of Social Behavior and Personality. 1999;13:755–764.

7. Albarracin D, Fishbein M, Middlestadt SE. Generalizing behavioral findings across times, samples, and measures: A study of condom use. Journal of Applied Social Psychology. 1998;28:657–674.

8. Albarracin D, Ho RM, McNatt PS, et al. Structure of outcome beliefs in condom use. Health Psychology. 2000;19:458–468. [PubMed]

9. Artz L, Malacuso M, Brill I, et al. Effectiveness of an intervention promoting the female condom to patients at sexually transmitted disease clinics. American Journal of Public Health. 2000;90:237–244. [PubMed]

10. Baker SA, Morrison DM, Gillmore MR, Schock MD. Sexual behaviors, substance use, and condom use in a sexually transmitted disease clinic sample. The Journal of Sex Research. 1995;32:37–44.

11. Barrett DC, Bolan G, Joy D, et al. Coping strategies, substance use, sexual activity, and HIV sexual risks in a sample of gay male STD patients. Journal of Applied Social Psychology. 1995;25:1058–1072.

12. Belcher L, Kalichman S, Topping M, et al. A randomized trial of a brief HIV risk reduction counseling intervention for women. Journal of Consulting and Clinical Psychology. 1998;66:856–861. [PubMed]

13. Breakwell GM. Risk estimation and sexual behavior. Journal of Health Psychology. 1996;1:79–91. [PubMed]

14. Bryan AD, Aiken LS, West SG. Increasing condom use: Evaluation of a theory-based intervention to prevent sexually transmitted diseases in young women. Health Psychology. 1996;15:371–382. [PubMed]

15. Bryan AD, Aiken LS, West SG. Young women's condom use: the influence of acceptance of sexuality, control over the sexual encounter, and perceived susceptibility to common STDs. Health Psychology. 1997;16:468–479. [PubMed]

16. Bryan AD, Fisher JD, Fisher WA, Murray DM. Understanding condom use among heroin addicts in methadone maintenance using the information-motivation-behavioral skills model. Substance Use and Misuse. 2000;35:451–471. [PubMed]

17. Bryan AD, Fisher JD, Fisher WA. Tests of the mediational role of preparatory safer sexual behavior in the context of the Theory of Planned Behavior. Health Psychology. 2002;21:71–80. [PubMed]

18. Carey MP, Maisto SA, Kalichman SC, et al. Enhancing motivation to reduce the risk of HIV infection for economically disadvantaged urban women. Journal of Consulting and Clinical Psychology. 1997;65:531–541. [PMC free article] [PubMed]

19. Carey MP, Braaten LS, Maisto SA, et al. Using information, motivational enhancement, and skills training to reduce the risk of HIV infection for low-income urban women: a second randomized clinical trial. Health Psychology. 2000;19:3–11. [PubMed]

20. Carey MP, Carey KB, Maisto SA, Gordon CM, Weinhardt LS. Assessing sexual risk behavior with the Timeline Followback (TLFB) approach: Continued development and psychometric evaluation with psychiatric outpatients. International Journal of STD and AIDS. 2001;12:365–375. [PMC free article] [PubMed]

21. Carey MP, Carey KB, Maisto SA, et al. Reducing HIV risk behavior among adults with a severe and persistent mental illness: A randomized controlled trial. Syracuse, NY: Syracuse University; 2002. Unpublished Manuscript.

22. Catania JA, Binson D, Dolcini MM, et al. Risk factors for HIV and other sexually transmitted diseases and prevention practices among US heterosexual adults: changes from 1990 to 1992. American Journal of Public Health. 1995;85:1492–1499. [PubMed]

23. Catania JA, Stone V, Binson D, Dolcini MM. Changes in condom use among heterosexuals in Wave 3 of the AMEN Survey. Journal of Sex Research. 1995;32:193–200.

24. Catania JA, Binson D, Stone V. Relationship of sexual mixing across age and ethnic groups to herpes simplex virus-2 among unmarried heterosexual adults with multiple sexual partners. Health Psychology. 1996;15:362–370. [PubMed]

25. Celentano DD, Bond KC, Lyles CM, et al. Preventive intervention to reduce sexually transmitted infections: A field trial in the Royal Thai Army. Archives of Internal Medicine. 2000;160:525–540. [PubMed]

26. Connor M, Graham S, Moore B. Alcohol and intentions to use condoms: Applying the Theory of Planned Behavior. Psychology and Health. 1999;14:795–812.

27. Cooper ML, Agocha VB, Powers AM. Motivations for condom use: Do pregnancy prevention gials undermine disease prevention among heterosexual young adults? Health Psychology. 1999;18:464–474. [PubMed]

28. Cooper ML, Orcutt HK. Alcohol use, condom use and partner type among heterosexual adolescents and young adults. Journal of Studies on Alcohol. 2000;61:413–419. [PubMed]

29. Coxon APM. Parallel accounts? Discrepancies between self-report (diary) and recall (questionnaire) measures of the same sexual behaviour. AIDS Care. 1999;11:221–234. [PubMed]

30. Crosby GM, Stall RD, Paul JP, Barrett DC, Midanik LT. Condom use among gay/bisexual male substance abusers using the Timeline Follow-back method. Addictive Behaviors. 1996;21:249–257. [PubMed]

31. Des Jarlais DC, Perlis T, Friedman SR, et al. Behavioral risk reduction in a declining HIV epidemic: Injection drug users in New York City, 1990–1997. American Journal of Public Health. 2000;90:1112–1116. [PubMed]

32. de Visser RO, Smith AMA. Inconsistent users of condoms: A challenge to traditional models of health behavior. Psychology, Health, and Medicine. 2001;6:41–46.

33. de Vroome EMM, Stroebe W, Sandfort TGM, de Wit JBF. Safer sex in social context: Individualistic and relational determinants of AIDS-preventive behavior among gay men. Journal of Applied Social Psychology. 2000;30:2322–2340.

34. Diaz-Loving R, Villangran-Vaquez G. The Theory of Reasoned Action applied to condom use and request of condom use in Mexican government workers. Applied Psychology: An International Review. 1999;48:139–151.

35. DiClemente RJ, Wingood GM. A randomized controlled trial of an HIV sexual risk-reduction intervention for young African-American women. Journal of the American Medical Association. 1995;274:1271–1276. [PubMed]

36. Downey L, Ryan R, Roffman R, Kulich M. How could I forget? Inaccurate memories of sexually intimate moments. Journal of Sex Research. 1995;32:177–191.

37. Durant LE, Carey MP. Self-administered questionnaires versus face-to-face interviews in assessing sexual behavior in young women. Archives of Sexual Behavior. 2000;29:309–322. [PubMed]

38. Eldrigde GD, St. Lawrence JS, Little CE, et al. Evaluation of an HIV risk reduction intervention for women entering inpatient substance abuse treatment. AIDS Education and Prevention. 1997;9 Supplement A:62–76. [PubMed]

39. Elkins D, Maticka-Tyndale E, Thicumporn K, Miller P, Haswell-Elkins M. Toward reducing the spread of HIV in Northeastern Thai villages: Evaluation of a village-based intervention. AIDS Education and Prevention. 1997;9:49–69. [PubMed]

40. Fernandez-Esquer ME, Krepcho MA, Freeman AC, et al. Predictors of condom use among African American males at high risk for HIV. Journal of Applied Social Psychology. 1997;27:58–74.

41. Fishbein M, Trafimow D, Middlestadt SE, et al. Using an AIDS KABP survey to identify determinants of condom use among sexually active adults from St. Vincent and The Grenadines. Journal of Applied Social Psychology. 1995;25:455–474.

42. Fishbein M, Guenther-Grey C, Johnson WD, et al. Using a theory-based community intervention to reduce AIDS risk behaviors: The CDC's AIDS Community Demonstration Projects. In: Oskamp S, Thompson SC, et al., editors. Understanding and preventing HIV risk behavior. Safer sex and drug use. Thousand Oaks, CA: Sage; 1996. pp. 177–206.

43. Fisher JD, Fisher WA, Misovich SJ, Kimble DL, Malloy TE. Changing AIDS risk behavior: Effects of an intervention emphasizing AIDS risk reduction information, motivation, and behavioral skills in a college student population. Health Psychology. 1996;15:114–123. [PubMed]

44. Fisher JD, Fisher WA. The Information-Motivation-Behavioral Skills Model of AIDS risk behavior change: Empirical support and application. In: Oskamp S, Thompson SC, editors. Understanding and preventing HIV risk behavior. Thousand Oaks: Sage; 1996. pp. 100–127.

45. Fisher JD, Fisher WA, Bryan AD, Misovich SJ. Information-Motivation-Behavioral skills model-based HIV risk behavior change intervention for inner-city high school youth. Health Psychology. 2002;21:177–186. [PubMed]

46. Ford K, Norris AE. Factors related to condom use with casual partners among urban African-American and Hispanic males. AIDS Education and Prevention. 1995;7:494–503. [PubMed]

47. Gerbert B, Bronstone A, McPhee S, Pantilat S, Allerton M. Development and testing of an HIV-risk screening instrument for use in health care settings. American Journal of Preventive Medicine. 1998;15:103–113. [PubMed]

48. Gillmore MR, Morrison DM, Richey CA, et al. Effects of a skill-based intervention to encourage condom use among high risk heterosexually active adolescents. AIDS Education and Prevention. 1997;9 Suppl. A:22–43. [PubMed]

49. Hobfoll SE, Jackson AP, Lavin J, Schroder KEE. The effects and generalizability of communally-oriented HIV/AIDS prevention versus general health promotion groups for single, inner-city women in urban clinics. Journal of Consulting and Clinical Psychology. In press. [PubMed]

50. Jaccard J, McDonald R, Wan CK. The accuracy of self-reports of condom use and sexual behavior. Journal of Applied Social Psychology. In press.

51. Jemmott JB, 3rd, Jemmott LS, Fong GT. Abstinence and safer sex HIV risk-reduction interventions for African American adolescents: A randomized controlled trial. Journal of the American Medical Association. 1998;279:1529–1536. [PubMed]

52. Jemmott JB, 3rd, Jemmott LS, Fong GT, McCaffree K. Reducing HIV risk-associated behavior among African American adolescents: Testing the generality of intervention effects. American Journal of Community Psychology. 1999;27:161–187. [PubMed]

53. Kalichman SC, Sikkema KJ, Kelly JA, Bulto M. Use of a brief behavioral skills intervention to prevent HIV infection among chronic mentally ill adults. Psychiatric Services. 1995;46:275–280. [PubMed]

54. Kalichman SC, Rompa D, Coley B. Experimental component analysis of a behavioral HIV-AIDS prevention intervention for inner-city women. Journal of Consulting and Clinical Psychology. 1996;64:687–693. [PubMed]

55. Kalichman SC, Rompa D, Coley B. Lack of positive outcomes from a cognitive-behavioral HIV and AIDS prevention intervention for inner-city men: lessons from a controlled pilot study. AIDS Education and Prevention. 1997;9:299–313. [PubMed]

56. Kalichman SC, Kelly JA, Rompa D. Continued high-risk sex among HIV seropositive gay and bisexual men seeking HIV prevention services. Health Psychology. 1997;16:369–373. [PubMed]

57. Kalichman SC, Kelly JA, Stevenson LY. Priming effects of HIV risk assessment on related perceptions and behavior: An experimental field study. Health Psychology. 1997;16:369–373. [PubMed]

58. Kalichman SC, Roffman RA, Picciano JF, Bolan M. Risk for HIV infection among bisexual men seeking HIV-prevention services and risks posed to their female partners. Health Psychology. 1998;17:320–327. [PubMed]

59. Kalichman SC, Nachimson D, Cherry C, Williams E. AIDS treatment advances and behavioral prevention setbacks: Preliminary assessment of reduced perceived threat of HIV-AIDS. Health Psychology. 1998;17:546–550. [PubMed]

60. Kalichman SC. Psychological and social correlates of high-risk sexual behaviour among men and women living with HIV/AIDS. AIDS Care. 1999;11:415–427. [PubMed]

61. Kalichman SC, Cherry C, Browne-Sperling F. Effectiveness of a video-based motivational skills-building HIV risk- reduction intervention for inner-city African American men. Journal of Consulting and Clinical Psychology. 1999;67:959–966. [PubMed]

62. Kalichman SC, Williams E, Nachimson D. Brief behavioural skills building intervention for female controlled methods of STD-HIV prevention: outcomes of a randomized clinical field trial. International Journal of STD and AIDS. 1999;10:174–181. [PubMed]

63. Kalichman SC, Rompa D, Cage M, et al. Effectiveness of an intervention to reduce HIV transmission risks in HIV-positive people(1) American Journal of Preventive Medicine. 2001;21:84–92. [PubMed]

64. Kamb ML, Fishbein M, Douglas JM, et al. Efficacy of risk-reduction counseling to prevent human immunodeficiency virus and sexually transmitted diseases. A randomized controlled trial. Journal of the American Medical Association. 1998;280:1161–1167. [PubMed]

65. Kasprzyk D, Montano DE, Fishbein M. Application of an integrated behavioral model to predict condom use: A prospective study among high HIV risk groups. Journal of Applied Social Psychology. 1998;28:1557–1583.

66. Kegeles SM, Hays RB, Coates TJ. The Mpowerment Project: A community-level HIV prevention intervention for young gay men. American Journal of Public Health. 1996;86:1129–1136. [PubMed]

67. Kelley JL, Petry NM. HIV risk behaviors in male substance abusers with and without antisocial personality disorder. Journal of Substance Abuse and Treatment. 2000;19:59–66. [PubMed]

68. Kelly JA, Murphy DA, Sikkema KJ, et al. Predictors of high and low levels of HIV risk behavior among adults with chronic mental illness. Psychiatric Services. 1995;46:813–818. [PubMed]

69. Kelly JA, McAuliffe TL, Sikkema KJ, et al. Reduction in risk behavior among adults with severe mental illness who learned to advocate for HIV prevention. Psychiatric Services. 1997;48:1283–1288. [PubMed]

70. Kelly JA, Murphy DA, Sikkema KJ, et al. Randomised, controlled, community-level HIV-prevention intervention for sexual-risk behaviour among homosexual men in US cities. Community HIV Prevention Research Collaborative. Lancet. 1997;350:1500–1505. [PubMed]

71. Kelly JA, Kalichman SC. Reinforcement value of unsafe sex as a predictor of condom use and continued HIV/AIDS risk behavior among gay and bisexual men. Health Psychology. 1998;17:328–335. [PubMed]

72. Kwiatkowski CF, Stober DR, Booth RE, Zhang Y. Predictors of increased condom use following HIV intervention with heterosexually active drug-users. Drug and Alcohol Dependence. 1999;54:57–62. [PubMed]

73. Lauby JL, Smith PJ, Stark M, Person B, Adams J. A community-level HIV prevention intervention for inner-city women: Results of the Women and Infants Demonstration Projects. American Journal of Public Health. 2000;90:216–222. [PubMed]

74. Lo Conte JS, O'Leary A, Labouvie E. Psychosocial correlates of HIV-related sexual behavior in an inner-city STD clinic. Psychology and Health. 1997;12:589–601.

75. Macaluso M, Demand MJ, Artz LM, Hook EW. Partner type and condom use. AIDS. 2000;14:537–546. [PubMed]

76. Mahoney CA, Thombs DL, Ford OJ. Heath belief and self-efficacy models: Their utility in explaining college student condom use. AIDS Education and Prevention. 1995;7:32–49. [PubMed]

77. Maticka-Tyndale E, Herold ES. Condom use on spring-break vacation: The influence of intentions, prior use, and context. Journal of Applied Social Psychology. 1999;29:1010–1027.

78. Mayne TJ, Acree M, Chesney MA, Folkman S. HIV sexual risk behavior following bereavement in gay men. Health Psychology. 1998;17:403–411. [PubMed]

79. Mays V, Cochran SD. HIV/AIDS in the African-American community: Changing concerns, changing behaviors. In: Stein M, Baum A, editors. Chronic diseases. Mahwah, NJ: Erlbaum; 1995. pp. 259–272.

80. Metzler CW, Biglan A, Noell J, Ary DV, Ochs L. A randomized controlled trial of a behavioral intervention to reduce high-risk sexual behavior among adolescents in STD clinics. Behavior Therapy. 2000;31:27–54.

81. Morrill A, Ickovics JR, Golubchikov VV, Beren SE, Rodin J. Safer Sex: Social and psychological predictors of behavioral maintenance and change among heterosexual women. Journal of Consulting and Clinical Psychology. 1996;64:819–828. [PubMed]

82. Morrison DM, Rogers Gillmore M, Baker SA. Determinants of condom use among high-risk heterosexual adults: A test of the Theory of Reasoned Action. Journal of Applied Social Psychology. 1995;25:651–676.

83. National Institute of Mental Health (NIMH) Multisite HIV Prevention Trial Group: The NIMH Multisite HIV Prevention Trial: Reducing HIV sexual risk behavior. Science. 1998. pp. 1889–1894. [PubMed]

84. Norris AE, Ford K. Condom use by low-income African American and Hispanic Youth with a well-known partner: Integrating the Health Belief Model, Theory of Reasoned Action, and the Construct Accessibility Model. Journal of Applied Social Psychology. 1995;25:1801–1830.

85. O'Leary A, Jemmott LS, Goodhart F, Gebelt J. Effects of an institutional AIDS prevention intervention: Moderation by gender. AIDS Education and Prevention. 1996;8:516–528. [PubMed]

86. O'Leary A, Ambrose TK, Raffaelli M, et al. Effects of an HIV risk reduction project on sexual risk behavior of low- income STD patients. AIDS Education and Prevention. 1998;10:483–492. [PubMed]

87. Orr DP, Langefeld CD, Katz BP, Caine VA. Behavioral intervention to increase condom use among high-risk female adolescents. Journal of Pediatrics. 1996;128:288–295. [PubMed]

88. Otto-Salaj LL, Heckman TG, Stevenson LY, Kelly JA. Patterns, predictors and gender differences in HIV risk among severely mentally ill men and women. Community Mental Health Journal. 1998;34:175–190. [PubMed]

89. Otto-Salaj LL, Kelly JA, Stevenson LY, Hoffmann R, Kalichman SC. Outcomes of a randomized small-group HIV prevention intervention trial for people with serious mental illness. Community Mental Health Journal. 2001;37:123–144. [PubMed]

90. Parsons JT, Halkitis PN, Bimbi D, Borkowski T. Perceptions of the benefits and costs associated with condom use and unprotected sex among late adolescent college students. Journal of Adolescence. 2000;23:377–391. [PubMed]

91. Reinecke J, Schmidt P, Ajzen I. Application of the Theory of Planned Behavior to adolescents' condom use: A panel study. Journal of Applied Social Psychology. 1996;29:749–772.

92. Rosario M, Meyer-Bahlburg HFL, Hunter J, Gwadz M. Sexual risk behaviour of gay, lesbian, and bisexual youths in New York City: Prevalence and correlates. AIDS Education and Prevention. 1999;11:476–496. [PubMed]

93. Rotheram-Borus MJ, Gwadz M, Fernandez M, Srinivasan S. Timing of HIV interventions on reductions in sexual risk among adolescents. American Journal of Community Psychology. 1998;26:73–96. [PubMed]

94. Rotheram-Borus MJ, Lee MB, Murphy DA, et al. Efficacy of a preventive intervention for youths living with HIV. American Journal of Public Health. 2001;91:400–405. [PubMed]

95. Sabogal F, Catania JA. HIV risk factors, condom use, and HIV antibody testing among heterosexual Hispanics: The National AIDS Behavioral Survey (NABS) Hispanic Journal of Behavioral Sciences. 1996;18:367–391. [PubMed]

96. Sanderson CA, Jemmott JB. Moderation and mediation of HIV-prevention interventions: Relationship status, intentions, and condom use among college students. Journal of Applied Social Psychology. 1996;26:2073–2099.

97. Sanderson CA, Maibach EW. Predicting condom use in African American STD patients: The role of two types of outcome expectancies. Journal of Applied Social Psychology. 1996;26:1495–1509.

98. Scheidt DM, Windle M. Individual and situational markers of condom use and sex with nonprimary partners among alcoholic inpatients: Findings from the ATRISK study. Health Psychology. 1996;15:185–192. [PubMed]

99. Schroder KEE, Hobfoll SE, Jackson AP, Lavin J. Proximal and distal predictors of AIDS risk behaviors among inner-city African American and European American women. Journal of Health Psychology. 2001;6:169–190. [PubMed]

100. Sikkema KJ, Winett RA, Lombard DN. Development and evaluation of an HIV-risk reduction program for female college students. AIDS Education and Prevention. 1995;7:145–159. [PubMed]

101. Sikkema KJ, Kelly JA, Winett RA, et al. Outcomes of a randomized community-level HIV prevention intervention for women living on 18 low-income housing develoments. American Journal of Public Health. 2000;90:57–63. [PubMed]

102. Sneed CD, Chin D, Rotheram-Borus MJ, et al. Test-retest reliability for self-reports of sexual behavior among Thai and Korean respondents. AIDS Education and Prevention. 2001;13:302–310. [PMC free article] [PubMed]

103. Sohler N, Colson PW, Meyer-Bahlburg HFL, Susser E. Reliability of self-reports about sexual risk behavior for HIV among homeless men with severe mental illness. Psychiatric Services. 2000;51:814–816. [PubMed]

104. Stein JA, Nyamathi A, Kington R, Corporation TR. Change in AIDS risk behaviors among impoverished minority women after a community-based cognitive-behavioral outreach program. Journal of Community Psychology. 1997;25:519–533.

105. St. Lawrence JS, Brasfield TL, Jefferson KW, et al. Cognitive-behavioral intervention to reduce African American adolescents' risk for HIV infection. Journal of Consulting and Clinical Psychology. 1995;63:221–237. [PubMed]

106. St. Lawrence JS, Scott CP. Examination of the relationship between African American adolescents' condom use at sexual onset and later sexual behavior: Implications for condom distribution programs. AIDS Education and Prevention. 1996;8:258–266. [PubMed]

107. Stone VE, Catania JA, Binson D. Measuring change in sexual behavior: Concordance between survey measures. Journal of Sex Research. 1999;36:102–108.

108. Susser E, Valencia E, Miller M, et al. Sexual behavior of homeless mentally ill men at risk for HIV. American Journal of Psychiatry. 1995;152:583–587. [PubMed]

109. Susser E, Valencia E, Berkman A, et al. Human immunodeficiency virus sexual risk reduction in homeless men with mental illness. Archives of General Psychiatry. 1998;55:266–272. [PubMed]

110. Voluntary HIV-1 Counseling and Testing Efficacy Study Group. Efficacy of voluntary HIV-1 counseling and testing in individuals and couples in Kenya, Tanzania, and Trinidad. Lancet. 2000;356:103–112. [PubMed]

111. Thompson SC, Anderson K, Freedman D, Swan J. Illusions of safety in a risky world: A study of college students' condom use. Journal of Applied Social Psychology. 1996;26:189–210.

112. Valleroy LA, MacKellar DA, Karon JM, et al. HIV prevalence and associated risks in young men who have sex with men. Journal of the American Medical Association. 2000;284:198–204. [PubMed]

113. Vanable PA, Ostrow DG, McKirnan DJ, Taywaditep KJ, Hope BA. Impact of combination therapies on HIV risk perceptions and sexual risk among HIV-positive and HIV-negative gay and bisexual men. Health Psychology. 2000;19:134–145. [PubMed]

114. Wang J, Siegal HA, Falck RS, Carlson RG, Rahman A. Evaluation of HIV risk reduction intervention programs via latent growth model. Evaluation Review. 1999;23:648–662. [PubMed]

115. Weinhardt LS, Carey MP, Carey KB. HIV-risk behavior and the public health context of HIV/AIDS among women living with a severe and persistent mental illness. Journal of Nervous and Mental Disease. 1998;186:276–282. [PubMed]

116. Weinhardt LS, Carey MP, Carey KB, Verdecias RN. Increasing assertiveness skills to reduce HIV risk among women living with a severe and persistent mental illness. Journal of Consulting and Clinical Psychology. 1998;66:680–684. [PubMed]

117. Weinhardt LS, Carey MP, Maisto SA, et al. Reliability of the timeline follow-back sexual behavior interview. Annals of Behavioral Medicine. 1998;20:25–30. [PMC free article] [PubMed]

118. Weinhardt LS, Carey KB, Carey MP. HIV risk sensitization following a detailed sexual behavior interview: a preliminary investigation. Journal of Behavioral Medicine. 2000;23:393–398. [PubMed]

119. Wilson TE, Jaccard J, Levinson RA, Minkoff H, Endias R. Testing for HIV and other sexually transmitted diseases: Implications for risk behavior in women. Health Psychology. 1996;15:252–260. [PubMed]

120. Zamboni BD, Crawford I, Williams PG. Examining communication and assertiveness as predictors of condom use: Implications for HIV prevention. AIDS Education and Prevention. 2000;12:492–504. [PubMed]

121. Catania JA, Gibson DR, Chitwood DD, Coates TJ. Methodological problems in AIDS behavioral research: Influences on measurement error and participation bias in studies of sexual behavior. Psychological Bulletin. 1990;108:339–362. [PubMed]

122. Catania JA, Binson D, van Der Straten A, Stone V. Methodological research on sexual behavior in the AIDS era. Annual Review of Sex Research. 1995;6:77–125.

123. Weinhardt LS, Forsyth AD, Carey MP, Jaworski BC, Durant LE. Reliability and validity of self-report measures of HIV-related sexual behavior: progress since 1990 and recommendations for research and practice. Archives of Sexual Behavior. 1998;27:155–180. [PMC free article] [PubMed]

124. Sheeran P, Abraham C, Orbell S. Psychosocial correlates of heterosexual condom use: A meta-analysis. Psychological Bulletin. 1999;125:90–132. [PubMed]

125. Weinhardt LS, Carey MP, Carey KB. HIV risk reduction for the seriously mentally ill: Pilot investigation and call for research. Journal of Behavior Therapy and Experimental Psychiatry. 1997;28:87–95. [PMC free article] [PubMed]

126. Fishbein M, Pequegnat W. Evaluating AIDS prevention interventions using behavioral and biological outcome measures. Sexually Transmitted Diseases. 2000;27:101–110. [PubMed]

127. Susser E, Desvarieux M, Wittkowski KM. Reporting sexual risk behavior for HIV: A practical risk index and a method for improving risk indices. American Journal of Public Health. 1998;88:671–674. [PubMed]

128. Pinkerton SD, Abramson PR. An alternative model of the reproductive rate of HIV infection. Formulation, evaluation, and implications for risk reduction interventions. Evaluation Review. 1994;18:371–388.

129. Pinkerton SD, Abramson PR. Evaluating the risks. A Bernoulli Process Model of HIV infection and risk reduction. Evaluation Review. 1993;17:504–528.

130. Carey MP, Morrison-Beedy D, Carey KB, et al. Psychiatric outpatients report their experiences as participants in a randomized clinical trial. Journal of Nervous and Mental Disease. 2001;189:299–306. [PMC free article] [PubMed]

131. Ngugi EN, Plummer FA, Simonsen JN, et al. Prevention of transmission of human immunodeficiency virus in Africa: Effectiveness of condom promotion and health education among prostitutes. Lancet. 1988;15:887–890. [PubMed]

132. Davis KR, Weller SC. The effectiveness of condoms in reducing heterosexual transmission of HIV. Family Planning Perspectives. 1999;31:272–279. [PubMed]

133. Fritz RB. AIDS knowledge, self-esteem, perceived AIDS risk, and condom use among female commercial sex workers. Journal of Applied Social Psychology. 1998;28:888–911.

134. Fisher JD, Fisher WA. Changing AIDS-risk behavior. Psychological Bulletin. 1992;111:455–474. [PubMed]

135. Cameron AC, Trivedi PK. Regression analysis of count data. New York, NY: Cambridge University; 1998.

136. Lindsey JK. Modeling frequency and count data. New York: Oxford University; 1995.

137. Long JS. Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage; 1997.

138. Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. 3rd ed. New York: John Wiley & Sons; 2001.

139. National Institute of Mental Health (NIMH) Multisite HIV Prevention Trial Group. The NIMH Multisite HIV Prevention Trial: Reducing HIV sexual risk behavior. Science. 1998. pp. 1889–1894. [PubMed]

140. Tabachnick BG, Fidell LS. Using multivariate statistics. 2nd ed. New York: HarperCollins; 1996.

141. Stevens J. Applied multivariate statistics for the social sciences. 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 1996.

142. Cohen J, Cohen P. Applied multiple regression / correlation analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Erlbaum; 1983.

143. Cronbach LJ, Furby L. How should we measure "change" - or should we? Psychological Bulletin. 1970;74:68–80.

144. Huck SW, McLean RA. Using a repeated measures ANOVA to analyze the data from a pretest-posttest design: A potentially confusing task. Psychological Bulletin. 1975;82:511–518.

145. Jennings E. Models for pretest-posttest data: Repeated measures ANOVA revisited. Journal of Educational Statistics. 1988;13:273–280.

146. Llabre MM, Spitzer SB, Saab PG, Ironson GH, Schneiderman N. The reliability and specificity of delta versus residualized change as measures of cardiovascular reactivity to behavioral challenges. Psychophysiology. 1991;28:701–711. [PubMed]

147. Rogosa D, Brandt D, Zimowski M. A growth curve approach to the measurement of change. Psychological Bulletin. 1982;92:726–748.

148. Maris E. Covariance adjustment versus gain scores - revisited. Psychological Methods. 1998;3:309–327.

149. Malgady RG, Colon-Malgady G. Comparing the reliability of difference scores and residuals in analysis of covariance. Educational and Psychological Measurement. 1991;51:803–807.

150. Rovine MJ, Molenaar PCM. A structural equation modeling approach to the General Linear Mixed Model. In: Collins LM, Sayer AG, editors. New methods for the analysis of change. Washington, DC: American Psychological Association; 2001. pp. 67–96.

151. Raudenbush SW. Toward a coherent framework for comparing trajectories of individual change. In: Collins LM, Sayer AG, editors. New Methods for the Analysis of Change. Washington, DC: American Psychological Association; 2001. pp. 33–64.

152. Gardner W, Mulvey EP, Shaw EC. Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin. 1995;118:392–404. [PubMed]

153. Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121–130. [PubMed]

154. Liang K-L, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

155. Neuhaus JM. Statistical methods for longitudinal and clustered designs with binary responses. Statistical Methods in Medical Research. 1992;1:249–273. [PubMed]

156. Neuhaus JM, Kalbfleisch JD, Hauck WW. A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. International Statistical Review. 1991;59:25–35.

157. Raudenbush SW, Bryk AS. Hierarchical Linear Models. Applications and Data Analysis Methods. 2nd ed. Thousand Oaks: Sage; 2002.

158. Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS system for mixed models. Cary, NC: SAS Institute Inc.; 1996.

159. Hays WL. Statistics. 5th ed. Ft Worth, TX: Harcourt Brace; 1994.

160. Fox J. Multiple and generalized nonparametric regression (Sage University Papers Series. Quantitative Applications in the Social Sciences, No. 131) London: Sage; 2000.

161. Green PJ, Silverman BW. Nonparametric regression and generalized linear models: A roughness penalty approach (Monographs on Statistics and Applied Probability, Vol 58) Boca Ratin: Chapman & Hall/CRC; 2000.

162. Joereskog K, Soerbom D. LISREL 8: User's reference guide. Chicago, IL: Scientific Software; 1996.

163. Bentler PM, Wu EJC. EQS for Windows users' guide. Encino, CA: Multivariate Software; 1995.

164. O'Connell AA. Methods for modeling ordinal outcome variables. Measurement and Evaluation in Counseling and Development. 2000;33:170–193.

165. Agresti A. An introduction ot categorical data analysis. New York: Wiley; 1996.

166. Agresti A. Tutorial on modeling ordered categorical response data. Psychological Bulletin. 1989;105:290–301. [PubMed]

167. West SG, Finch JF, Curran PJ. Structural equation models with non-normal variables. Problems and remedies. In: Hoyle RH, editor. Structural equation modeling. Concepts, issues, and applications. Thousand Oaks: Sage; 1995. pp. 56–75.

168. Muthen BO. Latent variable mixture modeling. In: Marcoulides GA, Schumacker RE, editors. New developments and techniques in structural equation modeling. Mahwah, NJ: Lawrence Erlbaum Associates; 2001. pp. 1–33.

169. Muthen BO. Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class - latent growth modeling. In: Collins LM, Sayer AG, editors. New methods for the analysis of change. Washington, DC: American Psychological Association; 2001. pp. 289–322.

170. Jo B, Muthen BO. Modeling of intervention effects with noncompliance: A latent variable approach for randomized trials. In: Marcoulides GA, editor. New developments and techniques in structural equation modeling. Mahwah, NJ: Lawrence Erlbaum Associates; 2001. pp. 57–87.

171. Joereskog K, Soerbom D. PRELIS 2: User's reference guide. Chicago, IL: Scientific Software; 1996.

172. Kline RB. Principles and practice of structural equation modeling. New York: Guilford; 1998.

173. Bentler PM. Causal modeling: New interfaces and new statistics. In: Adair JG, Belanger D, Dion KL, editors. Causal modeling: New interfaces and new statistics. Volume 1: Social, personal, and cultural aspects. Hove, England: Psychology Press/Erlbaum; 1998. pp. 353–370.

174. SPSS. SPSS Exact Tests 7.0 for Windows. Chicago, IL: SPSS; 1996.

175. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993.

176. Mooney CZ, Duval RD. Bootstrapping: A nonparametric approach to statistical inference. Newbury Park, CA: Sage; 1993.

177. Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician. 1983;37:36–48.

178. Lunneborg CE, Tousignant JP. Efron's bootstrap with application to the repeated measures design. Multivariate Behavioral Research. 1985;20:161–178.

179. Lunneborg CE. Data analysis by resampling: Concepts and applications. Pacific Grove, CA: Duxbury; 1999.

180. Bollen KA, Stine RA. Bootstrapping goodness-of-fit measures in structural equation models. In: Bollen KA, Long JS, editors. Testing structural equation models. Thousand Oaks, CA: Sage; 1993. pp. 111–135.

181. Good P. Permutation Tests. A practical guide to resampling methods for testing hypotheses. 2nd ed. New York: Springer; 2000.

182. Good P. Extensions of the concept of exchangeability and their applications to testing hypotheses. Retrieved on 07/10/2002 from http://users.oco.net/drphilgood/index1.htm. (n. d.)

183. Belli R, Schwarz N, Singer E, Talarico J. Decomposition can harm the accuracy of behavioural frequency reports. Applied Cognitive Psychology. 2000;14:295–308.

184. Schwarz N. Self-reports. How the questions shape the answers. American Psychologist. 1999;54:93–105.

185. Rothman AJ, Haddock G, Schwarz N. "How many partners is too many?” Shaping perceptions of personal vulnerability. Journal of Applied Social Psychology. 2001;31:2195–2214.

186. Schwarz N, Oyserman D. Asking questions about behavior: Cognition, communication, and questionnaire construction. American Journal of Evaluation. 2001;22:127–160.

187. Hayes AF. Permutation test is not distribution-free: Testing H_{0} : ρ = 0. Psychological Methods. 1996;1:184–198.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |