|Home | About | Journals | Submit | Contact Us | Français|
The trustworthiness of self-reported sexual behavior data has been questioned since Kinsey’s pioneering surveys of sexuality in the United States (Kinsey et al. 1948, 1953). In the era of HIV and AIDS, researchers and practitioners have employed a diversity of assessment techniques but they have not escaped the fundamental problem of measurement error. In this article, we review the empirical literature produced since Catania et al.’s (1990) review regarding reliability and validity of self-administered and automated questionnaires, face-to-face interviews, telephone interviews, and self-monitoring approaches. We also provide specific recommendations for improving sexual behavior assessment. It is imperative that standardized self-report instruments be developed and used for sexual risk-behavior assessment.
The fidelity of sexual behavior data obtained by self-report has been questioned repeatedly since Kinsey’s pioneering surveys of sexuality in the United States (Kinsey et al. 1948, 1953). In recent years, commentaries on sexual behavior research have suggested that behavioral data produced by self-report methods are worthless. For example, Lewontin (1995) argued that self-reports of sexual behavior are inherently unreliable and invalid due to multiple sources of bias, including under-reports of stigmatized behaviors and over-reports of normative behaviors. Brody (1995) questioned the validity of self-report data for sexual behaviors that confer risk for HIV infection, suggesting that participants in behavioral research are prone to intentional misrepresentation. Recognition of the importance of these and related concerns has led to a National Institutes of Health-sponsored conference on “The Science of Self-Report”, held in Washington, DC, in November, 1996.
Despite these criticisms and concerns, researchers and practitioners continue to rely on self-report methods to assess the topography of sexual behavior because ethical and practical considerations limit the use of more direct assessment methods. The goals of this paper are to describe the state-of-the-science and to provide recommendations for improving self-report assessment methodology. We begin by summarizing methodological issues presented by Catania et al. (1990); next, we summarize the research published since their review; then we make specific suggestions for the assessment of sexual behavior in research and practice.
Catania et al. (1990) published a thoughtful review of the methodological challenges faced by researchers attempting to assess sexual behavior. Issues such as participation and response biases (e.g., intentional misrepresentation, inaccurate recall) and the effects of different modes of administration on responding were discussed regarding their potential to introduce error into measurement of sexual behavior. Table I details the practical advantages and disadvantages of different modes of administration that emerge from their discussion.
Catania et al. (1990) concluded that although of these methodological issues have been studied extensively in the assessment of other behaviors, rigorous research on assessment of sexual behavior is needed before decisions about the best measures for various purposes can be made with confidence. Because self-report sexual behavior data are used for tracking the spread of HIV, identifying populations at risk for HIV infection, and evaluating the effectiveness of HIV-risk-reduction interventions, the authors implored researchers to balance the urgency of the HIV epidemic with the methodological rigor necessary for quality assessment. A foundation of developing measures is psychometric evaluation, which generally begins with evaluations of reliability and validity.
Establishing the reliability of a measure is prerequisite to assessing its validity. A common method for determining the reliability of self-reports of behavior is the test-retest reliability (TR) study, wherein the same measure is administered to the same participants twice and the results are compared for congruency. Well-designed TR studies can suggest some of the variables that affect reporting accuracy. Studies that compare the reliability of different modes of administration also aid in determining procedures that reduce inconsistent responding. Another method of examining the reliability of reports is to evaluate internal consistency. Behavioral questionnaires can include the same question more than one time, and the concordance of responses can be examined.
One ever-present issue in assessment of sexual behavior is the absence of a “gold standard” with which to compare self-report data. As of 1990, there were no current data on the distribution of high-risk sexual behavior in the population of the United States. Even if such normative data were available, it would be difficult to draw conclusions on an individual basis regarding whether a participant has accurately reported his or her sexual behavior. Unlike health-risk behaviors that are directly observable (e.g., use of a bicycle helmet), leave permanent byproducts (e.g., cigarette butts), or are relatively unencumbered by societal proscription (e.g., sedentary lifestyle behavior), risky sexual behaviors are inherently private, frequently considered taboo, and situationally specific, characteristics that make them inherently inaccessible to direct assessment strategies. The fact that there are no physiological or psychophysiological data that correspond directly to frequency of sexual activity compounds the problem. Due to these obstacles, there have been few studies designed to examine evidence for the validity of self-report sexual behavior measures. However, methods such as comparing self-reports with partner reports and comparing self-reports of risk behavior to HIV serostatus, reinfection with bacterial STDs and other biochemical markers for sexual activity, have been employed in several studies.
We searched the psychological, psychiatric, and medical literature to obtain studies examining specifically the reliability and validity of self-report sexual behavior measures. We began with computerized searches of the psychological (PsycLIT) and medical (MedLine) databases. The following search terms were used: sexual behavior, HIV-risk behavior, reliability, validity, self-report, and assessment. In order to review research produced concurrent with and since Catania et al.’s (1990) review, we collected relevant articles published from January 1, 1990 to December 31, 1995 (or later if made available to us by the author). Articles were considered relevant if they described assessment of sexual behavior using a self-report method, and included analyses examining the reliability and/or validity of those self-reports. We then studied each reference section in those articles to identify additional relevant research. This process was repeated for each new reference until all cited references were included or eliminated because they provided redundant information. Table II describes the 30 studies included in this review.
Our first observation from this literature is that studies of self-report sexual behavior measures are methodologically heterogeneous and often at odds conceptually. Few articles document assessments of sexual behavior using the same questions, and the administration of many measures were unstandardized (see Darke et al., 1991; McKinnon et al., 1993; Needle et al., 1995 for exceptions). Studies were conducted with structured and semi-structured face-to-face interviews (FTFIs), self-administered questionnaires (SAQs), biological markers, and collateral interviews. Reporting periods ranged from three weeks to several years and the number of sexual behavior items included ranged from 3 to 398. Behaviors commonly assessed include numbers of sexual partners; frequency of protected and unprotected oral, anal, and vaginal sex; non-consensual sexual experiences; and history of relationships. Assessment duration ranged from 6 - 90 minutes (see Boekeloo et al., 1994; Konings et al., 1995, respectively). Most measures we examined were appropriate for risk screening (e.g., dichotomous indication of risk behavior) and risk assessment (e.g., continuous indication of level of risk behavior).
We summarize evidence for reliability and validity in Table II. Evidence for the reliability of measures was most often reported as temporal stability resulting from test-retest studies and internal consistency in studies using a single administration. However, there was variation in the conceptualization of reliability. For example, researchers have compared the consistency of scores using different reporting periods, in effect measuring different behaviors, yet presented their results as an evaluation of test-retest reliability (e.g., McLaws et al., 1990). Validity was often demonstrated using convergent evidence, with comparison to other self-report data as the most common procedure. Some researchers compared partner and self-reports (e.g., Upchurch et al., 1991; Padian, 1990), and others compared self-reports to STD incidence (Doll et al., 1994; Cohen and Dent, 1992; Zenilman et al., 1995).
The measures were used in research on a variety of subpopulations in the U.S. and abroad, including ethnic minority groups (Boekeloo et al., 1994; Cohen & Dent 1992; Dowling-Guyer et al., 1993; Kalichman et al., 1997; Schopper et al., 1993; Wyatt et al., 1992; Zenilman et al., 1995), psychiatric patients (McKinnon et al., 1993; Sacks et al., 1990), gay men (Doll et al., 1994; McLaws et al., 1990; Schneider et al., 1991; Siegel et al., 1994), college students (Anderson and Pollack, 1994; McEwan et al., 1992; Schneider et al., 1991), STD clinic patients (Cohen and Dent, 1992; Zenilman et al., 1995; Upchurch et al., 1991), sex traders (McLaws et al., 1990; Konings et al., 1995), and intravenous drug users (IDU; Dowling-Guyer, 1994; McElrath et al., 1994; Needle et al., 1995).
Eight studies directly examined the effects of administration mode on reported frequency of sexual behavior. Five compared different modes of assessment (Boekeloo et al., 1994; Kalichman et al., 1997; McEwan et al., 1992; Nebot et al., 1994; Siegel et al, 1994) and three compared different versions of the same self-report assessment mode (Kauth et al., 1991; Konings et al., 1995; Weinhardt et al., in press). These studies are detailed in Table II. The following descriptions highlight four of the studies that examined different modes of assessment.
Boekeloo et al. (1995) conducted a mode effect study of HIV-risk assessment that examined differences in reported risk behaviors between participants in a self-administered questionnaire condition and participants who responded to the same questions presented by audiotape. After the initial assessment, all participants completed face-to-face interviews based on the same questionnaire items. Boekeloo et al. concluded that audiotape administration of a culturally sensitive sexual behavior measure was preferable to written or face-to-face interview versions of the same measure because it resulted in fewer missing responses for several behaviors, including unprotected vaginal sex with steady or nonsteady partners, unprotected receptive anal sex with steady or nonsteady partners, multiple partners, and sex with a homosexual or bisexual man. The audiotape administered questionnaire also resulted in more reports of risk factors including unprotected vaginal sex with steady or nonsteady partners, unprotected receptive anal sex with steady partners, and sex with an HIV positive partner. Although methodological confounds (e.g., length of assessment covaried with assessment mode) probably influenced their results, Boekeloo et al.’s article serves as a model for mode-effects studies in applied environments. This study also highlights the potential advantages of using technologically advanced assessment methods.
Kalichman et al. (1997) examined differences in response rates, reliability, and changes in HIV-preventive behavioral intentions between groups of women who were administered an HIV-risk behavior assessment that was either self-administered, displayed on an overhead projector, or conducted as a face-to-face interview. In each condition, participants were administered long or short forms of the measure. Results indicate that: (a) effects of the interview and SAQ method were similar, and both enhance sensitization, whereas the lack of time to reflect during the projected assessment, resulted in minimized sensitization; and (b) longer assessments may have greater sensitizing effects (which may prepare people to change their behavior).
McEwan et al. (1992) compared a postal survey to a face-to-face interview. Results indicated that most responses and participation rates were similar across modes, but that the interviews were more costly more to administer. However, the interviews yielded better responses to open-ended questions and resulted in fewer reported “socially unacceptable” behaviors. Although the two assessments were administered to different samples, the authors concluded that postal questionnaires are better for obtaining information on sexual behavior if in depth answers are not needed. McEwan et al. recommended the postal method for most surveys in the HIV/AIDS field.
Finally, Kauth et al. (1991) reviewed duration of retrospective reporting periods used in sexual behavior research and found that many studies used reporting periods of 12 months or longer, and that reporting period varied widely across studies. The authors tested reliability of 2-week, 3-month, and 1-year reporting and found that participants tended to report reliably at 2-weeks and 3-months, but not at 12 months. Men tended to underreport all behaviors using the 12-month reporting period. The authors recommended the use of briefer reporting periods to improve the reliability of self-reports.
Few studies with direct implications for future assessment have been conducted. Most studies that have examined the psychometric properties of self-report sexual behavior measures have done so in the context of larger empirical investigations, where evidence of reliability and validity is reported only as part of the preliminary analyses. Although more attention has been paid to measurement error in sexual behavior research, it remains as challenging today to select an empirically evaluated self-report measure of sexual behavior as when Catania et al. (1990) issued their call to action. Further, there is little consensus among researchers regarding which administration modes yield the most reliable and valid sexual behavior data. Consequently, contemporary sexual behavior research may employ measures that yield low fidelity data. Additional research focusing on modes of administration with diverse populations will be necessary to reach such a consensus.
In addition to study design, greater attention to statistical analyses of self-report data is worthy of attention. For example, optimal performance of standard tests of correlation and mean differences depend upon a number of important conditions that, when violated, limit confidence in the results. Among these is the assumption that observed scores come from normal distributions. It is widely recognized that the distributions for many sexual behaviors tend to be positively skewed; many people report zero or a few occasions of target behaviors whereas other participants report high frequencies. In some cases, transforming scores prior to examining their covariation with other variables may be appropriate if the transformation produces a distribution that approximates the normal (see Tukey, 1977, for a thorough discussion of data transformation). However, data for these behaviors are often sufficiently non-normal to suggest the need for alternative analytical approaches (e.g., negative binomial regression; Gardner et al., 1995) that are appropriate for highly skewed, count data. Other issues worthy of attention include modeling data that contain a disproportionate number of specific values relative to the rest of the distribution (e.g., zeros or ones), the implications of relying on least squares estimation for modeling low base rate behavior, and the elaboration of alternative analytical strategies for modeling non-normally distributed, sexual behavior data. Clearly, sexual behavior data require more sophisticated analytic procedures that yield test statistics that can be interpreted with greater confidence.
The quality of sexual-behavior assessment is an important but daunting challenge to HIV-related research. From the research produced to date, there is evidence suggesting that well-designed interviews and questionnaires can provide acceptable data when administered appropriately. However, the problematic issues discussed herein (see also Catania et al., 1990, 1995), coupled with the public health implications of the use of sexual self-report data, require active research that focuses on the design and evaluation of measures for a variety of populations. To improve the quality of self-report assessment of sexual behavior, we offer the following recommendations, which are based on the reviewed studies, information from literacy experts, and our experiences with survey design and administration. These recommendations are likely to be useful for selecting among existing measures, adapting existing measures for different applications, or designing new measures.
Table II provides psychometric evidence for the latest measures developed. Research on the Risk Behavior Assessment (RBA; Needle et al., 1995), which includes sexual and drug-related HIV risk factors, provides an example of how psychometric evaluation can be conducted systematically. Psychometric properties of the RBA measure have been evaluated with different populations. When adapting a measure for use in a population, or when creating a new measure, it is crucial to conduct appropriate psychometric evaluation. In studies reporting test-retest analyses, the focus should be on the consistency of reporting rather than the consistency of target behaviors. Consequently, investigators should be careful to assess behaviors occurring in the same reporting period at both administrations. As others have pointed out, shorter assessment intervals (e.g., one week) limit the impact that memory deterioration and other variables may have on retest coefficients.
There is an urgent need for a standardized method for validating findings from sexual behavior self-reports. One approach, suggested by Schopper et al., (1993), is to (i) use the concordance of numbers of regular and casual partners and sex acts across gender to identify gaps; (ii) use the degree of concordance between couples on the type of relationship (e.g., monogamous or polygamous), and on number of sex acts with the partner to measure accuracy at the individual level; and (iii) compare patterns related to participant demographics and behavior with results from other surveys with similar samples. Seidman and Reider (1994) reviewed sexual behavior surveys conducted in the United States, and provided data on “normative” sexual behavior across the life cycle. A more comprehensive survey has recently been reported by Laumann et al. (1994). This normative data may allow researchers to compare aggregate patterns of behavior in their samples with appropriate population parameters. Other methods can be used to provide evidence for validity of self-report measures are biochemical evidence of sexual activity (e.g., reinfection with STDs) and comparison with concurrent self-monitoring data.
Because measurement instruments address different levels of behavioral specificity, attention to the needs of the assessment procedure are warranted. It may be useful to conceptualize measures of HIV-risk behavior as falling into to three concentric categories: risk screening, risk assessment, and risk-event data. Risk screening describes dichotomously whether a respondent has engaged in risk behavior during the reporting period (e.g., Boekeloo et al., 1994). Risk assessment allows description of the level of risk, based on the frequency of risk behavior (e.g., Downey et al., 1995). Risk-event data is the most detailed, and allows event-level examination of the co-occurrence of potential risk factors with risk behaviors (e.g., Crosby et al., 1996). Clearly, a measure designed for risk-screening may yield data that are inadequate if the goal of an investigation is to assess the level of risk reported by the target sample. Alternatively, although an event-level assessment provides risk screening data, the additional resources required for assessment would be wasted if event-level analysis is not required.
Researchers interested in studying at-risk behaviors among under-served samples are likely to encounter problems associated with the terminology used in assessments. On the one hand, terms that are unfamiliar to participants may lead to increased measurement error, and on the other hand, using slang terms from the argot of the target sample may reduce researchers’ credibility or be misinterpreted as condescension. One solution is to provide parenthetical descriptions of clinical terms. For example, “How many times in the past three months have you had vaginal sex (when a man puts his penis in a woman’s vagina)?” Careful pilot research with the target sample prior to measurement development and implementation is essential. Specifically, the literacy level of the population should be assessed, and used to guide vocabulary used in the questionnaire. Oral and written instructions should be as concise as possible, and should be reviewed with participants before they begin the questionnaire If multiple questionnaires are used, response formats should be as consistent as possible. Finally, offering an audiotape-administered version may be advantageous for groups with low reading levels.
Just as consideration of the reading and comprehension level of the target sample is necessary, so to is attention to the lifestyles, prevailing customs, and religious and cultural traditions of target samples. For example, sexual behavior questionnaires should be designed without inherent sexual orientation bias. Caution should be exercised to include response options that are suitable for a wide variety of responses. Filler questions should be culturally relevant. Also, emphasizing the personal benefits of the research to participants and their community is likely to engender cooperation and elicit candid reporting. Conducting focus groups with men and women from the population to be assessed often reveals unanticipated cultural and contextual issues that can be used to inform sexual behavior assessment (Carey et al., 1997). The need for qualitative research during measurement development cannot be over-emphasized.
Croyle and Loftus (1993) detailed the influence of the constructive nature of memory on self-reports of sexual behavior. Simple forgetting, telescoping (distorting the recency of particularly memorable events), exposure to misleading information since the event, and the use of heuristics to estimate behavior frequencies are some of the factors that can contribute to inaccurate self-reports. Useful techniques include: (a) providing anchor dates for reporting periods, (b) encouraging participants to use appointment books and calendars to recall other memorable events during the reporting period, and (c) recalling extensive periods of abstinence or consistent sexual activities. Recent use of the timeline follow-back procedure (Sobell & Sobell, 1996), which utilizes many of these techniques, to assess sexual behavior have shown that it provides valuable event-level data (Crosby et al., 1996) and that these self-reports are reliable (Weinhardt et al., in press).
Assessment of risk should take place after a participant and interviewer have established rapport, and the interviewer has assured the participant of confidentiality. Specific sexual behavior assessment should always begin with an appropriate introduction for the participants (Carey, 1998). During this time the reasons for asking questions about sexual (and other socially sensitive) behaviors should be provided. For example, in a clinical context, one might say that a standard practice is to inquire about risk for HIV just as one routinely inquires about suicidal ideation, personal safety, and other important matters; thus, all participants get asked and no one will feel singled out as being at unique risk. In a public health or research context, investigators might identify the overall purpose of the research, including how this will improve public health or enhance the scientific knowledge base. After these introductory remarks, participants should be invited to ask any questions they might have.
If an interviewer or investigator appears embarrassed about or unsure of the appropriateness of the questions, participants may detect this and provide incomplete or ambiguous responses.
These assumptions reflect the preferred direction of error. Thus, for example, it is better to assume minimal understanding on the part of the participant so that language is clear and concrete. Other useful assumptions include: (i) participants will be embarrassed about and have difficulty discussing sexual matters; (ii) participants will not understand all sexual behavior terms, medical terminology, etc.; (iii) participants will be misinformed about sexual material, including STDs and other threats to sexual health; and (iv) questions will provoke concern regarding a participant’s health or well-being (thus, the interviewer should be prepared with information, referral sources, and other reassuring materials). As the interviewer or investigator learns more about the client, these assumptions can be adjusted.
Thus, questions about oral or vaginal sex might precede questions regarding sexual behaviors that are less socially approved (e.g., anal intercourse, fisting, or rimming).
Rather than ask “if” a clients has engaged in a particular activity, ask “how many times have you ...” engaged in it. Use of open response formats on questionnaires is encouraged (Catania et al., 1990). Such an approach will communicate an investigator’s (or clinician’s) expectation that such behaviors do occur and are not abnormal.
Among these factors are the experimenter’s demeanor during interactions, administration setting, relevance of the studies aims to participants’ lives, and perceptions of trust regarding the purpose and personnel of the study. Clearly, these “non-specific” variables may impact participants’ the assessment protocol and affect measurement error.
Additional studies employing these steps will increase understanding of the issues affecting sexual behavior self-reports, and facilitate the development of standardized self-report measures that are appropriate for specific purposes in a variety of populations. Researchers and practitioners can then place more confidence in sexual behavior data gathered with self-report methods.