Despite the importance of reliable sexual information regarding GLB individuals, the current study, as far as we know, represents the first test-retest study of various aspects of sexuality among GLB youths. Overall, substantial to almost perfect reliability was obtained using the SERBAS-Y-HM among GLB youths on a variety of aspects of their sexuality, including lifetime sexual behavior, recent sexual behavior, unprotected sexual risk behavior, sexual identity, sexual orientation, and ages of psychosexual developmental milestones. The reliability found here is substantially higher than that found among most past research among primarily heterosexual adolescents or GLB adults.
Two potential explanations exist for the strong reliability found in this study. First, the SERBAS-Y-HM includes strategies that have been recommended by experts in sexual behavior assessment to enhance the reliability and validity of the behaviors assessed, including: (1) defining sexual terms (e.g., what do you mean by “sex”; Wiederman, 2002
), (2) using non-technical jargon by exploring and using the youths’ own language and terms for sexual behaviors (e.g., “tossing salad”; Catania et al., 1990
), (3) focusing on a short, three-month recall assessment (Schroder et al., 2003
), (4) using participant-nominated events in order to personally anchor and clarify the assessment window (Weinhardt et al., 1998
), (5) assessing behaviors with respect to each specific partner, and (6) utilizing qualitative research to inform item content and language (Weinhardt et al., 1998
). Second, the interviewers were highly trained and experienced with the administration of the SERBAS-Y-HM, comfortable with discussing sexual topics, and comfortable with the GLB population. Unfortunately, it is impossible to determine which aspects of the SERBAS-Y-HM or the interviewer training played critical roles in the reliability of the reports assessed here. Nevertheless, researchers are encouraged to employ measures that, like the SERBAS-Y-HM, incorporate strategies to enhance the reliability and validity of self-reported sexual information.
Despite the generally high reliability found among these youths, some exceptions were noted. Although there were generally few observed differences in the reliability of male and female youths’ reports, instances of moderate or low reliability were often gender-specific. For example, female youths were found to have only fair agreement on the number of sexual partners in their lifetime, whereas male youths provided almost perfect reliability on this question. In contrast, male youths were found to provide poor reliability in their reports of anal sex while using alcohol or drugs (female youths were not asked about anal sex). This poor reliability may be due to the rarity of this behavior, which reduced the sample size and potential variability. Nevertheless, it should be noted that the numbers of moderate or low reliabilities observed were less than expected by chance. Future research must determine whether the low reliabilities are chance findings or indicate problematic measurement.
Unreliable findings also have serious methodological implications for sex research in general. For example, youths reported only moderate reliability (kappa = .77 for males and .60 for females) on whether they had any same-sex sexual behavior in the past three months. Although this would suggest that youths are only moderately able to recall their recent sexual behaviors, in fact, they provided highly reliable reports of the number of recent partners and the number of recent specific sexual acts (e.g., vaginal, oral, anal; with or without a condom). This inconsistency suggests that perhaps, despite our efforts to clarify what we meant by “sex,” some youths were confused by this general term, but not when asked about specific behaviors. Thus, the use of general questions may be unreliable and research should focus on specific sexual behaviors. This also would imply that general questions should not be used to determine whether to skip a section of more detailed sexual inquiry; instead, specific behaviors should be assessed, regardless of any response to more general sex questions.
Given the recent advances in computer-assisted interviewing (e.g., Audio-CASI), some may question whether the use of a face-to-face interview for the assessment of sexual behavior is a reliable and valid method of assessment. Indeed, many have suggested that the greater privacy afforded by Audio-CASI assessments would increase the reliability and validity of self-reported sexual behavior (see Schroder et al., 2003
for review). Although some research has indicated that Audio-CASI results in more reports of potentially stigmatizing sexual behaviors than do face-to-face interviews (Des Jarlais et al., 1999
), most of the research has identified only a small number of differences between interviews and Audio-CASI in the reports of sexual behaviors (Ellen et al., 2002
; Macalino, Celentano, Latkin, Stathdee, & Vlahov, 2002
; Metzger et al., 2000
; Williams et al., 2000
). Indeed, some of these observed differences are in the opposite direction, with more sexual behaviors disclosed via face-to-face interviews than with Audio-CASI (Ellen et al., 2002
; Jennings, Lucenko, Malow, & Devieux, 2002
; Williams et al., 2000
). Furthermore, at least some past research has suggested that test-retest reliability of sexual behavior is greater in face-to-face interviews than when using Audio-CASI (Williams et al., 2000
). Although it is unclear whether Audio-CASI results in more reliable and valid assessments of sexual behavior, face-to-face interviews may have some potential advantages in some populations, such as among those with low educational background or those who are uncomfortable using computers. Face-to-face interviews have the added benefits of allowing for the exploration of the individuals’ own terms for various sexual behaviors, perceiving possible confusion and clarification of questions, exploring of potential logical inconsistencies, and building trust and rapport with the participant-none of which are adequately duplicated with the use of Audio-CASI. Indeed, this report provides evidence that sexual information can be reliably obtained via face-to-face interviews and earlier reports from this study using the SERBAS-Y-HM provide evidence of the construct validity of this interviewer-administered assessment (e.g., Rosario, Hunter, Maguen, Gwadz, & Smith, 2001
; Rosario, Mahler, Hunter, & Gwadz, 1999
; Rosario, Schrimshaw, & Hunter, 2004
The present sub-study has limitations. First, the sample size for the test-retest study was limited. Although we had a sufficient sample to examine reliability separately for male and female youths, we had insufficient numbers to examine potential ethnic/racial differences in reliability. A second limitation is that the sample was recruited from GLB-focused organizations in a major urban area. As such, these GLB youths may not be representative of the population of GLB youths. These youths may have been further along in the development of their GLB identity and more comfortable discussing their sexuality than youths who might not be involved in GLB organizations. As such, these youths’ reports may have been more reliable than might be found among samples less comfortable with their sexuality. Similarly, the findings from this ethnically diverse and urban sample may not generalize to other GLB populations. A third potential limitation is the use of a two-week test-retest period. Although the two-week retest is recommended by psychometric texts to prevent recall (e.g., Nunnally, 1978
) and is sufficiently brief to help ensure that new behaviors did not occur between test and retest (thereby biasing the reliability estimates), this brief retest period might increase the possibility of participants recalling their original responses and artificially increasing their reliability coefficients. As such, future reliability research may wish to employ longer test-retest periods to determine whether the reliability in reports observed here are replicated over longer periods (but not so long as to assess behaviors in two non-overlapping time periods). Finally, this report demonstrated that GLB youths were able to reliably report sexual information, this study does not provide any information about the validity of these reports. Although reliability is necessary for validity, the reverse is not true. Thus, the high reliabilities identified here are not necessarily indicative that youths were accurate in their reports of sexual information. Future research into the validity of sexual reports are needed.
Despite these limitations, the findings provide preliminary but critical information regarding the reliability of self-reported sexual information among GLB youths. However, given the importance of reliable reports of sexual information and the scarcity of empirical reports examining reliability, future research is needed into reliability of self-reported sexual information among all groups including adolescents and GLB individuals.