On the basis of the results presented here, it is possible to conclude that telephone and in-person assessment, by means of the PHQ-9, yield similar results. Additionally, the internal consistency of the telephone-administered PHQ-9 was similar to the self-administered PHQ-9. Thus, telephone administration of the PHQ-9 seems to be a reliable procedure for assessing depression at PC.
In the present study, questionnaire items were identical but the administration procedure differed. According to Helzer et al.20
we can describe the present study as a study of procedural validity, having characteristics of both reliability, because the same measure was used twice, and validity, since telephone administration was compared with a gold standard (self administration).
Intraclass correlation coefficient between self-administered and telephone-administered PHQ-9 were excellent regardless of administration procedure order (ST or TS) or administration procedure (telephone- or self-administration). Moreover, item concordance analysis (weighted κ) of each group revealed good or moderate agreement for all items, showing an adequate procedural validity for the telephone-administered PHQ-9 and good test–retest reliability for both self- and telephone-administered PHQ-9. Additionally, the internal consistency of telephone-administered items was high (between 0.85 and 0.79 depending on the group) and very close to the self-administered items.
A high and significant positive correlation was observed between self-administered and telephone-administered PHQ-9. Furthermore, correlation between both procedures was even higher than the one obtained by Kroenke et al.12
in their validation study of the PHQ-9 as a depression severity measure. While in that study, correlation between self-administered and telephone reappraisal performed within 48 hours was 0.84, in the present study, no group (i.e., ST, TS, TT, or SS) showed a correlation below 0.90.
PHQ-9 mean comparisons revealed a significant tendency toward lower scores for the telephone administration. However, the differences were minor (0.60 points) and probably lacked clinical relevance. In fact, according to Kroenke et al.12
depression severity measured with the PHQ-9 is considered to change qualitatively every 5 points. Thus, we should be cautious in overstating this point.
Although both procedures considered the same questions (items), it may have been possible that answering individually (i.e., self-administered PHQ-9) may have enhanced personal acknowledgment of certain characteristics that, when answering to someone else (over the telephone), could have been inhibited, either because of distrust or lack of privacy. As Evans et al.21
pointed out, “it is less easy to ensure privacy in a telephone interview, because the interviewer does not know who else may be present, possibly inhibiting disclosure by the subject” (p. 161). In the same way, Rohde et al.22
suggested that when scheduling a telephone assessment, the interviewer should try to set up a time when the participant could talk in private.
Additionally, PHQ-9 mean comparisons also revealed a statistically significant tendency toward lower scores on reappraisal assessments, that is, participants showed lower scores on the second PHQ-9 assessment for the ST plus TS group, and for the TT group. Two studies comparing face-to-face and telephone interviews found the same tendency.22,23
According to Jorm et al.24
when assessing psychiatric symptoms or personality traits twice, a mean change in scores toward less psychopathology is often observed. This retest artifact does not seem to be related to time lag between occasions and confined to measures assessing negative self-characteristics and administered orally by an interviewer. Some hypotheses intending to explain this are as follows: (1) regression to the mean, (2) therapeutic effects of the first interview, (3) participants trying to create a more favorable impression on retest, or (4) respondents taking the second evaluation less seriously. Any of these hypotheses are plausible for the present study. Unfortunately, our results do not allow us to clarify this point.
Limitations of the present study and of telephone interviewing must be acknowledged. First, participants were not randomized to the 4 groups, and while PCC patients formed the ST, TS, and TT groups, PCC staff members formed the SS one. This may explain the differences in socio-demographic characteristics among groups. For example, individuals in the SS group were younger, more educated, and most of them were currently working. However, because the analyses were conducted within groups, we believe that these differences do not represent a major methodological concern. Besides, ST and TS groups (those directly related to procedural validity testing) were more similar as they only differed in terms of mean age and years of formal education.
Second, differences emerging from age, educational level, or gender variations were not considered when comparing telephone- and self-administered PHQ-9 because of the small sample size for each socio-demographic category. It could be possible that telephone- and self-administered PHQ-9 could show more or less an agreement according to such differences, and therefore the telephone-administration procedure could be less valid for certain populations. For example, it was stated that telephone responses from older people might be different from face-to-face assessments for the General Health Questionnaire.21
Third, as indicated by the low mean PHQ-9 scores (between 3.61 and 6.19), our sample included only a few participants with high levels of depression severity. Therefore, our results might not be representative for patient samples with higher levels of depression severity or samples with a wider range of depressive severity. This potential “bottom effect” may limit generalization of our findings.
Fourth, the brief time interval considered between assessments could have favored recall of initial answers. However, these conditions could also represent an advantage, as a brief time interval could reduce possible changes within subjects.
Fifth, during telephone interview, it is less easy to ensure privacy, because the interviewer does not know who else may be present, possibly inhibiting disclosure by the participant. The importance of developing a rapport between the interviewer and the participant before gathering sensitive information has been pointed out.22
This could be less easy to do over the telephone. In the present study, the PCP requested their patients to participate as a way of favoring confidence. In any case, we do not know the extent to which this was achieved.
Finally, during telephone interview, we have to be aware that we could be selectively excluding participants not having a telephone and therefore biasing our results.
Future research concerning agreement between responses to self-administered and telephone-administered PHQ-9 or other scales could attempt to explore possible differences emerging from gender, educational level, or age. Additionally, reassuring the respondent regarding privacy as much as possible may favor the validity of the assessment.