In our analysis, the ISP-D was found to be a reliable and valid tool for Internet users. The reliability and validity of data collected through the Internet have been questioned because the data were based on information provided directly by the patient's experience and data gathering were administered remotely. In the study, the overall test-retest reliability for ISP-D was excellent within a 2-week interval. Regarding the test-retest reliability for each diagnosis within a 2-week interval, MDD and SSD were found to be excellent. Only the reliability of MinD was low, which may be related to the relatively fewer number of cases of MinD. It is possible that MinD was not a stable diagnosis or that it may have been a stage of another depressive disorder. The high reliability of SSD implied that its diagnosis is worthy of further studies of validity and psychopathology. As the test-retest interval increased to greater than 2 weeks, the test-retest reliabilities of MDD and SSD decreased. Because MinD has similar symptomatic criteria to MDD and SSD, the good reliabilities of MDD and SSD lessened the possibility of problems with the questionnaire design or the media bias of the Internet.
The analysis of test-retest reliability for each depressive symptom showed that the reliabilities for "diminished interest or pleasure", "fatigue or loss of energy", and "thoughts of death" were excellent and that those for other symptoms were fair within a 2-week interval. The reliabilities were poor only for "depressed mood" and "thoughts of death" with a time interval between 2 and 4 weeks, and were poor only for "decreased appetite", "activity disturbance", and "fatigue or loss of energy" with a time interval greater than 4 weeks. The test-retest reliability with a time interval of greater than 2 weeks may be affected by the fluctuating course of depressive disorders themselves. According to Judd et al. [35
]., the symptomatic course of MDD is dynamic and changeable, and MDD, MinD, and SSD symptom levels commonly alternate over time in the same patients as a symptomatic continuum of illness activity of a single clinical disease. Cuijpers and Smit found that the incidence of MDD in subjects with SSD is larger than in subjects without SSD [36
]. Thus in spite of a possible fluctuating course, the test-retest reliability of each depressive symptom in the ISP-D was fair to excellent within a 2-week interval.
For the validity study, the M.I.N.I. was applied in the clinical interview. The concordance between the M.I.N.I. and SCID-P diagnoses was demonstrated to be very good (sensitivity 96%, specificity 88%, PPV 87%, and NPV 97%) [17
]. κ values of inter-rater reliability for the M.I.N.I. and for test-retest reliability of MDD were previously reported to be 1.00 and 0.87, respectively [17
], indicating that the M.I.N.I. is a good criterion for validity. As of yet, only one study has previously attempted to evaluate the validity of a Web-based instrument for depression diagnosis [25
]. In that study Farvolden et al. showed that the WB-DAT had sensitivity, specificity, PPV, and NPV of 79%, 89%, 75%, and 93%, respectively, relative to SCID-I/P diagnosis for MDD [25
]. Our result is similar to that of Farvolden et al. and confirms that the Internet may be a valid tool for the assessment of depression. Farvolden et al.'s study differed from the present study in that the participants in their study were recruited from several clinical research projects and the test was performed in a clinical environment. By contrast, our study participants were recruited remotely, directly from the Internet, and the ISP-D was used to evaluate depressive disorders with differing severities including MDD, MinD, and SSD. All the other validity studies for depression have been conducted in writing (pen and paper tests). One such study by Haringsma et al. [37
] used the same assessment tool as that used in our study (the M.I.N.I.) to assess the criterion validity of the CES-D in a sample of self-referred seniors. They found that with the optimal cut-off score of 25 for MDD, sensitivity was 85%, specificity 64%, and PPN 63%. In Bagby's review article [38
], the mean sensitivity, specificity, PPV, and NPV of the Hamilton Depression Rating Scale from 7 studies were found to be 76%, 91%, 77%, and 92%, respectively. In Nyklicek's study [39
] of the Edinburgh Depression Scale (EDS) with 951 randomly selected women of peri-menopausal age, test sensitivity, specificity, PPV, and NPV were 58.8%, 95.0%, 49.1%, and 91.7%, respectively, with a cut-off score of 12. The results of our study demonstrated that screening for depression via the Internet may have similar validity as screening tests conducted in writing, most of which have been reported to be highly sensitive and specific [40
Despite the satisfying results of the present examination of the reliability and validity of the ISP-D, the study has several limitations. The first limitation of our study is the potential for a self-selected effect. The high prevalence of MDD, MinD, and SSD in the study may be due to a self-selected effect of the participants. That is, people who had depressive symptoms may have been more motivated to participate in the study than non-depressed persons in the general population. On the other hand, many severely depressed patients may not have access to the Internet. The self-selected effect was apparent in the demographic characteristics of our study sample, with the majority of participants being young, single, well educated women. This sampling bias limits our ability to generalize our findings to the general population. The second limitation is the relatively low response rate in the test-retest reliability study, a characteristic inherent to Internet studies [41
]. Importantly, our statistical analysis showed that the sociodemographic characteristics did not affect the agreement of the test-retest reliability or the validity of the ISP-D, despite these limitations. The third limitation is that the kappa statistics for each diagnosis was recoded and recalculated. The interpretation should be cautious because of possible recoding bias.
Because the mean Internet interview time for the ISP-D was relatively short, and participants did not need to go to a clinic, the ISP-D may be useful as an auxiliary tool for screening or follow-up of depression as an alternative to the face-to-face interview. There is little additional cost to online interviews beyond maintenance of the system on the server. Computer-aided interviewing gives standardized information about a patient's psychopathology and diagnosis. It allows patients to work at their own pace and is available whenever a computer terminal is available. Furthermore, results can be scored and presented to the patients and/or clinicians immediately. Indeed the standardized manner of administration and scoring of computer-administered rating scales may actually improve reliability and insure greater completeness of the information gathered [18
The ISP-D can generate personalized reports, which summarize each individual's responses and possible diagnostic categories. The automatically generated final report was designed to be printed and shared with a health care professional. In addition, links to related Internet articles and information about resources such as clinics and hospitals, virtual clinics, and online groups can be provided when participants have positive findings.