Inter-rater reliability refers to the consistency with which a particular assessment method comes to the same conclusions when applied by different raters to the same body of information. Reliability is particularly important when there is no ‘gold standard’ as is the case in making psychiatric diagnoses. Where the result of rater judgement is a classification, two classes of measure of agreement have been used – those based only on the percentage of subjects receiving the same classification from the raters (‘percentage agreement’) and those incorporating also a ‘correction for chance agreement’. This study reports percentage agreement, Cohen’s kappa (the most widely used of the chance-corrected measures) and variants of this statistic incorporating adjustments for ‘bias’ and ‘prevalence’. Proportion (percentage) agreement varies between 0 and 1 (0 and 100), and is readily interpretable. Kappa can take negative values, but in practice mostly also lies between 0 and 1. Its interpretation is less straightforward. Byrt et al. (1993)
showed that kappa may be expressed as a function of percentage agreement, the disparity between ‘yeses’ and ‘noes’ among the agreements (‘prevalence’) and the disparity in the proportions of yeses between the raters (‘bias’). Departure from a 50/50 split of yeses and noes among the agreements lowers kappa, while departure from equality of proportions of yeses between raters increases kappa. It is wise therefore to keep prevalence and bias in mind when interpreting kappa.
In considering the inter-rater reliability of psychiatric diagnoses in persons with ID, it is salient to consider the same in persons without communication or cognitive difficulties. Presumably, this provides the upper limit of the inter-rater reliability achievable with persons with ID. Here, the present authors consider the data available relating to the diagnoses of psychosis and depression.
The DSM-IV Field Trial for Schizophrenia and Related Psychotic Disorders (American Psychiatric Association, 1992
) examined the reliability and concordance of three alternative sets of options for diagnosing DSM-IV psychotic disorders plus the criteria from DSM-III, DSM-III-R, and ICD-10. In the international field trials for ICD-10, inter-rater agreement based on diagnoses at the ‘two character’ group level, kappa values of 0.82 for schizophrenic disorders and 0.66 for depressive episode were reported (Sartorius et al. 1993
). However subsequent studies using less structured formats have reported lower kappa-values. Way et al. (1998)
videotaped 30 emergency department psychiatric assessments and these were then re-rated by eight different psychiatrists. For psychosis a kappa-value of 0.64 was reported and for depression a kappa of 0.48.
In a review examining the accuracy of the clinical examination for diagnosing clinical depression Williams et al. (2002)
found seven studies using the Structured Clinical Interview for DSM Diagnoses (Spitzer et al. 1979
) in which inter-rater reliability for major depression was evaluated. Study design ranged from multiple clinicians viewing a videotaped interview, to paired clinicians conducting sequential interviews, with training varying from psychology trainees to experts in the mood disorders field. Kappa values ranged from 0.64 to 0.93. They found a further seven studies evaluating inter-rater reliability of DSM-IV diagnoses using non-standardized interviews and thus more closely simulating clinical practice as in the current study. With the exception of one study using a videotaped interview, the study designs involved paired, generally blinded, interviewers conducting joint or sequential interviews. Here, the kappa-values ranged from 0.55 to 0.74.
In a smaller study, Miller (2001)
compared the inter-rater reliability of structured versus unstructured interviews and found that the traditional, unstructured diagnostic assessment gave kappa-values from 0.24 to 0.43, whereas the inter-rater agreement using the SCIDCV, Computer Assisted Diagnostic Interview, gave a kappa of 0.75. A number of other studies (Keller et al. 1995
; Roy et al. 1997
; Shear et al. 2000
; Simpson et al. 2002
) support the finding that in individuals without ID, the inter-rater agreement for diagnosis of psychosis and depression gave kappas in the range 0.6 to 0.8, with structured assessments having higher values than unstructured.
As previously noted there has been little research on the inter-rater reliability of either the DSM or ICD diagnostic systems when used to assess persons with an ID and there continues to be a relative lack of suitable alternative techniques for detection and diagnosis of psychiatric morbidity in this population. Screening instruments, which tend to lack the depth required for accurate diagnosis include the Psychopathology Instrument for Mentally Retarded Adults (PIMRA) (Matson et al. 1984
), the Reiss screen (see Sturmey et al. 1995
), and the Diagnostic Assessment for the Severely Mental Handicapped (DASH) scale (Matson et al. 1991
). Moss & Goldberg (Moss et al. 1993
) developed the Psychiatric Assessment Schedule for Adults with a Developmental Disability (PAS-ADD) as a semi-structured interview for use in persons with an ID and is based on diagnostic criteria utilised in ICD and DSM. From the PAS-ADD item set, 13 possible symptom syndromes, such as ‘depressed mood’ or ‘situational anxiety’ can be generated by the CATEGO computer algorithm. While a detailed report of the psychometric properties of these measures is outside the scope of this report, none of these has documented sufficient validity to warrant its use as a structured assessment protocol for the diagnosis of psychosis or depression in people with ID.
We sought to add to this information by measuring the extent to which experienced clinicians agreed about a diagnosis of depression or psychosis in individuals with ID.