The findings overall indicate that various personality features of movie characters can be rated reliably by relatively untrained raters with only very basic knowledge about the nature of personality and personality disorders. It was found that the perceived difficulty of rating personality disorder criteria declined somewhat over the movies.
The use of specific criteria resulted in a better discrimination than the use of global rating scales, but even with global rating scales, agreement could be detected beyond chance, accounting for more the three-quarters of the observed variance.
In comparison with real-life patients, all raters were able to observe exactly the same behaviour for each target, yet the amount of situations and contexts that raters could observe were far more varied than would usually be available. In real-life settings, various clinicians will often have to observe patients within a single limited setting (e.g., counselling, therapy, group therapy, milieu therapy), and that setting may not even be the same for different clinicians (e.g., observing the patient on different days). However, rating movie characters also differ from another common situation, in which a co-interviewer rates criteria based on a semi-structured clinical interview. In that situation, inter-rater agreement is expectedly much higher than what was observed in this study. Thus, a strength of this study is to show that the inference of personality disorder traits can be done, even in the absence of clear questions about the specific criteria.
The rating of movie characters may serve as illustrations of many points in the assessment of personality disorders.
First of all, raters get the point that some people, whilst clearly disturbed and stressed as a result of life-long patterns of maladaptive behaviour, such as Aileen Wuornos and Sarah Morton, do not fit any single prototypical personality disorder profile [26
Thirdly, raters get an impression of the relative precision (and lack thereof) of ratings of personality. Some students may perceive personality disorder diagnoses as nearly arbitrary or prejudice-driven labels, and the experience that agreement can be achieved may help them understand that there is something more than a label to personality disorders. Others with an unrealistic faith in the diagnostic system may experience that rating personality disorders is more difficult, and that although agreement is substantial, sometimes behaviours and reactions in the same person is experienced differently by different raters, even when using the same diagnostic system to rate the behaviours.
Several limitations to this study must be acknowledged. The four movies selected did not represent a wide range of different personality traits and personality disorders. For instance, the variance of the TIPI Openness to Experience scale is nearly nil, and there are no high scorers on avoidant or dependent personality disorder, either by the rating scales or by the criteria. Therefore, the level of agreement that could be reached for these traits and disorders is limited by the limited range they represent. Thus, although it is tempting to suggest that differences in agreement between features are due to difference between the degree to which behaviours are easily observable (e.g., conscientiousness, extraversion, Cluster B personality disorder feature), an equally justifiable interpretation is that the amount of variance for some personality traits (e.g., openness to experience, paranoid, schizoid, avoidant and dependent personality disorder) was simply too small to assure reasonable agreement.
Also, the difficulty in rating various areas must be interpreted with caution, given that all raters saw the films in the same order. Had the order been varied across raters, changes in perceived difficulty over time would have been much more reliable.
Thirdly, we do not know whether these raters were better or worse than the average psychology student at rating personality features. What we do know is that they did not consider themselves experts. The absence of a gold standard for the ratings makes it difficult to conclude anything about the movie characters, beyond simply stating that 8 independent raters reached similar results when rating these 4 movies.
Another limitation is the use of relatively untrained psychology students. With regard to agreement, however, it seems likely that experts could do little better in terms of cluster B personality disorder features, where it is unlikely that agreement can be better than ICC ranging from 0.46–0.54 for global rating scales, and ICC ranging from 0.75–0.89 for criteria counts.
A next step in evaluating the use of fiction as a tool for practicing assessment of personality disorder would be to conduct a study of the inter-rater agreement on real patients before and after practice with fiction, and identify factors in movies that foster the learning experience of rating personality features in movies. For instance, is it more helpful to assess ambiguous characters, easily rated characters, or a mixture of ambiguous or easily rated characters?