As predicted, we found that a single trait of autobiographical memory specificity seems to account for responses on the AMT. Across three samples, a one-factor model was characterised by a good fit, and this fit did not differ significantly from the fits of models that took cue word characteristics into account. Because all of the items in the AMT were indicative of a single construct, the best measure of autobiographical memory specificity is one that incorporates information from all of the items. Investigators should therefore be cautioned that interpretation of so-called valence effects might not be due to the emotional associations of words, but rather due to idiosyncratic psychometric properties of items.
In addition, IRT analyses demonstrated that when the traditional instructions are used, the AMT measures autobiographical memory specificity most precisely for individuals who are low on this trait. When minimal instructions were given, the AMT elicited more overgeneral memories and was most precise for people of average ability on the latent trait of memory specificity. Thus the AMT, as it is traditionally implemented, may have limited utility for discriminating individuals across a wide range of autobiographical memory specificity ability.
As described below, our IRT analyses show that item characteristics can vary within a cue word type, even if the words have been equated in the usual ways (e.g., matching on length and usage frequency). This finding is not surprising, and it has implications for examining valence effects. Researchers who find a main effect of valence may need to consider the possibility that their findings are due to idiosyncrasies in a particular pool of words. If an easy (in an IRT sense) set of negative cue words were compared to a difficult set of positive cue words, then one might find a main effect of word type. Consequently, it would be tempting to conclude that the negative content of the cue words elicited more specific memories. However, the possibility that the findings were due to differential cue word characteristics, rather than to the emotional content of the word, could not be ruled out.
To make a strong case for valence effects, researchers would need to examine a large comprehensive corpus of words in a large sample of participants. IRT could be employed to examine characteristics of individual cues. Different sets of words (e.g., positive and negative words) could then be equated on item-slopes and thresholds, which subsequently could provide a fair test of valence effects. Valence effects are inconsistent in the literature, so analyses that rule out alternative explanations, such as psychometric differences across positive and negative word sets, would be a valuable addition to the OGM literature.
For each item in the three datasets using the traditional AMT instructions, the threshold that separated specific and non-specific memories was below zero on the latent trait of autobiographical memory specificity. Thus, each cue word was more likely than not to elicit a specific memory among people of average ability. This is also true for people of below average ability, to a point, because the highest threshold obtained in the samples that received the traditional instructions was −.33 (see –.). Thus individuals who are low on autobiographical memory specificity may perform well on an AMT that uses the traditional instructions. Indeed, Raes et al. (2007)
noted the low frequency of overgeneral memories retrieved by non-clinical samples with the traditional AMT. Consequently, the traditional AMT may be less apt to find differences among samples with better memory functioning. Our IRT results confirm this observation, and offer suggestions for future research with the AMT.
The results of IRT analyses would be helpful in piloting versions of the AMT for use with various samples. As one example, the threshold for specific memories in response to happy in the YEP dataset was low at a value of −1.40 on the latent trait (see ). Therefore this item may be too easy (in an IRT sense) to help distinguish between individuals who are high and low on autobiographical memory specificity. In contrast, calm had a higher threshold of −.60. Although it is below zero, it may be a more useful item in AMT research because it should elicit more diverse responses. Examination of item characteristics curves, such as in and , can help to select items that are likely to elicit specific response types that may be of special interest in depression research, such as categoric memories.
The IRT analyses also have implications for how instructions are worded. As seen in , the threshold that separates specific and non-specific memories is larger for the minimal instructions group compared to the traditional instructions group. Thus, the minimal instructions version of the AMT may be more useful in samples expected to have average memory functioning, such as college students, whereas the traditional instructions version may be useful in samples with poor memory functioning, such as severely depressed participants. This difference helps elucidate why the AMT is consistently related to depression in clinical samples, but less so in high functioning samples (e.g., Debeer et al., 2005
; Raes et al., 2004
). However, because the minimal instructions AMT does not explicitly require participants to generate a specific memory, it is possible that some respondents may exhibit a more overgeneral style of reporting memories on this test, even though they would be able to provide a specific memory if prompted to do so. Thus, the two versions of the AMT may be measuring different constructs. Within-participants research using both versions of the AMT would be useful in answering this question.
A low rate of non-specific memories was obtained in our three samples of students. Like other investigators (e.g., Brittlebank et al., 1993
; Scott, Williams, Brittlebank, & Ferrier, 1995
), we were forced to collapse certain response categories in order to analyse our data. A second limitation is that our word sets were limited. Although we failed to find differences across cue word types, we cannot rule out the possibility that our findings are specific to the small corpus of words that we examined. It is possible that other sets of positive, negative, or other words would yield different results. A third limitation of this study is that depression was not examined as it relates to the structure of the AMT. The YEP dataset contained diagnostic and self-report measures of depression (not used in the current paper), but there were far too few cases of major depressive disorder (n
=58) to factor analyse them as a separate group. IRT analyses yield statistics that are relevant to the properties of a test, but it is unknown whether IRT analyses would yield different results in a clinical sample. Our results may not generalise to clinical samples, but IRT analyses using clinical samples would be a rich direction for future research. A different factor structure for positive and negative AMT cue words might be found in a clinical sample. Finally, our three datasets contained adolescents and young adults. Future studies should extend this research to adult samples.
Some investigators have examined the effects of cue word characteristics other than valence on autobiographical memory specificity. For example, Dalgleish et al. (2003)
hypothesised that the effects of cue words on the AMT are due more to word meaning than to valence. Recent studies have begun to investigate this idea. At least three studies have shown that cues that are relevant to an individual’s concerns and self-concept are more likely to elicit non-specific memories than cues without such self-relevance (Barnhofer, Crane, Spinhoven, & Williams, 2007
; Crane, Barnhofer, & Williams, 2007
; Spinhoven, Bockting, Kremers, Schene, & Williams, 2007
). Thus, characteristics of AMT cues may interact with individual difference variables (e.g., rumination) or idiosyncratic concerns. Future studies should examine whether certain cues perform differently in certain subgroups. In the IRT literature, these analyses are referred to as analyses of differential item functioning
(Embretson & Reise, 2000
). These analyses may yield interesting clinical insights, but such studies would require a range of cues and adequate variation in the traits being studied.
In addition to research on personal relevance, Williams et al. (1999)
showed that highly imageable cue words elicit specific memories more often than less imageable cue words. Although studies of mean differences are informative, IRT allows for the examination of specific words within a cue type. Thus IRT can help to explore the effects of individual cue words, and to plan autobiographical memory studies by selecting discriminating cue words with thresholds that are appropriate to the population being studied.
Another possibility for future research would be to conduct a large-scale investigation of cue words to find cues that are maximally informative for different populations. Different sets of cue words could be identified that are discriminating for individuals in the range of the autobiographical memory specificity trait that is most relevant for a particular population, such as college students or patients with major depression. Such a project could establish a common word set to be used in studies with similar individuals, thus facilitating comparison across different studies.
Summary and conclusions
To our knowledge, we are the first investigators to use CFA and IRT to examine the psychometric properties of the AMT. Our findings indicate that a one-factor model of autobiographical memory specificity provides a good conceptualisation of AMT performance, at least in non-clinical samples. This result is consistent with the finding that responses to positive and negative cue words are highly intercorrelated (van Vreeswijk & de Wilde, 2004
). Additionally, a one-factor model of the AMT is congruent with the notion that overgenerality develops as an overall response style over time, as proposed by the functional avoidance hypothesis (Williams et al., 2007
The findings of the current study also demonstrate that the AMT is not maximally informative for individuals at all levels of autobiographical memory specificity. Rather, certain characteristics of the AMT, such as particular cue words and the nature of the instructions, may influence the types of responses obtained. These results offer a number of suggestions for ways to modify the AMT to obtain the most relevant outcomes for a particular study. Future AMT studies may be well served by examining IRT parameters for individual cues.