Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Neuropsychology. Author manuscript; available in PMC 2010 September 3.
Published in final edited form as:
PMCID: PMC2933077

Data-driven methodology illustrating mechanisms underlying word list recall: Applications to clinical research



Word list learning tasks such as the California Verbal Learning Test (CVLT; Delis, Kramer, Kaplan & Ober, 1987) are widely used to investigate recall strategies. Participants who recall the most words generally employ semantic techniques, whereas those with poor recall- such as patients with schizophrenia- rely on serial techniques. However, these conclusions are based on formulas that assume categories reflect semantic associations, bind strategy to overall performance, and neglect strategy changes over the five trials. Therefore, we derived novel measures- independent of recall performance- to compute strategies across trials and identify whether diagnosis predicts recall strategy.


Participants were included based upon performance on the CVLT, namely total words recalled over the five trials. From a large sample, the 50 highest and 50 lowest performers among healthy volunteers (n=100) and also among patients with schizophrenia (n=100) were selected. Novel measures of recall and transition probability were calculated and analyzed by permutation tests.


Recall patterns and strategies of patients resembled controls with similar performance levels: Regardless of diagnosis, low performers were more likely to recall the first two and last four items from the list, and high performers increased engagement of semantic-based transitions across the five trials, while low performers did not.


Cognitive strategy must be considered independently of overall performance before attributing poor performance to degraded learning processes. Our results demonstrate the importance of departing from global scoring techniques, especially when working with clinical populations such as patients with schizophrenia for whom episodic memory deficits are a hallmark feature.


The California Verbal Learning Task (CVLT) is a widely used neuropsychological tool for evaluating verbal memory strategies (Delis et al., 1987). A list of 16 words comprising four semantic categories - spices, tools, fruits and clothes, each containing four words - is presented orally then freely recalled; this occurs five consecutive times per administration. The performance metrics include the number of words recalled during the first recall, words recalled during the fifth (final) trial, the improvement between the first and fifth trial, and overall recall across all five trials. The mnemonic strategies employed are assayed by semantic and serial “clustering scores,” with semantic clustering being operationalized as recall of categorically similar words one after another, and serial clustering defined as the sequential recall of words according to the presentation order.

In general, people who recall the most words employ a semantic clustering technique, whereas those who display poor recall tend to rely on serial techniques (Delis, Freeland, Kramer, & Kaplan, 1988). Indeed, among healthy individuals, the recall strategy employed predicts the number of words successfully recalled in the final trial on the CVLT, more so than performance on the first trial (Paulsen et al., 1995). An underlying assumption is that the semantic strategy during recall reflects to some extent the underlying semantic organization in memory, and indeed the CVLT was designed based upon the theory that semantic strategy is most likely to lead to encoding in long-term memory (Delis et al., 1987). That semantic recall techniques lead to better recall fits well with notions that semantic information is “deeply” encoded through stronger, longer-lasting bonds because of associations with pre-established information in long-term memory, while serial information is “shallowly” encoded in short-term memory because it is only related to the current task (Craik & Tulving, 1975).

While studies have emphasized the superiority of semantic mnemonic strategies, little has been said about the interaction of semantic and serial strategies (Delis et al., 1987). Moreover, opposing serial and semantic clustering patterns (one increasing when the other decreases) have been considered to reflect “the incompatibility of serial learning and semantic organizations strategies” (p.381; Brébion, David, Jones, & Pilowsky, 2004) which perpetuates ideas that verbal memory strategies compete rather than interact and support each other. In fact, classic studies of word list learning show that participants recall the most words when the order of the presented list of words is constant rather than varied for each trial, which suggests that the temporal order of words is an underlying aid in free recall (Jung & Skeebo, 1967). While an increasingly semantically-based strategy can reduce reliance on serial strategy, careful examination of the two approaches reveals that they can operate in concert. In a longitudinal study, healthy participants employed a less serially-based recall strategy over time while increasing semantically-based strategy, but patients with schizophrenia increasingly (though not significantly) relied on both serial and semantic recall strategies (Roofeh et al., 2006).

In order to better understand mechanisms of the strategies during free list recall, it is useful to examine verbal memory in a population for whom impairments are a hallmark feature, such as patients with schizophrenia who display deficits on a wide range of episodic verbal memory tasks (Paulsen et al., 1995). A recent study comparing verbal and visual episodic memory - using the Wechsler Memory Scale-Revised Logical Memory (story) and Visual Reproduction tasks (Wechsler, 1987) - found that patients with schizophrenia performed poorly on measures of verbal and visual learning, but their unaffected siblings performed poorly only on measures of verbal learning, therefore suggesting that verbal processing deficits, rather than memory deficits per se, represent a cognitive phenotype. (Skelley, Goldberg, Egan, Weinberger, & Gold, 2008). Attempts to characterize the nature of verbal memory impairment that employ the CVLT report that strategy predicts performance for patients with schizophrenia. As with healthy participants, recall strategy predicted the number of words successfully recalled in the final trial, more so than performance on the first trial (Iddon, McKenna, Sahakian, & Robbins, 1998). However, while both patients and healthy control participants recall an increasing number of words across multiple trials and maintain their respective performance levels in delayed recall (Paulsen et al., 1995), patients employ a less semantically-based strategy, thus recalling fewer words overall (Delis et al., 1987; Hazlett et al., 2000; Hill, Beers, Kmiec, Keshavan, & Sweeney, 2004; Kareken, Moberg, & Gur, 1996; Roofeh et al., 2006). However, the distinction has not been made between whether all patients rely less on semantically-based strategy or there are fewer patients than controls who choose to employ semantically-based strategy. The former assumption fits well with notions that semantic deficits accompany schizophrenia, such as findings of large variance in semantic priming (Minzenberg, Ober, & Vinogradov, 2002; Pomarol-Clotet, Oh, Laws, & McKenna, 2008) or even “widened” category boundaries (Chen, Wilkins, & McKenna, 1994). However, other studies do not provide unequivocal support for the notion of semantic anomalies per se in schizophrenia (Elvevåg, Heit, Storms, & Goldberg, 2005; Elvevåg & Storms, 2003; Prescott, Newton, Mir, Woodruff, & Parks, 2006), and indeed suggest that - at the very least - some semantic knowledge is represented by patients as it is by healthy control participants (Cohen, Elvevåg, & Goldberg, 2005).

A study that parsed patients with schizophrenia into high-achievers, learners, and non-learners based on their performance on the first trial of the CVLT and improvement between the first and last trial scores (i.e., the learning slope) found that patients not only relied heavily on semantic clustering in recall, but that no matter their performance level, they simultaneously employed some degree of serial strategy supporting the idea that semantic and serial strategies interact (Vaskinn et al., 2008). Examination of our own CVLT data - a large data set from the CBDB/ NIMH Schizophrenia Sibling study (Egan et al., 2000) - revealed that high performers, regardless of their diagnostic group (patient or control), seemed to employ both serial and semantic strategies. In addition, strategies appeared to change as a function of learning trial. However, this detail was not captured by the current CVLT scoring method (we describe this in detail below), which prompted us to develop a different approach to analyzing recall that considered recall strategies simultaneously and that would relate these strategies to the changes in performance as a function of learning. Such an approach promises to provide a framework with which to re-examine both the underlying cognitive mechanisms of poor verbal recall in general, but also, crucially, the neurobiology as determined by both functional brain networks and genetic functional polymorphisms. Clearly such a tool would be of enormous utility for understanding individual differences in modulations of the verbal recall process by both illness and genes. Thus, we sought to examine CVLT recall and learning strategies in high and low performing healthy people, as well as in a group for which problems in the verbal memory domain are a hallmark feature, namely patients with schizophrenia.

The Original Formulas of Strategy in Free Recall of Word Lists in the CVLT

The original CVLT scoring rubric was, indeed, an improvement upon global scoring methods that only counted the overall number of words recalled since it developed sub-scores intended to evaluate strategy use (Delis et al., 1987). However, it also established rigid, non-interactive measures. For each participant, the CVLT generated a semantic and a serial Clustering Index (CI). In a given trial, serial clustering was quantified as the observed serial value (OBSser) divided by the expected serial value:


Here, OBSser is the observed number of times two words are recalled in the order they appear on the presentation list and nj is the number of words correctly recalled in trial j. The formula yields a minimum score of 0 and a maximum score of 24.27. Notably, the formula includes the participant’s overall performance score in the denominator. Therefore, serial strategy becomes an element within performance, rather than a factor that might influence one’s performance.

The semantic CI was calculated from a similar formula, with comparable problems:


where OBSsem is the number of clusters of words from the same category within the recall list; for example, the OBSsem for “tangerine, grape, drill, plier” is 2, while for “tangerine, grape, apricot, drill” it is 1. In the denominator, i is the category, ni is the number of words correctly recalled from category i during a particular trial, and nj is the number of words recalled for that trial. A participant therefore can score from 0 to 4, with a higher score signifying greater semantic clustering. Again, the participant’s overall performance contributes substantially to the CI. Furthermore, while repetitions and intrusions contribute to the number of words recalled in each trial, they are not factored into the observed semantic clustering score. Participants' scores are lowered for repetitions and intrusions (both of which, incidentally, the instructions do not advise against). We consider the repetition of words to be a particular retrieval strategy and surmise that this omission in the CVLT formulas may lower the semantic scores of low performing participants. Full descriptions of the formulas can found in the CVLT manual (Delis et al., 1987).

These formulas were recently revised to address dependence on recall performance, disregard for the presentation list length and the method of counting semantic groups (Stricker, Brown, Wixted, Baldo, & Delis, 2002). The revised formulas subtract the expected from the observed scores. Thus, the expected score, including the participant’s overall recall score, has less weight on the strategy scores. In addition, the expected scores were modified to contain the number of words (OBSser) or number of categories (OBSsem) in the presented word list. The change, while mathematically small, is important conceptually. Basing the expected clustering scores on the words recalled rather than the words presented assumes that organizational strategies are executed after retrieval; basing the expected scores on the presentation list deems all words accessible to the same extent. While the revisions addressed these problems, overall performance scores remain central to the CVLT strategy formulas and continue to effect strategy scores.

An additional problem addressed in the formula revisions was that the original observed semantic calculation awarded one point to a cluster of two words from the same category as well as to a cluster of three or four words from the same category (e.g. “drill, pliers” and “tangerines, grapes, apricots”, respectively). Thus, disproportionate categorical groups were considered equal. As participants recall single words, not total categories, and it is intrinsically harder to remember a greater number of words, semantic clusters must also be calculated from single word relationships. This problem was resolved, in the revised semantic index, by awarding one point every time a word is recalled immediately after a word in the same category (from the example above, “tangerines, grapes” and “grapes, apricots” each receive one point). Not addressed in the revisions are the inclusion of repetitions and intrusions in overall performance while excluding them from the observed semantic clustering score.

Based upon the changes made to the expected scores as related above, the revised measures are referred to as List-Based Clustering Indices (LBC). As before, measures are calculated for each trial and the average of the five values is the ratio used for analyses.


where OBSser is calculated as in the original serial clustering formula and nj is again the number of words correctly recalled in trial j. This yields a score range of −0.9375 to 14.0625, with a greater score when one recalls a large number of words closer to the presentation list order.

The semantic LBC is calculated from a similar formula where the range is −3 to 9:


Improvements notwithstanding, the new formulas remain problematic. Since serial and semantic ratios are averages of the raw LBC scores for each of the trials, the progression of the scores over time is not considered. Consequently, participants receive performance-based scores that fail to capture adaptations of strategy over time and thus conceal critical clues about verbal learning processes and importantly result in conceptualizations of memory as unchanging over time. Additionally, the scoring system does not allow the investigation of whether serial strategies in initial trials result in more extensive semantic strategies and better recall in later trials. Essentially, the conventional CVLT scoring treats the retrieval process as driven by mutually exclusive recall strategies, namely semantic and serial clustering. Results indicating that serial recall is minimal in high performers could be taken as suggesting that serial strategies - as defined by traditional CVLT formulas - are not useful and should be avoided. However, a constant list-order across trials leads to higher scores, and categories occurring earlier in a list are remembered more often, indicative that serial strategies in fact underlie semantic strategies (Klein, Addis, & Kahana, 2005; Smith, D'Agnostino, & Reid, 1970; Waugh, 1961).

As operationally defined by the CVLT, the categories (spices, tools, clothing and fruits) contain concepts (e.g., food) that fall under the same superordinate class but have hitherto been assumed to be orthogonal. Crucially, if semantic scores do not reflect a participant’s actual grouping strategies, semantic strategy during recall is not being investigated. The CVLT semantic score does not allow for comparison between categories to verify that, within each category, the words are equally semantically related and recalled comparatively often. If particular sets of words share stronger semantic associations, they would contribute disproportionately more to semantic scores.1

A Novel Calculation of Strategy in Free Recall of Word Lists

In order to be agnostic about the possible orthogonalization of recall mechanisms, we adopted a data-driven approach of computing the similarity between participants’ recall orders across all learning trials. Our approach offers more specific information than past methods by calculating probabilities for every possible pair of words rather than for each trial or an average value across the five trials. This allows us to evaluate the effects of individual words, as well as the interaction of the predetermined categories and word order.

There is evidence that widely applicable, non-assumptive approaches are both consistent and sensitive (Ratcliff, Sheu, & Gronlund, 1992). A benefit of the current approach is that it can be modified easily to consider additional measures, such as whether nonadjacent words initiate semantic activation or otherwise affect recall order. However, the open ended nature of the technique also requires solid initial hypotheses. We developed the measures with the goal of investigating the interaction of multiple verbal memory strategies throughout the learning period and expected to find a distinct interaction of semantic and serial strategies. Additionally, we hypothesized that performance level would be as predictive of strategy as diagnosis. As in previous literature, we expected CVLT semantic scores would predict performance level regardless of group.



Participants (n=200) were selected out of approximately 450 patients with schizophrenia and 700 unrelated healthy volunteers from the NIMH Schizophrenia Sibling Study (Egan et al., 2000) based on their performance on the CVLT. We selected the 50 highest and 50 lowest performers among healthy volunteers (n=100) and also among patients (n=100). Performance was defined as total words recalled over the 5 trials in the CVLT, with the number of words recalled on trial 5 used as a secondary measure of performance when the number of words recalled over all trials was equal.

At the time of administration, the participants were recruited from the community and through the National Institutes of Health (NIH) Normal Volunteer Office. Exclusionary criteria followed that of Egan (2000). All participants were between 21 and 55 years of age. All were free from current alcohol or drug abuse or past dependence, had not suffered loss of consciousness, and did not have medical or neurological problems that might interfere with test performance including signs of dyslexia. Patients received a DSM-IV (American Psychiatric Association, 1994) schizophrenia spectrum diagnosis as determined by a psychiatrist using the complete Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I; First, Spitzer, Miriam, & Williams, 1997a) and the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II; First, Spitzer, Miriam, & Williams, 1997b). Healthy volunteers were free of both Axis I and Axis II diagnoses. All procedures were approved by the National Institute of Mental Health (NIMH) Institutional Review Board and informed consent was obtained from all participants prior to testing. Table 1 lists participants’ scores and demographics.

Table 1
Basic Demographics of the Participants: Mean IQ (WAIS-R), Age, Education, and number of CVLT words recalled. A perfect score across all five CVLT trials is 80 words. Standard deviations are noted in parentheses.


The CVLT (Delis et al., 1987) is a free recall task consisting of two 16 word lists. We used only data from List A. The list of words was comprised of four semantic categories - spices, tools, fruits and clothes - each containing four words. The list presentation order was fixed and consistent across all 5 administrations. The list was presented orally then freely recalled, five times.

Formulas to Calculate Strategy During Free Recall of Word Lists

First, for each of the five trials the recall probability is the chance a word would be recalled in the same position as it existed in a given series. In our case, the reference series was the list presentation order and probabilities were calculated using the formula:


Where N is the number of transitions, i and j are unique words, t is the trial and p is the position within the recalled words. Essentially, we calculated how many times one word was recalled in a position and divided by the number of times any other word occurred in that the same position.

Second, transition probabilities were computed. We refer to the values as “transition probabilities” because each value was the probability that, upon hearing a certain word, a participant would recall - or transition in memory to - a particular word. The probabilities were calculated as the number of times a given word follows another word divided by the number of times the first word was recalled in any position:


Again, N is the number of transitions, i and j are unique words and t is the trial. For each word, the sum of probabilities across all positions is 1. For any given word, a probability of 1/16 indicates that it was recalled at the rate of chance because there were 16 words on the list.

Third, we computed mean transition probabilities within each CVLT category to investigate the degree to which participants followed the four categories (tools, spices, fruits and clothing) operationally defined as orthogonal categories by the creators of the CVLT. The probabilities were calculated as the number of times a given word follows another word divided by the number of times the first word was recalled in any position:


Variables are defined as above. Additionally, c signifies a particular category. For each word, the sum of probabilities across all positions was one. The mean of all the within category probabilities was the categorical transition probability for any given trial.

Scores were probabilities ranging from 0 to 1. For every measure, we included repetitions since the second instance of a word retains all properties the first instance possessed. However, we did not include intrusions because the strength of semantic relatedness could have varied and because there were not sufficient instances of each of the words in our sample.

Data Collection

Our probability scores were organized as matrices, with six matrices for each of the three trials, one per trial plus a matrix containing the trial means. The recall probability matrix rows were labeled by the list of words in presentation order. The columns were numbered one through sixteen, signifying the position in which a word was recalled. An additional column was the total, or probability a word would be recalled at some point during the trial.

For the transition probability matrices, both the rows and columns were labeled by the list of words in presentation order. Cells where the row and column word were identical were left blank and not used in any calculations. The category based transition probability matrices shared the same values as the transition probability matrices, but the rows and columns were sorted so that words in the same category appeared next to each other. As seen in formula 7, four values were generated, each the mean of transition probabilities for all words in the category (data available upon request.)


Statistical comparisons were planned between the four performance and diagnostic based groups (low and high performing patients and low and high performing controls). While the two low performing groups (and, concordantly, the two high performing groups) are not equivalent in the total number of words they recalled over the five CVLT trials or in mean IQ, it was important to consider the extreme ends of cognitive ability in both diagnostic groups. The low performing controls and high performing patients had similar IQs, thus we directly compared these groups as a matched comparison to ensure any diagnostic-based differences were independent of IQ. We also compared all patients (high and low combined) to all controls to investigate the sensitivity of the current formula as compared to standard CVLT measures. Analyses were permutation tests (i.e. no assumptions were made) and were carried out to 1000 iterations. Corresponding individual cells were permuted for each possible group comparison (e.g. the recall probability of “apricots” in position two was compared between high and low performing patients) generating twelve matrices containing raw probabilities and another set containing the corresponding p-values. The results of the permutation tests, where more than one cell was significant, are presented as the largest p-value of the cells accompanied by the mean and standard deviation of the raw probability values for those cells. Some words showed peak recall probability rates which could not be attributed to list position. In these cases, post-hoc t-tests of single means compared the particular words to the other words in the middle of the list; the initial two and final four words of the presentation list were excluded because of high recall rates presumably due to primacy and recency effects of serial strategy. Repeated measures ANOVAs and Fisher's Least Significant Difference post-hoc comparisons were carried out to examine within category transition probabilities across the trials.

To establish a comparative basis to past research, we also calculated the revised CVLT semantic and serial clustering indices (Stricker et al., 2002) for our participants and analyzed them by factorial ANOVA using four groups rather than grouping only by diagnosis2.


Recall probabilities

First we examined the probability words were recalled as a function of the order in which they were presented. To evaluate the current formulas against the standard CVLT findings, we compared all patients with schizophrenia to all healthy control participants, regardless of performance level. There was little significant difference between the two groups. Controls recalled “vest” and “tangerine” more than patients (p<0.05; controls: M=0.063, SD=0.0085; patients: M=0.054, SD=0.013). Patients recalled the final two words of the presentation list order significantly more than controls (p<0.05; controls: M=0.072, SD=0.005; patients: M=0.094, SD=0.005).

In contrast, comparison of all high performers to all low performers yielded extensive differences. Low performers recalled the first two and last four words of the presentation list order significantly more often than the high performers (p<0.05; high: M=0.067, SD=0.004; low: M=0.093, SD=0.015). Excluding “paprika” and “tangerines,” high performers recalled the remainder of the words significantly more than low performers (p<0.005; high: M=0.059, SD=0.006; low: M=0.041, SD=0.012).

We then compared the four groups, as shown in Figure 1.

Figure 1
Recall probability, or the probability a word will be uttered by a participant during recall, across the five trials for each group. The sixteen individual points on each trial line sum to 1, with 0.06 (or 1/16) being the expected recall probability value. ...

High performing controls showed the most consistent rate of recall across the list. On the first trial, they most often recalled the initial and final words in the list (i.e., displayed primacy and recency effects), but for trials 2–5 recalled all words at a nearly perfect 1/16 rate (i.e., displayed no effects of serial ordering). In contrast, controls who comprised the low performing groups recalled the first and last two words from the presentation list significantly more often than high performing controls (p<0.001; high controls: M=0.065, SD=0.002; low controls: M=0.084, SD=0.009); the first and last words were recalled significantly more than the high performing patients (p<0.005; high patients: M=0.066, SD=0.002; low controls: M=0.095, SD=0.006). The influence of word presentation order partially subsided by the fifth trial as seen in Figure 1.

In the patient group, high performers looked similar to high performing controls in that there was an initial adherence to presentation order which dissipated throughout the trials. However, high performing patients relied on the presentation order more than high performing controls during recall, resulting in a significant difference between the first two and last two words on the list (p<0.005; high patients: M=0.072, SD=0.003; high controls: M=0.065, SD=0.002). Low performing patients had the greatest amount of presentation order based recall. They consistently recalled the initial and final words in the list at a rate far greater than 1/16, while recalling the middle words less than the other three groups. They recalled the three final words on the presentation list less than low performing controls (p<0.005; low controls: M=0.11, SD=0.015; low patients: M=0.08, SD=0.006); the first word and last four words were recalled less than high performing patients (p<0.0001; high patients: M=0.07, SD=0.003; low patients: M=0.10, SD=0.016); the initial two and last four words were recalled less than high performing controls (p<0.0001; high controls: M=0.06, SD=0.002; low patients: M=0.10, SD=0.016).

It is noteworthy that in all four groups we observed an exceptionally high recall probability of one word, namely “paprika.” This “paprika effect” was significant across all trials for both high performing groups (patients: t(14)=−2 87, p<0.05; controls: t(14)=−3.46, p<0.005). When considering only those words unaffected by primacy and recency effects (of which the low performing groups had greater effects), “paprika” was recalled more often than the remaining words on the list (p<0.001 for all four groups). The potency of one word illustrates the importance of screening words in the initial design of such word lists as well as the sensitivity of our approach. This word is not in many widely used databases assessing frequency (Wilson, 1988) besides the Kucera-Francis written frequency, which suggests it is unusual in some respects. Literature has shown that low frequency increases the probability a word will be remembered when non-serial recall is employed, such as in the high performing (Hulme et al., 1997).

Transition probabilities

Second we examined transition probabilities as a function of word presentation order in all 5 trials. All four groups employed semantically-based strategies to some degree (see Figure 2).

Figure 2
All four groups were more likely to recall two semantically related words in a row than they were to recall two unrelated words in a row. However, the degree to which semantic strategy is employed varied, with high performing controls showing the greatest ...

High performing controls recalled words from the same category (e.g. “apricots, grapes”) one after another significantly more often than any of the other three groups (as compared to high performing patients: t(94)= 3.57, p<0.001; low performing controls: t(94)= 9.84, p<0.001; low performing patients: t(94)= 12.17, p<0.001) . High performing patients employed semantic relationships significantly more often than the two low performing groups (as compared to low performing controls: t(94)= 5.50, p<0.001; low performing patients: t(94)= 7.56, p<0.001). Overall, low performing controls recalled semantically related words one after another significantly more than low performing patients (t(94)= 2.41, p<0.05).

However, considering the probabilities of specific words recalled, there was less distinction between the low performing patient and control groups as to the degree of semantic strategy. Whereas in the high performing groups controls clearly relied on semantic strategies significantly more than patients the combinations of “chisel, pliers” (p<0.05) and “drill, pliers” (p<0.05) occurred significantly more often in the low performing patient group than in low performing controls.

Within category mean transition probability

The four groups engaged significantly different levels of within category transitions from trial 1 to trial 5 (See Figure 3; F(3, 252)=3.26, p<0.05) High performers increased their reliance on semantic cues according to post-hoc comparisons (high performing controls: p<0.001; high performing patients: p<0.01), while low performers did not appear to utilize a semantically-based strategy. High performing controls increased their reliance on semantic strategies more quickly than did high performing patients and in trial 1 there was no difference between low and high performing patients. However, there was greater similarity between the performance based groups than there was among the patients and among the controls, with each high performing group having significantly more within category transitions than both low performing groups (as shown in Figure 3).

Figure 3
Both high performing groups significantly increased their dependencies on semantic strategies over time (F(3,252)=3.26, p<0.05). Participants in the two low performing groups had consistently low sequential recall of semantically related words ...

Other approaches to conceptualizing “semantic category”

We applied the traditional CVLT formulas to our dataset to ensure it was comparable to findings in the literature by looking at serial ratio and semantic ratio (averaged across the five trials) as dependent variables in a MANOVA. There were main effects of diagnosis (F(2, 196)=45.43, p<0.0001) and performance group (F(2, 196)=195.55, p<0.0001), as well as an interaction of diagnosis and performance group (F(2, 195)=22.19, p<0.0001). Thus we see that both diagnosis and performance influence strategy use. Raw semantic and serial scores for each strategy, group, and trial illustrate these findings in Figure 4.

Figure 4
Using the commercially available CVLT measures (revised: formulas 3 and 4 in text; Stricker et al., 2002), we see the patterns noted in past literature. That is, patients generally have lower semantic scores and higher serial scores than healthy volunteers. ...

Calculations were made using serial and semantic ratios, which are the standardized scores provided by CVLT grading programs calculated by deriving the mean of the five trial LBC ratios (formulas 3 and 4). Post-hoc t-tests showed significant differences of serial clustering. Of all controls, high performers used significantly less serial strategy than low performers (t(98)=5.19, p<0.001). High performing controls used less serial strategy than both the high and low performing patient groups (t(98)=3.34, p<0.005; t(98)=4.62, p<0.001). There were also differences between all groups on the semantic scores. Of all groups, high performing controls employed the greatest degree of semantic clustering, followed by high performing patients, low performing controls, and low performing patients. That semantic strategy can increase without a significant decrease in serial ratio (based on the results of our novel approach) is evidence of the independence of the strategies, a key factor our measures aim to elucidate.

Performance & Intelligence

Full scale current IQ, as assessed by the Wechsler Adult Intelligence Scale -- Revised (WAIS-R; Wechsler, 1981), not surprisingly, was significantly different between the groups. It has often been demonstrated that patients with schizophrenia present with a lower IQ than healthy control participants (Weickert et al., 2000). Indeed, IQ was associated with diagnosis and performance, but the two factors did not interact. Thus, not surprisingly, patients had a lower IQ than healthy control participants (patients = 91.62 (SD=11.93); healthy control participants = 107.83 (SD=11.62); (F(1, 196)=90.44, p<0.001)) and low performers (irrespective of diagnosis) had a lower IQ than high performers (low performers = 93.17 (SD= 13.46); high performers = 106.28 (SD=11.93); (F(1, 196)=138.26, p<0.001)). Crucially, however, there was no interaction between diagnostic group and performance level (F(1, 196)=2.66, p>0.1). Furthermore, post-hoc t-tests revealed no difference between IQ in low performing controls and high performing patients (t(98)= −1.5, p=0.13). That high performing patients recalled a greater number of words across the five trials, employed a higher level of semantic strategy, and lower level of serial strategy than low performing controls and the two groups had equivalent intelligence scores strongly suggest that, although IQ naturally underlies list recall, crucially it is not predictive of one’s recall strategy on this word list task.


Free recall in the CVLT is an extensively used assay of verbal episodic memory which, in addition to being a portable, easy task to administer, produces a normally and broadly distributed range of results. However, the widely employed and commercially available analytic formulas do not sufficiently assess the specific areas of performance and strategies employed which may be especially important to elucidate in clinical populations. We have identified and addressed the central problems of these standard CVLT measures to be: 1) the assumption that recall performance is constrained by the categories and primacy/recency assignations; 2) the dependence of strategy measures on recall list length; 3) the disregard for temporal progression of strategy within a group; and 4) the discussion of serial and semantic strategies as converse, conflicting approaches in verbal memory.

The results from our novel approach confirm the existence of the identified issues, namely that recall and transition probability measures reveal that in free recall of a list the number of words recalled is predictive of the strategy used, with a greater number of words recalled following a higher degree of semantic strategy. Though past studies have linked semantic strategy to good recall performance, they have also linked schizophrenia to poor recall performance (Delis et al., 1988; Iddon et al., 1998). High performing patients’ heavy dependence on semantic strategies suggests that semantic disorganization per se is not characteristic of patients with schizophrenia.

Boundaries of Categories and Order Effects

We investigated the categorical and primacy/recency boundaries by computing recall and transition probabilities for each cell within our matrices, allowing us to evaluate how appropriate the assumed categorical and primacy/recency groupings were. A main strength of the output of our analysis is that it does allow for literal visual examination of the occurrence of specific words over time, as seen in Figure 1. With recall probabilities, we examined the serial patterns of individual words, rather than assuming that primacy effects occur in the first four words and recency effects occur in the last four words of the presentation list, as done in the CVLT, which groups serial effects into primacy (the first four words), recency (the last four words) and middle (the remaining eight words). We found that primacy effects are generally restricted to the first two words. In a few cases, such as comparisons between high performing controls and low performing patients, the first three words of the list had significantly higher rates of recall for the low performing groups who tended to engage in higher serial-based recall.

Generally, the category boundaries corresponded with participant recall patterns. “Herbs” and “spices” were more associated with each other than with the fruits, despite the two categories sharing the superordinate category of “food”. However, all groups recalled the word “paprika” significantly more than all other words. Though frequency data was not available for the entire list, the frequency of paprika was not the lowest among the words. We theorize that it may have a greater age of acquisition, but this information was not available (Wilson, 1988). That the low performing patients recalled “chisel, pliers” and “drill, pliers” more often that the low performing controls not only attests to the strategic similarities of low performers independent of diagnosis, but is further evidence that certain words within the CVLT list have stronger semantic associations than others.

Recall List Length

The dependence of the original (Delis et al., 1987) and revised (Stricker et al., 2002) CVLT formulas on recall performance was one of our central motivations in developing the current formulas and is resolved in our mathematical approach. Since patients generally have poorer recall, it was expected that the original performance dependent measures would not fully capture patients’ semantic strategy use. In a recent analysis of semantic fluency data, where participants have one minute to generate as many words as possible within one category (e.g., “animals”), the widely observed lower number of words generated by patients with schizophrenia resulted in the semantic relationships between patients’ words appearing sparser than controls’, but an approach independent of list length showed that patients and control participants in fact displayed a similar content within their word lists. (Prescott et al., 2006). In the current study, each cell value is a probability rather than a scaled value. Therefore, we are comparing the number of times a word was recalled to the number of time each other word was recalled and excluding the number of words recalled from our formulas. In a similar vein to the study by Prescott et al (2006), we found comparable semantic grouping in patients and controls, suggesting that schizophrenia does not affect semantic organization of words per se.

Our findings relate to past genetic findings as verbal memory has proved to be a useful assay in unraveling the neurobiology and genetics of both episodic memory and schizophrenia (Cirillo & Seidman, 2003; Egan et al., 2004; Paunio et al., 2004). Indeed, the metabotropic glutamate receptor-modulating synaptic glutamate GRM3 genotype has been related generally to schizophrenia heritability (Egan et al., 2004) and specifically to overall performance on the CVLT in a large sample (n=217) of patients with schizophrenia and their first-degree relatives, but not for healthy volunteers (Egan et al., 2004). Also, in 168 families with a diagnosis of schizophrenia, the semantic clustering index derived from the original CVLT (Delis et al., 1987) has been associated with a linkage signal on 1q42 (Paunio et al., 2004), a chromosome identified as important among those genes potentially indicating susceptibility to schizophrenia (Ekelund et al., 2001). However, other results employing the original CVLT semantic clustering indices have not found these measures to be useful cognitive endophenotypes, thus suggesting that- as traditionally conceptualized- they are not useful in detecting strong biological signals (DeLisi et al., 2002).

Strategies Over Time

Whereas the original CVLT calculated strategies as averages across the five trials, by considering each trial we were able to identify changes in strategy over time. Though the participants who used the most semantically-based strategy overall used less serially-based strategy, all groups relied primarily on a serial strategy for the initial trial. Global usage of presentation order suggests that the use of temporal cues segue into other strategies. Though those with category based recall performed better, the importance of serial strategy is not negated. This is an important departure from previous CVLT literature which investigates only the mean serial and semantic clustering across the five trials, neglecting to examine the degree to which serial strategy may be the initial, underlying support necessary to shift to semantic strategy.

The Interaction of Semantic and Serial Strategies

The final issue, of the literature’s representation of semantic and serial recall as opposite strategies, is mainly one of conceptualizing the results. As stated above, all groups began with serial-based recall and moved toward categorical clustering. High performing controls made the strategic transition earlier than high performing patients. Low performing controls reduced reliance on serial recall earlier than low performing patients. However, both low performing groups employed comparable degrees of semantic recall. As concluded from CVLT studies employing the standard formulas, semantic strategy does lead to superior performance. We surmise that it was primarily the disparity between groups on overall performance that drove previous results (with controls remembering more words). That the semantic values of patients’ and controls’ lists were similar adds evidence that schizophrenia does not affect one’s semantic organization (Prescott et al., 2006).

It must be noted that the context-free environment of word list free recall is not ideal for demonstrating participant’s semantic recall patterns because it is likely vastly different from semantic connections created and employed in natural language. Furthermore, we only considered two aspects of verbal memory during list free recall, while it is clear that other processes contribute to retention and recall (Hunt, 1981).

These limitations notwithstanding, the strategy measures are certainly linked to learning processes: in the CVLT, participants choose which recall strategies to employ without input from the experimenter. A positron emission tomography (PET) experiment demonstrated that instruction of which recall strategy to use and previous knowledge of a list’s categorical sub-groups greatly improved performance. Furthermore, the more a participant had to discover the best recall strategy themselves, the greater the activation in the left prefrontal cortex (Fletcher, Shallice, & Dolan, 1998). Together, these findings suggest it may be the choice to use a semantic organizational strategy and not the strategy itself that is difficult for poor low performers.

Participant Grouping

The current approach revealed little difference between the recall pattern of controls and patients. In contrast to findings that controls, more than patients, depend on semantic strategies (Iddon et al., 1998), high performance correlated to semantic strategy in our patients and controls, suggesting that dividing populations by diagnosis only does not reveal the entire story. Forming groups based on performance level did not fully account for the difference in our results as compared to the standard CVLT formulas, evidenced by the significant effect of performance group by the revised CVLT formulas. Yet, performance strongly influenced semantic organization during recall. There is evidence that a subgroup of patients with schizophrenia is unable to adopt learning approaches during tasks requiring strategy implication, even after explicit instruction (Vaskinn et al., 2008). Cognitive remediation involving training to help patients choose strategies to enhance episodic memory abilities may have long term benefits. Indeed, cognitive training of skills requiring working memory might also be associated with an underlying biological change (McNab et al., 2009), a finding taken as evidence that training in other cognitive domains, such as verbal episodic memory, might also correspond with biological changes. Crucially, because verbal memory is a strong predictor of patient outcome, successful therapeutic approaches to improving verbal memory could increase patients’ success in general. This would be extremely welcome given that superior verbal memory is correlated with an improvement in many general function measures, especially fewer relapses and a greater chance of employment and independent living (Green, Kern, Braff, & Mintz, 2000).


We have developed a novel approach to calculate recall strategy use without the shortcomings of the free recall formulas widely employed in neuropsychology. Using such a sensitive measure, we can conclude that the number of words recalled is predictive of the strategy used, with a greater number of words recalled following a greater reliance on semantically-based strategy. Moreover, high performing patients’ dependence on semantic strategies is indicative that, to the extent that this task assays semantic organization, it is intact in patients with schizophrenia. Crucially, these findings highlight that it is essential to consider cognitive strategy independent of overall performance before attributing poor performance to degraded learning processes. Indeed, examining other patient populations believed to have semantic processing deficits, such as Alzheimer’s or temporal lobe epilepsy, may further our understanding of strategy engagement during verbal learning. In conclusion, our results demonstrate the importance of departing from global scoring techniques, especially when working with clinical populations. Although our study compared semantic and serial strategies in the CVLT in patients with schizophrenia, our approach is widely applicable to other word list paradigms and clinical populations and is sensitive enough to explore specific word-to-word relationships and how they impact verbal recall strategy.


This research was supported by the Intramural Research Program of the National Institute of Mental Health.


Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at

1According to a popular cognitive scientific theory of word association - Latent Semantic Analysis (LSA) (Landauer & Dumais, 1997) - semantic clustering is greater for categories containing natural items (fruits, spices/herbs) than those that contain man-made objects (tools, clothing) (Laham, 1997).

2Note that we are applying the CVLT-II formulas #3 and 4 (in text) to the CVLT-I word list.


  • American Psychiatric Association. DSM-IV: Diagnostic and Statistical Manual of Mental Disorders. The American Psychiatric Association; 1994.
  • Brébion G, David AS, Jones H, Pilowsky LS. Semantic organization and verbal memory efficiency in patients with schizophrenia. Neuropsychology. 2004;18(2):378–383. [PubMed]
  • Chen EYH, Wilkins AJ, McKenna PJ. Semantic memory is both impaired and anomalous in schizophrenia. Psychological Medicine. 1994;24(1):193–202. [PubMed]
  • Cirillo MA, Seidman LJ. Verbal declarative memory dysfunction in schizophrenia: from clinical assessment to genetics and brain mechanisms. Neuropsychology Review. 2003;13(2):43–77. [PubMed]
  • Cohen JR, Elvevåg B, Goldberg TE. Cognitive control and semantics in schizophrenia: An integrated approach. American Journal of Psychiatry. 2005;162(10):1969–1971. [PubMed]
  • Craik FI, Tulving E. Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General. 1975;104(3):268–294.
  • Delis DC, Freeland J, Kramer JH, Kaplan E. Integrating clinical assessment with cognitive neuroscience: construct validation of the California Verbal Learning Test. Journal of Consulting and Clinical Psychology. 1988;56(1):123–130. [PubMed]
  • Delis DC, Kramer JH, Kaplan E, Ober BA. California Verbal Learning Test. San Antonio, TX: The Psychological Corporation; 1987.
  • DeLisi LE, Shaw SH, Crow TJ, Shields G, Smith AB, Larach VW, Wellman RN, Loftus MB, Nanthakumar B, Razi K, Stewart J, Comazzi M, Vita A, Heffner T, Sherrington R. A genome-wide scan for linkage to chromosomal regions in 382 sibling pairs with schizophrenia or schizoaffective disorder. American Journal of Psychiatry. 2002;159(5):803–812. [PubMed]
  • Egan MF, Goldberg TE, Gscheidle T, Weirich M, Bigelow LB, Weinberger DR. Relative risk of attention deficits in siblings of patients with schizophrenia. American Journal of Psychiatry. 2000;157(8):1309–1316. [PubMed]
  • Egan MF, Straub RE, Goldberg TE, Yakub I, Callicott JH, Hariri AR, Mattay VS, Bertolino A, Hyde TM, Shannon-Weickert C, Akil M, Crook J, Vakkalanka RK, Balkissoon R, Gibbs RA, Kleinman JE, Weinberger DR. Variation in GRM3 affects cognition, prefrontal glutamate, and risk for schizophrenia. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(34):12604–12609. [PubMed]
  • Ekelund J, Hovatta I, Parker A, Paunio T, Varilo T, Martin R, Suhonen J, Ellonen P, Chan G, Sinsheimer JS, Sobel E, Juvonen H, Arajärvi R, Partonen T, Suvisaari J, Lönnqvist J, Meyer J, Peltonen L. Chromosome 1 loci in Finnish schizophrenia families. Human Molecular Genetics. 2001;10(15):1611–1617. [PubMed]
  • Elvevåg B, Heit E, Storms G, Goldberg T. Category content and structure in schizophrenia: An evaluation using the instantiation principle. Neuropsychology. 2005;19(3):371–380. [PubMed]
  • Elvevåg B, Storms G. Scaling and clustering in the study of semantic disruptions in patients with schizophrenia: A re-evaluation. Schizophrenia Research. 2003;63(3):237–246. [PubMed]
  • First MB, Spitzer RL, Miriam G, Williams JBW. Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) Washington, D.C.: American Psychiatric Press; 1997a.
  • First MB, Spitzer RL, Miriam G, Williams JBW. Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) Washington, D.C.: American Psychiatric Press; 1997b.
  • Fletcher PC, Shallice T, Dolan RJ. The functional roles of prefrontal cortex in episodic memory. I. Encoding. Brain. 1998;121(7):1239–1248. [PubMed]
  • Green MF, Kern RS, Braff DL, Mintz J. Neurocognitive deficits and functional outcome in schizophrenia: Are we measuring the 'right stuff'? Schizophrenia Bulletin. 2000;26(1):119–136. [PubMed]
  • Hazlett EA, Buchsbaum MS, Jeu LA, Nenadic I, Fleischman MB, Shihabuddin L, Haznedar MM, Harvey PD. Hypofrontality in unmedicated schizophrenia patients studied with PET during performance of a serial verbal learning task. Schizophrenia Research. 2000;43(1):33–46. [PubMed]
  • Hill SK, Beers SR, Kmiec JA, Keshavan MS, Sweeney JA. Impairment of verbal memory and learning in antipsychotic-naive patients with first-episode schizophrenia. Schizophrenia Research. 2004;68(2–3):127–136. [PubMed]
  • Hulme C, Roodenrys S, Brown GDA, Schweickert R, Martin S, Stuart G. Word-frequency effects on short-term memory tasks: Evidence for a reintegration process in immediate serial recall. Journal of Experimental Psychology: Learning Memory and Cognition. 1997;23(5):1217–1232. [PubMed]
  • Iddon JL, McKenna PJ, Sahakian BJ, Robbins TW. Impaired generation and use of strategy in schizophrenia: Evidence from visuospatial and verbal tasks. Psychological Medicine. 1998;28(5):1049–1062. [PubMed]
  • Jung J, Skeebo S. Multitrial free recall as a function of constant versus varied input orders and list length. Canadian Journal of Psychology. 1967;21(4):329–336. [PubMed]
  • Kareken DA, Moberg PJ, Gur RC. Proactive inhibition and semantic organization: Relationship with verbal memory in patients with schizophrenia. Journal of the International Neuropsychological Society. 1996;2(6):486–493. [PubMed]
  • Klein KA, Addis KM, Kahana MJ. A comparative analysis of serial and free recall. Memory and Cognition. 2005;33(5):833–839. [PubMed]
  • Laham D. Latent semantic analysis approaches to categorization. In: Shafto MG, Langley P, editors. Proceeding of the 19th Annual Conference of the Cognitive Science Society; Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.; 1997.
  • Landauer TK, Dumais ST. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review. 1997;104(2):211–240.
  • McNab F, Varrone A, Farde L, Jucaite A, Bystritsky P, Forssberg H, Klinberg T. Changes in cortical dopamine D1 receptor binding associated with cognitive training. Science. 2009;323(5915):800–802. [PubMed]
  • Minzenberg MJ, Ober BA, Vinogradov S. Semantic priming in schizophrenia: A review and synthesis. Journal of the International Neuropsychological Society. 2002;8(5):699–720. [PubMed]
  • Papassotiropoulos A, Stephan DA, Huentelman MJ, Hoerndli FJ, Craig DW, Pearson JV, Huynh K-D, Brunner F, Corneveaux J, Osborne D, Wollmer MA, Aerni A, Coluccia D, Hänggi J, Mondadori CRA, Buchmann A, Reiman EM, Caselli RJ, Henke K, de Quervain DJ-F. Common Kibra alleles are associated with human memory performance. Science. 2006;314(5798):475–478. [PubMed]
  • Paulsen JS, Heaton RK, Sadek JR, Perry W, Delis DC, Braff D, Kuck J, Zisook S, Jeste DV. The nature of learning and memory impairments in schizophrenia. Journal of the International Neuropsychological Society. 1995;1(1):88. [PubMed]
  • Paunio T, Tuulio-Henriksson A, Hiekkalinna T, Perola M, Varilo T, Partonen T, Cannon TD, Lönnqvist J, Peltonen L. Search for cognitive trait components of schizophrenia reveals a locus for verbal learning and memory on 4q and for visual working memory on 2q. Human Molecular Genetics. 2004;13(16):1693–1702. [PubMed]
  • Pomarol-Clotet E, Oh TMSS, Laws KR, McKenna PJ. Semantic priming in schizophrenia: Systematic review and meta-analysis. British Journal of Psychiatry. 2008;192(2):92–97. [PubMed]
  • Prescott TJ, Newton LD, Mir NU, Woodruff PWR, Parks RW. A new dissimilarity measure for finding semantic structure in category fluency data with implications for understanding memory organization in schizophrenia. Neuropsychology. 2006;20(6):685–699. [PubMed]
  • Ratcliff R, Sheu CF, Gronlund SD. Testing global memory models using ROC curves. Psychological Review. 1992;99(3):518–535. [PubMed]
  • Roofeh D, Cottone J, Burdick KE, Lencz T, Gyato K, Cervellione KL, Napolitano B, Kester H, Anderson B, Kumra S. Deficits in memory strategy use are related to verbal memory impairments in adolescents with schizophrenia-spectrum disorders. Schizophrenia Research. 2006;85(1–3):201–212. [PubMed]
  • Rosenberg SJ, Ryan JJ, Prifitera A. Rey auditory-verbal learning test performance of patients with and without memory impairment. Journal of Clinical Psychology. 1984;40(3):785–787. [PubMed]
  • Skelley SL, Goldberg TE, Egan MF, Weinberger DR, Gold JM. Verbal and visual memory: Characterizing the clinical and intermediate phenotype in schizophrenia. Schizophrenia Research. 2008;105(1):78–85. [PubMed]
  • Smith A, D'Agnostino P, Reid L. Output interference in long-term memory. Canadian Journal of Psychology. 1970;24(2):85–89.
  • Snitz BE, MacDonald AW, Carter CS. Cognitive deficits in unaffected first-degree relatives of schizophrenia patients: a meta-analytic review of putative endophenotypes. Schizophrenia Bulletin. 2006;32(1):179. [PMC free article] [PubMed]
  • Stricker JL, Brown GG, Wixted J, Baldo JV, Delis DC. New semantic and serial clustering indices for the California Verbal Learning Test-Second Edition: Background, rationale, and formulae. Journal of the International Neuropsychological Society. 2002;8(3):425–435. [PubMed]
  • Vaskinn A, Sundet K, Friis S, Ueland T, Simonsen C, Birkenaes AB, Engh JA, Opjordsmoen S, Andreassen OA. Can learning potential in schizophrenia be assessed with the standard CVLT-II? An exploratory study. Scandinavian Journal of Psychology. 2008;49(2):179–186. [PubMed]
  • Waugh NC. Free versus serial recall. Journal of Experimental Psychology. 1961;62(5):496–502. [PubMed]
  • Wechsler D. Weschler Adult Intelligence Scale- Revised. San Antonio, TX: Psychological Corporation; 1981.
  • Wechsler D. Weschler Memory Scale- Revised. San Antonio, TX: Psychological Corporation; 1987.
  • Weickert TW, Goldberg TE, Gold JM, Bigelow LB, Egan MF, Weinberger DR. Cognitive impairments in patients with schizophrenia displaying preserved and compromised intellect. Archives of General Psychiatry. 2000;57(9):907–913. [PubMed]
  • Wilson MD. MRC Psycholinguistic Database: Machine Usable Dictionary. 1988 Version 2.00 (Publication. Retrieved January 13, 2009: