|Home | About | Journals | Submit | Contact Us | Français|
Working memory can be described as the small amount of information held in a readily accessible state, available to help in the completion of cognitive tasks. There has been considerable confusion among researchers regarding the definition of working memory, which can be attributed to the difficulty of reconciling descriptions from working memory researchers with very different theoretical orientations. Here I review theories of working memory and some of the main issues in the field, discuss current behavioral and neuropsychological research that can address these issues, and consider the implications for cognitive development.
Researchers would probably agree that working memory is the small amount of information that is kept in an easily retrievable state concurrently. It is critical for successful cognition. (For recent reviews of working memory see Baddeley, 2007; Cowan, 2005; Jonides et al., 2008; Klingberg, 2009). One cannot comprehend language without keeping several points or concepts in mind at once, and one cannot solve a problem without keeping in mind the premises. Planning one’s activities and carrying out the plans requires working memory of the goals and steps involved. There has also been research on how the number of concepts in a media presentation must be kept to a minimum for adequate comprehension (Sweller, van Merrienboer, & Paas, 1998).
It is not surprising that working memory is a term bandied about frequently by behavioral and brain researchers, and even journalists, in these times of multimedia and complex technology everywhere. Anecdotal evidence that the concept is currently very popular is that, fortunately for me, most of the clinical or social researchers hired by my home department, Psychological Sciences, in the past few years have arrived with a strong research interest in working memory, and not due to influence by me or other cognitive researchers here.
I will first say more about the meaning of the term working memory, how I conceive of it given the vast literature on the topic, and what the empirical support is for its basic underlying mechanisms. Then I will describe some research that is beginning to clarify its neuropsychological functioning and development. The meaning of the term is unfortunately not so clear from the literature, and there seems to be a high demand from researchers in neighboring areas to have the meaning clarified. Once I attended a small conference on working memory culminating in the volume by Miyake and Shah (1999), in which each of the presenters was asked to define working memory. No two definitions agreed, and some of them were more like descriptions than definitions (which I see as problematic). In order to compare the various definitions that were offered, one’s working memory would have to be overtaxed.
It is of course possible to have some idea of a concept without being able to define it. For example, one can fathom what the linguistic concept of a “word” is without being able to define it. It would be unsatisfying to state that a word is just the unit separated by spaces in the written representation of language. What is more desirable is a definition based on meaningful principles, despite any difficulty of constructing it. A word is not quite the smallest unit of meaning – that is a morpheme – but perhaps a word is something like the most psychologically real unit of integrated meanings. This merely illustrates the difficulty of arriving at the best definition of a concept and, in the example, I actually cannot adequately define what a word is. The best I could do is to provide some insight. To understand what working memory is, likewise, we can consider the history of that term and the evolution of research on that topic.
Wilhelm Wundt, who is considered to have established the first laboratory of experimental psychology, in Leipzig, Germany, published prolifically on relevant topics but, amazingly, most of his writings were never translated into English; they were translated into Russian and Hungarian, I have been told, but not English. James (1890), who was familiar with Wundt’s work, offered the term primary memory to refer to the trailing edge of the conscious present. It was in contrast to secondary memory, the vast amount of information stored up through learning over a lifetime. From a more physiological point of view, Hebb (1949) invented the notion of a cell assembly to reflect a pattern of neural firing that could represent a specific idea in the brain, for as long as the appropriate cell assembly remained in an active state. These concepts are important precursors to working memory.
Miller (1956) effectively launched the field with his description of studies of immediate memory. It was measured by recall of a list of items as soon as the list ended, along with two other kinds of psychological tasks: absolute identification of an item from an ordered series, and the ability to know very quickly how many objects are in a display. One can recall lists of at most about 7 items, one can identify items with at most about 7 category choices, and one can tell very quickly how many objects are in a display of no more than about 7 objects. (Actually, this last estimate is a bit high, as subsequent work showed.) This paper impressed upon researchers how very limited the capacity for simultaneously-held information is within cognitive activities.
To my knowledge, the first use of the term working memory was by Miller, Galanter, and Pribram (1960) in a thoughtful essay on how we plan and carry out activities. Whereas Miller (1956) provided examples of limits in holding information from external stimuli in an active form, Miller et al. (1960) provided examples of limits in holding internally-generated concepts in an active form. For example, to drive to work one must keep in mind the main goal while satisfying various subgoals (putting on one’s coat if necessary, finding the car keys, loading work into a pack, etc.). It will not do if an important subgoal, or the main goal, is forgotten while another subgoal is met and the forgotten goal is not recovered in time. Each subgoal can have other, lower-level subgoals within it (such as opening the closet door as a subgoal to retrieve one’s coat) and the ostensible main goal is probably subservient to other, still higher-level goals (such as accomplishing something at work and earning money). The ability to retrieve the information needed, when it is needed, is the working memory concept here.
It is obviously not possible to keep the entire hierarchy of goals active in the conscious mind at the same time, so one must ask how humans can activate a subgoal and then return to the main goal to determine what subgoal should be met next, and so on. Several researchers have suggested mechanisms whereby we could use a secondary (or in modern language, long-term) memory structure, metaphorically speaking, as scaffolding that holds the hierarchy of goals and relevant information in order to allow the individual to use primary (or short-term) memory in a more focused manner to deal with one goal at a time. Cowan (1995) referred to virtual short-term storage, meaning the use of long-term storage in a way that makes items relevant to a particular topic easily retrievable, and Ericsson and Kintsch (1995) referred more specifically to data structures serving as what they termed long-term working memory. Their term has probably caused some confusion but it makes sense in functional terms, given the daunting amount of work that the working memory of Miller et al. (1960) was assigned to carry out.
The groundwork for this concept of long-term working memory was built into the essay by Miller (1956). He noted that the unit of immediate memory was not the binary choice or bit of information, as computer scientists might have expected, but rather the meaningful chunk. For example, it is difficult to remember a list of 9 random letters in their presented order but it is easy to remember the 9 letters IBM-CIA-FBI, if one recognizes the acronyms for a computer company and two U.S. government agencies. There is no practical limit on how much information can be remembered in this way. Ericsson, Chase, and Faloon (1980) took an individual who had memorized record athletic times and used those known chunks to train him to increase his digit span. He learned to use the known times and then learned to group sets of known times together, in the course of a year becoming able to repeat from memory lists of up to 80 digits or more. Even at the end of this time, though, his span for lists of letters remained at about 7, so the training was materials-specific.
In the period when the field of cognitive psychology was new, temporary forms of memory took a leading role within abstract flow diagrams of the entire process of cognition, known as information processing models. Broadbent (1958) sketched perhaps the first such model in which a wealth of information from the senses, which lasted only a matter of seconds, was fed very selectively into a limited-capacity processing system for further analysis using long-term memory information. Atkinson and Shiffrin (1968) took this notion further by concentrating on just what processes would have to be involved to shuttle information in and out of the limited-capacity processor, which they termed short-term memory.
A landmark work in the field of working memory is the chapter of that name by Baddeley and Hitch. They developed a multi-component model of working memory in contrast to the many previous models. Summarizing across most of the previous models, Baddeley and colleagues often have referred to the modal model of the earlier period as one in which information flows from sensory memory to short-term memory and then long-term memory, the most important representative being the model of Atkinson and Shiffrin (1968) as shown in Figure 1a. Baddeley and Hitch noted, however, that short-term memory as it was usually measured has not lived up to its promise as a working memory for cognitive performance. They pointed especially to studies like those of Warrington and Shallice (1969) and Shallice and Warrington (1970), showing that an individual who had very impaired immediate recall of lists (recalling lists of no more than 2 items) nevertheless appeared normal on various tasks of learning, memory, and comprehension. If the modal model served the purpose of a working memory, one would expect a more severe impairment of cognitive function in such an individual. Similarly, a list to be remembered did not have a very large effect on comprehension or reasoning tasks.
Baddeley and Hitch (1974) also noticed that key characteristics of short-term memory seemed to be dissociated from one another. In particular, they pointed to the characteristics of the recency effect, the especially good memory for items toward the end of a list to be recalled. Glanzer and Cunitz (1966) found that the recency effect disappeared if a distracting task was placed after the list and before recall of the list, which suggested that it was a property of short-term storage. In contrast to the modal model, though, Baddeley and Hitch noted that a memory load from a second, separate list did not diminish the recency effect. Finally, dissociations were observed between types of memory and types of interference. Verbal, phonological interference tends to impair memory for other verbal and phonological material much more than it affects spatial arrangements of visual items and, conversely, spatial or visual interference tends to impair memory for other visual, spatial configurations much more than verbal material.
Baddeley and Hitch (1974) therefore proposed a multi-component model of working memory instead of a general short-term store. In the 1974 chapter, the model was in verbal form only. The authors essentially suggested that there is a separate module that can handle a certain amount of verbal material in a phonological form, with the help of verbal rehearsal (the phonological loop) and, more tentatively, another module that can handle a certain amount of visual material in a spatial form (the visuospatial sketch pad). Baddeley and Hitch also still assumed that there was a central store that could hold abstract information and that had to share resources between storage and processing: they said (pp. 75–77), “We would like to suggest that the core of the working memory system consists of a limited capacity ‘work space ‘ which can be divided between storage and control processing demands…the working memory system may contain both flexible work space and also a component that is dedicated to storage.”
At least by the time Baddeley wrote a book summarizing his ideas about processing, he decided that the storage aspect of the core was unnecessary, and he omitted it. As shown in Figure 1b, there was a central executive component that carried out processing but did not itself have a memory. The storage of information was to take place in the phonological loop and visuospatial sketchpad components, which were specialized and did not share their resources with processing. Baddeley could still account for the mild interference between storage and processing by assuming that mnemonic strategies involved in maintaining information relied to some extent on central executive processes, as did various tasks such as comprehension and problem-solving. It is this model, without the central storage component, that has been the predominant theory of working memory in the last quarter century.
The Baddeley (1986) model did well when the research agenda was narrowly defined to look for the phonological and visuospatial components. Eventually, though, Baddeley became dissatisfied with what the model omitted. It could not explain how information of verbal and spatial types could be combined, or how abstract information could be remembered. It was well known that people can extract ideas from a long sequence of coherent prose and somehow that information had to be held in working memory. Therefore, Baddeley (2000) extended the model to include a component called the episodic buffer. He was not highly specific about what is included in that component, preferring to let subsequent research help define the evolution of the model. Therefore, it is not clear if the episodic buffer is meant to be the same as the earlier central component of working memory. One point that has been investigated is whether attention is required to retain associations between different kinds of information in an episodic buffer, as one might expect if the episodic buffer shares resources with processing. This has been investigated in several studies by dividing attention between a working memory task and another attention-demanding task. Allen, Baddeley, and Hitch (2006) found that verbal distraction impaired memory for items in a visual array, but it did not impair memory of the associations between different features of objects, their shapes and colors, any more than it impaired memory for the features themselves. A conflicting result was obtained by Fougnie and Marois (2009), though, in a task in which both the items to be recalled and the distracting task were visual in nature. More work will be needed for a good understanding of the episodic buffer, if there is such a component of working memory.
A long time ago I (Cowan, 1988) wrote a review of the information processing system that was later expanded into a book (Cowan, 1995) and further clarified as a result of Miyake and Shah’s conference (Cowan, 1999). The goal was to summarize what I thought we knew about information processing and leave out whatever was unclear. Therefore, I departed from convention in a number of ways (some of which have more to do with attention than with working memory and will not be discussed here).
For one thing, it seemed to me that the phonological and visuospatial stores of Baddeley (1986) were over-specified. There was a difference between these two types of material but perhaps the true taxonomy was more complex. What predictions would we make for memory of, say, spatially arranged tones, or touch, smell, and taste information?
Second, the Baddeley model underemphasized the role of sensory information. That model owed a lot to the work of Conrad (1964) showing that memory confusions between printed letters were primarily acoustic in nature (with confusions between letters that sounded alike, such as b, d, and p); therefore, that visual stimuli were recoded in a phonological form. Yet, the model seemed to ignore evidence that the modalities differ tremendously. Immediate memory for spoken items involves a much larger recency effect than immediate memory for printed items, (for reviews see Cowan, 1988, 1995; Penney, 1989).
Third, it seemed to me that the term short-term memory was used ambiguously in the literature. Sometimes it was used to refer to information that was in the conscious mind, similar to James’ (1890) primary memory. Other times it was used to refer to information that was in a temporarily activated state, whether or not it was in the conscious mind, more like the cell assemblies of Hebb (1949).
To accommodate these points, Cowan (1988) offered a revised model of information processing that is depicted in a slightly simplified form in Figure 1c. In this model, the phonological and visuospatial stores are just considered instances of the temporary activation of long-term memory information. It can also include sensory information from all modalities. The evidence for separate phonological and visuospatial stores is replaced with the general principle that interference between items in short-term storage depends on the similarity of their features (cf. Nairne, 1990). In fact, activated memory might include a large pool of activated features, including sensory, phonological, orthographic, and semantic features.
The model also assumes that some of the activated information is in the focus of attention. The input into that part of the system is processed to a greater depth, with a detailed interpretation based on the information in long-term memory. The items concurrently held in the focus of attention also can be combined to form new, larger units (or in the terminology of Miller, 1956, chunks) that can then be entered into long-term memory as newly-learned items. For example, if you saw a fish and a hat you could form the image of a fish wearing a hat and in the future remember that image as a single chunk, rather than as two items. Central executive processes can search long-term memory and use it to enter new information into the focus of attention, or to replace items in the focus with other items. Information that is displaced from the focus of attention remains activated for some time. Deliberate actions are based on what is in the focus of attention. In this approach, the term working memory was used to indicate a functional level at which activated memory, the focus of attention, and central executive processes worked together in order to keep items temporarily in mind to assist in various cognitive tasks.
An important difference between the components of the model is in how they are supposed to be limited. The activated portion of long-term memory is assumed to be unlimited in terms of how many features can be active at once, but the activation is assumed to persist only a matter of seconds before it decays. In contrast, the focus of attention is assumed to maintain a limited number of items in a manner that is resistant to interference from other items and does not decay until attention wanders elsewhere.
Oberauer accepted the basic model of Cowan (1988) except that he believed that the focus of attention can handle only one item at a time. Therefore, he added another level of embedding to the model shown in Figure 1c. In his model there is a capacity-limited region that can hold only a small number of items (like Cowan’s focus of attention) but, with this region, he proposed a one-item focus of attention. To demonstrate the aptness of this model he presented two sets of characters that had to be remembered after they disappeared from the screen, an upper and a lower row of digits. Each row had 1 or 3 digits. On every trial, one digit had to be updated. For example, the appearance of “−2” in the middle of the bottom row meant that the subtraction was to be carried out on the corresponding digit and the new result was to be remembered at that location, while all of the other digits were retained in memory without change. In some trial blocks, digits from either row could be updated whereas, in other trial blocks, only digits in one row could be updated. The reaction times were measured. At the end of the trial block, the final sets of digits had to be recalled.
There was a reaction time cost for trials in trial blocks in which either row could be updated, as compared to blocks in which only digits from one row could be updated. This cost depended on how many digits were in the row that was not actually updated on the trial. This was taken to indicate that there was a capacity-limited region of working memory that held the items that possibly could have to be updated on the trial. Additionally, there was cost of switching from one digit location to another one on two adjacent trials, as compared to trials in which the same digit location was updated on two trials in a row. This was taken to indicate that there is a 1-item focus of attention that holds only the digit location last updated.
These findings would be compatible with an alternative model in which all of the items that could be updated are in the focus of attention, but at differing levels of priority. Oberauer and Bialkova (in press) recently carried out a study somewhat similar to the 2002 study, but in which two items had to be updated on each trial. In particular, in one experiment the two items were digits that had to be used in an equation together. In another experiment they were digits, one of which had to be moved to a location relative to the other. To simplify a complex story, the results suggested that people form a chunk out of the two updated elements and then hold only that single chunk in working memory. This was taken in support of the Oberauer (2002) model with a single-item focus of attention.
I would not necessarily doubt the interpretation of Oberauer and Bialkova (in press) regarding their data. However, these data may not provide a test of the capacity limit of the focus of attention. If two items had to be updated on each trial in a way that did not relate them to one another, there still might be an advantage for the two digits even if they do not form a chunk. Such a result would count as evidence against the Oberauer (2002) model.
So far we have seen the term working memory shift from a single entity (in the early models of Broadbent, 1958 and Atkinson & Shiffrin, 1968) to a collection of entities. Baddeley and Hitch (1974) considered working memory to include a system with general-purpose storage (similar to what was previously called short-term storage), dedicated code-specific stores, and control processes. Cowan (1988) considered working memory to consist of embedded processes with an attention-based store embedded within a more extensive field of activated memory elements, and working memory comprises both of these along with executive processes.
Some individual difference researchers have provided findings suggesting that the most important difference between people with high and low working memory spans is the ability to keep irrelevant items out of working memory (e.g., McNab & Klingberg, 2008; Vogel, McCollough, & Machizawa, 2005), whereas others find group differences in the capacity of working memory, not in the ability to keep out irrelevant items (Cowan et al., in press; Gold, Wilk, McMahon, Buchanan, & Luck, 2003). It is not yet clear whether this difference is a matter of the populations examined or the methods used but the question will be revisited in the section on development. Meanwhile, the following further discussion of individual differences focuses on an issue of special importance to understand theoretical models of working memory.
In psychology, a model can be many different things. The models shown in Figure 1 are conceptual models that summarize how one may assume different components of processing work together. In strong contrast, a model as used by psychometric researchers can refer to portions of the variance in task performance and how they are related to one another. In such a model, arrows between one shape and another do not refer directly to a transfer of information as in the conceptual models, but rather to an influence of one factor on another. The prime example of such a model is the structural equation model with latent variables. The translation from a conceptual model to a structural equation model testing that conceptual model can be the source of some confusion, as I will illustrate shortly, after explaining the findings to be modeled.
Individuals differ in many different traits that affect their performance on working memory tasks. A dominant trend of individual differences research has suggested, though, that some types of variation are much more important than others for higher-level cognitive tasks such as problem-solving and reasoning, i.e., the types of task that underlie sheer thinking ability or fluid intelligence. In particular, it has been argued that the attention-demanding components of working memory are more important for higher-level cognition. This point has been made in a series of studies using what can be called complex working memory tasks. These tasks require some processing in between the items that are to be retained in memory, and therefore they require simultaneous storage and processing. In the most popular versions of the task, participants do some processing, such as comprehending a sentence or carrying out an arithmetic operation, and then receive a word to be remembered. This sequence of events is repeated several times and then all the words that were to be remembered are to be recalled in order. The measure of span is basically how long the series of storage and processing tasks can be without the participant making an error. Tasks of this type generally correlate with fluid intelligence and other cognitive aptitude tasks considerably better than do simpler list recall tasks (e.g., Conway, Cowan, Bunting, Therriault, & Minkoff, 2002; Conway et al., 2005; Daneman & Carpenter, 1980; Daneman & Merikle, 1996; Engle, Tuholski, Laughlin, & Conway, 1999). Other research shows that individuals with low span on this complex type of task have difficulty maintaining a goal according to instructions that go against natural responses, such as looking away from an object that suddenly appears on the computer screen rather than toward it (Kane, Bleckley, Conway, & Engle, 2001), or quickly naming the color in which a word is written in a Stroop task instead of reading aloud the conflicting color word that the letters spell out (Kane & Engle, 2003). Low spans also are more likely to complain that their minds wander when they are trying to pay attention (Kane et al., 2007). Dividing attention during a memory retrieval task tends to impair performance in high-span individuals more than in low spans, presumably because the high spans are more likely to apply attention-demanding strategies when their attention is available (Rosen & Engle, 1997).
Engle et al. (1999) basically accepted the conceptual model of Cowan (1988), and supported it with data analyzed using a structural equation model as illustrated in a somewhat simplified form in Figure 2a. Each arrow represents a significant path in the model. In this model, a theoretical short-term memory component (STM) governs performance on simple list recall tasks, and a theoretical working-memory component (WM) governs performance on complex span tasks. It is found that only the working-memory component is a significant potential influence on tests of fluid intelligence (gF). One could get the impression from a casual reading of this model that the STM and WM components are separate rather than embedded. That, however, would be a misinterpretation. The separation is a statistical convenience and the model works partly because it includes a correlation between the STM and WM components, depicted as a dashed two-headed arrow.
It is possible, though, to restructure the statistical model in a way that makes its relation to the underlying conceptual model clearer. Conway et al. (2002) did this, in a way illustrated in Figure 2b. Now it is directly shown that the WM component includes the STM component, in that the WM component governs all of the memory tasks and the STM component governs only the simple ones. Like Engle et al. (1999), the main finding was that only the WM component was a significant potential influence on fluid intelligence, and not the STM component. With this version of the structural equation model, though, the similarity to the embedded process model of Figure 1c is more apparent.
One can always find individuals who use the working-memory-related terms in an uncommon way (as discussed, for example, by Engle et al., 1999). Moreover, a final point of importance in understanding different investigators’ use of working memory terminology is that individuals may not be entirely self-consistent. In a discussion at a lucid moment or in a carefully-reviewed article, an investigator may use terms clearly and as intended (e.g., specifying an hierarchical relation between short-term and working memory); in a tired moment or in more informal writing, the terms may use terms in a less clear manner (e.g., talking as if there were separate, unrelated short-term and working memory stores). For the future, it is important to try to establish terms in a way that minimizes confusion both within an article and across the field at large, though we cannot expect a federal law to enforce consistent usage.
Although several different structures for the working memory system have been described and illustrated in Figure 1, they seem to hinge on certain basic mechanisms. It would be too ambitious to try to determine once and for all what the right structure of working memory is but I can address several of the underlying basic mechanisms: capacity limits, the focus of attention as a storage device, loss of information over time, and the modularity of storage mechanisms.
The field of cognitive psychology has often seemed quite ambivalent about the idea of working memory storage capacity limits. On one hand, everyone agrees that the capacity is not unlimited. On the other hand, there has been conventional wisdom in which the processes contributing to capacity limits are too complex for one to determine what the capacity will be in a particular circumstance. I have been particularly motivated to resolve this issue in order to address the possibility that the focus of attention itself provides capacity-limited storage.
Miller (1956) found that people could remember lists of about 7 meaningful items but this limit was rough. Baddeley, Thomson, and Buchanan (1975) soon afterward showed that the limit depends on the lengths of words to be recalled; lists of multisyllabic words are not recalled as well as lists of the same number of monosyllabic words, for example. Overall, the problem is that, if list recall recruits a complex system of working memory, one cannot tell from the result what a particular component is doing. The amount recalled may be enhanced by a process of grouping items together to form new chunks right on the spot; this may be why telephone numbers are reported in a grouped manner. Verbal rehearsal may help grouping processes or it may simply refresh the representations of items. Broadbent (1975) suggested that, in situations in which effects of such strategies are eliminated, the number of items that can be held in working memory is about 3, not 7. This would be the case, he suggested, if one looked at the number of items that can be repeated consistently without errors, as opposed to, say, the number of items that can be repeated half the time without error. Also, when one recalls from long-term memory, the results typically seem to come out in temporal bursts limited to about 3 items at a time. It is as if a limited-size bucket has to be dipped into a well repeatedly, and the bucket holds only 3 items. Cowan (2001) extended this review of Broadbent, finding various situations in which rehearsal and grouping seemed unlikely; the true capacity limit in normal adults was usually between 3 and 5 items.
Other studies were conducted to determine whether this capacity limit is independent of length limits or other extenuating factors. One striking result was obtained by Chen and Cowan (in press a). Like several previous studies, we taught participants pairs of words that previously would not have been strongly related (e.g., brick-fish). We then presented lists of various lengths that consisted of either pre-exposed singletons or learned pairs. On the basis of previous research we held the assumption that serial order information is not part of the capacity-limited store. Therefore, we used a free recall task in which words were counted correct if recalled at all, regardless of the recall order. For some participants, we prevented rehearsal by requiring repetition of the word “the” twice per second throughout the printed stimulus presentation. Under these circumstances, participants recalled an average of about 3 chunks regardless of the list length. When the chunks were singletons, 3 of them were recalled, and when the chunks were learned pairs, 3 of them (6 words) were recalled.
For another reason, I am fairly confident of the fundamental capacity limit of 3 to 5 items. For change recognition procedures in which the items to be remembered are visual objects, a simple formula has been applied to estimate the number of items in working memory (Cowan, 2001). The basic procedure for doing so is a slight modification of Luck & Vogel (1997) and is illustrated in Figure 3. An array of colored spots is briefly presented. This array is followed by an interval to allow the items to be transferred into working memory and then, ideally, a mask to eliminate sensory memory (Saults & Cowan, 2007). Last, a probe item is presented and the question is whether that item has changed color from the array item presented in that location.
The formula (Cowan, 2001) is based on the assumption that if one’s working memory capacity is k items and the array has S items, then the probability that the probed array item is in working memory is k/N or 1, whichever is smaller. If it is in working memory one will know if it has changed or not, and otherwise one will guess. The basic formula for the number of items loaded into working memory turns out to be k=S(h−fa), where h is hits, the proportion of changes detected, fa is false alarms, the proportion of no-change trials in which the participant incorrectly indicated that there was a change. The resulting estimates show that adults typically hold no more than 3 to 4 such items in working memory regardless of the number of array items. Subsequent work with more sophisticated mathematical modeling has reinforced that estimate (see Cowan & Rouder, 2009; Rouder et al., 2008; Zhang & Luck, 2008; but for a different interpretation see Bays & Husain, 2008, 2009).
Recall that, for verbal stimuli, verbal rehearsal had to be prevented in order for a fixed-capacity mechanism to be seen (Chen & Cowan, in press a). In the visual modality it is not yet clear whether there is any comparable rehearsal mechanism that would have to be prevented in order for a limited-capacity mechanism to be seen. Items can be refreshed using attention (Raye, Johnson, Mitchell, Greene, & Johnson, 2007) but it cannot yet be said with certainty whether that is a supplement to the basic attention-related capacity limit, or perhaps the underlying mechanism behind it.
A common complaint among cognitive psychologists is that neuroimaging research does not contribute much to an understanding of the abstract structure of cognition and rules of behavior. A good case can be made that neuroimaging has is furthering cognitive models, however, when it comes to capacity limits. Recent neuroimaging studies have identified a few areas in the posterior cortex that respond to a visual working memory load in a manner similar to behavior. The estimate of the number of simple items held in working memory tends to level off at about 4 items (Cowan, 2001; Rouder et al., 2008). Similarly, in a neuroimaging study (Todd & Marois, 2005), activity in the intraparietal sulcus (IPS) was found to increase with the number of simple items in the array that are supposed to be held in working memory, and to level off when there are 4 array items.
The capacity is lower for more complex items, such as letters from an unfamiliar foreign alphabet. There has been a debate about whether one should think of the number of objects or the total complexity of the display as more fundamental. Xu and Chun (2006) have found further results that allow us to believe that both factors may be important. In particular, they found that the inferior IPS responds to the number of objects, whereas the superior IPS and the lateral occipital complex depend on the complexity of the items. Without this neural result, parsimony would have urged us to choose between objects and complexity. In fact, it could be argued that the neuroscience has influenced the direction of the behavioral research. Awh, Barton, and Vogel (2007) tested memory for mixed arrays with both simple items, such as colored spots, and complex items, such as foreign letters or cube orientations. It was found that capacity was unaffected by the complexity of the items in the array provided that the changes being tested were large ones (e.g., a change from a foreign letter to a cube) rather than subtle ones (e.g., a change from one cube orientation to another). The complexity of an item therefore was said to affect the adequacy of the resolution of its representation in working memory, but not the number of items held in working memory.
A key unresolved issue is why the capacity limit exists. Cowan (2005) addressed this issue. The capacity limit can be viewed not only as a limitation of human cognition but also as a strength. Some mathematically-inclined investigators have carried out analyses indicating that the ideal group size is 3 or 4 items when one intends to search groups of items to find a given item, and when one takes into account the time of search for the correct group and search for the correct item within the group. This is consistent with the capacity limit and the assumption that it is easiest to form groupings of about that size (backed up by evidence: Ryan, 1969; Severin & Rigby, 1963). The notion would be that items present concurrently in the capacity-limited region of working memory (the focus of attention?) can form a new group.
In terms of neural mechanisms of capacity limits, Lisman and Idiart (1995) offered a theory in which cells carrying the feature representations of an object being represented in working memory have to fire in unison. Cells from different objects in working memory have to fire at slightly different times to avoid feature confusions between objects (e.g., a red square and green circle being miscoded as a greens square and red circle), and the capacity limit emerges because representations of all objects must fire once every 100 msec or so in a repeating cycle to persist.
If one accepts that there is a fundamental capacity limit in the range of 3 to 5 separate items, the nature of the capacity-limited holding mechanism is still in question. There are several types of finding that support the belief that the limit is attention-related (Cowan, 2001). First, the capacity limit seems to apply to items across modalities, provided that contributions of sensory memory, verbal rehearsal, and chunking are minimized. Saults and Cowan (2007) presented on every trial both a visual array of colored squares and a spatial array of spoken digits in different voices from four loudspeakers (with the two modalities either simultaneous or in succession; the results were similar either way). They used a spatial array of spoken digits rather than a sequence to make it difficult for the digits to be rehearsed. In some trial blocks the participant was responsible for one modality only, whereas in other trial blocks the participant was responsible for both modalities. In some experiments, after time (e.g., 1 sec) for encoding of each array, a mask was presented that included multicolored squares and garbled digits, to eliminate sensory memory of the stimuli. The results indicated that participants could recall about 4 visual items when only that modality was attended. When both modalities were attended, fewer visual items were recalled but the total of visual plus auditory items recalled was about 4. (The auditory-only results were poorer because of perceptual limitations.) This result suggests that the capacity limit of about 4 items applies to abstract semantic information rather than sensory or code-specific information, and that maintenance of information in that store requires attention.
Cowan and Morey (2007) further demonstrated that the modality-independent capacity limit is in maintenance, not encoding. On every trial they presented two sets of stimuli to be encoded; either set could be either a spatial array of colored items or a list of spoken characters. Articulatory suppression was used to prevent rehearsal of the list. After encoding, there was a cue indicating which set or sets had to be retained for another 3 s until the test. Performance was impaired when two sets had to be retained compared to when one set could be forgotten and one had to be retained. The results were the same, though, no matter whether the two sets were in the same modality or different modalities. After encoding has been completed, retention in the central component of working memory is general across modalities and stimulus types (see also Morey & Cowan, 2004, 2005).
Another source of evidence is that resources are shared between storage and processing even when the nature of the storage and processing are very different. Stevanovski and Jolicoeur (2007) found that visual array memory was impaired by a tone identification task during the array maintenance period. Chen and Cowan (in press b) tested memory for printed or spoken digits with a nonverbal choice reaction time between digits, in which one of three keys was to be pressed as quickly as possible (depending on the location of an object appearing on the screen). The memory load impaired choice reaction time performance to an extent that increased as the load increased across serial positions of the list. Kane et al. (2004) carried out an individual difference study and found a component of variance for complex working memory tasks regardless of whether they were verbal or spatial in nature. Given that such tasks were previously associated with controlled attention (e.g., Kane et al., 2001), it seems reasonable to assume that the type of storage under discussion is attention-demanding.
Two brain research studies help to reinforce the point that the capacity-limited memory is attention-related. In an fMRI study, Majerus et al. (2006) examined the IPS and its functional connectivity to other regions in the brain. The stimuli were lists of printed words. The IPS was active whenever there was a memory load (compared to a no-load task) but it was functionally connected to different areas depending on whether retention of item or order information was required. Thus, the IPS seemed to function as an attention component to enhance activity in the mechanisms specific to the kind of memory required. Postle et al. (2006) established the differential roles of frontal versus parietal regions in working memory using both fMRI and transcranial magnetic stimulation (TMS). Both frontal and parietal regions were activated for working memory tasks involving storage and manipulation of the materials, and also for tasks involving only storage. TMS differentiated the regions, though. Parietal TMS interfered with both kinds of task, as it should if the parietal regions account for storage, whereas frontal TMS interfered only with the tasks requiring manipulation, as it should if the frontal regions account for executive processes but not storage.
In sum, the bimodal tradeoffs in behavioral results and the observed patterns of brain activity support the notion of a capacity-limited attentional mechanism such as the focus of attention. It is not yet clear whether the limit is literally in the number of representations concurrently active in the focus of attention (Cowan, 2001, 2005) or the rate at which attention can be used to refresh representations in a timely manner (Barrouillet, Bernardin, Portrat, Vergauwe, & Camos, 2007; Portrat, Barrouillet, & Camos, 2008), as these possibilities are only subtly different.
Two major ways in which information can be lost from a limited working memory system is (1) if only a limited number of items can be held at once, and (2) if representations are somehow time-limited. Cowan (1988, 1995, 1999) described at length the evidence and rationale for a system in which the focus of attention and activated memory outside of that focus were said to have different limiting mechanisms. Activated memory outside of the focus of attention was said to be limited in time and susceptible to interference from incoming items that have features similar to the ones of the items already activated. In contrast, the focus of attention was said to protect items from those factors, although that protection can occur for only 3 to 5 separate chunks of information.
Loss strictly as a function of time is termed decay. Even if memory does not go away over time when rehearsal is possible, if memory loss over time can be seen with rehearsal prevented then it is still said that there is a decay mechanism. Surprisingly, the evidence in favor of a decay mechanism in memory of any kind is extremely weak. One challenge comes from researchers who found that it may be the relative rather than absolute amount of time that matters. For example, let us reconsider the finding that the recency effect disappears after a distraction-filled delay of at most 20 seconds (Glanzer & Cunitz, 1966). When the items in the list are also separated by distracting tasks of similar length, the recency effect to some extent reemerges (Bjork & Whitten, 1974). This finding suggested that a distraction-filled retention interval erodes performance at least partly by making the items toward the end of the list temporally less distinct from one another, a function that depends also on how far apart in time the items are from one another. By a common analogy, if an observer were standing at the end of a row of telephone poles (the list items), the distinctness of the poles would be greatest for the last few and this distinctness would depend on the ratio of the repeating distance between poles to the distance from the last pole to the observer’s location.
This discussion about relative versus absolute amounts of time becomes a bit moot, however, if one accepts recent evidence that there is very little, if any, loss of memory as a function of time (for a review, see Lewandowsky, Oberauer, & Brown, 2008). The evidence on which this conclusion is based primarily involves delaying the serial recall of items by including shorter or longer distracting periods between items recalled in the response period. Even when both verbal rehearsal and the use of attention for refreshing are prevented, there is little difference between lists with short versus long periods between items in the response. This seems to eliminate the mechanism of decay.
The situation appears to be different in terms of memory for unanalyzed sensory features, though. For example, Cowan, Lichty, and Grove (1990) presented spoken consonant-vowel syllables at irregular intervals while participants read a novel. A cue to stop reading and identify the last spoken syllable occurred 1, 5, or 10 seconds after the last syllable. Identification declined markedly over that time even though the only interference was from silent reading. In another experiment, it was necessary to monitor the spoken channel for a target syllable while reading and, even though the monitoring was only 60% correct, that division of attention eliminated the decline in memory over 10 seconds. Taken together, the results suggest that there is time-based loss of memory, but only for the types of features that can be encoded without attention.
For the items that receive adequate attention at the time of encoding but are then dropped from the focus of attention, in what sense can we say that the items are still temporarily activated in memory? Even without a decay mechanism, this might be said in several ways. First, the passage of time might make every item less distinct from other items presented nearby in time, and that process might typically produce the loss of memory over 20 s or so that is typical in earlier research on working memory (Keppel & Underwood, 1962; Peterson & Peterson, 1959). Second, daily life is full of interference, and the typical loss of memory over time may result from the typical amount of similar interference.
Third, the decay of memory that exists may not be exponential over time as one observes, for example, in the case of radioactive decay. An alternative notion of decay is one in which the active cell assembly actively representing an item has its neural resources depleted by the activity until it suddenly shuts down. Variability in the amount of time until shutdown could sometimes give the impression of exponential decay. Cowan, Saults, and Nugent (1997) administered a two-tone comparison task with the trials timed to allow control of the temporal distinctiveness factor, and found memory that held fairly steady for about 12 seconds and then dropped somewhat precipitously, not exponentially. Winkler, Schröger, and Cowan (2001) found the same thing in the temporal course of an event-related potential component that responds to changes in sound, the mismatch negativity; the component was found in full amplitude for some seconds and then disappeared precipitously. Most impressively, Zhang and Luck (2009) found a loss of information about items within a visual array of simple objects (differing only in color) over 10 s of an unfilled retention interval and found that was not a gradual decay involving a decreasing precision of the representations over time but, rather, a sudden death of some the representations of some items in the array by 10 s. They observed this by requiring recall of the probed color on a color wheel and by using a mathematical model in which one could know the item to some precision or else guess randomly.
It is even possible that this kind of memory loss, sudden death after a certain number of seconds, occurs even for semantic activation, though outside of the time range tested in the studies described by Lewandowsky et al. (2008).
The fact that I have discussed an undifferentiated activated area of long-term memory (Figure 1c) whereas Alan Baddeley has talked about more modular storage components (Figure 1b) should not be seen as evidence of a strong incompatibility of the models. Again, I simply felt unsure of what stores would be needed. For example, perhaps there are many different specialized storage areas that are partly overlapping. An auditory store may overlap with a phonological store, sharing some but not all of its neural apparatus, and a sound localization store may share some neural apparatuses with an acoustic store and a visual-spatial store. It appears that the storage of item and order information occurs via partly separate mechanisms as well (Majerus et al., 2006). So within activated memory outside of the focus of attention, I favor a system that has multiple specializations but is, nevertheless, not cleanly modular.
We have now looked at working memory macroscopically in terms of the models that have been developed (Figure 1), and more microscopically in terms of some basic mechanisms that make these models tick. That should be a good basis to begin to discuss development.
The body of research on the developmental aspect of working memory seems immense and diverse. Although I have recently reviewed the development of working memory (Cowan & Alloway, 2009), I cannot not do justice to the many different perspectives that have contributed to this topic for over 100 years. Moreover, because the child develops as a whole, many mechanisms change with age. Instead of surveying all potentially relevant research, I hope to provide some insight into how we might go about distinguishing which mechanisms are most critically involved in the developmental growth of working memory in childhood. I will do so by considering each of the underlying basic mechanisms that we have discussed.
Whereas Jean Piaget described cognitive development on the basis of the structure of rules about the world acquired by the growing child, neo-Piagetians tried to dig deeper to understand what processing capabilities might underlie the developing rule structure. Pascual-Leone (1970) developed a theoretical model in which the number of schemes that could be kept active at once (called M-space) was limited, and in which the active schemes had to be used both for storage of data and for processing routines used on these data. Any given problem could be solved only if the number of schemes available was adequate to hold the necessary data and also carry out the necessary processing of it.
One issue with this theoretical view is that it depended on a number of assumptions that were difficult to verify. In order to reach conclusions about developmental increases in M-space based on developmental improvements in performance on a memory task, one has to accept that the test was designed in such a way that the amount of space taken up by the task processing was equated across age groups. One also had to assume that the older participants were not better able to combine the task schemes or items to form fewer, larger chunks by older participants in order to get them to fit into the available capacity (cf. Miller, 1956). There were some successes. Burtis (1982), who was under Pascual-Leone’s tutelage early on, carried out a nice study in which the processing was extremely simple and the items sometimes formed natural chunks so that there was a strong basis on which to make these assumptions (e.g., letter pairs such as CC or CD as opposed to random pairs such as CG); nevertheless there was a developmental increase in the number of chunks recalled, and it was interpreted as a developmental growth in M-space or capacity.
Around the same time, however, Case, Kurland, and Goldberg (1982) reported on results leading to the opposite conclusion. There were age differences in processing efficiency and, if age groups were experimentally equated on processing efficiency through stimulus manipulations, the age difference in recall vanished. Specifically, Case et al. found a strong correlation between the time necessary to repeat spoken words and the memory span for lists of these spoken words. They taught adults new words (nonsense words) and found that their speed of identification and memory for these new words both were comparable to younger children using real words. Thus, Case et al. explained developmental improvements on the basis of increased efficiency in processing, rather than in increased capacity or M-space. This is a tough issue given the difficulties of theoretically analyzing one’s experimental situation. Probably as a result, there has been little work on this topic since that of Case et al. (1982).
Recently, Gilchrist, Cowan, and Naveh-Benjamin (in press) have re-opened the issue in a study of memory for lists of sentences in children and adults. On every trial, a list of linguistically simple, spoken sentences was presented via computer. There were four types of lists: (1) 4 short, unrelated, 1-clause sentences, e.g., “Thieves took the painting; Our neighbor sells vegetables…” etc.; (2) 8 short, unrelated sentences of this type; (3) 4 long sentences, each composed of two clauses, e.g., Our neighbor sells vegetables but he also makes fruit juice, still with no relation between the 4 sentences; or (4) 4 random pseudo-sentences, e.g., a close football your cheese; each pseudo-sentence was presented in its own sentence-like intonation. The materials used for long sentences for some participants were divided into short sentences and random pseudo-sentences for other participants, and vice versa. Thus, nobody received the same materials in multiple conditions and the assignment of materials to conditions was counterbalanced across participants. Notice that Conditions 2 and 3 were of the same phonological length, as were Conditions 1 and 4; yet, the number of independent chunks in the stimuli differed between the conditions within these pairs of conditions.
This test situation yielded two different, special measures of performance based on the assumption that each clause would tend to hang together to form a single chunk. If at least one content word from a clause was recalled, the participant was said to access the clause, meaning that it was in some way represented in working memory. The completeness and coherence of that representation then could be measured by the completion of accessed clauses, meaning the proportion of words recalled from clauses from which at least one content word was recalled. The results are shown in Figure 4. Between children in Grades 1–2 and older participants (children in Grades 6–7 and college students) there were developmental increases in the number of clauses accessed (top panel of the figure). Yet, there were no developmental increases in the completion of accessed clauses (bottom panel). This suggests that processing and grouping in this task were comparable across age groups, and that the increase across age groups in clause access can be taken as an indication that the capacity of working memory has increased. Of course, the efficiency of processing might have improved across age groups in some way that we could not measure, so there is room for followup work. At present, our best guess is that the capacity and, in some situations, the efficiency of processing increase with age in childhood.
An argument against this conclusion is that the capacity of infants in visual array procedures already seems similar to adults (Ross-Sheehy, Oakes, & Luck, 2003). It remains quite possible, however, that the task designed for infants inflates the capacity estimate. A repeating array is presented on each trial and the infant must notice multiple changes in the repeating array in order to prefer that array over a non-changing array on the other side of the screen. Without going through a very nuanced argument for the rationale for interpreting the results of this task, we can simply say that it seems to include a questionable assumption: that there is a monotonic relation between the number of changes noticed and the looking time. If, instead, looking time is greater when only some of the changes are noticed (because the infant is still figuring out the situation?), then looking reaches a maximum when the array exceeds capacity by a certain amount. Studies of visual arrays have suggested developmental increases in ability later in childhood (Cowan et al., 2005; Riggs, McTaggart, Simpson, & Freeman, 2006).
Even if one accepts that there is a developmental increase in capacity, one can question the assumption (Cowan, 2001; Pascual-Leone, 1970) that this developmental increase occurs because of developmental changes in the focus of attention. The nature of some studies that have yielded developmental increases in capacity at least show that the focus of attention has a lot to do with the task. Cowan, Nugent, Elliott, Ponomarev and Saults (1999) presented on each trial a visual rhyming game that silently held the participant’s attention while they ignored a series of spoken digit lists. Occasionally, the rhyming game was interrupted and the child was quizzed on the last spoken list. This seems to require that the participant switch attention to their unanalyzed sensory or phonological memory of the sounds in order to pull a few of these sounds into the focus of attention to be recalled. The number of digits that could be recalled in this situation increased with age in childhood, with adults recalling the now-familiar amount of about 3.5 digits on average and children recalling fewer. Cowan et al. (2005) also found that this task in college students was an excellent predictor of high school grades.
From this study it would not be clear whether the developmental change was in the amount that could be transferred into the focus of attention or the speed with which attention could be switched to the formerly-unattended speech sounds and the efficiency with which these materials could be encoded in the attention-dependent form of working memory that facilitates recall. Barrouillet, Gavens, Vergauwe, Gaillard, and Camos (2009), like Case et al. (1982), show that the rate of processing changes with age and that it can account for improvements in recall. The faster an individual can carry out processing, the more time is freed up to refresh items in memory, preventing them from decaying. It is probably going to be very difficult to tell the difference between holding multiple items in the focus of attention at once and circulating items in and out of attention in order to carry out a process of refreshing (Raye et al., 2007). On a macroscopic level of analysis they look the same; they differ only in a very microscopic analysis.
Another, more fundamental challenge to the idea of a developmental growth of capacity comes from the hypothesis that it is not capacity that increases, but the ability to exclude from storage items that are irrelevant to the task (e.g., Lustig, Hasher, & Zacks, 2007; McNab & Klingberg, 2008; Vogel et al., 2005). Although that kind of change probably does take place, it cannot account for all of the developmental change. Cowan, Morey, AuBuchon, Zwilling, and Gilchrist (in press) showed this by adapting a visual array procedure with distracting items (Gold et al., 2006) to children in Grades 1–2, Grades 6–7, and college. The purpose was to determine whether young children performed more poorly because they remembered fewer items than adults, or because a higher proportion of the items that they remembered were distracting items rather than targets.
As shown in Figure 5, on every trial an array of colored disks and triangles was presented. After a short delay, there was a single-item probe and the required response was to indicate with a mouse click where the probe item shown had occurred in the array. It could have occurred at the same location or at a different location, or at no location in the array. A cover story was used in which the shapes represented children in a classroom and the task was to indicate the seat in which the probed shape-child belonged. If the shape-child was not anywhere in the classroom the correct response was to click the door, sending the shape-child to the principal. For the main analyses, the results were coded in terms of whether the child indicated a change or no change in the location shown by the probe, so that the capacity formula (Cowan, 2001) could be applied.
There was a continuum of attention conditions. In some trial blocks, there was only one shape. In others, there were two shapes but one was always tested. In still other trial blocks, the location of a shape that was to be ignored was nevertheless used as the probe location on 20% of the trials. Finally, there were trial blocks in which attention was to be divided equally between shapes and each one was tested half the time. These trial blocks yielded five different attention conditions for the tested item, ordered here with decreasing amounts of attention devoted to the item during encoding: (1) 1-set, trials from a block in which objects of only one shape were in the array; (2) 100%, trials in a block that had two shapes but in which only the one that was to be attended was in fact tested; (3) 80%, trials in which the to-be-attended shape was tested, but from a block in which that was the case only 80% of the time; (4) 50%, trials from the divided-attention block; and (5) 20%, trials in which the to-be-ignored shape was tested, taken from the same trial block as the 80%-attended trials.
In the test, these conditions existed for trials with 2 items in the tested shape (and except in the 1-set condition, 2 items in the other shape) and also 3 items in the tested shape (and except in the 1-set condition, 3 items in the other shape). The clearest results, shown here, occurred for the 2-and-2 condition. Figure 6 shows these results in terms of the capacity as a function of the attention condition. The fact that the three age groups produced parallel response functions suggests that they did not differ in the use of attention to exclude less-relevant items. The fact that the youngest age group nevertheless recalled far fewer items than the older groups suggests that they have a smaller capacity than the older groups. These results seem to suggest that the focus of attention can hold more (or allow more to be held) in older children or adults than in the younger children. With a higher memory load (in the 3-and-3 condition, not shown), young children start to fail at the process of filtering out irrelevant stimuli; storage affects processing.
Throughout the history of developmental psychology there have been very few studies of the question of whether working memory has a shorter time period in young children than in adults. That is perhaps not surprising, given that the entire concept of memory being lost as a function of time is in flux (e.g., Lewandowsky et al., 2008) and has been questioned for many years (e.g., Keppel & Underwood, 1962; Melton, 1963). We did, however, find memory loss over time for items outside of the focus of attention, and an age-related change in that memory loss. Cowan, Nugent, Elliott and Saults (2000) presented lists of spoken digits in an unattended channel during a rhyming task and occasionally tested participants on the digits in the last spoken list, which ended 1, 5, or 10 s before the recall test. The results are shown in Figure 7. Second- and fifth-grade children had similar levels of performance with a 1-s delay but, at longer delays, the second-grade children’s memory had suffered much more decline. The age-related effect shown in the figure was found only for the final serial position of the list, for which a vivid, uninterrupted, auditory sensory memory is assumed to be present when the list ends. There may be an age difference in the persistence of sensory memory or in the ability to make sense of a degraded remnant of that memory.
For more categorical, attended stimuli, there is no known age-related difference in persistence of memory. Cowan et al. (2006) successfully trained children to speed up their responses in a memory span task, which should help to overcome any decay during recall, but no improvement in memory resulted. Theories like that of Barrouillet et al. (2009), that rely on the notion of attentional refreshing, and the earlier theories that rely on verbal rehearsal (e.g., Hulme & Tordoff, 1989) have hypothesized age differences in the speed of the refreshing or rehearsing process but not necessarily in the persistence of the information in memory if it is not refreshed.
There has been considerable developmental work done from the standpoint of the traditional tripartite model of Baddeley (1986) with a phonological store and rehearsal process, a visuospatial store and possible visual scan process, and an attention-related central executive process. For example, Gathercole, Pickering, Ambridge, and Wearing (2004) reported on the basis of means and structural equation models that these three components exist at least in children from 4 to 15 years of age, but with functional increases in the components over the years.
From my own standpoint, these data do not contradict the notion that the system may not be so modular. Other tasks could result in a different division in the system to produce perhaps a different tripartite structure and still the components would develop. In my opinion, the fundamental capabilities that develop and that underlie findings like those of Gathercole et al. could be (1) increasing knowledge that allows better storage and retrieval of verbal and spatial information, and (2) increasing capability of the attentional system. Together, these developments also allow an improvement in mnemonic strategies. Several examples will help to illustrate this possible developmental scenario.
Graham Hitch and his colleagues have suggested that young children rely on visual representations more than older children, shifting to verbal representations in older children (Hitch, Halliday, Dodd, & Littler, 1989; Hitch, Halliday, Schaafstal, & Schraagen, 1988). Does this suggest modularity in the system? Instead, it could be attributed to the helpfulness of a good match between the encoding and response representations in young children, given the types of experiments that were conducted. In older children there is a growing tendency to represent even visual materials in verbal form if possible because that allows covert verbal rehearsal, which assists recall in many test contexts (especially serial recall). Covert verbal rehearsal becomes increasingly automatic with age (Guttentag, 1984). Young children do not easily benefit from it (Ornstein & Naus, 1978) and therefore they perform poorly, compared to older children, on materials that lend themselves to verbal rehearsal.
A similar analysis can be made in a study on children with specific language disorders (Gillam, Cowan, & Marler, 1998). Lists of items to be recalled were presented in spoken or printed form (or both together) and the responses involved speech or pointing to the correct words. The different modalities were presented in different blocks of trials so children always knew what kinds of stimuli to expect and what kind of responses to give. The children with disorders were impaired relative to control children only when printed input was paired with pointing responses. This can be understood via strategies. The children with language disorders may have failed to form a speech code to assist recall unless something about the situation forced them to form such a code: spoken input or expected spoken responses.
A final example of the effect of the development of strategies comes from Cowan, Saults, and Morey (2006), who examined verbal-spatial associations in children in Grade 3, Grade 6, and college. On each trial, 3–7 pentagons represented house locations, and printed names appeared and then vanished one at a time in the houses. Then a probe name appeared centrally and was to be dragged to the correct house at which it had previously occurred. Cowan et al. showed that there were two strategies that could be used. The verbal-spatial associations could be remembered, an effortful process; or a combination of the verbal sequence and the spatial route from house to house could be remembered separately, and these two sequences could be combined at the time of test. For example, if one were presented with a name recognized as second in the sequence of names, then the location second in the route from house to house would be the correct answer. It turned out that children in third grade predominantly used the verbal-spatial associative strategy (and the sixth-grade children were ambivalent), whereas the college students predominantly used the combination of verbal sequence and spatial route. One could tell because of the effects of sequences in which the route doubled back upon itself, so that some locations were used for more than one name whereas other locations were not used at all. Such doubling-back was helpful to third-grade children but harmful to college students because it made the separate-storage strategy difficult to use. Suppressing rehearsal made college students’ pattern of data reverse to look very similar to third-grade children, though at a somewhat higher level of performance.
The suggestion from all these studies is that developing strategies throughout childhood can explain why children’s responses to verbal or spatial memoranda both improve but seem separate. Changing strategies can include many things, such as the better use of perceptual grouping cues (e.g., Towse, Hitch, & Skeates, 1999) and the reference to semantic cues that can serve as reminders even in short-term memory tasks (e.g., Cowan et al., 2003). It seems likely that changing strategies and the growth in attention-demanding processes together go a long way toward explaining developmental improvements in working memory.
About fifty years after Broadbent’s (1958) rough sketch of the information processing system and the ideas by Miller and colleagues (Miller, 1956; Miller et al., 1960) about how working memory operates, the field has just begun to make good on the promise to define the limits of human cognitive ability and the reasons for those limits. Exciting work is taking place in both behavioral and brain research laboratories related to working memory. Developmental science can play an important role in that effort by showing what basic parameters change to allow children to grow into their adult states. In addition to the studies I have discussed, there are a few recent studies of working memory in the developing brain (e.g., Nelson et al., 2000; Olesen, Nagy, Westerberg, & Klingberg, 2003) and developmental studies of attention-related working memory training that appears to improve cognition in children (e.g., Klingberg et al., 2005; Rueda, Rothbart, Saccamanno, & Posner, 2005). There are profound practical consequences of poor working memory in children (Cowan & Alloway, 2009; Gathercole, Lamont, & Alloway, 2006) and the topic is of considerable theoretical interest (Cowan & Alloway, 2009; Cowan, Elliott, & Saults, 2002). For these reasons, it seems likely that the topic of working memory will stay in the research spotlight for some time.