|Home | About | Journals | Submit | Contact Us | Français|
Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the lexical-conceptual system. Here I provide an overview of the dual-stream model of speech processing and then discuss evidence concerning the source of predictive coding during speech recognition. I conclude that, in contrast to recent theoretical trends, the dorsal sensory-motor stream is not a source of forward prediction that can facilitate speech recognition. Rather, it is forward prediction coming out of the ventral stream that serves this function.
Readers will (1) be able to explain the dual route model of speech processing including the function of the dorsal and ventral streams in language processing, (2) understand how disruptions to certain components of the dorsal stream can cause conduction aphasia, (3) be able to explain the fundamental principles of state feedback control in motor behavior, and (4) understand the role of predictive coding in motor control and in perception and how predictive coding coming out of the two streams may have different functional consequences.
The dual route model of the organization of speech processing is grounded in the fact that the brain has to do two kinds of things with acoustic speech information. On the one hand, acoustic speech input must be linked to conceptual-semantic representations; that is, speech must be understood. On the other hand, the brain must to link acoustic speech information to the motor speech system; that is, speech sounds must be reproduced with the vocal tract. These are different computational tasks. The set of operations involved in translating a sound pattern, [kæt], into a distributed representation corresponding to the meaning of that word [domesticated animal, carnivore, furry, purrs] must be non-identical to the set of operations involved in translating that same sound pattern into a set of motor commands [velar, plosive, unvoiced; front, voiced, etc.]. The endpoint representations are radically different; therefore the set of computations in the two types of transformations must be different. The neural pathways involved must also, therefore, be non-identical. The separability of these two pathways is demonstrated by the fact that (i) we can easily translate sound into motor speech without linking to the conceptual system as when we repeat a pseudoword and (ii) inability to produce speech as a result of acquired or congenital neurological disease or temporary deactivation of the motor speech system does not preclude the ability to comprehension spoken language (Bishop, Brown, & Robson, 1990; Hickok, in press; Hickok, Costanzo, Capasso, & Miceli, 2011; Hickok, Houde, & Rong, 2011; Hickok, et al., 2008; Rogalsky, Love, Driscoll, Anderson, & Hickok, 2011).
These facts are not new to language science – they were the foundations of Wernicke’s model of the neurology of language (Wernicke, 1874/1977) – nor are they unique to language. In the last two decades, a similar dual-stream model has been developed in the visual processing domain (Milner & Goodale, 1995). The idea has thus stood the test of time and has general applicability in understanding higher brain function.
In this article I have two goals: One is to summarize the dual route model for speech processing as it is currently understood, and the other is to consider how this framework might be useful for thinking about predictive coding in neural processing. Predictive coding is a notion that has received a lot of attention both in motor control and in perceptual processing, i.e., in the two processing streams, suggesting that a dual-stream framework for predictive coding may be useful in understanding possibly different forms of predictive coding.
The dual-route model (Fig. 1) holds that a ventral stream, which involves structures in the superior and middle portions of the temporal lobe, is involved in processing speech signals for comprehension. A dorsal stream, which involves structures in the posterior planum temporale region and posterior frontal lobe, is involved in translating acoustic speech signals into articulatory representations, which are essential for speech production. In contrast to the typical view that speech processing is mainly left hemisphere dependent, a wide range of evidence suggests that the ventral stream is bilaterally organized (although with important computational differences between the two hemispheres). The dorsal stream, on the other hand, is strongly leftdominant.
The ventral stream is bilaterally organized, although not computationally redundant in the two hemispheres. Evidence for bilateral organization comes from the observation that chronic or acute disruption of the left hemisphere due to stroke (Baker, Blumsteim, & Goodglass, 1981; Rogalsky, et al., 2011; Rogalsky, Pitz, Hillis, & Hickok, 2008) or functional deactivation in Wada procedures (Hickok, et al., 2008) does not result in a dramatic decline in the ability of patients to processes speech sound information during comprehension (Hickok & Poeppel, 2000, 2004, 2007). Bilateral lesions involving the superior temporal lobe do, however, result in severe speech perception deficits (Buchman, Garron, Trost-Cardamone, Wichter, & Schwartz, 1986; Poeppel, 2001).
Data from neuroimaging has been more controversial. One consistent and uncontroversial finding is that, when contrasted with a resting baseline, listening to speech activates the superior temporal gyrus (STG) bilaterally including the dorsal STG and superior temporal sulcus (STS). However, when listening to speech is contrasted against various acoustic baselines, some studies have reported left-dominant activation patterns that particularly implicate anterior temporal regions (Narain, et al., 2003; S.K. Scott, Blank, Rosen, & Wise, 2000), leading some authors to argue for a left-lateralized anterior temporal network for speech recognition (Rauschecker & Scott, 2009; S.K. Scott, et al., 2000). This claim is tempered by several facts, however. First, the studies that reported the unilateral anterior activation foci were severely underpowered. More recent studies with larger sample sizes have reported more posterior temporal lobe activations (Leff, et al., 2008; Spitsyna, Warren, Scott, Turkheimer, & Wise, 2006) and bilateral activation (Okada, et al., 2010). Second, the involvement of the anterior temporal lobe has been reported on the basis of studies that contrast listening to sentences with listening to unintelligible speech. Given such a course-grained comparison, it is impossible to tell what level of linguistic analysis is driving the activation. This is particularly problematic because the anterior temporal lobe has been specifically implicated in higher-level processes such as syntax- and sentence-level semantic integration (Friederici, Meyer, & von Cramon, 2000; Humphries, Binder, Medler, & Liebenthal, 2006; Humphries, Love, Swinney, & Hickok, 2005; Humphries, Willard, Buchsbaum, & Hickok, 2001; Vandenberghe, Nobre, & Price, 2002). Given the ambiguity in functional imaging work, lesion data is particularly critical in adjudicating between the theoretical possibilities and as noted above, lesion data favor a more bilateral model of speech recognition with posterior regions being the most critical for speech recognition (Bates, et al., 2003).
The hypothesis that phoneme-level processes in speech recognition are bilaterally organized does not imply that the two hemispheres are computationally identical. In fact, there is strong evidence for hemispheric differences in the processing of acoustic/speech information (Abrams, Nicol, Zecker, & Kraus, 2008; Boemio, Fromm, Braun, & Poeppel, 2005; Giraud, et al., 2007; Hickok & Poeppel, 2007; Zatorre, Belin, & Penhune, 2002). The basis of these differences is less clear. One view is that the difference turns on selectivity for temporal (left hemisphere) versus spectral (right hemisphere) resolution (Zatorre, et al., 2002). Another proposal is that the two hemispheres differ in terms of their sampling rate, with the left hemisphere operating at a faster rate (25–50 Hz) and the right hemisphere at a slower rate (4–8 Hz) (Poeppel, 2003). These two proposals are not incompatible as there is a relation between sampling rate and spectral vs. temporal resolution (Zatorre, et al., 2002). Further research is needed to sort out these details. For present purposes, the central point is that this asymmetry of function indicates that spoken word recognition involves parallel pathways -- at least one in each hemisphere -- in the mapping from sound to meaning (Hickok & Poeppel, 2007). Although this conclusion differs from standard models of speech recognition (Luce & Pisoni, 1998; Marslen-Wilson, 1987; McClelland & Elman, 1986), it is consistent with the fact that speech contains redundant cues to phonemic information and with behavioral evidence suggesting that the speech system can take advantage of these different cues (Remez, Rubin, Pisoni, & Carrell, 1981; Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995).
Beyond the earliest stages of speech recognition, there is accumulating evidence that portions of the STS are important for representing and/or processing phonological information (Binder, et al., 2000; Hickok & Poeppel, 2004, 2007; Indefrey & Levelt, 2004; Liebenthal, Binder, Spitzer, Possing, & Medler, 2005; Price, et al., 1996). The STS is activated by language tasks that require access to phonological information - including both the perception and production of speech (Indefrey & Levelt, 2004), and during active maintenance of phonemic information (B. Buchsbaum, Hickok, & Humphries, 2001; Hickok, Buchsbaum, Humphries, & Muftuler, 2003). Portions of the STS seem to be relatively selective for acoustic signals that contain phonemic information when compared to complex non-speech signals (Hickok & Poeppel, 2007; Liebenthal, et al., 2005; Narain, et al., 2003; Okada, et al., 2010). STS activation can be modulated by the manipulation of psycholinguistic variables that tap phonological networks (Okada & Hickok, 2006), such as phonological neighborhood density (the number of words that sound similar to a target word) and this region shows neural adaptation effects to phonological level information (Vaden, Muftuler, & Hickok, 2009).
During auditory comprehension, the goal of speech processing is to use phonological information to access conceptual-semantic representations that are critical to comprehension. The dual-stream model holds that while conceptual-semantic representations are widely distributed throughout cortex, a more focal system serves as a computational interface that maps between phonological level representations and distributed conceptual representations (Hickok & Poeppel, 2000, 2004, 2007). This interface is not the site for storage of conceptual information. Instead, it is hypothesized to store information regarding the relation (or correspondences) between phonological information on the one hand and conceptual information on the other. Most authors agree that the temporal lobe(s) play a critical role in this process, but again there is disagreement regarding the role of anterior versus posterior regions. The evidence for both of these viewpoints is presented below.
Damage to posterior temporal lobe regions, particularly along the middle temporal gyrus, has long been associated with auditory comprehension deficits (Bates, et al., 2003; H. Damasio, 1991; N.F. Dronkers, Redfern, & Knight, 2000), an effect recently confirmed in a large-scale study involving 101 patients (Bates, et al., 2003). We can infer that these deficits are primarily post-phonemic in nature as phonemic deficits following unilateral lesions to the areas are mild (Hickok & Poeppel, 2004). Data from direct cortical stimulation studies corroborate the involvement of the middle temporal gyrus in auditory comprehension but also indicate the involvement of a much broader network involving most of the superior temporal lobe (including anterior portions) and the inferior frontal lobe(Miglioretti & Boatman, 2003). Functional imaging studies have also implicated posterior middle temporal regions in lexical-semantic processing (Binder, et al., 1997; Rissman, Eliassen, & Blumstein, 2003; Rodd, Davis, & Johnsrude, 2005). These findings do not preclude the involvement of more anterior regions in lexical-semantic access, but they do make a strong case for significant involvement of posterior regions.
Anterior temporal lobe (ATL) regions have been implicated both in lexical-semantic and sentence-level processing (syntactic and/or semantic integration processes). Patients with semantic dementia, who have been used to argue for a lexical-semantic function (S.K. Scott, et al., 2000; Spitsyna, et al., 2006), have atrophy involving the ATL bilaterally, along with deficits on lexical tasks such as naming, semantic association, and single-word comprehension (Gorno-Tempini, et al., 2004). However, these deficits are not specific to the mapping between phonological and conceptual representations and indeed appear to involve more general semantic integration (Patterson, Nestor, & Rogers, 2007). Further, given that atrophy in semantic dementia involves a number of regions in addition to the lateral ATL, including bilateral inferior and medial temporal lobe, bilateral caudate nucleus, and right posterior thalamus, among others (Gorno-Tempini, et al., 2004), linking the deficits specifically to the ATL is difficult.
Higher-level syntactic and compositional semantic processing might involve the ATL. Functional imaging studies have found portions of the ATL to be more active while subjects listen to or read sentences rather than unstructured lists of words or sounds (Friederici, et al., 2000; Humphries, et al., 2005; Humphries, et al., 2001; Mazoyer, et al., 1993; Vandenberghe, et al., 2002). This structured-versus-unstructured effect is independent of the semantic content of the stimuli, although semantic manipulations can modulate the ATL response somewhat (Vandenberghe, et al., 2002). Damage to the ATL has also been linked to deficits in comprehending complex syntactic structures (N. F. Dronkers, Wilkins, Van Valin, Redfern, & Jaeger, 2004). However, data from semantic dementia is contradictory, as these patients are reported to have good sentence-level comprehension (Gorno-Tempini, et al., 2004).
In summary, there is strong evidence that lexical-semantic access from auditory input involves the posterior lateral temporal lobe. In terms of syntactic and compositional semantic operations, neuroimaging evidence is converging on the ATL as an important component of the computational network (Humphries, et al., 2006; Humphries, et al., 2005; Vandenberghe, et al., 2002), however, the neuropsychological evidence remains equivocal.
The earliest proposals regarding the dorsal auditory stream argued that this system was involved in spatial hearing - a “where” function (Rauschecker, 1998) - similar to the dorsal “where” stream proposal in the cortical visual system (Ungerleider & Mishkin, 1982). More recently, there has been a convergence on the idea that the dorsal stream supports auditory-motor integration (Hickok & Poeppel, 2000, 2004, 2007; Rauschecker & Scott, 2009; S. K. Scott & Wise, 2004; Wise, et al., 2001). Specifically, the idea is that the auditory dorsal stream supports an interface between auditory and motor representations of speech, a proposal similar to the claim that the dorsal visual stream has a sensory-motor integration function (Andersen, 1997; Milner & Goodale, 1995).
The idea of auditory-motor interaction in speech is not new. Wernicke’s classic model of the neural circuitry of language incorporated a link between sensory and motor representations of speech and argued explicitly that sensory systems participated in speech production (Wernicke, 1874/1969). More recently, research on motor control has revealed why this sensory-motor link is so critical. Motor acts aim to hit sensory targets. In the visual-manual domain, we identify the location and shape of a cup visually (the sensory target) and generate a motor command that allows us to move our limb toward that location and shape the hand to match the shape of the object. In the speech domain, the targets are not external objects but internal representations of the sound pattern (phonological form) of a word. We know that the targets are auditory in nature because manipulating one’s auditory feedback in speech production results in compensatory changes in motor speech acts (Houde & Jordan, 1998; Larson, Burnett, Bauer, Kiran, & Hain, 2001; Purcell & Munhall, 2006). For example, if a subject is asked to produce one vowel and the feedback that he or she hears is manipulated so that it sounds like another, the subject will change the vocal tract configuration so that the feedback sounds like the original vowel. In other words, talkers will readily modify their motor articulations to hit an auditory target indicating that the goal of speech production is not a particular motor configuration but rather a speech sound (Guenther, Hampson, & Johnson, 1998). The role of auditory input is nowhere more apparent than in development where the child must use acoustic information in his/her linguistic environment to shape vocal tract movements that must reproduced those sounds.
During the last decade, a great deal of progress has been made in mapping the neural organization of sensorimotor integration for speech. This work has identified a network of regions that include auditory areas in the superior temporal sulcus, motor areas in the left inferior frontal gyrus (parts of Broca’s area), a more dorsal left premotor site, and an area in the left planum temporale region, referred to as area Spt (Fig. 1) (B. Buchsbaum, et al., 2001; B. R. Buchsbaum, et al., 2005; Hickok, et al., 2003; Hickok, Houde, et al., 2011; Hickok, Okada, & Serences, 2009; Wise, et al., 2001). One current hypothesis is that the STS regions code sensory-based representations of speech, the motor regions code motor-based representations of speech, and area Spt serves as a sensory-motor integration system, computing a transform between sensory and motor speech representations (Hickok, 2012; Hickok, et al., 2003; Hickok, Houde, et al., 2011; Hickok, et al., 2009).
Lesion evidence is consistent with the functional imaging data implicating Spt as part of a sensorimotor integration circuit. Damage to auditory-related regions in the left hemisphere often results in speech production deficits (A. R. Damasio, 1992; H. Damasio, 1991), demonstrating that sensory systems participate in motor speech. More specifically, damage to the left temporal-parietal junction is associated with conduction aphasia, a syndrome that is characterized by good comprehension but frequent phonemic errors in speech production (Baldo, Klostermann, & Dronkers, 2008; H. Damasio & Damasio, 1980; H. Goodglass, 1992). Conduction aphasia has classically been considered to be a disconnection syndrome involving damage to the arcuate fasciculus. However, there is now good evidence that this syndrome results from cortical dysfunction (Anderson, et al., 1999; Hickok, et al., 2000). The production deficit is load-sensitive: Errors are more likely on longer, lower-frequency words, and on verbatim repetition of strings of speech with little semantic constraint (H. Goodglass, 1992; H Goodglass, 1993). In the context of the above discussion, the effects of such lesions can be understood as an interruption of the system that serves at the interface between auditory target and the motor speech actions that can achieve them (Hickok & Poeppel, 2000, 2004, 2007). The lesion distribution of conduction aphasia has been shown to overlap the location of auditory-motor integration area Spt (B. R. Buchsbaum, et al., 2011), consistent with idea that conduction aphasia results from damage to this interface system.
Recent theoretical work has clarified the computational details underlying auditory-motor integration in the dorsal stream. Drawing on recent advances in understanding motor control generally, speech researchers have emphasized the role of internal forward models in speech motor control (Aliu, Houde, & Nagarajan, 2009; Golfinopoulos, Tourville, & Guenther, 2010; Hickok, Houde, et al., 2011). The basic idea is that the nervous system makes forward predictions about the future state of the motor articulators and the sensory consequences of the predicted actions to control action. The predictions are assumed to be generated by an internal model that receives efference copies of motor commands and integrates them with information about the current state of the system and past experience (learning) of the relation between particular motor commands and their sensory consequences. This internal model affords a mechanism for detecting and correcting motor errors, i.e., motor actions that fail to hit their sensory targets.
Several models have been proposed with similar basic assumptions, but slightly different architectures (Aliu, et al., 2009; Golfinopoulos, et al., 2010; Guenther, et al., 1998; Hickok, Houde, et al., 2011). One such model is shown in Figure 2 (Hickok, Houde, et al., 2011). Input to the system comes from a lexical-conceptual network as assumed by psycholinguistic models of speech production (Dell, Schwartz, Martin, Saffran, & Gagnon, 1997; Levelt, Roelofs, & Meyer, 1999). In between the input/output system is a phonological system that is split into two components, corresponding to sensory input and motor output subsystems, and is mediated by a sensorimotor translation system that corresponds to area Spt (B. Buchsbaum, et al., 2001; Hickok, et al., 2003; Hickok, et al., 2009). Parallel inputs to sensory and motor systems are needed to explain neuropsychological observations (Jacquemot, Dupoux, & Bachoud-Levi, 2007), such as conduction aphasia, as we will see immediately below. Inputs to the auditory-phonological network define the auditory targets of speech acts. As a motor speech unit (ensemble) begins to be activated, its predicted auditory consequences can be checked against the auditory target. If they match, then that unit will continue to be activated, resulting in an articulation that will hit the target. If there is a mismatch, then a correction signal can be generated to activate the correct motor unit.
This model provides a natural explanation of the symptom pattern in conduction aphasia. A lesion to Spt would disrupt the ability to generate forward predictions in the auditory cortex and thereby the ability to perform internal feedback monitoring, making errors more frequent than in an unimpaired system (Fig. 2B). However, this would not disrupt the activation of auditory targets via the lexical semantic system, thus leaving the patient capable of detecting errors in their own speech, a characteristic of conduction aphasia. Once an error is detected, however, the correction signal will not be accurately translated to the internal model of the vocal tract due to disruption of Spt. The ability to detect but not accurately correct speech errors should result in repeated unsuccessful self-correction attempts, again a characteristic of conduction aphasia.
An examination of current discussions of speech perception in the literature gives the impression that forward prediction in speech perception is virtually axiomatic. It is demonstrably true that knowing what to listen for enhances our ability to perceive speech. What is less clear is the source of these predictions.
The idea that forward prediction plays a critical role in motor control, including speech, is well-established. Recently, several groups have suggested that forward prediction from the motor system may facilitate speech perception (Friston, Daunizeau, Kilner, & Kiebel, 2010; Hickok, Houde, et al., 2011; Rauschecker & Scott, 2009; Sams, Mottonen, & Sihvonen, 2005; Skipper, Nusbaum, & Small, 2005; van Wassenhove, Grant, & Poeppel, 2005; Wilson & Iacoboni, 2006). The logic behind this idea is the following: If the motor system can generate predictions for the sensory consequences of one’s own speech actions, then maybe this system can be used to make predictions about the sensory consequences of others’ speech and thereby facilitate perception. Research using transcranial magnetic stimulation (TMS) has been argued to provide evidence for this view by showing that stimulation of motor speech regions can modulate perception of speech under some circumstances (D'Ausilio, et al., 2009; Meister, Wilson, Deblieck, Wu, & Iacoboni, 2007; Mottonen & Watkins, 2009). There are both conceptual and empirical problems with this idea, however. Conceptually, the goal of forward prediction in the context of motor control is to detect deviation from prediction, i.e., prediction error, which can then be used to correct movement. If the prediction is correct, i.e., the movement has hit its intended sensory target and no correction is needed, the system can ignore the sensory input. This is not the kind of system one would want to engineer for enhancing perception. One would want a system designed to enhance the perception of predicted elements, not ignore the occurrence of predicted elements. Empirically, motor prediction in general tends to lead to a decrease in the perceptual response and indeed a decrease in perceptual sensitivity. The classic example is saccadic suppression: Self-generated eye movements do not result in the percept of visual motion even though a visual image is sweeping across the retina. The mechanism underlying this fact appears to be motor-induced neural suppression of motion signals (Bremmer, Kubischik, Hoffmann, & Krekelberg, 2009). A similar motor-induced suppression effect has been reported during self-generated speech (Aliu, et al., 2009; Ventura, Nagarajan, & Houde, 2009).
But even if there are general concerns about the feasibility of a motor-based prediction enhancing speech perception, one could point to the TMS literature as empirical evidence in favor of a facilitatory effect of motor prediction on speech perception. There are problems with this literature, however. One problem is that recent lesion-based work contradicts the TMS-based findings. Damage to the motor speech system does not cause corresponding deficits in speech perception as one would expect if motor prediction were critically important (Hickok, Costanzo, et al., 2011; Rogalsky, et al., 2011). A second problem is that the TMS studies used measures that are susceptible to response bias. It is therefore unclear whether TMS is modulating perception, as is typically assumed, or just modulating response bias. Raising concern in this respect is a recent study that used a motor fatigue paradigm to modulate the motor speech system and then examine the effects on speech perception. Using signal detection methods, this study reports that the motor manipulation affected responses bias but not perceptual discrimination (d’) (Sato, et al., 2011). A third problem is that all of the TMS studies use tasks that require participants to consciously attend to phonemic information. Such tasks may draw on motor resources (e.g., phonological working memory) that are not normally engaged in speech perception during auditory comprehension (Hickok & Poeppel, 2000, 2004, 2007).
While sensory prediction coming out of the motor system (dorsal stream) is getting a lot of attention currently, there is another source of prediction: namely the ventral stream. Ventral stream prediction has a long history in perceptual processing and has been studied under several terminological guises such as priming, context effects, and top-down expectation. Behaviorally, increasing the predictability of a sensory event via context (G. A. Miller, Heise, & Lichten, 1951) or priming (Ellis, 1982; Jackson & Morton, 1984) correspondingly increases the detectability of the predicted stimulus, unlike the behavioral effects of motor-based prediction. A straightforward view of these observations is that forward prediction is useful for perceptual recognition only when it is coming out of the ventral stream.
One possible exception to this conclusion comes from research on audiovisual speech integration. Visual speech has a significant effect on speech recognition (Dodd, 1977; Sumby & Pollack, 1954), so much so that mismatched visual speech information can alter auditory speech percepts under some conditions (McGurk & MacDonald, 1976). Because visual speech provides information about the articulatory gestures generating speech sounds, the influence of visual speech has been linked to motor-speech function (Liberman & Mattingly, 1985) and more recently to motor-generated forward prediction (Sams, et al., 2005; Skipper, et al., 2005). Is this evidence that motor prediction can actually facilitate auditory speech recognition? Probably not. Recent work has suggested that auditory-related cortical areas in the superior temporal lobe, but not motor areas, exhibit the neurophysiological signatures of multisensory integration, namely an additive or supra-additive response to audiovisual speech (AV > A or V alone) and/or greater activity when audiovisual information is synchronized compared to when it is not (Calvert, Campbell, & Brammer, 2000; L. M. Miller & D'Esposito, 2005). Further behavioral responses to audiovisual speech stimuli have been linked to modulations of activity in these superior temporal lobe regions (Beauchamp, Nath, & Pasalar, 2010; Nath & Beauchamp, 2011, 2012; Nath, Fava, & Beauchamp, 2011). And finally, there is evidence that motor-speech experience is not necessary for audiovisual integration (Burnham & Dodd, 2004; Rosenblum, Schmuckler, & Johnson, 1997; Siva, Stevens, Kuhl, & Meltzoff, 1995). The weight of the evidence suggests that the influence of visual speech information is not motor speech-mediated but rather is a function of cross-sensory integration in the ventral stream.
The dual stream model has proven to be a useful framework for understanding aspects of the neural organization of language (Hickok & Poeppel, 2000, 2004, 2007) and much progress has been made in understanding the neural architecture and computations of both the ventral and dorsal streams. I suggested here that it might also be fruitful to consider the notion of forward prediction in the context of a dual stream model; specifically that the dorsal and ventral streams represent two sources of forward prediction. Based on the evidence reviewed and consistent with their primary computational roles, I conclude that dorsal stream forward prediction primarily serves motor control functions and does not facilitate recognition of others’ speech, whereas ventral stream forward prediction functions to enhance speech recognition.
1. The dorsal speech stream primarily supports
2. The ventral speech stream primarily supports
3. Conduction aphasia results from
4. The ability to recognize speech is supported by
5. Predictive coding or forward prediction
Answer key: 1:A, 2: B, 3: D; 4: D, 5: E
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.