The primary aim of this study was to identify the direction of influences exerted among left temporal and frontal areas activated by listening to intelligible speech. We focused on three key regions that were more responsive to hearing intelligible speech: pSTS, aSTS, and the anterior portion of the IFG, which we refer to as POrb. Having identified these areas, we then used a DCM analysis to exhaustively test all mathematical possible models of connectivity among these regions. We found that the optimal model exhibited a “forward” architecture, in the sense that auditory inputs entered the system exclusively via the most posterior area in the model, pSTS, and our experimental manipulation (i.e., intelligible vs time-reversed speech) only modulated the forward connections from the pSTS to both the aSTS and the IFG.
The areas that were activated by the intelligible speech compared with the time-reversed stimuli were strongly left lateralized and were in “higher-order” or “multimodal” sensory cortex within the STS; that is, in brain regions activated by meaningful stimuli of both a linguistic and nonlinguistic nature [e.g., faces, biological motion (Puce and Perrett, 2003
)]. This region is considered “heteromodal” in terms of language, because it is involved in processing both written and spoken inputs (Marinkovic et al., 2003
; Spitsyna et al., 2006
). Clearly this structure is important for processing meaning, and, although several current models stress its role in language processing linking posterior and anterior regions of the dominant temporal lobe (Scott et al., 2000
; Spitsyna et al., 2006
; Obleser et al., 2007
), our results demonstrate for the first time that the processing stream for intelligible speech runs from posterior to anterior STS. This conclusion is drawn from the results of our model comparisons. Because each model of interregional connections was crossed with three possible ways that the auditory stimuli (both meaningful and time-reversed stimuli) could enter the network, we were able to test where speech sounds are most likely to enter the system. As indicated by a group Bayes factor of 3.3 × 105
, there was strong evidence (across the group) in favor of the pSTS as opposed to the aSTS alone, both pSTS and aSTS, or the IFG alone.
An important question is whether the architecture of the model identified as optimal enables one to infer the type of processing that occurs in the model system. For example, a dominant theme in many cognitive studies and in previous DCM analyses (Mechelli et al., 2005
; Noppeney et al., 2006
) is whether the identified architecture supports “bottom-up” (i.e., stimulus-driven) or “top-down” (i.e., driven by stimulus-unrelated variables such as task demands, cognitive set, etc.) processes. In the present study, the task demands were kept constant throughout the experiment (i.e., the subjects were asked to identify the speakers’ gender), whereas the stimulus properties varied. Thus, the strong influence that speech intelligibility had on the pSTS→aSTS and pSTS→IFG connections (corresponding to a 76 and 150% increase in connection strength) represents a bottom-up, feedforward process.
Concerning the general structure of our models, it should be noted that, in models of effective connectivity such as DCM, the presence of a connection, or direct input, does not necessarily imply the existence of a direct (i.e., monosynaptic) anatomical connection; instead, a connection represents a causal influence, which can be mediated vicariously via intermediate regions not included in the model (e.g., local interconnected subregions). Both the pSTS and aSTS are connected to primary auditory cortex (Heschel’s gyrus or A1) via polysynaptic pathways that spread out from A1 to secondary auditory cortex. In the primate literature, this progression is from the core auditory regions (A1) to surrounding lateral “belt” and thence to lateral “parabelt” regions in STG (Kaas and Hackett, 2000
). This serial processing model has been recapitulated recently in man using a DCM analysis on fMRI data collected while subjects listened to complex (nonspeech) sounds, in which the spectral envelope was altered. The authors found that the optimum model for their data were serial or hierarchical, linking A1 to STS via the STG (Kumar et al., 2007
The anatomical evidence for the structural pathways linking pSTS to aSTS comes primarily from studies in the 1970s and 1980 using anterograde and retrograde tracer techniques in rhesus monkeys and electrophysiological studies in macaques. These studies demonstrate that the upper bank of the STS receives inputs from multiple sensory cortical regions (visual, auditory, and somatosensory inputs: area TPO), whereas the lower bank is unimodal, receiving inputs from visual cortex only (area TEa) (Seltzer and Pandya, 1978
; Baylis et al., 1987
). The upper bank of rhesus STS has been subdivided into four regions of increasing cellular and laminar differentiation from anterior to posterior. These areas are reciprocally connected with both forward and backward connections, the former usually originate in supragranular layers of cortex and terminate in layer IV of the more anterior region; the longer, intermediate-range fibers connecting posterior and anterior regions of the upper bank of the STS run in the middle longitudinal fasciculus (Schmahmann and Pandya, 2006
). There are at least two possible routes by which the human temporal lobe may be connected with the IFG, via either the uncinate fasciculus or a separate projection that passes through the extreme capsule (Seltzer and Pandya, 1989
; Catani et al., 2005
; Anwander et al., 2007
Our finding that intelligible speech preferentially activates pSTS and this drives additional processing in the aSTS is in partial support of Hickok and Poeppel’s model of the ventral processing steam; although their model does not have a direct connection between pSTS and aSTS, rather these areas influence each other through an intermediate region in inferolateral temporal cortex (Hickok and Poeppel, 2007
), not identified in our study. In contrast to our result, Scott et al. (2000)
found that the aSTS was more responsive than the pSTS to intelligible speech, with the pSTS being more involved in the processing of complex acoustic sounds. Although time-reversed speech has phonetic elements, there are clearly less native phonemes in these stimuli than in the non-time-reversed stimuli; thus, the pSTS region may be supporting both phonemic and semantic processing. Hickok and Poeppel have regions mediating phonological processing abutting those concerned with lexico-semantics at the junction of the pSTS/MTG, and there is evidence from other studies that parts of Wernicke’s area, particularly in and around the pSTS, may have dissociable functions (Wise et al., 2001
). Our results are consistent with cognitive models that predict complex auditory processing (in pSTS) precedes the recognition of intelligible speech (in aSTS).
The frontal region activated by intelligible speech in our study was in the most anterior and ventral part of the IFG, the POrb, equivalent to Brodmann area 47 (BA47). The POrb is distinct from Broca’s area, which, immediately posterior, takes up the rest of the IFG (BA 45 and 44 or pars triangularis and pars opercularis, respectively) (Standring, 2004
). The cytoarchitectonics of these regions shows that the POrb is more granular than the subdivisions in Broca’s area (Bailey and Bonim, 1951
) and has different projections from the mediodorsal nucleus of the thalamus (Fuster, 1997
). The POrb is associated with semantic processing of both auditory and visually presented words (Poldrack et al., 1999
; Vigneau et al., 2006
), whereas more dorsal and posterior regions in Broca’s area are involved in speech production. A recent study found that activation within left POrb, prompted by a semantic priming task, interacted with working memory demands such that there was more semantic task-related activity when the working memory component was low (Sabb et al., 2007
). Our result is consistent with this because POrb activation was probably implicitly driven by the high semantic content of our intelligible speech stimuli compared with their time-reversed counterparts.
In summary, using a DCM analysis, we report for the first time how directed connection strengths among key regions within the dominant temporofrontal cortex are modulated by intelligible speech. A forward model emerges, in which auditory speech inputs drive activity in the pSTS; in turn, this activity influences activity in the more anterior areas, the aSTS and POrb, and the degree of this influence changes depending on which speech stimuli are being processed (intelligible vs unintelligible speech). This type of analysis, which combines dynamic system models and Bayesian model selection, is likely to prove fruitful in additional studies on normal subjects. For example, it would be of interest to perform similar analyses to manipulate task demands or expectations to investigate top-down influences on semantic processing. Also, studies of patients with auditory speech processing disorders acquired as a result of stroke could investigate how functional connectivity is reorganized after damage.