The overall pattern of results suggests that intra-cortical white matter pathways as well as pathways running between cortical areas and the brainstem support the behaviors tested by the higher order auditory tasks used in this study. Two results are particularly salient. The first is that all tasks correlated with FA of fibers within the centrum semiovale. However, FA in this region and task performance was positively correlated for one task (TC40) and negatively correlated for the remaining three (TC60, SCAN-C, BKBSIN). Secondly, the remaining white matter regions that correlated with task performance were largely independent between tasks. This directly addresses an ongoing controversy concerning whether tests for APD measure independent or dependent constructs (Domitz & Schow, 2000
; McFarland & Cacace, 2002
). The DTI data supports the idea that the neurological bases for each of these tasks are at least partially independent.
Each of the tasks we studied requires effortful processing of linguistic information. What distinguishes these tasks is the manner in which processing was made effortful. For each test, the child’s task is to repeat the input. However, the input was varied in terms of whether sentences or words were presented (altering memory load and semantic support) and in terms of how the speech input was altered (added noise, filtering, or time compression). The results can be considered in light of these sources of task variation.
Correlations of FA with task performance were found in dorsal prefrontal cortex (for the BKBSIN task, the centroid was nearest the middle frontal gyrus, BA 9) as well as the right ventral prefrontal cortex (for the low-pass filtered words and the 60% time compressed sentences; the centroids of the regions were nearest the anterior cingulate, BA 32, and the medial frontal gyrus, BA 10, respectively). The dorso-lateral aspect of the prefrontal cortex is involved in working memory (see (Owen, McMillan, Laird, & Bullmore, 2005
) for a review) and such top-down processes are known to support retrieval of information from degraded speech (Hannemann, Obleser, & Eulitz, 2007
), by manipulating the degraded stimuli within short-term memory, and then reconstructing by processing within the context of semantic predictability (Obleser, Wise, Alex Dresner, & Scott, 2007
). Therefore, it is not unexpected to see dorsally located prefrontal white matter correlate with our speech-in-noise task (BKBSIN) which shares these task characteristics. An alternative explanation is that the prefrontal cortex is involved with directing attention to relevant auditory features, including monitoring and selection (Lebedev, Messinger, Kralik, & Wise, 2004
). Differences in the regions of prefrontal white matter identified for each task may reflect differences in how attentional resources must be marshaled under different listening conditions. FA in the right ventral prefrontal cortex correlated with task performance for the low-pass filtered words (SCANC) and the 60% time compressed sentences (TC60); the centroids of these regions were nearest to the medial frontal gyrus and anterior cingulate cortex, respectively. This region of the prefrontal cortex may be involved with lexical selection involved with multiple word candidates (Kan & Thompson-Schill, 2004b
). When individual low-pass filtered words were presented in the SCAN-C, it is possible that children had to select the actual word from their lexicon as opposed to similar-sounding lexical items. Selection within a set of closely related semantic items appears more closely associated with left than right prefrontal activation (Kan & Thompson-Schill, 2004a
); however for the SCAN-C, selection is not occuring from closely related semantic items but from phonetically similar items. The task may involve a conflict arising at the response level, necessitating inhibition of semantic associations. Conflicts at the response level and inhibition of targets have been found to recruit the right prefrontal cortex (Konishi et al., 1999
; Milham et al., 2001
The SCAN-C was also the only auditory processing task which displayed correlations with interhemispheric structures (the corpus callosum). In this case the subjects are likely using spectral information to help with lexical decision. As spectral information has been thought to be preferentially processed in the right hemisphere, interhemispheric transfer would play a key role in performance on this task. Although it should be pointed out the framework of the left hemisphere being dominant for processing fast temporal information, with the right hemisphere dominant for processing spectral information (Dogil et al., 2002
; Obleser et al., 2008
; Poeppel, 2003
; Schonwiesner, Rubsamen, & von Cramon, 2005
; Zatorre, Belin, & Penhune, 2002
), has recently been challenged. A new hypothesis has been proposed (Hickok & Poeppel, 2007
) for multi-time resolution processing in which the hemispheres differ in terms of selectivity, with the right hemisphere dominant for integrating information over longer timescales and the left hemisphere being less selective in response to different integration timescales (Boemio, Fromm, Braun, & Poeppel, 2005
). Our finding of increased FA in interhemispheric regions such as the corpus callosum would also be consistent with this framework, in which processing of suprasegmental information is carried out over longer intervals (~300 ms) and in the right hemisphere.
Positive correlations between FA and white matter adjoining occipital or occipital-temporal areas were found for the SCAN-C filtered words and the time-compressed sentences at the lowest degree of compression. While the centroids of these regions were found to be closer to the posterior cingulate or precuneus (BA 31) by the Talairach Daemon, visual inspection of the coverage of these regions together with the Talariach atlas revealed that these white matter regions were adjacent to Brodmann’s area 39 (middle temporal gyrus/middle occipital gyrus). This region is held in the model of (Hickok & Poeppel, 2007
) to be the more posterior region of the ventral stream, corresponding to the lexical interface linking phonological and semantic information. Our interpretation for the time-compressed sentences is that better conversion from phonologic to semantic information, as reflected in higher FA values adjoining BA 39, results in better encoding and therefore better recall from working memory when subjects are tested. For the SCAN-C filtered words, working memory would not be expected to be a major factor affecting task performance. However, a more “automatic” conversion using the ventral stream would be expected to facilitate recognition of the words.
A very interesting finding is the correlations found between FA in the centrum semiovale and task performance. For a few of these areas, the Talairach Daemon determined the nearest gray matter region to be the cingulate gyrus rather than the pre or post-central gyrus (). However, visual inspection of the Talairach Atlas (Talairach & Tournoux, 1988
) revealed the centroid of these regions to be very close anatomically to the position of the corticospinal tract as shown in the atlas (<= 5 mm). Thus, it is likely these regions reflect connections with motor regions. While these fibers (in the centrum semiovale) do not connect directly to the auditory cortex, connections with motor regions are possibly related to descending control (efferent pathways) over the auditory system. Bidirectional connectivity is present between premotor areas and the auditory cortex, as shown in a tracer study (Romanski et al., 1999
). The auditory cortex then projects to the superior olivary complex and inferior colliculus (Peterson & Schofield, 2007
), and these corticofugal pathways reach into the inner ear via efferents in the medial olivocochlear bundle (Guinan, 2006
). This explanation is consistent with the well known phenomenon of the suppression of otoacoustic emissions at the level of the cochlea in the presence of contralateral noise (Timpe-Syverson & Decker, 1999
), as well as the correlation between these emissions and performance on speech-in-noise tasks (de Boer & Thornton, 2007
; Kim et al., 2006
). However, further research will be necessary to confirm a relationship between connections with motor regions and top-down efferent processing.
Also, for our tasks, these correlations were negative for three of the four tasks; these three tasks were more difficult than the one task that showed a positive correlation. Therefore, even if our hypothesis of top-down efferent processing is correct, descending control of the cochlea alone, or modulation of ascending pathways from the superior olivary complex and inferior colliculus (Peterson & Schofield, 2007
) may not be the predominant predictor of performance once a task has crossed a threshold of difficulty. Instead, it may be that other aspects of processing become more prominent at high difficulty levels.
A recently proposed framework for speech processing (Hickok & Poeppel, 2007
) possibly offers further insight. This framework places the premotor cortex in the “dorsal stream” used for auditory-motor mapping, a framework supported by functional imaging studies showing activation in the premotor cortex during speech perception (e.g. (Wilson, Saygin, Sereno, & Iacoboni, 2004
)). Indeed, some theories of speech perception hold that speech recognition is facilitated by articulatory representations in the dorsal pathway (Kluender & Lotto, 1999
; Liberman & Whalen, 2000
). Several studies support this interpretation. The precentral gyrus was observed to be active during speech perception (Skipper, Nusbaum, & Small, 2005
); although at a lower level during auditory-only presentation, as compared with audio-visual presentation. Repetitive transcranial magnetic stimulation (rTMS) studies have shown disruptions in speech perception when the left premotor cortex was stimulated prior to presentation of speech (Meister, Wilson, Deblieck, Wu, & Iacoboni, 2007
); however, interestingly, phoneme perception was improved when the portion of the motor cortex representing the articulator producing a particular sound was stimulated just before the auditory presentation (D’Ausilio et al., 2009
). Greater activation in the motor cortex has been observed for non-native speech sounds as compared to native speech sounds (Wilson & Iacoboni, 2006
). A causally delayed relationship was found between the left posterior supramarginal gyrus and the premotor cortex (Londei et al., 2007
) for listening to words and (pronounceable) non-words, although not for reversed words, again supporting the hypothesis (Warren, Wise, & Warren, 2005
) that the presentation of “do-able” sounds causes the interfacing of auditory-sensory representations and the articulatory-motor component.
The authors further hypothesize that stored auditory vocal templates, based on experience of one’s own vocal output, are used in a template-matching process for interpretation of auditory input, which enables recognition of speech sounds despite variations due to the characteristics of the particular speaker (Kuhl, 2004
). Such a template-matching process would need bidirectional transfer between auditory and motor regions, facilitating top-down in addition to bottom-up processes, a hypothesis supported by recent functional imaging studies (e.g. (Kohler et al., 2002
; Wilson et al., 2004
)), as well as a tracer study in primates (Romanski et al., 1999
). Correlations between FA values in corticospinal pathways and performance on the auditory recognition tasks used in the study is consistent with this model, as cortico-cortical connections between premotor and auditory areas are utilized.
However, in the presence of significantly degraded
speech, whether by loss of frequency content (such as the SCAN-C filtered words), extreme temporal speed (such as the time-compressed sentences at 60% compression), or the presence of noise (such as the speech presented over four-talker babble), the template-matching process may fail as the degraded speech is too far away in the auditory-motor space from a target. Humans cannot generate speech with the high-frequency content missing, talk as fast as the speech presented in the time-compressed sentences, or generate more than one voice at a time. We hypothesize that the auditory-motor template matching process is useful when children start to learn to speak and understand language, or perhaps when individuals learn a new language with unfamiliar phonemes. However, over time, we hypothesize that children begin to use the more ventral processing stream for comprehension and speech recognition (Hickok & Poeppel, 2007
). Children thus “directly” generate semantic information from the incoming auditory stream and rely less on the auditory-motor dorsal stream for speech perception. As the mean age of children included in our study (~10 years) is during a period of active gray matter pruning (Giedd et al., 1999
), connections to the auditory-motor dorsal stream may become less relevant and therefore, weaker, reflecting on the lower FA values seen.
A positive correlation between FA in the centrum semiovale was seen, however, for the time-compressed sentences with the lowest degree of compression. The range of subject scores was very limited, with even the worst-performing subjects still able to recall 57 out of the 60 words. Our interpretation is that the cognitive aspects of this task are heavily loaded towards short-term memory, as interpretation of the speech is not that difficult at this level of compression. Therefore, the positive correlation with this task performance reflects either auditory efferent control or subvocal articulatory representations carried by the dorsal pathway in order to maintain specific words in memory over a short period of time.
Our study has several limitations. The study sample is biased towards males. As differences have been shown in FA values between boys and girls (Schmithorst, Holland, & Dardzinski, 2008
) and sex-related differences have been shown in the relation of FA values to cognitive function (Schmithorst, 2009
) results can not necessarily be taken as representative for girls. Additionally, our study sample is biased towards the higher end in terms of overall intelligence and cognitive function, and so caution must be used generalizing to a larger population. The study is also limited by the small sample size (N = 17) and the restricted range of scores, especially for the time-compressed sentences with 40% compression. Keith (2000)
showed increased performance with age for the SCAN-C filtered words, as well as for auditory figure ground (another subtest of the SCAN-C test, which measures ability to comprehend speech-in-noise). Our results, for the SCAN-C filtered words and the BKB-SIN speech-in-noise test, show correlations of magnitude around R = 0.4; these results are not significant due to the small sample size.
It is also important to note that the estimates of partial correlation coefficients were obtained post-hoc. As shown in (Kriegeskorte, Simmons, Bellgowan, & Baker, 2009
), this can result in a biased estimator and overestimation of the effect sizes. Nevertheless, the partial correlations are indicative of a very robust effect, as it was detected even with our small sample size of 17 subjects. Additionally, the ROIs were determined from the filtered datasets, while the effect sizes were estimated using the unfiltered datasets, likely minimizing the effects of any existing bias.
In conclusion, the relation between brain anatomical connectivity and performance on higher-order auditory processing tasks was investigated in a cohort of normal-hearing children ages 9-11. Positive correlations between FA and task performance were seen in white matter adjoining prefrontal regions for the auditory processing tasks of speech in noise, time-compressed sentences at high (60%) compression, and low-pass filtered words. For those three tasks, negative correlations between performance and FA were found in the centrum semiovale. For the low-pass filtered words, positive correlations with FA were also found in the corpus callosum. For time-compressed sentences at low (40%) compression, positive correlations with FA were found with performance in the centrum semiovale. Our results support a dual-stream (dorsal and ventral) model of auditory processing, and that higher-order processing tasks rely less on the dorsal stream related to articulatory networks, and more on the ventral stream related to semantic comprehension.