|Home | About | Journals | Submit | Contact Us | Français|
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning.
Keywords: category learning, fMRI, corticostriatal systems, speech, putamen
What neural mechanisms underlie language acquisition in adulthood? Learning speech sounds of a new language is argued to be a difficult category learning problem in adulthood. For instance, native Japanese speakers find it difficult to learn to categorize English /r/ versus /l/ sounds (Iverson et al. 2003). This difficulty is likely due to the high variability and multidimensional nature of speech categories (Hillenbrand et al. 1995; Jongman et al. 2000; Vallabha et al. 2007; Holt and Lotto 2008, 2010). Adequate feedback can significantly enhance speech category learning in adults (McCandliss et al. 2002; McClelland and Patterson 2002; Norris et al. 2003; Goudbeek et al. 2008). Trial-by-trial feedback is therefore ubiquitously used in speech training paradigms. However, little is known about the neural mechanisms underlying feedback-based error reduction in speech learning (Holt and Lotto 2008, 2010). Understanding the neural mechanisms mediating feedback-based learning is critical because subtle variations in feedback characteristics can significantly modulate speech learning rates (Chandrasekaran et al. 2014b). Furthermore, it would contribute to our general knowledge of the neural mechanisms involved in learning a second language.
Outside the speech domain, previous research examining visual category learning has identified at least two partially dissociable neural systems that process feedback: a reflective system, wherein processing is under conscious control, and a reflexive system that is not under conscious control (Ashby and Alfonso-Reese 1998; Poldrack and Packard 2003; Ashby and Ennis 2006; Nomura et al. 2007; Seger and Miller 2010). The reflective system, also referred to as the rule-based learning system in the literature, uses working memory and executive attention to develop and test verbalizable rules based on feedback (Maddox and Ashby 2004). It relies on an executive corticostriatal loop that primarily involves the dorsolateral prefrontal cortex (DLPFC), head of the caudate nucleus, the anterior cingulate cortex, and the hippocampus. These brain regions contribute to the generation, selection, and maintenance of verbalizable rules. In contrast, the reflexive learning system, also referred to as the procedural-based learning system, is not consciously penetrable, nonverbalizable, and operates by associating perception with actions that lead to immediate reward (Maddox and Chandrasekaran, 2014; Chandrasekaran et al. 2014a; Maddox et al. 2014). During reflexive learning, a single medium-spiny neuron in the striatum implicitly associates an abstract motoric response with a group of sensory cells. Learning occurs within cortical–striatal synapses, wherein plasticity is facilitated by a reinforcement signal from the ventral striatum (Ashby and Ennis 2006; Seger 2008). A recent study examining visual category learning showed that the putamen is critical in reflexive learning (Waldschmidt and Ashby 2011). Animal research has shown that both the reflective and reflexive circuitries receive direct input from several auditory regions (Reale and Imig 1983; Yeterian and Pandya 1998). While the role of the reflective auditory loop has been extensively studied (Romanski et al. 1999; Rauschecker and Scott 2009), much less is known about the role of the reflexive learning system in speech processing.
In the current study, we examined the hypothesis that optimal speech category learning is mediated by the neural circuitry underlying the reflexive learning system. We hypothesized that reflective learning of speech categories is difficult due to the multidimensional nature and high variability of speech categories. In addition, dimensions underlying speech categories are integral and often difficult to verbalize (Lisker 1986; Hillenbrand et al. 1995; Jongman et al. 2000; Vallabha et al. 2007; Holt and Lotto 2008, 2010). By definition, it is difficult to selectively attend to integral dimensions stimuli (Shepard 1964; Garners 1974; Ashby 1992a). Indeed, when the mode of stimulus presentation and the nature of the trial-by-trial feedback were manipulated in a recent behavioral study examining speech learning (Chandrasekaran et al. 2014b), learning was enhanced under conditions that were previously shown to augment reflexive learning in the visual domain (Maddox et al. 2003,2008). Computational modeling of behavioral data collected in a similar learning paradigm revealed that optimal speech category learning is associated with initial use of reflective strategies followed by a transition to the use of reflexive strategies (Maddox and Chandrasekaran 2014).
Despite this growing body of evidence which suggests that speech category learning is reflexive, there currently is no neural evidence of the relative role of the two learning systems in speech categorization. To this end, we employ a combination of behavioral, neural, and computational modeling methods to evaluate the mechanisms underlying feedback-dependent speech category learning. Specifically, we predict that optimal speech category learning will be associated with increased processing in the putamen, which is hypothesized to be involved in a “motor loop” that implicitly associates stimuli with category responses within the motor cortex. We use an individual differences approach as well as computational modeling to assess the mechanistic link between learning and computations within the domain-general learning systems. Adult native speakers of English (N = 23) learned novel speech categories (Mandarin tone categories, Fig. Fig.1)1) while blood oxygen level-dependent (BOLD) responses were collected. Participants made a category response to each stimulus, which resulted in positive or negative feedback. Neural activation during stimulus presentation and feedback processing were separately estimated using an optimized rapid event-related design. Behavioral accuracies were calculated and decision-bound models were applied at the level of individual participants to provide a window into cognitive processing and the computational strategies employed at different stages of category learning.
Native speakers of American English (age: 18–35; n = 25; 14 females) were recruited from the University of Texas at Austin community. Participants self-reported as being right-handed and passed a hearing screening examination (pure tone thresholds < 25 dB HL at 1, 2, and 4 kHz). Further, participants had no prior exposure to a tonal language, as determined by an abbreviated form of the LEAP-Q (Marian et al. 2007). Potential participants were excluded if they reported a current or past history of major psychiatric conditions, neurological disorders, hearing disorders, head trauma, or use of psychoactive drugs or psychotropic medication. Data from 2 male participants were excluded from all analyses due to file corruption or an incidental finding on the structural scan. The University of Texas at Austin IRB approved the experimental protocol.
Natural exemplars (N = 40) of the 4 Mandarin tones (high-flat, low-rising, high-falling, low-dipping) were produced in citation form by 2 native Mandarin speakers (originally from Beijing; 1 female) in the context of 5 monosyllabic Mandarin Chinese words (/bu/, /di/, /lu/, /ma/, /mi/). These syllables were chosen because they also exist in the American English inventory. The stimuli were normalized for RMS amplitude of 70 dB and duration of 0.4 s (Wong et al. 2009; Perrachione et al. 2011). Five independent native speakers correctly identified the 4 tones (>95%) and rated the stimuli as highly natural.
Participants performed a category learning task in the scanner while listening to the speech sounds presented through headphones. Visual stimuli including the instructions and feedback were displayed via the in-scanner projector visible using a mirror attached onto the head coil. Participants were equipped with a 2-button response box in each hand. Prior to scanning, participants underwent a brief training procedure in which they familiarized themselves with the association of keys to 4 possible responses. Tone learning procedures closely followed a previous study on visual category learning in the scanner (Nomura et al. 2007). The experiment consisted of 6 contiguous scans, or “learning blocks”. Prior to each block, participants were instructed to attend to the fixation cross on the screen. During each trial, an auditory stimulus was presented for 445 ms. Participants were instructed to categorize the sound into 1 of 4 categories. They were encouraged to guess even if they did not know the answer. Following a jittered stimulus–feedback interval, corrective feedback (“RIGHT” versus “WRONG”) was displayed for 750 ms (Fig. (Fig.1).1). If the participant failed to respond within the 2 s following stimulus onset, the response did not register and a cautionary feedback display was presented (“TIME”). Each stimulus was presented once within each block. The presentation order of the stimuli was pseudorandomized into a sequence common for all participants but different across learning blocks.
The participants were scanned using the Siemens Magnetom Skyra 3T MRI scanner at the Imaging Research Center of the University of Texas at Austin. High-resolution whole-brain T1-weighted anatomical images were obtained via MPRAGE sequence (repetition time [TR] = 2.53 s; echo time [TE] = 3.37 ms; field of view [FOV] = 25 cm; 256 × 256 matrix; 1 × 1 mm voxels; 176 axial slices; slice thickness = 1 mm; distance factor = 0%). T2*-weighted whole-brain blood oxygen level-dependent (BOLD) images were obtained using a gradient-echo multi-band EPI pulse sequence (flip angle = 60°; TR = 1.8 s; 166 repetitions; TE = 30 ms; FOV = 25 cm; 128 × 128 matrix; 2 × 2 mm voxels; 36 axial slices; slice thickness = 2 mm; distance factor = 50%) using GRAPPA with an acceleration factor of 2. To separately estimate neural responses to the stimulus from the response to the feedback, the stimulus–feedback and feedback–stimulus intervals were randomly jittered using samples from a uniform distribution (stimulus–feedback: 2–4 s; feedback–stimulus: 1–3 s; Fig. Fig.1;1; Dale 1999; Liu et al. 2001; Birn et al. 2002).
Each participant's response on each trial was coded as “correct” or “incorrect,” with the missed trials also being coded as incorrect. A mixed logit analysis was conducted to estimate the log odds of producing a correct response, using lmer (Bates et al. 2012). The fixed effect of interest was the number of the blocks (1–6) mean-centered to 0 (−2.5, −1.5, −0.5, 0.5, 1.5, 2.5). The model was corrected for by-participant random slopes for each block and the random intercept for each block.
The model fitting approach closely followed the methodology published in Maddox and Chandrasekaran (2014) and in other applications to speech and vision (Maddox 2002; Maddox et al. 2013, 2014; Maddox and Filoteo 2011; Chandrasekaran et al. 2014a). We fit each model on a block-by-block basis separately to the data from each participant to circumvent misleading interpretations from fits to aggregate data (Estes 1956; Ashby et al. 1994; Maddox 1999). We assumed that the 2-dimensional space (pitch height vs. pitch direction) displayed in Figure Figure11 accurately describes the perceptual representation of the stimuli. Previous multidimensional scaling studies suggest that these 2 dimensions explain a significant percentage of variance (Chandrasekaran et al. 2007). Based on the results from our earlier work (Maddox and Chandrasekaran 2014), we also assumed that participants applied category learning strategies separately to the male and female perceptual spaces (Fig. (Fig.1).1). We explored 3 classes of models: reflexive, reflective, and a random responder model. The model parameters were estimated using maximum likelihood procedures (Wickens 1982; Ashby 1992b). Model fits were compared using Akaike weights to determine the best fitting model for each participant in each block of trials (Wagenmakers and Farrell 2004; modeling analyses were also conducted using the Bayes Information Criterion (BIC). In every case the results mirrored those reported with AIC. We provide the results using BIC in the Supplementary Material).
The reflexive learning system was modeled using the Striatal Pattern Classifier (SPC; Ashby and Alfonso-Reese 1998; Maddox et al. 2002; Seger and Cincotta 2005; Ashby and Ennis 2006; Nomura et al. 2007). The model reflects the many-to-one mapping from the primary and secondary auditory cortices along the superior temporal gyrus to the striatum (Yeterian and Pandya 1998), where a low-resolution map of the perceptual space is represented among different striatal units. Category learning involves associating each category label with a cluster of striatal medium-spiny neurons (Hikosaka et al. 1989; Wilson 1995; Arnauld et al. 1996; Yeterian and Pandya 1998; Ashby and Ennis 2006). We model this association by assuming that each category is represented by a striatal “unit” in the pitch height–pitch direction space. The SPC assumed 4 striatal units in the 2-dimensional pitch height–pitch direction space for the male speakers and a separate 4 striatal units in the pitch height–pitch direction space for the female speakers. The SPC contained 6 free parameters in each space: 5 that determine the location of the units, and one that represents the noise associated with the placement of the striatal units. The versions of SPC have already been applied in an artificial auditory category learning task (Maddox et al. 2006), vowel categorization task (Maddox et al. 2002), and Mandarin lexical tone learning (Maddox et al. 2013, 2014). It is important to note that the SPC is a computational model inspired by what is known about the neurobiology of the striatum. Because of this fact, the striatal “units” are hypothetical and could be interpreted within the language of other computational models (e.g., as “prototypes” in a multiple-prototype model like SUSTAIN; Love et al. 2004).
A series of unidimensional reflective models was also fit to the data. The unidimensional reflective models assumed that the participant set 3 criteria along the pitch height or pitch direction dimension, ignoring the other dimension. The unidimensional height model assumed that the 3 criteria along the pitch height dimension were used to separate the stimuli into low, medium-low, medium-high, or high pitch height, each of these being associated with one of the tone categories, while ignoring the pitch direction dimension. Although a large number of versions of this model are possible, we explored the 8 variants of the model that made the most reasonable assumptions regarding the assignment of category labels to the 4 response regions. Using the convention that the first, second, third, and fourth category labels are associated with low, medium-low, medium-high, and high pitch height, respectively, the 8 variants were: 3214, 3412, 3241, 3421, 2314, 4312, 2341, and 4321. The unidimensional direction model assumed that the 3 criteria along the pitch direction dimension were used to separate the stimuli into low, medium-low, medium-high, or high pitch direction, each of these being associated with one of the tone categories, while ignoring the pitch height dimension. Although a large number of versions of this model are possible, we explored the 2 variants of the model that made the most reasonable assumptions regarding the assignment of category labels to the 4 response regions. Using the convention that the first, second, third, and fourth category labels are associated with low, medium-low, medium-high, and high pitch direction, respectively, the 2 variants were: 4312 and 4132. The unidimensional models each contained 4 free parameters in each space: 3 criteria and one noise parameter. The random responder model assumed a fixed probability of responding tone 1, tone 2, tone 3, and tone 4, allowing for response biases. The model had 3 free parameters in each space to reflect the predicted probability of responding “1,” “2,” or “3”, the probability of responding “4” being equal to 1 minus the sum of the other 3.
Finally, a more complex conjunctive reflective model was also considered in a secondary analysis. In previous behavioral pilot work, we elicited verbal descriptions of the 4 categories after the category learning task. No participant reported a conjunctive reflective strategy, although several described unidimensional strategies. However, since a conjunctive model is theoretically possible, we conducted a separate analysis using this model as a possibility. The model assumed that the 2 criteria along the pitch direction dimension are used to separate the stimuli into falling, flat, or rising pitch direction. Falling pitch direction items are classified as tone category 4 and rising pitch direction items as tone 2. If an item is classified as flat pitch direction, the pitch height dimension is examined. The single criterion along the pitch height dimension is used to separate the stimuli into low and high pitch height. Stimuli that have flat pitch direction and high pitch height are classified as tone 1 and flat pitch direction items of low pitch height as tone 3. This model contained 4 free parameters in each space: 3 criteria and one noise parameter. Inclusion of this model did not alter the main findings of the study, and therefore we only present the findings of this secondary analysis as Supplementary Material.
To assess the strategy selection by participants over the course of learning blocks, a linear mixed effects analysis was applied to the set of best fitting models per block for each participant (Bates et al. 2012). Mean-centered block numbers were included as the dependent variable, with the best fitting strategy being the fixed effects (the reflexive model as the reference), corrected for by-participant random intercepts. Finally, an analysis was run to examine whether reflexive strategy use was associated with better learning than nonreflexive strategies. The dependent variable was trial-by-trial accuracy. The fixed effects were the mean-centered block numbers and whether the participant was using a reflexive strategy. The model was corrected for a random intercept of each participant, as well as the random slope of block by strategy interaction for each participant.
fMRI data were analyzed using FMRIB's Software Library Version 5.0 (Smith et al. 2004; Woolrich et al. 2009; Jenkinson et al. 2012). BOLD images were motion corrected using MCFLIRT (Jenkinson et al. 2002). All images were brain-extracted using BET (Smith 2002; Jenkinson et al. 2005). Registration to the high-resolution anatomical image (df = 6) and the MNI 152 template (df = 12; Grabner et al. 2006) was conducted using FLIRT (Jenkinson and Smith 2001; Jenkinson et al. 2002). Six separate block-wise first-level analyses were run within-subject. The following prestatistics processing was applied: spatial smoothing using a Gaussian kernel (FWHM = 5 mm); grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor; high-pass temporal filtering (Gaussian-weighted least-squares straight line fitting; σ = 50.0 s). Each event was modeled as an impulse convolved with a canonical double-gamma hemodynamic response function (phase = 0 s). Motion estimates were modeled as nuisance covariates. Temporal derivative of each event regressor, including the motion estimates, was added. Time-series statistical analysis was carried out using FILM with local autocorrelation correction (Smith et al. 2004). The events of interest were stimulus, response and feedback, which were further subdivided according to the accuracy valence: correct, incorrect, and missed. The missed trials were treated as nuisance variables.
First-level analysis results were committed to second-level analysis using fixed effects with 3 regressors: group average, mean-centered block numbering, and mean-centered accuracy per block per participant. The latter 2 regressors were included as nuisance variables to counteract systematic trends in the data across multiple blocks. Third-level group analysis was performed for each contrast using FLAME1 (Woolrich et al. 2009). Poststatistical analysis was performed using randomise in FSL to run permutation tests (n = 50 000) for the GLM and yield in threshold-free cluster enhancement (TFCE) estimates of statistical significance (Freedman and Lane 1983; Kennedy 1995; Bullmore et al. 1999; Anderson and Robinson 2001; Nichols and Holmes 2002; Hayasaka and Nichols 2003). Finally, in order to assess the activation patterns associated with optimal learning, the first-level analysis from the final block was committed to a second-level analysis with 2 regressors of group average and mean-centered accuracy for the final block (Table (Table11).
Four Region of Interests (ROIs), chosen a priori, were defined: (1) left and right DLPFC and (2) left and right putamen. The DLPFC were anatomically defined using the Brodmann areas 9/46 (Spence et al. 2000; Pochon et al. 2002; Curtis and D'Esposito 2003; Anderson et al. 2004) per the atlas included in the MRIcron package (Rorden 2007). The putamen was anatomically defined using the Harvard–Oxford Subcortical Atlas (Frazier et al. 2005; Desikan et al. 2006; Makris et al. 2006; Goldstein et al. 2007). The masks were linearly registered to the MNI152 space (Grabner et al. 2006) using FLIRT (Jenkinson and Smith 2001; Jenkinson et al. 2002; Fig. Fig.5).5). Percent signal changes in the (correct − incorrect) contrast for feedback processing were calculated by first linearly registering the ROIs to the individual BOLD spaces using FLIRT with the appropriate transformation matrices generated from the first-level analysis and nearest neighbor interpolation (Jenkinson and Smith 2001; Jenkinson et al. 2002). Then, the contrast parameter estimate images were masked for the transformed ROIs, multiplied by the height of the double-gamma function for the stimulus length of 1 s (0.0288), converted into percent scale, divided by mean functional activation, and averaged within the ROI using fslmaths (Mumford 2007).
The average performance for the initial block was 23% (standard deviation [SD] = 9%; 95% confidence interval [CI] [19%, 26%]), close to the chance level of 25%. By the final block, average performance was 54% (SD = 27; 95% CI [42%, 66%]). The performance in the initial and final blocks were positively correlated, r(21) = 0.432, P = 0.040, 95% CI [0.024, 0.716]. A mixed effects analysis was conducted to assess the learning progress. The dependent variable was trial-by-trial accuracy (correct vs. incorrect), and the fixed effect was the mean-centered block number. The intercept was not significant, b = −0.16, standard error (SE) = 0.23, z = −0.72, P = 0.47, 95% CI [−0.63, 0.30]. The effect of the mean-centered block number was significant, b = 0.32, SE = 0.70 z = 4.61, P < 0.0001, 95% CI [0.18, 0.47], indicating an overall learning effect across blocks (Fig. (Fig.22).
Participants were observed to use various strategies, including all models considered in the process. As described above, for each block, a strategy (reflexive, reflective pitch height, reflective pitch direction, or random responder) was assigned according to the best fitting model. Several notable patterns could be verified from the mixed effects analysis estimating the average block number for each of the assigned strategies. Since the block numbers were mean-centered, positive estimates for each level of strategy would indicate that the given strategy was more likely to be utilized late in learning (block 4, 5, or 6), while negative estimates would indicate that the given strategy was more likely to be utilized early in learning (block 1, 2, or 3). The mean block for the reflexive strategy (intercept) was significant, b = 0.79, SE = 0.25, t = 3.10, P = 0.0024, 95% CI [0.29, 1.28], indicating that a given reflexive strategy was more likely to be utilized late in learning. The random responder model was significant, b = −1.29, SE = 0.36, t = −3.60, P = 0.00044, 95% CI [−1.98, −0.59], indicating that the random responder strategy was more likely to be utilized in learning earlier than the reflexive strategy. Similar patterns were observed to be statistically significant for unidimensional reflective strategies, which were utilized earlier in learning than the reflexive strategy: pitch direction, b = −1.29, SE = 0.60, t = −2.13, P = 0.035, 95% CI [−2.46, −0.11]; pitch height: b = −0.94, SE = 0.35, t = −2.66, P = 0.009, 95% CI [−1.63, −0.25]. Taken together, these results indicate that the slow-learning reflexive strategy was more likely to be utilized late in learning, whereas the fast-learning reflective or random responder strategies were more likely to be utilized early in learning (Fig. (Fig.2).2). In an analysis designed to test whether the reflexive strategies yielded better learning outcomes, a logistic regression was conducted with the trial-by-trial accuracy as the dependent variable and the mean-centered block number, block-by-block strategy, and their interaction term as fixed effects. There were 2 levels in the block-by-block strategy term: reflexive versus nonreflexive (reference level). There was a nonsignificant interaction between block number and strategy, b = −0.96, SE = 0.84, z = −1.15, P = 0.25. Therefore, we focused on a model that only included the main effects. For the average block number (between 3 and 4), the log odds of producing an accurate response compared with an inaccurate response for the nonreflexive strategy was negative, b = −0.37, SE = 0.17, z = −2.18, P = 0.030, indicating the probability of an accurate response was significantly below 50%. The block effect was significant, b = 0.26, SE = 0.057, z = 4.58, P < 0.0001, indicating that the odds of producing an accurate response compared with an inaccurate response was higher for later blocks than for earlier blocks. The strategy effect was significant, b = 0.39, SE = 0.19, z = 2.07, P = 0.038, indicating that reflexive strategy use, compared with nonreflexive strategy use, was associated with increased odds of producing an accurate response compared with an inaccurate response. These results suggest that learning improved over time, and that reflexive strategy use was associated with better learning than nonreflexive strategies.
Averaging across correct and incorrect responses (correct + incorrect) did not yield any significant activations associated with feedback processing. Testing whether the activation for correct trials was higher than for incorrect trials (correct − incorrect; Fig. Fig.3)3) yielded areas associated with the corticostriatal loops involved in category learning (Seger 2008, 2010). The ventral striatum including the nucleus accumbens was activated, as well as the anterior cingulate cortex. These 2 areas form a part of the motivational loop that processes reward value in feedback, which is greater in positive than negative feedback. The left dorsolateral prefrontal cortex and the left head of caudate were activated, which are parts of the executive loop that form the basic circuitry underlying reflective learning. The bilateral putamens were activated, which are involved in the categorization process via the connection to the motor regions. The left inferior parietal lobule was activated, which functions as the sensorimotor interface that maps sensory speech information onto articulatory gestures (Hickok and Poeppel 2007). Finally, the left middle temporal gyrus/superior temporal sulcus region was activated. During feedback processing, there was no meaningful auditory stimulus to be processed, and the level of auditory sensory input was identical across positive and negative feedback. Therefore, the activation in the superior temporal area as well as the inferior parietal lobule was presumably not driven by the auditory stimulus alone but reflects feedback-driven strengthening of stimulus-to-response/category association (Weil et al. 2010). No brain region showed significantly higher activation for incorrect trials than for correct trials (incorrect − correct).
Averaging across the accuracy valence (correct + incorrect), stimulus presentation was found to elicit activation in the bilateral Heschl's gyri, planum temporales, and the posterior superior temporal gyri concurrent with the auditory nature of the task. Activation for correct trials was higher than for incorrect trials in the right planum temporale and the insular cortex, and the left pre- and postcentral cortices (correct − incorrect; Fig. Fig.4).4). Also, the right inferior parietal lobule was shown to be sensitive to accurate categorization, consistent with its proposed role as the sensorimotor interface between auditory processing and articulatory mapping (Hickok and Poeppel 2007). No brain regions showed higher activation for incorrect trials than for correct trials (incorrect − correct).
In order to assess the activation patterns associated with optimal learning, the final block contrast for correct trials relative to incorrect trials (correct − incorrect) during stimulus perception was regressed against the accuracy scores from the final block. Following this analysis, individual accuracy scores were found to positively correlate with increased activation in the speech processing areas of the bilateral Heschl's gyrus, right inferior parietal lobule, right inferior frontal gyrus, and the bilateral insula. Additionally, higher accuracy was also associated with increased activation in the bilateral putamen, right caudate nucleus, the motor cortex, and the anterior cingulate cortex, suggesting that better performance in the final block was related to the involvement of the corticostriatal learning systems, and in particular, the motor loop encompassing the motor cortex and the putamen (Fig. (Fig.5).5). No brain regions showed negative correlation with the accuracy scores.
Averaging across correct and incorrect responses (correct + incorrect), the activation associated with category response involved several areas within the extensive cortical networks. The bilateral pre- and postcentral areas were activated, reflecting finger movements necessary for making category responses. The decision making network involving the left dorsolateral prefrontal cortex and the anterior cingulate cortex were activated, reflecting the categorization process during response selection. The activation for correct and incorrect trials did not significantly differ (correct − incorrect; correct + incorrect).
This analysis tested the hypothesis that the putamen is involved when category learning is mediated by the reflexive processing system. Participants were classified as reflexive versus nonreflexive (reflective or random) strategy users based on the best fitting model in each block. Mixed effects analyses were performed on the putamen and DLPFC on the left and right hemispheres. The dependent measure was the percent signal change (correct − incorrect) value during feedback processing in each block. The fixed effects were the mean-centered block number, and strategy group (reference level: nonreflexive), corrected for random participant intercepts. In the left putamen, the block by strategy interaction was not significant, b = −0.15, SE = 0.82, t = −1.79, P = 0.076, 95% CI [−0.31, 0.013]. Therefore, we investigated the model without the interaction. The strategy effect was significant, b = 0.18, SE = 0.76, t = 2.31, P = 0.023, 95% CI [0.027, 0.324], suggesting that reflexive strategy use was associated with increased activation in the putamen for positive feedback processing relative to negative feedback processing, although Bonferroni correction for the number of ROIs (n = 4) renders this effect only marginally significant (corrected P = 0.091). The block effect was not significant, b = −0.058, SE = 0.035, t = −1.65, P = 0.10, 95% CI [−0.13, 0.011]. The intercept was not significant, b = 0.054, SE = 0.041, t = 1.33, P = 0.19, 95% CI [−0.025, 0.13]. No effects were significant in other ROIs (Fig. (Fig.66).
We examined the neural mechanisms underlying nonnative speech category learning in adults. Based on an extensive review of previous behavioral work (Ashby and Maddox 2005; Chandrasekaran et al. 2014b; Maddox et al. 2013, 2014; Maddox and Chandrasekaran 2014), we predicted that speech categories would be optimally learned via corticostriatal circuitry involved in reflexive learning (Seger 2008; Seger and Miller 2010). In this study, computational modeling of behavioral response strategies revealed an increase in the use of reflexive strategy, and a decrease in the use of reflective or random strategy with experience. Reflexive strategy use was associated with increased activation in the putamen during feedback processing. Final block categorization accuracy was associated with increased stimulus-related activation in the auditory areas that have been previously implicated in speech category learning. These include Heschl's gyrus (Wong et al. 2008), inferior parietal lobule (Gandour, Dzemidzic et al. 2003, Gandour, Wong et al. 2003), and the insular cortex (Wong et al. 2004). Furthermore, individual learning success was associated with activation in the putamen and the motor cortex. These areas have not been directly implicated in speech processing, but are considered to be key components of the corticostriatal motor loop that forms the reflexive category learning system (Seger 2008; Seger and Miller 2010). These behavioral, computational modeling and neuroimaging results help specify the mechanisms underlying feedback-dependent error reduction during speech learning. While speech learning has been mostly viewed as a perceptually encapsulated process in previous research, our findings represent an important conceptual advance in understanding the neurobiological basis of domain-general learning systems during speech processing.
Positive feedback, relative to negative feedback, activated several functional loops within the corticostriatal system. These included the ventral striatum, a part of the motivational loop, which is critical in processing the reward value during corrective feedback (Seger 2008; Seger and Miller 2010). These results are consistent with previous work showing that the ventral striatum was more active during positive than negative feedback (Seger et al. 2010). In addition, the DLPFC, anterior cingulate, and the putamen were more active during positive feedback. The DLPFC and the anterior cingulate are key components of the reflective executive loop, which is involved in the explicit processing of trial feedback (Seger 2008; Seger and Miller 2010). These areas have been found to be more active on correct categorization trials during visual learning (Seger et al. 2010). The DLPFC is hypothesized to generate and store verbalizable rules, which are either retained or discarded by the anterior cingulate cortex depending on the valence of the feedback (Ashby and Alfonso-Reese 1998; Ashby and Ell 2001; Maddox et al. 2003; Ashby and Maddox 2005).
In addition to the DLPFC and the anterior cingulate cortex, which are parts of the reflective learning system, positive feedback also increased the activation in the putamen. The putamen, considered a part of the reflexive corticostriatal motor loop (Seger 2008; Seger and Miller 2010), is involved in the selection of appropriate motor responses based on prior experience. The putamen is therefore posited to be involved in procedural learning. Studies have shown that changing the button-to-category associations interfere with reflexive learning but not with reflective learning (Ashby et al. 2003; Maddox et al. 2004, 2010; Spiering and Ashby 2008). Indeed, a recent neuroimaging study suggested that the putamen is integral to reflexive learning of visual categories (Waldschmidt and Ashby 2011). The involvement of the motivational, executive (reflective), and motor (reflexive) loops is consistent with the predictions from the visual category learning literature. Overall, these results demonstrate a functional role for domain-general corticostriatal category learning systems in speech learning. During feedback processing, the ventral striatum responds to the reward value in positive feedback, the DLPFC and the anterior cingulate cortex generate and select rules based on the content of feedback, and the putamen is activated to transform stimuli representations onto procedural responses.
Outside the corticostriatal category learning areas, particularly noteworthy is the activation of the speech-related auditory areas including the left superior temporal sulcus/middle temporal gyrus. Feedback was presented in the visual modality, and the level of sensory auditory stimulation did not vary across positive and negative feedback. Similar positive feedback-driven activation in sensory regions has been previously reported in the visual domain when the feedback was presented in the auditory domain (Weil et al. 2010). The activation of the visual cortex during positive feedback has been interpreted as evidence for the modulation of early sensory regions by the reward processing network. Indeed, we can interpret our results within this framework. The left STS/MTG regions have been shown to be important for auditory speech processing (Hickok and Poeppel, 2007; Rauscheker and Scott, 2009). Activation of these regions during positive feedback may reflect a strengthening of the sensory representation of the rewarded stimulus, driven by the reward processing network. Future work would need to include more trials and effective connectivity analyses to test the possibility of a causal relationship (i.e., the influence of the reward processing network on sensory regions).
Positive feedback also activated the inferior parietal lobule, which is presumed to be an integral part of the phonological network (Hickok and Poeppel 2007). The IPL has been previously conceptualized as a temporary buffer in phonological working memory (Koelsch et al. 2009), especially regarding comparison and decision making (Strand et al. 2008). The auditory input is only available in the form of sensory memory trace during feedback presentation (Sams et al. 1993; Haenschel et al. 2005). Thus, the IPL activation during positive feedback may reflect the mapping of stored representation of the auditory stimulus onto the phonological categories (Buchsbaum and D'Esposito 2008; McGettigan et al. 2011). Since negative feedback does not directly provide stimulus-to-category information, we hypothesize that the IPL is less active during negative feedback condition. Thus positive feedback engages the reward processing network and may provide the critical learning signal for stimulus-to-category mapping within the IPL.
To conclude, positive feedback activates a large corticostriatal network. The reward value of positive feedback is thought to be processed in the ventral striatum. Category learning likely occurs within the reflective (dorsolateral prefrontal cortex and the anterior cingulate cortex) and the reflexive (putamen and the motor cortex) networks (Seger 2008; Seger and Miller 2010). Finally, positive feedback may strengthen the sensory representation of the rewarded stimulus and may promote stimulus-to-category mapping within the phonological network.
Individual learners adopt different speech category learning strategies depending on the stage of learning and individual capacity (Maddox and Chandrasekaran 2014). Computational modeling enables direct assessment of this variability in individual response strategies. In this study, response strategies were modeled in each learning block separately for each participant. Multidimensional scaling studies have shown that Mandarin tone categories are most parsimoniously distinguished using 2 pitch dimensions (height and direction; Chandrasekaran et al. 2010). In the current study, the 40 stimuli were embedded in a 2-dimensional space defined by average pitch height and average pitch direction. We hypothesized that the optimal strategy is reflexive and requires a predecisional integration of information across dimensions. Nonoptimal strategies were also explored that were either reflective, relying on only one of the 2 dimensions, or was random. The modeling results revealed that the typical trend was for participants' early learning to be characterized by the use of reflective or random responder strategy and their late learning to be characterized by the use of a reflexive strategy. In addition, the results suggested that reflexive strategy users, as determined on a block-by-block basis, were more accurate in the task. Therefore, learners initially used reflective strategies, but switched to the more optimal reflexive strategies as they gain expertise. This latter interpretation was supported by the mixed effects modeling result which showed that reflexive strategy use was associated with better learning outcomes.
Reflexive category learning is dependent on the mapping of the perceptual experience of the stimulus onto motor response associated with the appropriate category. The cortical sensory input is relayed to the striatum via many-to-one convergent connections, which give rise to a low-resolution stimulus representation (Wilson 1995; Ashby and Ennis 2006). These striatal units allow association of stimuli to category responses, and these corticostriatal connections form the basis of reflexive category learning. The putamen is a strong candidate in this type of plasticity since it exhibits greater connectivity to the auditory association cortices relative to the caudate nucleus (Di Martino et al. 2008). The putamen is also involved in perceptual processing of auditory stimuli (Geiser et al. 2012), and has been implicated in visual category learning research to be critical to reflexive learning (Waldschmidt and Ashby 2011). Our results showed that reflexive strategy use was associated with increased activation in the left putamen during feedback processing. However, no pattern pertaining to the reflexive strategy use could be found in the DLPFC, suggesting that the optimal strategy use is not the result of increased prefrontal reflective processing. The current study, therefore, supports the prediction that speech category learning is reflexive-optimal, and that reflexive strategy involves putamen during feedback processing.
During stimulus perception, individual variability in learning performance was associated with the involvement of the corticostriatal motor loop. In the visual category learning literature, the relative dominance of the reflective and reflexive learning systems is dependent on the stage of learning. Early learning is dominated by the executive, reflective learning system (Smith et al. 2012a; 2012b), but later stages of learning are associated with increased automaticity and putamen activation (Haruno and Kawato 2006; Williams and Eskandar 2006; Seger 2009). During reflexive learning, a single striatal “unit,” presumed to be located within the putamen, implicitly associates an abstract cortical–motor response with a large group of sensory cells within the sensory association cortex (Matelli and Luppino 1996; Seger 2008; Seger et al. 2010; Waldschmidt and Ashby 2011). Synaptic plasticity in the striatal cell is facilitated by a dopamine-mediated reinforcement reward signal from positive feedback, which is processed via the motivational loop that connects the ventral striatum and the anterior cingulate cortex (Seger 2008; Seger et al. 2010). In later stages of learning, the dopamine-mediated reinforcement signal becomes more consistent, allowing a stronger association between the stimulus and an accurate category label. As discussed earlier, optimal speech category learning is thought to be reflexive, given its multidimensionality and high variability. Therefore, optimal speech category learning necessitates a switch to the reflexive strategy (Chandrasekaran et al. 2014b), which is likely based on the activity of the loop between the putamen and the motor cortex. This prediction was reflected in the finding that the individual variability in learning performance was associated with increased involvement of the putamen and the motor cortex for correct trials relative to incorrect trials, during stimulus perception.
The putamen receives convergent input (10 000 to 1) from the cortex, as do other parts of the striatum (Wilson 1995). These cortical afferents originate from the prefrontal cortex in the rostral putamen (Selemon and Goldman-Rakic 1985) and the motor and somatosensory areas (Alexander and DeLong 1985) and the superior temporal auditory areas (Yeterian and Pandya 1998) in the caudal putamen. The putamen has been purported to be involved in episodic memory, cognitive control, and category learning, in addition to motor processing. In fact, the putamen has been suggested to be the ideal site of acquisition of stimulus-to-response associations, where sensory stimuli are mapped onto context-specific motor activity that lead to favorable outcomes (Ell et al. 2011). Despite the many functions of the putamen, in the context of the current study, the putamen activation patterns are best interpreted as reflecting reflexive strategy use, as reflexive learning of the speech categories necessitate implicit associations between the speech sounds and behavioral category responses. The findings from this experiment indicate that increased putamen activation was associated with reflexive strategy use, as well as learning success in the final block. However, caution should be taken in inferring process from the activation of the putamen to a specific task function. Further studies are required to confirm the role of the putamen in reflexive learning of speech categories. The various functions of the putamen relate to different anatomical regions of this structure (Ell et al. 2011). High-resolution mapping of the putamen may help clarify the specific role of the putamen in speech learning.
Category learning plays a vital function in human cognition. Speech category learning in adulthood is difficult, but feedback-dependent training can lead to successful speech categorization. In this study we examined the computational and neural mechanics underlying feedback-dependent speech categorization, using a dual-systems approach developed in the visual domain. Considering the complexity of speech categories, it was hypothesized that optimal speech category learning would be associated with the reflexive system. Computational modeling results revealed that the learners were initially biased towards the reflective system, but gradually discarded it in favor of the reflexive system. Throughout learning, reflexive strategy use was associated with better learning performance. Positive feedback was associated with increased activation in reflective and reflexive circuitries. In addition, positive feedback also activated the ventral striatum, a key component of the motivational loop, as well as several regions associated with auditory and speech processing. Furthermore, reflexive strategy use was associated with increased activation in the putamen, which is part of the motor loop that implicitly maps stimuli onto category responses in the motor cortex. Finally, increased activation of this motor loop during stimulus perception was associated with more accurate categorization. The neurocomputational and individual differences approach reveal that successful speech category learning is critically dependent on domain-general corticostriatal learning systems.
Research reported in this publication was supported by the National Institute On Deafness And Other Communication Disorders of the National Institutes of Health under Award Number R01DC013315 834 (BC), and by the National Institute on Drug Abuse under Award Number DA032457 (WTM).
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Conflict of Interest: None declared.