Accuracy and RT data from the identification and ABX tasks are shown in . In the identification task, both PreP and PreN conditions showed a continuous, mostly linear change in identification accuracy along the continuum (). This is consistent with the suggestion that the participants did not have discrete representations for the sounds in either Pre conditions, and they did not spontaneously perceive the P sine-wave stimuli as speech. In contrast, after the participants were informed about the phonetic nature of the P stimuli, their performance became more categorical with them, in that the two ends of the continuum were consistently identified as /ba/ or /da/. This was not the case for the PostN condition, in which there was no significant change from the PreN performance.
Figure 2 Performance of the subjects on the identification and discrimination tasks in Pre (naïve) and Post (informed) conditions. Error bars indicate the standard error of the mean. (A) Accuracy and (B) RT in milliseconds (msec). “Across” (more ...)
The identification performances were assessed quantitatively by entering the β coefficients (slope parameters) obtained by logistic regression into a two-way repeated-measures analysis of variance with factors for training (Pre vs. Post) and sound type (P vs. N). There was a main effect of training [F(1, 27) = 4.82, p < .037], a main effect of sound type [F(1, 27) = 6.78, p < .015], and an interaction [F(1, 27) = 6.32, p < .019]. Post hoc comparisons with Tukey’s HSD tests revealed a significant increase in β from PreP to PostP ( p < .013) but no change in β from PreN to PostN ( p > .9).
On the discrimination task, performance did not vary in the Post condition compared to the Pre condition for both P and NP sounds. AC accuracy was better than WC accuracy in the PostP condition, consistent with categorical perception in that condition. However, this difference was already present in the PreP condition, before subjects could categorize the P sounds. For the N sounds, AC and WC discrimination did not differ for either PreN or PostN conditions. A three-way repeated-measures analysis of variance was carried out with factors for sound type (P, N), training (Pre, Post), and contrast (AC, WC). There were main effects of sound type [F(1, 27) = 6.15, p < .020], training [F(1, 27) = 7.54, p < .011], and contrast [F(1, 27) = 59.79, p < 10−6]. There was also an interaction between sound type and contrast [F(1, 27) = 36.18, p < 10−5]. No other interactions were significant. Post hoc comparisons using Tukey’s HSD revealed that AC accuracy was higher than WC accuracy in both PreP and PostP conditions (both p < .0001). The increase in AC and WC accuracy from PreP to PostP was not significant (both p > .32). There was no difference between AC and WC accuracy for PreN or PostN conditions, and the change in accuracy from PreN to PostN was also not significant (all p > .76).
The overall improvement in discrimination accuracy, from Pre to Post conditions, was the same for P and N conditions. The mean improvement in P was 8.9% (SD = 22.9), and in N it was 6.3% (SD = 12.7) ( p > .59).
The RT results () largely mirrored the accuracy data. In the identification task, RT was similar across most of the continuum in the PreP and PreN conditions. In the PostP condition, there was a reduction in RT for tokens at either end of the continuum and an increase in RT at the middle of the continuum (Token ba4), corresponding to the category boundary indicated by the accuracy data. The discrimination RT results also mirrored the accuracy data, in that RT was lower AC than WC for both PreP and PostP conditions.
In summary, the behavioral data indicate that participants did not have discrete category representations for either the P or N sounds in the Pre phase, but divided the P continuum into two perceptual categories in the Post phase. This change in categorization from nonphonetic to phonetic perception was reflected by a shift in identification but not in discrimination curves. Verbal reports of the participants after the Pre scan indicated that no participant had recognized the sounds in the Pre conditions as speech.
Functional Magnetic Resonance Imaging
The fMRI results for various conditions and contrasts are shown in and . The Appendix
lists peak and activation cluster information for the contrasts. Compared to the baseline, each condition showed extensive activation that included bilateral temporal, frontal, and parietal areas ().
Figure 3 Activation for each experimental condition compared to the baseline, overlaid on sagittal slices of an anatomical image of one subject. Captions on each image indicate the lateral distance (mm) of the slice from the anterior-to-posterior commissure line (more ...)
Activation maps for contrasts (A) PreP–PreN, (B) PostP–PreP, (C) PostN–PreN, (D) PostP–PostN, (E) Sound type × Scan interaction (PostP–PreP)–(PostN–PreN).
PreP versus PreN
No areas were found to be more active for PreP, whereas small clusters in the left and right posterior cingulate gyrus were found to be more active for PreN ().
PostP versus PreP
In this critical contrast, the only area activated more for the PostP condition was in the posterior left STG/STS. A number of areas showed higher activation for the PreP condition, including the bilateral posterior and anterior cingulate gyrus and the basal ganglia. The right superior and middle frontal gyri (SFG and MFG), precentral gyrus, and supramarginal gyrus (SMG), as well as a cluster on the left planum temporale, were also more active for the PreP condition ().
PostN versus PreN
No areas were more active for the PostN condition compared to the PreN condition. A number of areas were more active for the PreN condition, and these overlapped to a large degree with those activated for the PreP condition in the previous contrast. These included the bilateral anterior and posterior cingulate gyrus, intraparietal sulcus (IPS), basal ganglia, and MFG (right > left); the right precentral gyrus and fusiform gyrus; and the left STG/planum temporale ().
PostP versus PostN
The left inferior frontal gyrus (IFG), IPS, STG/STS, and precentral gyrus, as well as the bilateral MFG and pre-cuneus, and the right SFG, were activated more in the PostP than in the PostN condition. No areas were activated more for the PostN condition ().
(PostP − PreP) versus (PostN − PreN)
Some of the difference in activation between Pre and Post scans could be due simply to differences in task difficulty as a result of practice and training. Some of this training effect can be removed by comparing Post–Pre activation in phonetic and nonphonetic conditions (i.e., the interaction between training and sound type). In this contrast, positively activated areas included the bilateral MFG and IFG (right > left) and the posterior left STG and STS (). By comparison with the other contrasts, it is apparent that the positive values in the left STG/STS are due to a greater increase of activity with training for the P than for the N sounds, whereas positive values in other areas are due to a larger decrease in activity with training for the N sounds than for the P sounds. The right anterior STG showed negative values in the interaction contrast, due mainly to an increase in activation from the PreN to PostN condition.
Correlation with Behavioral Data
Behavioral performance on both identification and discrimination tasks varied across participants. Subjects also varied in their ability to hear the sine-wave sounds as speech, presumably leading to variation in the degree of categorical perception of the sounds. As noted in the Methods section, the CPI measures the change in degree of categorical identification, estimated by the change in slope of the logistic regression curve, from Pre to Post scans. High values indicate a change from continuous to categorical identification, whereas low values indicate little change. The mean CPI for the P sounds (CPIP) was 1.24 (SD = 2.63), whereas the mean CPI for the N sounds (CPIN) was −0.08 (SD = 0.94). CPIP and CPIN for individual subjects are shown in . The individual variation in behavioral performance provided an opportunity to examine whether the degree of activation in the different brain regions was correlated with the degree of change in categorical identification exhibited behaviorally by the participants, as measured by the CPI.
Categorical perception index (CPI) for phonetic and nonphonetic sounds for individual subjects, sorted according to the phonetic CPI. CPI measures the change in categorical perception from Pre to Post scans.
We correlated the CPIP for each participant with the level of activation in the PostP–PreP contrast in an ROI that included the bilateral lateral temporal lobes and the SMG. Two clusters in the left STG/STS and SMG were found to be correlated with the CPIP (). A scatterplot of the individual training-induced activation in the PostP – PreP contrast at the maximally correlated voxel in the posterior STS (Talairach coordinates −52, −34, 7) plotted against individual CPIP values is shown in . Because Spearman’s rank correlation was used, subjects’ rank is plotted on the x-axis.
Figure 6 (A) Areas in the lateral temporal lobes in the PostP–PreP contrast that are correlated with the CPI for P sounds (CPIP). (B) Areas in the lateral temporal lobes in the PostN–PreN contrast that are correlated with the CPI for N sounds (CPI (more ...)
Scatterplot of peak activation and CPI. (A) Activation at −52, −34, 7 in the PreP–PostP contrast and CPI for P sounds. (B) Activation at −56, −50, 18 in the PostN–PreN contrast and the CPI for N sounds.
We similarly correlated the CPIN with the level of activation in the PostN–PreN contrast in the same ROI. Although no additional information was provided about the N sounds prior to the Post scan, some variation in CPIN was observed. The mean and variance in CPIN were not as large as those of CPIP, as expected (across all subjects, β did not change significantly between PreN and PostN, as mentioned in the behavioral results). We were interested, however, in testing whether changes in categorical perception of nonspeech sounds might be correlated with the level of activation in temporal regions and whether the regions emerging in this analysis would overlap with those found to be sensitive to speech categorization. A cluster in the left SMG, extending into the posterior STS, was found to be correlated with the CPIN. Smaller clusters in the left anterior MTG and right SMG were also correlated with CPIN (). A scatterplot of the individual level of activation at the maximally correlated voxel in the left SMG (Talairach coordinates −56, −50, 18) plotted against the individual CPIN is shown in .
We then examined whether the area in the left posterior STG/STS that was correlated with CPIP in the PostP–PreP contrast was also correlated with CPIN in the PostN–PreN contrast. To this end, a spherical ROI with radius of 10 mm centered at the peak of the cluster in STS (Talairach coordinates −52, −34, 7) was created. Activation in the PostN–PreN contrast was correlated with CPIN (voxelwise p < .03, corrected p < .05) in a small cluster within this ROI (peak Talairach coordinates −51, −39, 10; cluster volume 94 μl).
Finally, we examined whether the correlations between CPI and the Pre to Post change in level of activation in the left posterior temporal region could be explained by the small general improvement in discrimination ability from Pre to Post scans rather than by changes in categorization ability. Overall change in discrimination (combining WC and AC trials) was calculated for each subject for both P and N sounds and correlated with the activation in the spherical ROI defined above, for PostP–PreP and PostN–PreN contrasts, respectively. No correlation between level of activation and overall change in discrimination was found in either analyses.