3.1. Behavioral Results
To ensure that behavioral responses to different trial types did not contribute to the decoding of words and semantic categories, we first analyzed the accuracy and response times of button presses (to large objects) for all subjects. Accuracy of behavioral responses ranged from 71.6 to 95.5% with a mean of 90.3±1.4% across subjects. Mean response times varied from 760 to 1152ms with a cross-subject mean of 943±27ms. Mean accuracies for living and non-living object categories across subjects were 90.4±1.6% and 90.2±1.6% respectively. Mean response times for living and non-living object categories were 947±30ms and 962±25ms. Accuracies and response times were not significantly different between living or non-living object trials for any of the subjects (Wilcoxon sign-rank, p>0.05). It is therefore unlikely that differential behavioral responses influenced subsequent decoding analyses. Accuracies were not significantly different between SV and SA tasks (Wilcoxon, p>0.05), although mean response times were shorter for the visual task (SV: 864ms, SA: 1023ms, Wilcoxon, p<0.00001). As expected, response times were shorter for repeated versus novel words (repeated: 868ms, novel: 1023ms, Wilcoxon, p<0.001). Mean accuracies and times were not significantly different between individual repeated words for any subject (ANOVA, p>0.05).
3.2. SVMs allow for decoding of semantic category
We first attempted to train an SVM to decode living versus non-living objects. The SVM was trained separately on EEG features, MEG features, and both combined. illustrates the decode accuracies after averaging 5 trials (chance accuracy = 50%). When utilizing EEG features alone, data from 7 of the 9 subjects in the SV task and 6 of 9 in the SA task, showed statistically significant decoding accuracy (permutation test, p<0.05). When utilizing MEG features alone, data from 8 of the 9 subjects in SV and 7 of 9 in SA showed significant decoding accuracy (permutation test, p<0.05). Statistically significant decoding accuracy was obtained in data from all subjects when utilizing combined EEG and MEG features in both SV and SA tasks. When utilizing combined EEG and MEG features, accuracies ranged from 63–86% (mean±s.e. = 76±2%) for the SV task and 62–91% (mean±s.e. = 75±3%) for the SA task. Training on both MEG and EEG features increased accuracies by an average of 12% for the SV task and 10% for the SA task over using EEG features alone and 8% (SV) and 4% (SA) over MEG features alone (Wilcoxon sign-rank, p<0.05). Accuracies for the SV and SA task were not statistically different in any set of features when discriminating between living and non-living objects (Wilcoxon, p>0.05). These results suggest that high-dimensional machine-learning algorithms, such as SVMs, are able to robustly extract semantic category information from multichannel electro/magneto-physiological recordings.
Decoding accuracy when distinguishing between living and non-living objects or individual words
To explore the effect of the number of trials averaged on decoding accuracy, we also performed a leave-n-out cross-validation on all sets of features with all subjects ( inset panels). Not surprisingly, increasing the number of trials averaged resulted in increased decode performance in all cases. However, averaging more than approximately 7 trials resulted in only marginal additional increases in performance.
3.3. SVMs accurately decode individual word representations
We subsequently examined SVM decoding of individual word representations utilizing multiclass SVMs. We trained and tested classifiers on either the 5 repeated non-target (small objects) or target words (large objects) to decode individual word representations without the potential motor confound (chance accuracy = 20%). The requirement for a motor action (button-press when the presented object was larger than one foot) may result in the decoding of that volitional response, rather than word processing information per se, when examining differences between all 10 words. The ability of the classifier to predict the observed word was statistically significant for all subjects after averaging 5 trials in at least one set of features (permutation test, p<0.05) (). Accuracies varied from 32–79% (mean±s.e. = 60±5%) using combined EEG/MEG features for the SV task (chance accuracy is 20%). For the auditory task, accuracies varied from 66–97% (mean±s.e. = 83±4%). Training the SVM classifier on both EEG and MEG features increased average decode performance by 18% for the SV task and 29% for the SA task over using EEG features alone and 2% (SV) and 7% (SA) over MEG features alone (Wilcoxon, p<0.05). The decode accuracies of the SV and SA tasks when utilizing solely EEG features were not significantly different (Wilcoxon, p>0.05). However, utilizing MEG alone or both feature types resulted in significantly better performance in the SA data than utilizing the corresponding feature sets in the SV data (Wilcoxon, p<0.01).
The SA task contained twice as many trials as the SV task (780 for SA versus 390 for SV) which may have resulted in the difference in decoding accuracy between the two presentation modalities. By utilizing only the first 390 trials of the SA task, accuracy of the multiclass decoder after averaging 5 trials (mean±s.e. = 61±4%) was not significantly different from SV performance (mean±s.e. = 60±5%) (Wilcoxon, p>0.05).
Again, increasing the number of trials averaged increases decode performance substantially ( inset panels). In the case of individual word decoding for the SV task, there is a slight decrease in accuracy when the number of trials averaged is increased from 6 to 8. This is likely due to the fact that increasing the number of trials averaged causes a corresponding decrease in the number of trials used for training the SVM, leading to a less robust classifier. This is especially pronounced in the multiclass SV case because of the relatively smaller number of total trials per condition when compared to the SA task. These data also illustrate that combining EEG and MEG features improves accuracy over either feature set alone. Taken together, these results demonstrate surprisingly robust ability to decode individual words from spatiotemporal features computed from multichannel electrophysiology.
3.4. Linear probabilistic decoders are unable to handle high-dimensional data
While a decoding analysis is a powerful method for exploring electro/magneto-physiological data, not all classification algorithms are suited for such an analysis. To demonstrate the advantages of utilizing machine-learning techniques robust to high-dimensional data, we compared the decoding accuracy obtained when using SVMs (sections 3.2 and 3.3) to the use of a popular probabilistic classifier. Because traditional Fisher linear discriminant analysis and Bayesian decoders are unable to handle cases in which the number of features is close to, or exceeds, the number of trials, we utilized a naïve Bayes classifier. Naïve Bayes classifiers assume independence of features, and are thus able to train and classify this particular set of MEG/EEG features.
When classifying living/non-living category using MEG and EEG features, a naïve Bayes classifier resulted in average accuracies of 54±4% and 51±3% for SV and SA respectively (chance=50%). This was significantly lower than the SVM classification of the same data (76% for SV, 75% for SA, Wilcoxon sign-rank, p<0.005), and in fact not statistically different from chance. Similarly, when classifying individual words using MEG and EEG features, a naïve Bayes classifier yielded accuracies of 41±4% and 46±3% for SV and SA data respectively (chance=20%). This, again, is significantly lower than the classification using an SVM (60% for SV, 83% for SA, p<0.005). These results suggest that a decoding analysis of MEG/EEG data requires techniques which are robust to high-dimensional data. In this case, SVMs, when compared to a naïve Bayes classifier, are better able to handle such data and can provide insight into the spatiotemporal representations of semantic knowledge.
3.5. SVM weights show bilateral distributed cortical areas contribute to classification
Examining the SVM weights allows us to determine the features which were most important in the generation of the final SVM classifier (). In the linear case, the weight of each feature dictates the importance of that feature in the final classification. Because the weights of a nonlinear classifier cannot be easily visualized, we utilized linear SVMs when examining classifier weights. The performance when using nonlinear SVMs was greater than the performance of the linear SVMs by 3.3% on average (Wilcoxon sign-rank, p<0.05), however decoding accuracy remained high in the linear case. In all cases where the nonlinear SVM yielded statistically significant decoding accuracy, the linear SVM also yielded statistically significant accuracy. Thus, examining the linear SVM weights allows determination of important spatiotemporal features in the classification.
Classifier weights show important times and locations for decoding
Averaged weights across subjects for the visual () and auditory () tasks show a broadly distributed pattern of information-specific activity. Large weights are seen at all sampled time points and across both hemispheres. In particular, bilateral anterior temporal and inferior frontal weights increase to inanimate objects relative to living objects from 400–600ms. A concurrent increase of SVM weights in response to living over non-living objects is present at left inferior temporal-occipital sensors from 400–700ms. Interestingly, an early temporal-occipital increase in weights to non-living objects is seen at an earlier latency of 200ms. While left inferior temporal-occipital activation to animals has been previously observed, the earlier activation to non-living objects has not been reported.
When decoding individual word representations, the multiclass SVM generates one set of weights for each class. For visualization purposes, the variance of the SVM weights across words for each time-sensor point was computed and displayed (). Features with higher variances differ more across classes, generally making them more important in the final classification. These data also show fairly distributed set of time-sensor points which contribute to the decoding. The SV data showed inferior occipital increase in weight variance from 300–400ms, and inferior temporal activation from 400–500ms (). The SA task showed increased weight variance in bilateral anterior temporal areas from 250–450ms with increases in posterior sensors at 300 and 500ms ().
3.6. Systematic errors in individual word decoding reveal semantic structure
Confusion matrices were constructed to analyze errors generated when discriminating between all 10 repeated words (). The actual stimulus words are present along the vertical axis while the words predicted by the classifier are present along the horizontal axis. The colors along any given row (actual word) indicate the proportion of trials of that word which were classified as each of the possible choices (predicted words) (i.e. the confusion rate). Therefore, if the classifier correctly classified the word “feather” in all cases, the first element in the row corresponding to “feather” would be 1 (i.e. “feather” was always classified as “feather”) with all other elements being 0 (i.e. “feather” was never classified as any other word). Therefore, the diagonal elements in the matrix display correctly classified trials.
Individual word decoding confusion matrices
Visual examination of confusion matrices confirms that decoding of the MEG auditory data yields the highest accuracy, followed by EEG auditory data, followed by data from the visual task. The confusion matrices of combined EEG and MEG data were virtually identical to the confusion matrices generated to MEG data alone (data not shown). A larger confusion rate is visually apparent within target (large object) or non-target (small object) classes (upper left and lower right corners), compared to between the two classes (lower left and upper right). The required motor response associated with the target trials may be providing additional non-language information allowing for a decreased error rate when decoding between all 10 repeated words (as discussed in section 3.3). Despite this, the ability to decode individual words is seen within the large and small object groups; this provides additional evidence that word-specific information is present in the neural signals being classified.
To quantify the effects of semantic category and large versus small objects on confusion rates, we performed a 3-way ANOVA on these data (). This was done to determine if two words which were within the same class (e.g. both living objects, both small objects, etc.) had a higher confusion rate than two words in different classes. In other words, the ANOVA compares differences in “within-class” confusion rates to “between-class” confusion rates. The ANOVA analysis involved three factors (living/non-living, large/small, and subjects) with two levels in the categorical factors (within-class or between class) and 9 levels in the subject factor (one for each subject).
For the SV task, the average large/small between-class confusion rate (mean±s.e. = 0.0472±0.027) was significantly smaller than large/small within-class confusion (0.125±0.045; F=45.72, p<0.00001). Average living/non-living object between-class confusion (0.074±0.037) was significantly smaller than living/non-living object within-class confusion (0.092±0.043; F=8.59, p<0.005). For the SA task, the average large/small between-class confusion (0.038±0.028) was significantly smaller than large/small within-class confusion (0.067±0.036; F=20.28, p<0.00001). Average living/non-living object between-class confusion (0.045±0.031) was also significantly smaller than living/non-living object within-class confusion (0.058±0.034; F=7.99, p<0.05). This shows that it is more difficult for the classifier to discriminate words within the same semantic category than words of different categories. This suggests semantically related words have similar neural representations, and provides further evidence of the natural distinction between living and non-living objects.
3.7. Decoding is not based on low-level stimulus properties
It is possible that the generated classifiers are utilizing neural activity related to low-level visual or auditory stimulus properties when decoding individual words. For example, the classifier may be decoding brain activity which is specific for the number of letters in the visual word or the number of syllables in the acoustic word, and not the semantic information associated with the word. To test this, we performed a shuffling based on stimulus properties to evaluate this potential confounding factor. Within either the 5 target or non-target words, we randomly swapped half of the trials between two words with equal numbers of letters or numbers of syllables, thus creating two categories with consistent sensory characteristics but scrambled lexical referents, while leaving the remaining three words unchanged. If the decode ability was solely based on either of these visual or phonetic properties of the stimulus, we would see no change in accuracy. In fact, the decoding accuracy of these sensory based categories dropped by 24% (letters) and 30% (syllables) (Wilcoxon sign-rank, p<0.01). Accuracies remained statistically above chance due to the fact that trials associated with 3 of the 5 words were left unchanged.
Although these low-level properties were not solely responsible for the decode ability, if these stimulus characteristics contributed information to the decoding, shuffling trials between two words with different sensory characteristics would result in a larger drop in accuracy compared to shuffling between words with consistent sensory characteristics. The drop in performance when swapping trials between words with similar sensory characteristics was not significantly different from the performance when swapping trials between words with different sensory characteristics (25% for letters and 28% for syllables, Wilcoxon, p>0.05). This suggests that these sensory characteristics did not contribute significantly to the decoding of individual words in the visual version of the task.
We performed the same shuffling analysis for the SA task as well. The drop in performance was 23% when shuffling between words with the same number of syllables (Wilcoxon, p<0.01). This decrease in accuracy was not statistically different from the decrease in accuracy when shuffling between words with different numbers of syllables (20%, Wilcoxon, p>0.05).
To control for the possibility of frequency-related acoustic properties of the words affecting the decode analysis (in the SA task), we attempted to predict stimulus properties using the same set of neural features used in the individual word decoding. In this case, the SVM algorithm performed a regression instead of classification to predict the power of the acoustic stimuli within five frequency bands (250–500Hz, 500Hz-1kHz, 1–2kHz, 2–4kHz, and 4–8kHz). If any of these acoustic properties contribute to the decoding of individual words, we would expect that an SVM trained on the previously used features would also be able to predict the power in these auditory frequency bands. To statistically test these results, a permutation distribution was computed by shuffling trials so that each trial was associated with a random set of stimulus band-power values for 2000 trainings of the SVM regression. The root-mean-squared error was computed for each of these repetitions, resulting in a distribution of errors for the case that no information about stimulus band-power was present in the computed features. The root-mean-square error of this regression was not statistically significant based on a permutation distribution computed by shuffling the stimuli (p>0.05, Supplementary Figure S1
). This results suggests that the decoding of individual words was not solely a result of differential representation of low-level properties of the auditory stimulus such as acoustic power.
3.8. Inter-subject and inter-modality decoding show shared neural representations of semantic information
To investigate supramodal contributions to the generated classifiers, SVMs were trained on one stimulus modality and tested on the other modality. When training on visual data and testing on auditory data, statistically significant decode accuracies was obtained in 3 of 9 subjects () with a mean accuracy across all subjects of 57.5±3.0%. When training on the auditory modality and testing on the visual modality, data from 5 of 9 subjects showed significant decode accuracies with a mean accuracy across all subjects of 67.7±4.1%. This suggests that the models generated with features from either version of the task contain supramodal semantic information. This is more apparent in the case where the training set was larger and better able to produce a robust classifier (training on SA, testing on SV). By increasing the number of trials averaged, performance improves, as seen previously (Supplementary Figure S2
Intermodality and intersubject classification shows word and category representation consistencies
We also investigated the ability to train a generalized, subject-nonspecific decoder by training an SVM on data from all but one subject, and testing on the final subject’s data. The accuracy obtained from such a cross-validation is an indication of the consistency of language-related representations between individuals. In the first case, an SVM was trained to discriminate between living and non-living object categories. Data from 5 of 9 subjects for SV and all subjects for SA showed statistically significant decoding performance (, p<0.05). Mean accuracies were 56.8±2.4% and 72.9±2.8% for SV and SA respectively.
A generalized SVM was also trained to discriminate between 5 large or small repeated words. indicates that in 6 of 9 cases for SV and all cases for SA, the decoding accuracy was significantly above chance levels. Mean accuracies were 30.2±3.7% for SV and 41.3±2.7% for SA (chance = 20%). Despite the fact that MEG sensor positions are variable between subjects, above-chance accuracies were obtained, suggesting that some word-specific information is consistent between individuals. Not surprisingly, however, subject-specific classifiers still yield significantly higher decode accuracies.
3.9. Hierarchical tree decoding improves decoding performance
To explore the potential practical use of machine-learning algorithms to decode larger libraries of words, we used SVM classifiers within the larger construct of a hierarchical tree decoder (). Such a paradigm is easily scalable and may allow for the eventual decoding of a large number of individual words or concepts. Utilizing a hierarchical tree decoding construct allows for the incorporation of a priori knowledge about semantic classes and the features which best discriminate these categories.
Hierarchical tree decoding improves classification performance
The average accuracy of all branches of the tree for the SA task was over 80% and accuracies at each level of the decoder were above 80% for all but 2 subjects (). By examining cumulative accuracies at each level of the tree, we find that errors propagate from earlier levels, as expected, but accuracy ultimately remain above 60% in all cases (). The mean overall accuracy of the tree decoder was 70%, significantly higher than the 67% accuracy of a single multiclass SVM trained on all 10 words (Wilcoxon sign-rank, p<0.05) (). Data from all subjects, but subject 7, showed an improvement over the single SVM classifier when using the tree decoder. Thus, the hierarchical tree framework, by incorporating a priori knowledge of semantic properties, allows representations of individual word properties to be decoded more accurately than using a single multiclass decoder which treats each word as an independent entity.