|Home | About | Journals | Submit | Contact Us | Français|
Much remains to be learned about the neural architecture underlying word meaning. Fully distributed models of semantic memory predict that the sound of a barking dog will conjointly engage a network of distributed sensorimotor spokes. An alternative framework holds that modality-specific features additionally converge within transmodal hubs. Participants underwent functional MRI while covertly naming familiar objects versus newly learned novel objects from only one of their constituent semantic features (visual form, characteristic sound, or point-light motion representation). Relative to the novel object baseline, familiar concepts elicited greater activation within association regions specific to that presentation modality. Furthermore, visual form elicited activation within high-level auditory association cortex. Conversely, environmental sounds elicited activation in regions proximal to visual association cortex. Both conditions commonly engaged a putative hub region within lateral anterior temporal cortex. These results support hybrid semantic models in which local hubs and distributed spokes are dually engaged in service of semantic memory.
The binding problem reflects a longstanding question within the cognitive neurosciences regarding the manner(s) in which the human brain amalgamates features from disparate sensorimotor modalities into coherent conceptual representations. Object knowledge is comprised of complex conjunctions of features from visual, auditory, olfactory, haptic and motoric modalities, in addition to a range of other affective, episodic and lexical associations. Much of the early stage perceptual processing associated with these modalities occurs within distinct and remote regions of the brain. Accordingly, there exists a rich and long-standing debate regarding the mechanistic neural architecture that binds multimodal feature information into coherent wholes.
Perhaps the most historically dominant neurocognitive hypothesis of conceptualization is that it reflects auto-associated and conjoint activation of information stored in the multiple remote modality-specific regions of the brain (Wernicke, 1874; Hauk, Johnsrude, & Pulvermüller, 2004; Eggert, 1977; Gage & Hickok, 2005; Pulvermüller, 2001). For example, hearing the distinct ‘meow’ of a cat will activate auditory association cortices in addition to a range of associated features in other modalities (e.g., form, color) (See Figure 1, Panel 1). There is considerable evidence for these distributed-only accounts in the form of functional neuroimaging studies that have demonstrated that different types of concept (e.g., action concepts vs. object concepts) activate modality-specific association cortices (motor cortex vs. visual cortex) linked to their most crucial sources of defining information (motor sequences to perform them vs. the way they look) (Hoenig, Sim, Bochev, Herrnberger, & Kiefer, 2008; Kellenbach, Brett, & Patterson, 2001; Simmons, Martin, & Barsalou, 2005; Trumpp, Kliese, Hoenig, Haarmeier, & Kiefer, 2012).
Alternative perspectives argue the necessity for transmodal brain regions; tertiary cortical areas that transcend particular modalities. It has been hypothesized that one or more of these regions fulfill a central organizing principle by which multiple modalities can be drawn together for the process of conceptualization (see Figure 1, Panel 2). In particular, such a region constitutes a convergence zone that is massively reciprocally linked to the distributed network of sensorimotor regions and thus uniquely suited as a nexus point for coordination of polymodal feature processing (Damasio, 1989). It has been further proposed that such ‘hubs’ play a necessary role in the computations required for coherent conceptual representation (Lambon Ralph, Sage, Jones, & Mayberry, 2010; Rogers & McClelland, 2004).
Patterns of behavioral performance among patients with relatively focal versus diffuse neuropathologies present a compelling test of these competing hypotheses. A theoretical model with no center (i.e., distributed only) predicts that only catastrophic, diffuse, bilateral brain damage is sufficient to cause multi-modal semantic impairment. In contrast, the hub perspective holds that profound semantic memory disorders can result from relatively circumscribed damage focused upon regions of transmodal representational cortex. The case for the existence of a transmodal hub for conceptual representation has historically relied heavily on studies of patients with semantic dementia. This clinical population exhibits progressive, yet relatively circumscribed bilateral anterior temporal atrophy coupled with a selective dissolution of conceptual knowledge that transcends task and modality (Bozeat, Lambon Ralph, Patterson, Garrard, & Hodges, 2000; Coccia, Bartolini, Luzzi, Provinciali, & Lambon Ralph, 2004; Goll et al., 2010; Piwnica-Worms, Omar, Hailstone, & Warren, 2010). Decades worth of detailed neuropsychological investigations of semantic dementia provide a compelling foundation of support for the hypothesis that the bilateral anterior temporal lobes play key roles in computing transmodal conceptual representations (Lambon Ralph, 2014; Patterson, Nestor, & Rogers, 2007). These assertions primarily informed by patient-based dissociations have been bolstered by a growing body of convergent evidence from functional imaging and neurostimulation studies (Binney, Embleton, Jefferies, Parker, & Lambon Ralph, 2010; Halgren et al., 2006; Marinkovic et al., 2003; Mion et al., 2010; Pobric, Jefferies, & Lambon Ralph, 2007; Shimotake et al., 2014; Vandenberghe, Price, Wise, Josephs, & Frackowiak, 1996; Visser & Lambon Ralph, 2011).
Evidence for both distributed and local (hub) components of the neural architecture subserving semantic cognition is reconciled within a class of hybrid, pluralistic theories, the most influential, to date, being the Hub and Spoke model proposed by Patterson, Lambon Ralph, Rogers and colleagues (Hoffman, Jones, & Ralph, 2012; Lambon Ralph & Patterson, 2008; Rogers & McClelland, 2004). The architecture and functioning of the Hub and Spoke model is made explicit by the connectionist computational implementation of Rogers et al. (Rogers et al., 2004; see also Patterson et al., 2007), wherein modality-specific spokes are interfaced via a central transmodal hub. According to Lambon Ralph (2014) and colleagues, the spokes are the substrate of invariant representations important for recognition of perceptual objects within their respective modality and the coding of similarities between different perceptual objects. The surface similarities computed by the spokes are necessary but insufficient for the formation of coherent and generalizable concepts, providing only a fragmentary guide to meaning. Herein lies the importance of a hub. The additional representational layer afforded by the ATL hub provides a tertiary level of abstraction allowing for distillation of the highly complex, non-linear transmodal relationships between multi-modal features that comprise concepts (Lambon Ralph et al., 2010; Reilly, Peelle, Garcia, & Crutch, 2016). Importantly, this does not imply that conceptualization can be achieved solely by activation of the representations subserved by the hub. Instead, in the Rogers et al. model, the hub and spokes are bi-directionally connected and complete conceptualization arises from the conjoint action of both the transmodal hub and each of the modality specific sources of information (Pobric, Jefferies, & Lambon Ralph, 2010). This approach, therefore, predicts that conceptual processing will be reflected in activation of both transmodal representational cortex (in particular, the ATL) and the distributed network of modality-specific association regions, and that this full network will be activated regardless of the input modality through which the concept is probed. Neuroimaging techniques such as fMRI are useful tools to explore this hypothesis, given they enable visualization of activation across the whole brain when individuals are performing semantic tasks. Several prior fMRI and PET studies have compared activation associated with conceptual processing probed in different modalities (Taylor, Moss, Stamatakis, & Tyler, 2006; Thierry & Price, 2006; Tranel, Grabowski, Lyon, & Damasio, 2005) or probed via verbal versus non-verbal (pictures) representational formatsn (Jouen et al., 2014; Thierry & Price, 2006; Vandenberghe et al., 1996; Visser & Lambon Ralph, 2011). However, the prevailing emphasis in such studies has been upon identifying regions that are commonly activated across modalities in hope of identifying putative transmodal ‘hub’ regions (Binder, Desai, Graves, & Conant, 2009). Less attention has been paid to the contribution of spoke regions. Indeed, there exists no clear operationalization of what constitutes a spoke region within the brain. As far as we are aware, there have been no explicit predictions under the hub and spoke framework beyond that they are higher-order modality-specific association regions that lie at the apex of each unimodal processing streams. Some predictions could be derived from a parallel set of functional imaging studies motivated by the distributed-only perspective that, as discussed above, holds that conceptual processing recruits the same modality-specific regions involved in perception and action. Such studies have identified lateral and ventral temporo-occipital regions that show responses selective to processing of visual features of objects and object nouns, such as visual form, color and motion, and that certain sub-areas of these brain regions preferentially respond to certain semantic categories of objects such as faces, animals and tools (Chao, Haxby, & Martin, 1999; Chao, Weisberg, & Martin, 2002; Kanwisher & Yovel, 2006; Wheatley, Weisberg, Beauchamp, & Martin, 2005). Other studies have found similarly selective profiles of regions of the lateral temporal cortex for auditory features (e.g., environmental sounds) (Kiefer, Sim, Herrnberger, Grothe, & Hoenig, 2008), the ventral premotor cortex for object manipulability (Chao & Martin, 2000), and the orbito-frontal cortex for olfactory and gustatory features (Goldberg, Perfetti, & Schneider, 2006). There is also complementary evidence for a contribution of distributed sensorimotor association regions to concepts stemming from neuropsychological investigations and electrophysiological and neurostimulation studies (Reilly et al., 2011; Kemmerer, Castillo, Talavage, Patterson, & Wiley, 2008; Pulvermüller, 2013; Pulvermüller, Shtyrov, & Ilmoniemi, 2005; Shtyrov & Pulvermüller, 2007; Pobric et al., 2010). In the present study, we sought to investigate whether some of these potential candidate ‘spoke’ regions (See Methods) would show conjoint activation with the ATL hub region during a task involving semantic processing, and whether the differential roles of the hub and spoke regions can be observed by virtue of their response profiles. In particular, spoke regions should show an effect of modality in line with their unimodal function, but we were specifically interested in whether they exhibit a differential response according to a semantic manipulation which would indicate that they are involved in conceptual processing. Moreover, in line with the connectionist implementation of the hub and spoke model, we hypothesized that the spoke regions would show a semantic effect on their activation regardless of the modality of stimulus presentation. Regarding the ATL hub, we hypothesized an effect of the semantic manipulation but no effect of modality (i.e., all representational modalities ultimately converge upon an amodal semantic representation) in line with its purported transmodal function.
To the best of our knowledge, there have been no prior functional imaging studies that have attempted to simultaneously test hypotheses regarding interactivity between hub and spoke regions in this manner. This could reflect, at least in part, the considerable methodological challenge of isolating activation specific to semantic processes from that associated with sensorimotor perceptual processes per se. Conventional subtraction analyses in fMRI typically contrast activation associated with semantic tasks (unimodal or crossmodal) to that associated with non-semantic perceptual judgments (unimodal or cross-modal) in order to subtract out low-level perceptual processes or cross-modal integration per se (i.e., at a pre-semantic stage) and control for domain-general function such as decision-making processes. The baseline control tasks often use stimuli such as false fonts or noise vocoded speech that have been distorted or scrambled to remove their ‘meaningfulness’ but retain, at least partially, perceptual complexity and the decision-making element of the task. However, the complexity of stimuli and the task difficulty often remain variable between the semantic (experimental) and control (low level perceptual baseline) conditions and thus semantic processes continue to be confounded with other processes in the interpretation of activations. Moreover, control conditions for different modalities may vary in their effectiveness to control for low-level processes and, as such, comparisons between activations in one modality and another are difficult to interpret. Alternative approaches include parametric modulation of the ‘meaningfulness’ of stimuli, to which activation related to semantic processing should covary. We undertook a novel approach to addressing the role of ‘spoke’ regions in semantic representation by contrasting activation when subjects named familiar entities compared to newly learned object concepts. Our rationale for choosing newly learned objects as a baseline was that the familiar and novel stimuli would be well matched in terms of perceptual complexity and that task requirements would be identical for the two stimulus sets. Moreover, the familiar/novel exemplar distinction reflects a manipulation of ‘meaningfulness’ that may be used to yield activations associated with evocation of conceptual representations. The novel entities were fictional animals and artefacts with distinct form, motion and sounds. Thus, these objects might be considered meaningful novel exemplars of superordinate categories (e.g., animals) but with relatively impoverished representation as basic level concepts or unique exemplars/entities as they do not have the multiple episodic, affective or encyclopedic associations (to name a few) that enrich the conceptualization of familiar animals/artefacts. Contemplating connectionist frameworks, we hypothesized that the familiar concepts will have stronger connection weightings and more robust neural representations, and thus would evoke more robust activation across the neural network subserving conceptual representation. Under this assumption we attempted to use this contrast (familiar > novel) to yield semantic activations while controlling for pre- and post-semantic processes/activations. We examined activations associated with semantic processing when the input was constrained to one of three feature types (an object’s static visual form, point light motion or characteristic sound) which each have dissociable cortical regions associated with perceptual processing (See Methods). We used whole brain and targeted region of interest analyses to reveal differences and commonalities between the networks activated by conceptual processing probed by different stimulus presentation modalities.
We examined patterns of activation for a series of familiar object concepts (i.e., bird, dog, scissors, clock) relative to four newly learned object concepts (plufky, korbok, wilzig, blerga). During the learning phase, participants learned the names of these novel concepts by watching animated videos of the target concepts in action, moving in distinctive paths/manners and making unique animal/tool noises while a narrated voiceover announced each item’s name. Three days later the participants underwent the same exposure condition. One week after initiation of the learning phase, we scanned participants as they repeatedly named novel and familiar items upon exposure to only one of their modality-specific semantic features. We segregated the three input modalities (i.e., visual form, environmental sound, and point light motion) into separate scanning runs. For example, in the visual form session participants named DOG from a grayscale photograph. In the auditory session, participants named DOG from the sound of a barking dog, and in the motion session participants named DOG from a point-light video depicting the canonical motion of a dog walking. Our analyses were aimed at detecting activations both associated to each input modality and areas common across all the modalities. In contrasting familiar versus novel items, we aimed to isolate deep semantic processing via a cognitive subtraction. Novel objects are matched to familiar in terms of perceptual processing (items were approximately comparable in visual complexity; see Figure 2), and phonological encoding demands. Training of novel items was continued until 100% accuracy was achieved such that differences in activations for novel and familiar items was unlikely to be attributed to difficulty. As such, the familiar-novel contrast was intended to isolate enhanced activation associated with retrieval of conceptual knowledge associated with familiar objects.
18 healthy young adults (14 females) were recruited from the University of Florida. Participants reported no prior neurological injury, learning disability (e.g., dyslexia), and were not on a current regimen of sedative medications. All were right-hand dominant, native English speakers. Mean age was 22.8 and mean education was 16.1 years. Participants provided informed consent in accord with the Institutional Review Board of the University of Florida and were nominally compensated for taking part in the study.
The familiar objects included two animals (dog, bird) and two manufactured artifacts (clock, scissors). Items from both the living and non-living domain were included such that our results were not entangled with category-specific effects. Visual form presentation involved grayscale, cartoon-like photographs of the target items on a black background. Auditory stimuli included clips (e.g., ticking clock) originally obtained from online sound repositories which we subsequently trimmed to 2000ms, matched in sampling rate and normalized on amplitude using the GoldWave waveform editor. The motion stimuli involved 2000ms videos of white points placed along articulated joints and edges against a black background.
Four novel objects were learned one week before scanning. Participants learned these names by watching 20s long videos of the target objects moving in a distinctive path/manner and making a distinct sound while a male narrator announced each item’s name. Novel sounds were created by first subjecting real sounds to low-pass filtering and warbling effects using the GoldWave waveform editor. The audio clips were then all matched in duration and batch normalized for root mean square amplitude (i.e., perceived as roughly equal volume). Point light videos analogous to the familiar items were created using LightWave animation software. Figure 2 illustrates the stimuli characteristics.
On Day 1 of the experiment, participants learned the names of four novel concepts by watching videos depicting simultaneous motion, sound, and form. Participants passively viewed each video two times, and the order of video presentation was randomized. Upon completion of this learning sequence, we tested naming ability by showing truncated (2s) clips of each target item and asking the participant to name the item. For any participant who failed to name a single item, we repeated presentation of the entire video sequence (4 items * 2 repetitions) and probed naming until that participant achieved 100% accuracy. The learning session was terminated when participants named all items with 100% accuracy. We repeated this session using the exact parameters three days later.
On the day of the scanning session, participants were informed that they were to name the items they had learned a week before in addition to several untrained but familiar items but that only one of their constituent features would be presented in a given scanning run. Furthermore, they were instructed that overt naming was required on only a subset of trials and this would be cued at random during the experiment. Before entering the scanner, we conducted a brief (5 minute) familiarization procedure where participants performed the task with experimenter feedback in each of the test modalities (point light, sound, picture). Again, 100% accuracy for naming familiar and novel items was required before participants proceeded to scanning.
Participants were instructed to overtly name target stimuli as quickly and accurately as possible from the onset of the cue (a tilde presented in isolation following stimulus presentation). Probes were pseudorandomly interspersed over the duration of a scanning run, with 24 of 64 trials requiring an overt speech response. As such, participants were unaware on any given trial of whether they were required to make an overt response, and as such were encouraged to identify the object on each stimulus presentation (covert naming).
Participants completed three functional scanning runs (8 minutes each) spaced over one hour, with approximately 2–4 minutes of inactivity between each. Each functional scanning run contained only features presented in one of the modalities (i.e., visual form, sound, or point-light motion). Each object was presented 8 times for a duration ≈2000ms, for a total of 64 pseudo-randomized stimulus presentations per scanning run. Each object was cued for naming 3 times per scanning run. The intervals between stimulus presentation and onset of the cue was 2s. An interstimulus interval of either 2s or 4s was pseudorandomly assigned across stimuli. Figure 3 illustrates the trial structure and timing parameters.
We presented stimuli using a laptop computer running E-Prime 2.0 Professional software. We synchronized scanning with stimulus presentation by timelocking the offset of each trial to the radiofrequency (RF) pulse to the using a TTL synchronization box (Nordic NeuroScan, Inc). Trial durations were a maximum of 2000ms (TR duration) but sometimes slightly shorter due to receipt of the pulse signal. This procedure ensured that no cumulative timing errors were introduced into the experiment as a result of stimulus buffering.
Participants viewed all picture and motion stimuli on a monitor situated behind the scanner bore via a mirror slotted onto the head coil. Participants heard auditory stimuli over MR compatible pneumatic headphones (ScanSound Inc).
All imaging was performed on a Philips Achieva 3 Tesla scanner with a 32-channel SENSE head coil. We acquired a high resolution T1-weighted anatomical image using a magnetization-prepared rapid acquisition with gradient echo (MPRAGE) sequence with the following parameters: in-plane resolution = mm2, Field of View (FOV)=240mm2, slice thickness=1mm, Number of Slices=170, Flip angle=8°; Repetition Time (TR)=7.0ms; Echo Time (TE)=3.2ms. The gradient-echo echo-planar fMRI sequences were performed with the following parameters; in-plane resolution =3mm × 3mm, Field of View (FOV)=240mm2, Slice thickness=4mm, Number of Slices=38, Flip angle=90°; Repetition Time (TR)=2000ms; Echo Time (TE)=30ms, Slice Gap=None; Interleaved Slice Acquisition, Number of Dynamic Scans=242. We restricted FOV for temporal and inferior parietal lobe coverage, excluding the highest convexities of the dorsal cerebrum.
Analysis was carried out using statistical parametric mapping (SPM8) software (Wellcome Trust Centre of Neuroimaging, UK). Within each subject, the functional EPI volumes from across all three scanning sessions were re-aligned using a 6-parameter rigid body transform estimated using a least squares approach and a two-pass procedure. The aligned images and a mean volume were then re-sliced using fourth-degree B-spline interpolation. This procedure corrected for differences in subject positioning between sessions and minor motion artefacts within a session. The T1-weighted anatomical image was registered to the mean functional volume using a six-parameter rigid-body transform and subsequently subjected to SPM8’s unified tissue segmentation and normalization procedure in order to estimate a spatial transformation from subject space to a standard stereotactic space, according to the Montreal Neurological Institute (MNI) protocol, for inter-subject averaging. This transform was applied to each of the subject’s functional volumes, resampling to a 3 × 3 × 3 mm voxel size using trilinear interpolation and preserving the intensities of the original images (i.e., no “modulation”). These volumes were then smoothed with an 8 mm full-width half-maximum (FWHM) Gaussian filter.
Data were analyzed using a general linear model approach (GLM). At the individual subject level, each scanning run (and thus presentation modality) was modeled within a separate fixed effects analysis. Our rationale for analyzing each run/modality in a separate model was that some participants did not complete all three runs/modalities due to technical difficulties with audio presentation. Within the model for a given scanning run, the presentation of familiar items (e.g., Dog) and newly learned items (e.g., Plufky) were modelled as two separate boxcar functions. Instances where subjects made an overt verbal response (probed trials only) were also modelled with a two-second boxcar function in order to capture speech-related changes in BOLD as an independent nuisance covariate. Rest periods were modelled implicitly. A set of six motion parameters for each scan session, which were estimated during the realignment step, were also included as nuisance regressors. Regressors were subsequently convolved with a canonical hemodynamic response function. Data were further treated with a high-pass filter with a cutoff of 128 s. Planned contrasts were calculated to assess differences in activation between naming familiar items and newly-learnt items [Familiar – Novel]. These contrast images were entered into subsequent multi-subject mixed-effect analyses directed at our a priori hypotheses, as follows.
First, we examined activation for semantic access of familiar entities relative to novel entities in separate whole-brain analyses for each of the stimulus presentation modalities (one sample t-tests). This allowed for an initial assessment of brain-wide similarities and differences in the topology of activations recruited by semantic processing of stimuli in the different modalities. Statistical parametric maps were thresholded with an uncorrected voxel-height threshold of P<0.005 and a minimum cluster size of 30 contiguous voxels. Inferences were made on activations surviving this threshold if they occurred in a priori predicted regions. These predicted regions included occipito-temporal visual association cortex and superior temporal auditory association cortex and also regions implicated in the semantic neuroscience literature, namely, the anterior temporal lobe, the posterolateral temporal lobe, the frontal operculum and ventral parietal cortex. Activations outside of these predicted regions were assessed using a more stringent threshold corrected for the multiple comparisons problem using the topological false discovery rate as implemented in SPM8 (p<0.05). All reported coordinates are in Montreal Neurological Institute (MNI) space.
Our second planned analysis sought to identify regions areas activated by semantic processing irrespective of the feature type (form, sound or motion) that was presented, and thus fulfill hypothesized characteristics of high-order regions involved in transmodal semantic processing, such as the ATL hub. In particular, we planned a conjunction analysis (within a one-way Analysis of Variance (ANOVA)) targeting voxels that are significantly activated independently in both conditions/modalities (Nichols et al, 2005). The threshold for this analysis was p<0.01, uncorrected, in effect being 0.001 for pairwise commonalities. We also performed post hoc analyses assessing main effects and interactions of modality and familiarity within a factorial ANOVA. This required calculation of contrasts at the first level for [Visual Familiar] only, [Visual Novel] only, [Auditory Familiar] only, and so on. Statistical parametric maps generated by these analysis were assessed following an application of a p<0.001, uncorrected voxel-height threshold and an FDR-corrected cluster extent threshold at p<0.05.
In addition to the more conservative whole-brain analysis, we applied a planned targeted region of interest (ROI) analysis using the MARSBAR toolbox (Brett, Anton, Valabregue, & Poline, 2002). This method is an alternative to approach to overcoming the multiple comparisons problem and is ideal for regional hypothesis testing. We reasoned that differences in activation to familiar relative to novel concepts may be subtle and thus less conservative analyses may be required to detect them. Moreover, plotting the parameter estimates extracted from an ROI can provide a more intuitive means to interpreting differential effects compared to statistical maps. MARSBAR calculates a single summary value to represent activation across all voxels within a given ROI (in the present case, the median of the parameter estimates). Details on ROI selection and definition are provided below. We extracted an estimate for the [Familiar-Novel] contrast for each ROI in each modality and performed group analyses using statistical software outside of SPM. In the group level statistics (Figure 6), a positive value indicates greater activation for familiar item, a negative indicates greater activation for novel items, and a zero or insignificant value indicates no differential activation between familiar and novel items.
Candidate ‘spoke’ regions of interest (ROIs) were defined on the basis of prior functional imaging data from studies investigating higher order and/or semantic processing of sensorimotor features within unimodal association cortices. Given the presentation modalities we investigated specifically targeted (i) superior temporal (anterior parabelt) auditory association cortex that is associated with processing of higher-order auditory objects, and (ii) the ventral occipito-temporal visual association cortex that is particularly associated with higher-order processing of object form. An ROI for motion-associated regions was omitted from this analysis, as was the motion condition data, due to not finding an effect at the whole-brain-level in the predicted temporal or parietal regions (see above), even at the more liberal threshold (see Results). A third ROI represented the ATL hub region. Each ROI was spherical with a radius of 10mm (see Figure 6 for a visual depiction).
The precise location of the superior temporal ROI was defined on the basis of work from Warren, Jennings and Griffiths (2005) who identified regions associated with the higher order and abstracted spectral analysis of sounds. We averaged the coordinates of peaks reported by these authors for the contrast they interpreted as relating to an abstraction of spectral shape, which may be relevant to the analysis of auditory sources or objects, such as voices. This average coordinate (57,−22,3) defined the center of mass for the superior temporal ROI. Given the 10mm radius of our ROI, it encompassed many of the individual peak coordinates. We had no prior hypotheses regarding lateralization of activation in this region, and thus included a left hemisphere homologue (−57,−22,3) ROI in the analysis.
It is well established that the ventral occipito-temporal cortex (vOTC) is involved in the processing of higher-level visual properties of objects such as global shape/form. Moreover, it has been demonstrated that portions of the mid-to-posterior fusiform gyrus preferentially respond to certain categories of objects, such as animals or tools (Chao et al., 1999; Chao & Martin, 2000; Wheatley et al., 2005), implying at least a near-conceptual level of processing but still within modality-specific cortex. These results are usually discussed as fitting the predictions of the distributed-only perspective, but they also suggest this region is a good candidate for a visual form ‘spoke’ in the context of the hub-and-spoke model. We used reported coordinates from the fMRI study of Chao et al. (1999) to define our ventral occipito-temporal cortex ROI. Given we were not interested in distinction between categories in the present study, we averaged their coordinates for the animal and tool selective regions (transformed from Talairach coordinates to MNI coordinates using the tal2icbm function; http://biad02.uthscsa.edu/icbm2tal/tal2icbm_spm.m). We included both left (−34 −56 −18) and right (35 − 53 −20) hemisphere vOTC ROIs.
Recent refinement of hypotheses regarding ATL involvement in conceptual knowledge representation have pinpointed the ventral surface in particular as a transmodal substrate (Binney et al., 2010; Mion et al., 2010; Lorraine K. Tyler et al., 2013). Conventional gradient-echo EPI imaging protocols, such as that used in the present study, suffer from signal loss and distortion in this region (Devlin et al., 2000; Embleton, Haroon, Morris, Ralph, & Parker, 2010) and, therefore, reliable BOLD measurements can be difficult to achieve. However, it is possible to obtain good signal within lateral ATL regions that have been equally implicated, bilaterally, on the basis of fMRI and TMS studies, albeit more variably and as function of task demands or input/output modality (Binder, Desai, Graves, & Conant, 2009; Binney et al., 2010; Hoffman, Binney & Lambon Ralph, 2015; Lambon Ralph, Pobric, & Jefferies, 2009; Zahn et al., 2007). We calculated the average of the two sets of coordinates reported for the left lateral ATL in Binney et al.’s (2010) study and used this (−54,6,−28) and its right hemisphere homologue (54,6,−28) to define the center of mass for two ATL ROIs for the present study.
We acquired full datasets for 13 of 18 participants. For the remaining five participants, the auditory run failed due to an amplifier malfunction. Thus, in the analyses to follow we modeled activation via an unbalanced design including the visual form and point-light motion runs for all 18 participants and auditory runs for 13 participants.
Results of the whole brain analysis for familiar >novel entities in the visual form condition are displayed in Figure 4, Row A and Table 1. A large right-hemisphere cluster included three regions associated with transmodal semantic processing and language processing (albeit more so in the left hemisphere), namely the lateral anterior temporal lobe (peaking in the middle temporal gyrus), the inferior frontal gyrus, the posterolateral temporal lobe (including the MTG, STG and STS) and the ventral parietal lobe. This cluster extended ventromedially to reveal greater activation for familiar entities in the posterior and mid parahippocampal gyrus and visual association cortex within the bilateral lingual gyri. As such, familiar concepts invoke greater activation than novel entities in the association cortex tied with the modality of presentation, despite equivalent complexity in the perceptual features of stimuli.
The same cluster extended to auditory association cortex in the anterior superior temporal gyrus and the superior temporal sulcus and also to the posterior planum temporale adjacent to Heschl’s gyrus. Therefore, the increased activation for familiar entities extends beyond association cortex for the presentation modality and transmodal association cortex to association cortex of other modalities (in this case, auditory cortex). This large cluster also extended to the right hippocampus, the right insula, the right orbitofrontal cortex and across ventromedial frontal and anterior cingulate cortex bilaterally. A small cluster was also observed in the right thalamus, although it did not survive correction for multiple comparisons.
Four left hemisphere clusters mirrored many of the activations in the right hemisphere cluster. The largest occurred predominately in the left parahippocampal gyrus and left ventral occipitotemporal visual cortex. Another cluster extended over a large swathe of the dorsal posterolateral temporal cortex, ascending to ventral parietal regions and descending to dorsolateral extrastriate visual cortex. A third cluster was observed in left orbitofrontal cortex, while a fourth cluster encompassed portions of left planum temporale, and left insula, extending toward medial temporal regions. These final two clusters did not survive correction for multiple comparison.
The familiar > novel entity contrast in the sound condition (Figure 4, Row B and Table 1) revealed a largely symmetrical distribution of activation across the hemispheres. However, large clusters were observed bilaterally at the occipito-temporal-parietal junction encompassing the supramarginal gyri, the posterior superior temporal gyri/sulci and posterior middle temporal gyri. In the right hemisphere, the cluster extended rostrally to auditory cortex in the mid superior temporal gyrus and planum temporale. In the left hemisphere, primary and secondary auditory cortex activation was observed in two smaller clusters, one in the posterior superior temporal gyrus and another more anterior which stretched across the planum temporale to insula cortex. These clusters did not survive correction for multiple comparisons but nevertheless fell within a priori predicted regions.
In both hemispheres, the largest clusters also extended ventrally to extrastriate visual cortex from the posterior middle temporal gyrus. In the right hemisphere, an additional smaller cluster was observed within lateral occipital-temporal visual association cortex with an extension into ventral occipital-temporal regions. This cluster did not survive corrections for multiple comparisons. Contrary to the visual form modality, parahippocampal cortex was not activated.
Three bilateral clusters were observed in posterior cingulate cortex, the precuneus including retrosplenial cortex (this cluster also encroached on the lingual gyri bilaterally) and medial prefrontal cortex.
In the point-light motion condition, the familiar minus novel contrast revealed two small clusters of activation, (Figure 4, Row C and Table 1). These clusters were located within the ventral central sulcus and ventral postcentral gyrus in the left and right hemisphere, respectively. Neither cluster fell in a priori predicted regions, nor did they survive correction for multiple comparisons. In light of this null result, we excluded the point-light motion condition from subsequent analyses reasoning that it would further produce a null result in a conjunction analysis.
The next set of analyses continued with a whole-brain approach but sought to identify brain regions activated by semantic processing irrespective of the feature type (visual form or sound) that was presented. In particular, we planned a conjunction analysis to reveal overlap of regions activated in the familiar-novel contrast in each modality. Before we present those findings, however, we will describe the result of a post-hoc analysis in which we used a factorial ANOVA to examine the main effect of Familiarity, as well as that of Modality and the interaction effect.
The results of an F-test on the main effect of Modality are presented in Table 2 and Figure 5, Row A. Modality-specific activations were observed in visual primary and association cortex from the occipital poles to medial occipito-temporal cortex bilaterally and in bilateral auditory cortex including Heschl’s gyrus, the planum temporale and the mid-to-posterior superior temporal gyrus. The results of a T-test on the positive effect of Familiarity (Familiar > Novel), a manipulation we hypothesized to reveal semantic processing, are presented in Table 3 and Figure 5, Row B. Greater activation for familiar objects was observed within the occipito-temporal parietal junctions bilaterally, consistent with the results of the Familiar>Novel contrasts in the sound and visual form modalities separately (Figure 4, Rows A and B). In the left hemisphere, the cluster peaked in the posterior middle temporal gyrus where it abuts the extrastriate visual cortex. In the right hemisphere, the cluster peaked in the posterior superior temporal gyrus and extended more dorsally into the ventral parietal lobe than the left hemisphere cluster. The right lateralized cluster also extended rostrally into auditory cortex in the mid-to-posterior superior temporal gyrus. In the left hemisphere, auditory cortex was revealed in separate cluster in the planum temporale that also extended to the posterior insula. A right hemisphere cluster was observed in the parahippocampal gyrus, and area associated with high level visual processing and object recognition. Indeed, some of the regions showing an effect of Familiarity were located just anterior to those showing a main effect of modality suggesting that conceptual-level processing occurs upstream in the modality-specific pathways. In the superior temporal gyrus, the effect was observed in the most anterior aspect of the area showing the effect of modality. Familiarity also drove greater activation in lateral occipito-temporal regions adjacent to and including extras-striate regions around the posterior middle temporal gyrus, whereas effects of modality were in early visual cortex and medial occipito-temporal regions.
Consistent with a role for the region in transmodal semantic processing, there was a cluster in both in the left and right anterior temporal lobe, peaking at the middle temporal gyrus, which responded to the familiarity manipulation (but did not reveal a modality effect). When each modality was considered separately, we only observed right ATL activation and for the visual form condition only. It is possible that the bilateral ATL activation in the present analysis reflects a greater statistical power to detect subtle effects owing to collapsing across the conditions. In particular, the absence of ATL activation in the sound condition may reflect lower statistical power than the visual form condition due to a lower number of subjects. Moreover, as observed in the above analyses (Section 3.1), there was also greater activation in the medial prefrontal cortex, the precuneus and the posterior cingulate cortex. While it is possible that the results of this analysis reflect common processing across the presentation modalities, they could be predominantly driven by strong activation in only one of the two conditions (Nichols, Brett, Andersson, Wager, & Poline, 2005). Therefore, a conjunction analysis that reveals only voxels that were activated independently in both conditions, was necessary. The results of the conjunction analysis are displayed in Figure 5, Row C. It revealed the largely the same clusters (albeit much smaller, due to the much more stringent nature of the analysis), with the exception of the posterior cingulate cortex, which was absent. Although not clearly shown in Figure 5, there was bilateral posterior parahippocampal gyrus activation. The left ATL cluster was greatly reduced in size to two small separate clusters in the anterior MTG. This is likely driven by the lack of supra-threshold activation in the auditory modality, which again may reflect reduced power due to there being fewer subjects than in the visual form modality.
No interactions between Familiarity and Modality were observed in the whole-brain analysis. A strong form of our hypothesis regarding contributions of modality-specific regions to conceptual processing would be that an interaction should be observed such that they will respond equally to novel and familiar concepts in the associated input modality, whereas other presentation modalities will evoke activation only for familiar concepts that are more robustly represented across ‘hubs’ and ‘spoke’ regions. Statistically speaking, we believe this is unlikely to be observed, especially in a conservative whole-brain approach, as the effects of familiarity are in all probability much more subtle than modality effects and this has the potential to cloud an interaction effect. Additionally, the stimuli in each presentation modality differ not only in modality but in perceptual richness and task difficulty. These confounds are likely to introduce noise that make direct comparisons between modalities difficult to interpret and interactions unlikely. As such, we focused primarily upon regional effects of familiarity independently within each modality.
The location of ROIs and the results of this analysis are displayed in Figure 6. The results are largely consistent with those in the whole-brain analyses above (Sections 3.1 and 3.2). Both the left and the right ATL ROIs were significantly more active for familiar entities regardless of whether the stimulus was the visual form of or the sound made by objects. In the whole-brain analyses in Section 3.1, only the right ATL was activated and only in the visual form condition. The greater statistical power afforded by the ROI analyses therefore appeared to greatly improve sensitivity to effects of familiarity.
Likewise, the superior temporal gyrus (STG) ROI, positioned over auditory association cortex, was significantly more active for familiar > novel entities when probed in the auditory domain and when probed by the visual form, albeit the effect was smaller in the latter. This pattern held for both the left and right hemisphere homologues. On the contrary, no significant effects of familiarity were observed in the ventral occipito-temporal cortex (vOTC) ROI, although there was a trend in the sound condition in the right hemisphere ROI (p = 0.11). This ROI was situated more posteriorly than the activation observed in the whole-brain analysis and thus it may have not been optimally situated to detect such an effect. The whole-brain results implicated the mid parahippocampal gyri and the lateral extrastriate visual cortex, further downstream in the visual processing hierarchy. This is consistent with prior observations that increased cortical activity of visual regions associated with recognition shifts more anteriorly with increased certainty of an object’s identity (Bar et al., 2001).
In this fMRI study, we set out to explore the roles of unimodal sensorimotor association areas and transmodal temporal association cortices in semantic processing. Distributed-only “embodied” theories propose that conceptualization is underpinned by reactivation of perception-action traces distributed across a network of remote unimodal association areas (Allport, 1985; Pulvermüller, Moseley, Egorova, Shebani, & Boulenger, 2014). An alternative theory suggests than in addition to the contribution of sensorimotor regions, or “spokes”, there is convergence of multi-modal information and further abstraction/computation within a centralized transmodal representational “hub”. Critically, the ‘hub-and-spoke’ model, as instantiated computationally by Rogers et al. (2004), requires the conjoint simultaneous action/computations of all the distributed ‘spokes’ and the ‘hub’ for conceptual representation. The hub is purported to be located specifically in the anterior temporal lobes, bilaterally. We tested a hypothesis congruent with the hub and spoke model, specifically that activation during semantic processing tasks should be observed both in a distributed set of unimodal association areas and the ATL. Moreover, this distributed pattern of activation should be observed irrespective of the input modality such that association areas not directly engaged by the stimulus should be activated in addition to those directly stimulated (e.g., auditory areas should be engaged by visual semantic stimuli). In order to disentangle activation related to semantic processing from that related to modality-specific perceptual processes, we employed a semantic manipulation of familiarity. Familiar and novel entities were closely matched in terms of perceptual complexity and therefore we hypothesized that any difference in activation between these groups of entities would reflect ‘depth’ of semantic encoding rather than pre-semantic perceptual differences. Familiar object concepts are likely to have more coherent representations in the sensorimotor spokes, and as such recognition will evoke greater, more robust activations (see also Bar et al., 2001).
We interpret our findings as grossly supportive of the overarching predictions of the hub-and-spoke model. The bilateral ATL exhibited a greater response to familiar items compared to newly learned entities. The ROI analyses demonstrated that this ATL activation occurred irrespective of the input modality (environmental sound or visual form), a finding that is consistent with previous investigations delineating this cortical region as a transmodal hub (Vandenberghe et al., 1996; Visser, Jefferies, & Lambon Ralph, 2010). Our major finding relates to the contribution of unimodal spoke regions to semantic processing. We demonstrated that activation of auditory and visual association regions during a semantic task does not demand direct sensory stimulation. Auditory association cortex exhibited activation not only when semantic knowledge was probed by auditory stimuli but also, albeit less strongly, when it was probed via visual form. This result suggests that presentation of the visual form of, for example, a dog activates representations of its characteristic environmental sound. Our results regarding the contribution of visual association regions were less straightforward, although we believe they are not at odds with our hypothesis.
Our a priori hypothesis regarding the location of a visual spoke was derived from previous functional imaging studies that demonstrated semantic category-related patterns of activation in the ventral occipito-temporal cortex including the posterior fusiform gyrus (Aguirre, Zarahn, & D’esposito, 1998; Chao et al., 1999; Kanwisher, McDermott, & Chun, 1997). This region, situated anterior to early visual cortex (and the effect of modality in the present study), is associated with high-level visual processing, including object identification. Under the distributed-only framework, these category-related effects have been interpreted as indicative of an exclusive role in visual semantic feature information, which renders it a candidate visual spoke region (e.g., Martin et al., 2000). However, we failed to observe any significant differential response to familiar objects compared to newly learned objects within this region of interest that would support a conceptual-level function, particularly one of representing object-specific information. Category-related patterns of activity may, therefore, reflect a function that is limited to global visual feature conjunctions shared across many category-exemplars. Tyler and colleagues (2004) have argued that there exists a gradient of specificity along the posterior-anterior axis of the ventral temporal lobe, such that posterior regions encode gross domain distinctions with progressive conjunctions of features occurring along the anterior axis, giving rise to specific person and object-knowledge distinctions in the most anterior portions of the ventral visual pathway, including perirhinal cortex and proximal regions such as parahippocampal cortex (see also Bar et al., 2001). Our whole-brain analyses revealed greater activation for familiar objects in the medial temporal cortex bilaterally. This raises the possibility that the visual spoke is located within a more anterior distribution to our ROI. The effect of familiarity in the medial temporal lobe was, however, only observed in the visual form condition. Therefore, our hypothesis that an object’s sound would activate visual association regions reflecting recapitulation of its form was not met in this region. Still, failure to reject the null hypotheses may stem from the lower number of subjects in the sound condition and a lack of statistical power to detect this effect in this region.
The lateral posterior temporal and ventral parietal cortex abutting the lateral extrastriate visual cortex exhibited an effect of familiarity in both the visual form and sound conditions. The lack of a modality-specific effect implicates some of this region as omni-modal in nature. Indeed, this area has been described as heteromodal cortex important for conceptual representation (Bonner, Peelle, Cook, & Grossman, 2013; Price, Bonner, Peelle, & Grossman, 2015) and the fact that it is located between dorsal auditory region and ventral visual regions make it ideally suited for cross-modal integration (Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Binney, Parker, & Lambon Ralph, 2012). It has also been implicated as part of a network subserving an executive component of semantic processing (Jefferies, 2013; Noonan, Jefferies, Visser, & Lambon Ralph, 2013). On the other hand, this region has also been demonstrated to exhibit category-specific patterns of activation (Chao et al., 1999) and thus might also be considered a candidate spoke region, possibly even a visual spoke. Specifically though, it has been suggested that it may be site for stored information about object motion (Martin, 2000; Beauchamp et al, 2003). Activation in this region could reflect retrieval of object motion features. However, the present results could feasibly be explained by either one of these accounts, and our study design cannot adjudicate between either.
Analysis of the point-light motion condition failed to reveal any significant effects of familiarity. This brings into question whether the stimuli were suited to detect neural responses associated with access to object-specific information. Point light displays have been successfully demonstrated to evoke responses in the lateral posterior temporal cortex (Beauchamp, Lee, Haxby, & Martin, 2003) but these responses were modulated by category-level distinctions (biological vs. non-biological motion) that we did not explicitly contrast in the present study. Moreover, the lateral posterior temporal cortex may be rather less tuned to finer-grained within-category distinctions probed by familiarity. It is unclear, however, why there was no clear response to familiar and newly learned concepts in the whole brain analysis (with the exception of a small motor cortex cluster). One possibility is that point light may reflect an ecologically unnatural means to probe object recognition and subsequent conceptualization compared to the form and sound conditions (i.e., we have no evolutionary precedent for naming objects depicted from point light). As such, point light stimuli may elicit high inter-subject and inter-item variability that consequently produce null results at the group level.
The cognitive subtraction of novel from familiar objects offers a unique approach to isolating semantic activation. We observed such activation in regions considered to be some of the key components of the core network subserving semantic cognition and language comprehension (Jefferies, 2013; Turken & Dronkers, 2011), including the bilateral ATL, the posterior middle temporal gyrus and the temporo-parietal junction, which supports the effectiveness of this approach. This also suggests that greater activation for familiar relative to novel objects observed in unimodal sensory association regions reflects semantic upregulation of these regions. It remains to be demonstrated, however, whether this reflects processes key to conceptualization or other cognitive or perceptual processes that are differentially engaged by familiar objects/concepts but not directly associated with conceptual processing (see also Mahon, 2014; Reilly et al., 2014). Limitations in the temporal resolution of fMRI preclude addressing these contingencies directly in the present study. In lieu of future investigations that will, some alternatives are worthy of consideration. Rather than reflecting a process of conceptualization, greater activation of a directly stimulated modality-specific region by familiar compared to newly learned objects could reflect greater semantic feedback to sensorimotor regions acting as a top-down influence on high-level perceptual processing (Hon, Thompson, Sigala, & Duncan, 2009). When the modality-specific region is not directly engaged by the stimulus, the differential activity could be interpreted as reflecting enhanced mental imagery or perceptual simulation (although not necessarily at the level of conscious awareness). This might follow from familiar items having more coherent representation in the sensorimotor domain and richer associations in episodic memory. Indeed, we also observed greater activation for familiar objects in posterior ventral cingulate cortex which has been linked to episodic memory processes, including the internal generation and maintenance of mental imagery (Vann, Aggleton, & Maguire, 2009). This is to say that activation of action-perception traces and episodic associations could be a result of feedforward propagation and secondary to conceptual processing, playing an elaborative rather than integral role. Mahon (2014) likened this phenomenon to the domain of speech perception where a common finding is that as we hear words, their corresponding orthographic representations are rapidly activated. This form of ‘passive priming’ may occur, but few would argue that activation of orthography is a necessary pre- or co-requisite for effective speech perception.
Contrasting familiar and novel entities to reveal semantic activation may also come with pitfalls. In particular, naming familiar entities is likely to be easier than naming novel entities and thus contrasting the two has the potential to yield activation patterns not entirely associated with conceptual processing. Such contrasts may invoke regions considered to play a functional role in the Default Mode Network (DMN), for example. The DMN is an anatomically-defined network that shows deactivation during goal-directed tasks as compared to rest, with some component regions showing stronger task-induced deactivations as task difficulty increases (Humphreys, Hoffman, Visser, Binney, & Lambon Ralph, 2015; Seghier & Price, 2012). The DMN includes medial prefrontal cortex, the precuneus, the posterior cingulate cortex, the hippocampus and the temporo-parietal junction (including the angular gyrus) (Buckner, Andrews-Hanna, & Schacter, 2008). Many of these DMN regions were activated in the familiar > novel contrasts we reported here. One explanation for this is that during rest, the brain is afforded more opportunity for task-unrelated thought, which may engage processes such as internal speech and mental imagery (Binder et al., 1999). Along the same lines, if the familiar items are easier to name, then component of the DMN may appear to activate more during these trials than in novel item trials not because they are more engaged but because they are less engaged in the task. Although we trained participants on both familiar and novel items to 100% accuracy, it remains possible/probable that naming novel items was more cognitively demanding than naming familiar items. Thus, task difficulty poses an alternative explanation for the large areas of activation we observed in the posterior cingulate cortex, precuneus and medial frontal cortex.
In conclusion, we provide novel empirical support for the hypothesis that both the ATL hub and modality-specific spokes are engaged conceptual processing. Future work will address some of the above outlined issues and determine whether the observed involvement of modality-specific ‘spoke’ regions is necessary for performance of semantic tasks. The question of necessity arguably cannot be answered through fMRI or any other single experimental method currently at our disposal. For example, in the current study, we cannot know whether spokes are engaged in parallel or are instead sequentially “turned on” by rapid distillation through a hub. The question of necessity for spoke activation will be gleaned through converging and perhaps tandem evidence from a variety of sources with varying scales of temporal and spatial resolution (e.g., MEG, TMS, neuropsychology, fMRI).
We thank Alison O’Donoughue for her assistance in scheduling and recruiting participants. This study was supported by US Public Health Service Grants R01 DC013063 (JR), and T32 T32AG020499 (AG).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.