|Home | About | Journals | Submit | Contact Us | Français|
FMRI studies have revealed three scene-selective regions in human visual cortex (the Parahippocampal Place Area (PPA), Transverse Occipital Sulcus (TOS) and RetroSplenial Cortex (RSC)), which have been linked to higher-order functions such as navigation, scene perception/recognition, and contextual association.
Here, we document corresponding (presumptively homologous) scene-selective regions in the awake macaque monkey, based on direct comparison to human maps, using identical stimuli and largely overlapping fMRI procedures. In humans, our results showed that the three scene-selective regions are centered near - but distinct from - the gyri/sulci for which they were originally named. In addition, all these regions are located within or adjacent to known retinotopic areas. Human RSC and PPA are located adjacent to the peripheral representation of primary and secondary visual cortex, respectively. Human TOS is located immediately anterior/ventral to retinotopic area V3A, within retinotopic regions LO-1, V3B, and/or V7.
Mirroring the arrangement of human regions FFA and PPA (which are adjacent to each other in cortex), the presumptive monkey homologue of human PPA is located adjacent to the monkey homologue of human FFA, near the posterior superior temporal sulcus. Monkey TOS includes the region predicted from the human maps (macaque V4d), extending into retinotopically-defined V3A. A possible monkey homologue of human RSC lies in the medial bank, near peripheral V1.
Overall, our findings suggest a homologous neural architecture for scene-selective regions in visual cortex of humans and non-human primates, analogous to the face-selective regions demonstrated earlier in these two species.
A sense of ‘place’, and the ability to recognize the environment and localize oneself within it, is crucial for survival in most animals. Although place-related cues take myriad forms across the animal kingdom, visual cues predominate in humans and other primates. In humans, functional MRI studies (Aguirre et al., 1996; 1998; Epstein and Kanwisher, 1998; Ishai et al. 1999; Maguire; 2001; Grill-Spector 2003; Hasson et al., 2003; Bar and Aminoff, 2003) have described three visual cortical regions that are more active during the presentation of ‘places’ (typically, scenes or isolated houses) compared to the presentation of other visual stimuli such as faces, objects, body parts, or scrambled scenes. Typically, these human brain regions are named for nearby anatomical landmarks: 1) Parahippocampal Place Area (‘PPA’); 2) Transverse Occipital Sulcus (‘TOS’); and 3) RetroSplenial Cortex (‘RSC’).
Although all three regions respond well to scenes, recent fMRI studies have revealed intriguing functional differences between them. For instance, PPA reportedly processes the visual-spatial structure of scenes (Epstein and Kanwisher, 1998), responding to changes in viewpoint and to scene novelty, but not during the navigation tasks – whereas RSC responds in the opposite way (Epstein et al., 1999; Epstein et al., 2003; Park and Chun, 2009).
Such evidence suggests that these regions form a network for scene processing, analogous to the well-known network for face processing. Based on human fMRI, this face-processing network includes several regions, including OFA, FFA, and the anterior face region (e.g. Kanwisher et al., 1997; Grill-Spector et al., 2004; Rajimehr et al., 2009). Recent studies have revealed neurobiological mechanisms underlying this network by studying homologous regions in macaque monkeys (Tsao et al., 2003, 2008a; Rajimehr et al., 2009). Primate studies have shown that: 1) at least some of these face-processing regions are anatomically interconnected, as shown by microstimulation combined with fMRI (Moeller et al., 2008); 2) these regions are organized hierarchically, based on physiological recordings (Freiwald and Tsao, 2010); and 3) this face-processing network extends to prefrontal cortex, as demonstrated by fMRI activation (Tsao et al., 2008b). Thus, studies of the face-processing network in monkeys have greatly expanded our understanding of the neurobiological substrates of face perception and recognition.
Analogously, our main goal here was to test for macaque homologues of human PPA, TOS and RSC, to enable subsequent studies of scene-processing mechanisms in macaque cortex. To generate an optimal reference map, we first defined the precise locations of these regions in human cortex. These maps indicated that all three scene-selective regions are centered near but not on the sulci/gyri for which they were named. Moreoever, these ‘scene-selective’ regions are located in-or-adjacent to known retinotopic areas, including the lowest-tier areas V1 (adjacent to RSC) and V2 (adjacent to PPA). In macaques, the homologue of PPA is located adjacent to the FFA homologue, mirroring the topography of adjacent human regions FFA and PPA. The macaque fMRI also revealed a homologue of human TOS, which included V3A. Preliminary versions of this work have been presented previously (Devaney et al., 2008).
17 normal human subjects (7 females, 22–33 y/o) with normal or corrected-to-normal vision, were tested in 1–3 experimental sessions each (Table 1). Written informed consent was obtained from each subject before the experiments. All experimental procedures were approved by Massachusetts General Hospital protocols.
7 juvenile male macaque monkeys (Macacca mulatta) were used in these studies (Table 2). Three of the monkeys (4–6 kg) were studied at the Massachusetts General Hospital (MGH), and four (5.0–8.5 kg) were studied at the National Institute of Mental Health (NIMH). Surgical details and the training procedures for the monkeys were similar across the two sites, and described in detail elsewhere (e.g. Vanduffel et al., 2001; Tsao et al., 2003; Bell et al., 2009). All experimental procedures conformed to NIH guidelines, and were approved by experimental protocols at MGH and NIMH, respectively.
Human subjects were scanned in a horizontal 3T Siemens Tim Trio MR imager at MGH. Gradient echo EPI sequences were used for functional imaging (TR 2000 ms, TE 30 ms, flip angle 90°, 3.0 mm isotropic voxels, and 33 axial slices). A 3D MP-RAGE sequence (1.0 mm isotropic) was used for high-resolution anatomical imaging from the same subjects.
Throughout the functional scans, all subjects continuously fixated a small fixation spot at the center of visual display. To control attention level during the functional scanning, subjects reported an unpredictably timed color change for the fixation target, except as noted. Each session consisted of 10–15 functional runs, and each run contained 14 blocks (block duration = 16 or 24 s).
All primates were implanted with a MR-compatible headpost, and trained to work in the sphinx position in a MR-compatible horizontal restraint device. As in the human task, all monkey subjects were required to fixate a small spot at the center of the display screen, near-continuously. Eye position was monitored using an infrared pupil tracking system (ISCAN Inc.). Monkeys were rewarded with water or juice for maintaining fixation within a square-shaped central fixation window (typically, 2° × 2° in size) surrounding the fixation spot.
Primate scanning at MGH used the 3T scanner described above. A gradient echo EPI sequence was used for functional imaging (TR 2000 ms, TE 19 ms, flip angle 90°, 1.0 mm isotropic voxels, and 50 axial slices). Each monkey session consisted of 20–25 functional runs, with each run containing 14 blocks (block duration = 30 or 40 s). Each monkey was scanned for 2–5 sessions, and data from all sessions were averaged together. To increase functional sensitivity in the monkey scans (in part, to compensate for smaller voxels in the smaller primate brains), we used a gradient insert coil (Siemens AC88), parallel imaging with a four-channel phased array coil, and an exogenous contrast agent (MION; 8–10 mg/kg IV). Previous studies (Vanduffel et al 2001, Leite et al 2002, Tsao et al 2003) within the same animals have confirmed that MION and BOLD label corresponding cortical areas (e.g. Vanduffel et al., 2001; Leite et al., 2002), although within-area activity details may differ slightly (Smirnakis et al 2007). For each monkey, structural scans were also acquired using a 3D MP-RAGE sequence (0.35 mm isotropic voxels), during anesthesia.
Imaging data were collected using a 3T GE scanner. A gradient echo (EPI) sequence was used for functional imaging (TR 2000ms, TE: 17.9 ms, flip angle 90°, 1.5 mm isotropic voxels, 27 coronal slices) with an 8 channel surface coil array, based on MION (7–11 mg/kg IV). Each session consisted of 10–30 functional runs containing 3 blocks (block duration = 40 s). Each monkey was scanned for 2 sessions, and data from all sessions were averaged together. High-resolution T1-weighted whole-brain anatomical scans (voxel size: 0.5 mm3) were also acquired on a 4.7T Bruker scanner with a Modified Driven Equilibrium Fourier Transform (MDEFT) sequence.
For all human and monkey subjects, functional and anatomical data were preprocessed and analyzed using FreeSurfer (http://surfer.nmr.mgh.harvard.edu/). For each subject, the cortical surface was extracted and reconstructed, allowing analysis on both the ‘inflated’ and ‘flattened’ views.
All functional images were motion corrected, spatially smoothed (unless otherwise noted) using a 3D Gaussian kernel (2.5 mm HWHM in humans and 1 mm HWHM in monkeys), and normalized across scans. The estimated hemodynamic response was defined by a γ function, and then the average signal intensity maps were calculated for each condition. Voxel-wise statistical tests were based on a univariate general linear model. The significance levels were projected onto the inflated/flattened cortex after a rigid co-registration of functional and anatomical volumes. For monkey data, additional manual corrections were also applied to avoid possible misalignment between functional and structural scans. Using Freesurfer, functional maps were spatially normalized across sessions (in monkeys) and across subjects (in humans and monkeys). Then, activity within individuals monkey and human brains were transformed spatially onto the ‘averaged human’ and ‘averaged monkey’ brains respectively (for details see Fischl et al., 1999) and averaged using a fixed-effects model.
As noted in different analyses, the averaged human cortical surface was based on either the 10 subjects participating in our main study, or 40 independent human subjects (FreeSurfer). For all monkeys, we generated an averaged anatomical surface based on the four NIMH monkeys, and projected the averaged activity onto those anatomical maps.
In human subjects, flattened maps were generated using largely automated routines in Freesurfer. These procedures automatically created a number of cuts around the medial aspect of the inflated surface: one in a region around the corpus callosum to remove all midbrain structures, one down the fundus of the calcarine sulcus, a set of equally spaced radial cuts, and a sagittally oriented cut around the temporal pole. The resulting cut surface was projected onto a plane that was oriented perpendicular to the average surface normal at each cortical site. Further details of these procedures are described elsewhere (Fischl et al., 1999).
For all experiments (human and macaque) at MGH, stimuli were presented via a LCD projector (Sharp; 1024 × 768-pixels resolution, 60-Hz refresh rate) onto a rear-projection screen using a PC. Matlab 7.0 and Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) were used to program the experiments. The stimuli were presented in a blocked design. Within a given functional scan, the first and last blocks were always null epochs (i.e., a fixation point on a black background), to allow the hemodynamic response to reach a steady state. The remaining stimulus blocks were ordered pseudo-randomly, without a rest period between them. Within each block, stimuli (see below) were presented for 1 s.
Corresponding stimulus presentation details were similar for the monkeys tested at NIMH. There, stimuli were presented via a Sharp Notevision3 projector (resolution 1024 × 768), via Presentation software (12.2). Each block lasted 40 s, during which 20 images were presented for 2 s each, alternating with 20 s fixation blocks (neutral gray background). Individual scanning runs began and ended with a block of baseline fixation.
We used 4 different sets of scenes. Image set #1 included achromatic (gray-scaled) scenes, including 23 images of furnished or empty rooms, and 23 outdoor scenes (cities or natural landscapes). Set #2 included 8 naturally colored images of the scanning rooms that were all familiar to the subjects (Rajimehr et al., 2009). Set #3 were 8 achromatic scenes of familiar locations outside the scanning rooms, including both indoor and outdoor images. Set #4 included 8 achromatic scenes of unfamiliar places, including both indoor and outdoor images.
Three different sets of face images were used in this experiment. Image set #1 were 23 images of individual faces (contrasted with scene set #1). Set #2 included 8 colored face mosaics that included multiple equal-sized faces adjacent to each other (contrasted with scene set #2; Rajimehr et al., 2009), of equal retinotopic extent to the scene set. Set #3 included computer-generated (FaceGen) faces, similar to those used by Yue et al. (2011).
Set #1 included 8 unfamiliar computer-generated objects (‘blobs’) (Yue et al., 2010). Set #2 were 8 images of tools (Bell et al., 2009). Set #3 included 8 scrambled versions of the scene stimuli. The scrambled images were based on perturbing a random noise field at different scales, to match the original image statistics (Portilla and Simoncelli, 2000).
To map the retinotopic organization within the central ~half of the cortical representations (10° radius in the visual field), we used two complementary sets of retinotopic stimuli. Set #1 were scenes and face mosaics (Face set #2), which were presented within retinotopically-limited apertures, on a black background. The retinotopic apertures included: (1) a foveal disk (1.5° radius); (2) a peripheral annulus (5° inner radius and 10° outer radius); (3) an upper vertical meridian wedge (10° radius and 60° angle); (4) a lower vertical meridian wedge (10° radius and 60° angle); (5) a left horizontal meridian wedge (10° radius and 30° angle); (6) a right horizontal meridian wedge (10° radius and 30° angle). Set #2 were phase-encoded, contrast-reversing (1 Hz) checkerboards within continuously rotating rays or continuously expanding/contracting ring stimuli, as described previously (Sereno et al., 1995; Tootell et al., 1997).
In one subject, we also mapped the representation of the far peripheral visual field, using radially-scaled, contrast-reversing checkerboards presented at a range of eccentricities from 70° to (and beyond) the visible limits of the visual field, centered on the vertical and horizontal meridians (retinotopic set #3).
Stimuli were identical to the human scene set #2, face set #2, and retinotopic set #1, described above.
The stimuli used at NIMH were achromatic photographs from three image categories, all relatively familiar to the monkeys. Set #1 were individually presented monkey faces, from the local colony. Set #2 were scenes of the NIMH scanning, training and housing rooms. Set #3 were objects from those environments. Retinotopic stimuli were not used in the monkeys at NIMH.
Figure 1A–E illustrates the group-averaged scene-selective activity from the main group of human subjects (n=10, Table 1), using faces as control images, in the folded (Figure 1A–B), inflated (Figure 1C–D) and flattened (Figure 1E) cortical surfaces. Consistent with previous studies (Epstein et al., 2007; Park and Chun, 2009), we found significantly higher responses to scenes in three main regions, bilaterally, in the vicinity of: 1) the parahippocampal gyrus (‘PPA’), 2) the transverse occipital sulcus (‘TOS’), and 3) the retrosplenial cortex (‘RSC’).
For comparison, Figure 1F shows a fMRI map from an awake fixating macaque monkey, in response to the same stimuli, displayed in the same cortical surface format. As in humans, multiple scene-biased regions were evident in the macaque. Regions that appear to correspond in the two species (presumptive homologues) are named accordingly in white (cf. Figure 1E and 1F). Below, this putative map correspondence was tested in detail.
For simplicity and historical continuity, we used the original names for the human scene-selective regions PPA, TOS and RSC (Epstein et al., 1999; Grill-Spector, 2003; and Maguire, 2001, respectively). We also extended the original naming scheme to indicate presumptive monkey homologues of these areas, by adding ‘m’ (i.e. mPPA, mTOS, mRSC). However because the present evidence revealed inaccuracies in all these names, a new set of names is proposed in the discussion section, which remain correct across both human and macaque cortex.
To clarify the functional maps of PPA, it is helpful to first document a detail in the anatomical maps. Generally, the fusiform gyrus is described as a single uninterrupted gyrus (e.g. Polyak, 1957; Duvornoy, 1999). However one group (Chao et al., 1999; Haxby et al., 1999) distinguished fMRI activity on the ‘medial fusiform’ gyrus, from that on the ‘lateral fusiform’ gyrus. Here we found that this functional subdivision has a rough anatomical correlate: the central portion of the fusiform gyrus is usually split along its length by a shallow sulcus. We named this the ‘middle fusiform sulcus’, separating the ‘medial fusiform’ gyrus from the ‘lateral fusiform’ gyrus.
Figure 2 shows this anatomical feature in the averaged MRI-based cortical surfaces from two independent subject pools: 1) the current group average (n = 10; Figure 2A–B) and 2) the averaged surfaces from the standard FreeSurfer average brain (n = 40; Figure 2C–D). This cortical surface analysis averages the cortical folding pattern (i.e. the gyri and sulci) without conventional volumentric (3D) blurring. However, note that the cortical folds in each individual surface are best fit to the group-averaged folding pattern, so the individual maps are subject to minor 2D misalignment relative to the group map (Fischl et al., 1999).
The middle fusiform sulcus (white arrow) is apparent in both group-averaged cortical surfaces (Figure 2B and D). In the n = 40 surface, the middle fusiform sulcus is only 2.5 mm deep, thus ~ 5 mm across the cortical surface. By contrast, the two sulci defining the external border of the fusiform gyrus (i.e. collateral and temporal occipital sulci) are much deeper, with maximum depth of ~10 and 6 mm, respectively. In our n=17 subject pool, the group values were similar: the depth and length of the middle fusiform gyrus in those individual surfaces ranged from 2–5 mm and 8–54 mm, respectively.
To confirm the presence of this sulcus in actual brains, we examined ex vivo brains from human autopsy. A middle fusiform sulcus was present in 20 of 24 hemispheres examined (83%). Examples are shown in Figure 3.
Figure 4 shows the location of scene-selective activity in this region (‘PPA’), from the main human dataset (n=10), based on group-averaged maps of the anatomy and function from a common set of subjects. Also shown is the center of fMRI activity (the voxel showing the highest statistical bias for scenes) in the group average (Figure 4C) and in the individual data comprising our group map (Figure 4D). Counter to expectations, we found that this ventral scene-selective region (the ‘parahippocampal place area’) was not centered on the parahippocampal gyrus. Instead, it was consistently centered near the lateral lip of the collateral sulcus, where it meets the medial fusiform gyrus, in both our group-averaged and the individual maps, in both hemispheres. Of course, a lower-amplitude activity bias could extend onto the parahippocampal gyrus, depending on the statistical threshold chosen, the levels of signal averaging and spatial filtering, and variations between individuals.
There is no single, quantifiable stimulus comparison for localizing PPA. Instead, different studies have localized this region based on correspondingly different scenes or houses, contrasted with various sets of faces, objects, body parts, and/or scrambled scenes. Thus it could be argued that the location of ‘PPA’ varies with the stimuli used to localize it. This could occur if the optimal stimuli vary continuously (instead of area-wise) across the cortical sheet (e.g. Wang et al., 1996; but see Tootell et al., 2008). Alternatively, it could occur in some models of a distributed representation (e.g. Ishai et al., 2000a, b). Conceivably, either of these hypotheses could explain the presence here of a scene-selective patch of activity located lateral to, instead of on the parahippocampal gyrus.
To address this, we directly tested whether the location and topography of ‘PPA’ varies due to corresponding stimulus variations. Figure 5 shows the results produced by four different sets of scenes vs. natural and computer-generated faces, objects, or scrambled scenes (see Methods). Despite these wide stimulus variations, the topography of ‘PPA’ remained remarkably constant in comparisons within a common subject pool.
Thus the unexpected localization of the scene-selective region here (away from the crown of the parahippocampal gyrus) cannot be attributed to stimulus differences between the current vs. past studies. Instead, these results in ‘PPA’ are fully consistent with results in classic lower-level visual areas, such as V1, V2, MT: none of these areas change shape or move across the cortical map, dependent on object stimulus variations.
How does the unexpected PPA localization here compare with analogous localizations in the literature? To clarify this, the following meta-analysis was conducted. The centers of previously published scene-biased activity in this region were translated onto a common, standardized cortical surface (using FreeSurfer and its averaged human brain) based on Talairach coordinates (Talairach and Tournoux, 1988) reported in previous studies (Table 3). Coordinates were found in 12 neuroimaging comparisons of scenes or buildings, relative to faces, objects, or scrambled scenes. Each study was assigned a character, and that distribution is shown in Figure 6. Eleven studies were based on fMRI; one was based on PET.
The averaged center of PPA in the current data (asterisk, from Fig. 4C) lies squarely in the middle of these previously published sites; thus our data were representative. Among the previously published sites, five were located on the crown of the medial fusiform gyrus, but none was on the crown of the parahippocampal gyrus. This and prior descriptions (Haxby et al., 1999; Levy et al., 2004) suggest that the ‘parahippocampal place area’ is not centered on the parahippocampal gyrus (see Discussion); instead it is located lateral to that gyrus. However as noted above, sub-maximal activity beyond the center can extend onto adjacent regions of the cortical surface (including the parahippocampal gyrus), depending on thresholding and related factors.
All but one of the remaining sites were located along the lip of the collateral sulcus, which divides the medial fusiform gyrus from the parahippocampal gyrus. The confluence of published centers along the lip (but not within the depth) of the collateral sulcus may reflect signal contributions from the large vein that overlies the collateral sulcus (e.g. Menon et al., 1993; Kim et al., 1994), in addition to signal contributions arising from the gray matter itself.
Human PPA is located immediately adjacent to FFA in the cortical sheet, on the medial side, on the side closest to the splenium of the corpus callosum. Thus any candidate homologue of PPA in the monkey (‘mPPA’) should also lie immediately adjacent to monkey FFA (mFFA), on the side closest to the splenium.
To test that prediction, it was necessary to first localize mFFA as a reference landmark. Previously (Tsao et al., 2003; Rajimehr et al., 2009), the location of mFFA was defined based on quantitative transformation of cortical areas in the human and macaque maps, using fMRI and equivalent stimuli, based on maps from individual monkeys. In both reports, mFFA is the large, high-amplitude face-selective patch located ~mid-posteriorly along the length of the superior temporal sulcus, extending from the ventral bank onto the lip of the middle temporal gyrus (e.g. black asterisk in Figure 1F).
However, additional face-responsive patches have also been reported in this cortical region, which might confuse the accurate localization of mFFA. In the simplest account, both monkeys and humans have two main face patches in corresponding cortical regions of each hemisphere, with the more posterior patch comprising (m)FFA (Hadj-Bouziane et al., 2008; Rajimehr et al., 2009; Bell et al., 2009; Pinsk et al., 2005; 2009). Another account is more complex: the monkey has either three (Tsao et al., 2003) or six (Tsao et al., 2008a) face patches in each hemisphere, whereas humans have three (Tsao et al., 2008a) in this occipito-temporal region.
It is possible that this discrepancy arises in part from variation in the individual maps chosen for illustration. To date, group-averaged maps have not been calculated for the monkey face patches, which would reduce or eliminate such individual variations. To remedy this, group-averaged maps were first calculated from the fMRI data from three monkeys (Table 2) used throughout this study, based on the same localizing stimuli used in human subjects (faces vs. scenes). In the monkey experiments, we used an exogenous contrast agent (MION; see methods) which increased the spatial specificity of the MRI signal compared to the more conventional BOLD signal used in human studies (Mandeville et al., 1999; Vanduffel et al., 2001; Leite et al., 2002). These averaged data showed two main face patches in each hemisphere (Figure 7), consistent with those described earlier (Hadj-Bouziane et al., 2008; Rajimehr et al., 2009; Bell et al., 2009; Pinsk et al., 2005; 2009; see also Figure 1E).
To confirm this finding, we calculated a second group-averaged map based on an additional and independent set (n=4) of monkeys. This second set of activity maps was generated in a different laboratory (NIMH), using a different scanner, based on stimuli that were familiar to the monkeys (i.e. faces of conspecifics, scenes and objects from the laboratory) – as opposed to stimuli that were matched to the human localization studies, as tested first. Despite these technical differences, again the group averages showed two main face patches (black asterisks and arrowheads, Figure 8), as expected from previous reports (ibid.).
At lower thresholds, additional, smaller face-biased patches were sometimes found within a given monkey, as described elsewhere (Tsao et al., 2008; Ku et al., 2011). However the presence and location of such additional patches varied across animals, dependent on threshold level and other factors. Accordingly, those patches did not survive group averaging. Note also that face-selective activity in mFFA sometimes extended farther posteriorly (in or near V4d) as in the human maps (e.g. Figure 1F). However, in both species, retinotopic maps from the same subjects suggest that this variable posterior activity reflects a difference in stimulus size/position, not necessarily face selectivity per se.
Based on the cortical maps, a candidate mPPA should lie adjacent to this main face patch (mFFA) in the monkey cortical map, analogous to the relationship of FFA to PPA in the human map. Thus, in macaques, mPPA should lie on the crown of the middle temporal gyrus, slightly anterior and ventral to the posterior middle temporal sulcus.
Such a result has been shown in individual maps from two monkeys (Rajimehr et al., 2011). Here, that initial finding was confirmed in both sets of group-averaged data (Figures 7 and and8).8). In one hemisphere, scattered regions of scene-biased activity also extended into the region of occipitotemporal sulcus (Figure 7D). However the latter activity was inconsistent in location, relative to the consistent peak of scene-selective activity in mPPA, in all four averaged hemispheres (Figures 7 and and88).
In humans, an additional focus of scene-selective activity is found in dorsal occipital cortex (Nakamura et al. 2000; Grill-Spector, 2003; Hasson et al., 2003; Epstein et al., 2005; Park and Chun, 2009) (Figures 1 and and9).9). Depending on experimental details, that dorsal patch can be as prominent as the one in PPA, in both amplitude and topographical extent. However, this dorsal occipital patch has received relatively little attention.
In the original report, the dorsal patch of scene-selective activity was localized on the transverse occipital sulcus; thus it was named ‘TOS’. However prior to that time, a classic retinotopically-defined area (‘V3A’) was also localized on the transverse occipital sulcus (Tootell et al., 1997). Thus, either: 1) the transverse occipital sulcus spans both activity-defined areas (i.e., V3A plus TOS); 2) the TOS region coincides with (or includes) V3A; or 3) the original localization of TOS is incorrect.
Our evidence supports the third hypothesis. When averaged across subjects and hemispheres, this scene-selective patch (‘TOS’) was centered on the crown of the lateral occipital gyrus (Figure 9), anterior and ventral to the transverse occipital sulcus. As in PPA, the centers of highest activity occurred on the edges of this gyrus, consistent with a contribution from the large veins overlying the adjacent sulci.
Human area V3A is easily defined based on retinotopic mapping stimuli, because it has a distinctive map of the complete contralateral visual field (Tootell et al., 1997). In Figure 10, we localized the scene-selective TOS region relative to retinotopically-defined area V3A, within all hemispheres in which V3A was unambiguously defined, based on two retinotopic criteria: 1) upper vs. lower field subdivisions, and 2) horizontal vs. vertical meridians (see Methods).
These data confirmed that TOS is consistently located immediately anterior and ventral to V3A, and dorsal to the confluent foveal representations in V1 through V3 (Figure 10). Thus TOS lies within explicitly retinotopic cortex – extending from V7 (Tootell et al., 1998) through V3B (Press et al., 2001) and LO-1 (Larsson and Heeger, 2003).
Next we tested whether a TOS homologue (‘mTOS’) exists in macaque visual cortex. When translated from the human maps to the macaque maps, a homologue for human TOS should lie immediately anterior to macaque V3A (Gattas et al., 1988), in macaque ‘V4d’, and/or the newly described retinotopic representations CIP-1, CIP-2 (Arcaro et al., 2011) and perhaps also the dorsal prelunate (DP) gyrus (Andersen et al., 1990; Heider et al., 2005).
However this specific human-to-monkey prediction is complicated by the existing maps of macaque V3A, which are not perfectly clear. The original single unit maps of V3A frequently showed a representation of the contralateral 180° on the anterior bank of the lunate sulcus, posterior to the prelunate gyrus (Van Essen and Zeki 1978; Gattass et al., 1988). However in some animals, the anterior (upper field) representation in V3A was less certain (Gattass et al., 1988). A similar uncertainty can be seen in fMRI maps of V3A in some macaques (e.g. the upper field representation in Figure 11). When defined by variations in polar angle, the fMRI maps of V3A in macaque consistently extend over the prelunate gyrus (Arcaro et al., 2011; see also Figure 11).
In all three animals in which the MR slice prescription included this region (MGH), we found patches of scene-selective activity in this general location, extending variably across both sides of the prelunate gyrus (e.g. Figure 1F, and Figure 7, black arrows). In two monkeys, we were also able to map the retinotopy (Figure 11). Direct comparison between the scene-biased and retinotopic maps showed that mTOS included area V4d, which is roughly the topographic equivalent of human areas V7, V3B and LO-1 (Figure 11C, D and F). However in macaques, this scene-selective activity also extended into area V3A, with some variability. In one hemisphere, mTOS was mainly in area V3A without any clear activity in area V4d (Figure 11E). Thus, ‘mTOS’ activity included V3A (as defined by the polar angle), plus areas more anterior to V3A (as in human TOS). Given the uncertainty in the definition of macaque V3A, it seems likely that the macaque TOS is homologous with human TOS.
A third patch of scene-selective fMRI activity was noted in human studies (Maguire et al., 1998; O’Craven and Kanwisher, 2000) and eventually attributed to ‘RSC’ (e.g. Maguire, 2001), referring to architectonically defined RetroSplenial Cortex (Brodman, 1909). However, the fMRI-defined scene-selective ‘RSC’ has not been localized in detail.
In our human maps, scene-selective RSC was consistently located in the fundus of the parieto-occipital sulcus, bilaterally (Figure 12A–B). Extrapolating from many early architectonic studies, the scene-selective RSC region thus lies near the peripheral retinotopic representations of primary and secondary visual cortex, V1 and V2. To localize these regions in more detail, we first compared functional and anatomical maps based on group-averaged data (Figure 12). Scene-selective RSC was localized using our main group-averaged data based on faces vs. scenes, as described above. V1 was localized anatomically, based on increased myelination in the stria of Gennari (Hinds et al., 2008), as translated to the current brain surface using spherical coordinates (Fischl et al., 1999). The topography of V2 was based on two kinds of data: 1) previous fMRI studies of the retinotopy in human V2 (Sereno et al., 1995; Engel et al., 1997; DeYoe et al., 1996; Pitzalis et al., 2006; 2009) up to 60° eccentricity, and 2) flattened human cortical tissue stained for cytochrome oxidase (Tootell and Taylor 1995; Horton and Hocking, 1998) including the far peripheral representation, which reveals thin stripes that are known to span the width of V2 (Tootell et al., 1983; Horton, 1984).
According to this group data, RSC is located immediately adjacent to V1. The close proximity of RSC to V1 and V2 is somewhat surprising, given the higher-order properties reported for RSC (Epstein et al., 2007; Park and Chun, 2009; Vann et al., 2009) (see Discussion).
These maps also revealed a partially mirror-symmetrical topography in scene-selective regions PPA and RSC (Figure 12). Although PPA lies farther away from the border with V1, both RSC and PPA lie adjacent to the peripheral representation of V2: PPA is located adjacent to the representation of the upper visual field, while RSC lies adjacent to the representation of lower visual field.
Given these unexpected results in the group-averaged data, we conducted more detailed tests to confirm these conclusions within an individual subject. Figure 12D–F shows those results, based on patterns of fMRI activity produced by: 1) scenes vs. faces (set #2; to label RSC and PPA); 2) vertical vs. horizontal meridians in the central 20° (retinotopic set #1); 3) monocular activation of the visible limit of the ipsilateral far periphery (the monocular crescent) of the visual field, vs. the (invisible) farther periphery (see Methods). As a reference, we also included the group-averaged border of V1 based on the stria of Gennari.
Overall, we found a good match between the group-averaged data and the individual data. The retinotopically-defined border of V1/V2 (the vertical meridian representation) in the individual subject corresponded well with myelination boundaries in the group-averaged map (Figure 12E), within the central ~ half of V1, where both measures were available. In addition, the peripheral extent of checkerboard-driven activation in the individual map coincided with the peripheral border of V1 in the myelination map (Figure 12F). The peripheral extent of the checkerboard-driven activity spread slightly into adjacent areas, including presumptive V2 and the posterior portion of PPA. This spread of the checkerboard-driven activation was expected; previous studies have demonstrated that both V2 (e.g. Sereno et al., 1995; DeYoe et al., 1996; Engel et al., 1997) and PPA (Rajimehr et al., 2011) are strongly activated by flickering checkerboards.
As in the group map, RSC in this individual map was located immediately adjacent to the dorsal border of peripheral V1, thus occupying what would otherwise be the peripheral representation of V2. Also consistent with the group comparison, PPA was located adjacent to peripheral V2, at an eccentricity similar (or even more peripheral) to that of RSC.
Based on the translation of cortical maps across species, a presumptive macaque homologue of RSC should be located on the medial bank, in or adjacent to the parietal occipital (medial) sulcus (POm; Pitzagalis et al 2006). In at least one of the monkeys, we confirmed the presence of that scene-biased patch, bilaterally (Figure 13). As in human RSC, this presumptive macaque homologue of RSC (‘mRSC’) was small in size and low in amplitude, in response to the localizer used here. This small size and amplitude of RSC may explain why mRSC did not reach threshold in the n=3 group map (Figure 7C–D).
The correspondence between scene-selective regions in human and macaque cortex is diagrammed in Figure 14.
We found that scene-selective fMRI activity in ‘PPA’ was typically centered on the lips of the collateral sulcus and adjacent medial fusiform gyrus, rather than on the parahippocampal gyrus per se. This was borne out in our MRI data (Figures 4 and and5),5), and in a meta-analysis of the literature (Figure 6). This finding is also consistent with a few reports describing functionally equivalent regions on the collateral sulcus (Levy et al., 2004) or medial fusiform gyrus (Chao et al., 1999; Haxby et al., 1999).
The discrepancy in localizing ‘PPA’ cannot be easily attributed to differences in experimental design or stimuli, relative to previous localizers. Although the size of PPA varied according to the stimuli we tested, the peak location and the topography of this area remained remarkably constant, within a given set of subjects (Figure 5).
In two independent group-averaged cortical surfaces (n=17 and n=40; Figure 2), and in 20 of 24 human brains from autopsy (Figure 3), we documented that a shallow sulcus (the middle fusiform sulcus) subdivides the fusiform gyrus into two parallel branches: the lateral and medial fusiform gyri. This middle fusiform sulcus roughly divides the scene-responsive fMRI activity (on the medial fusiform gyrus) from face-responsive activity (on the lateral fusiform gyrus). Since that middle fusiform sulcus was not considered in the original report (Epstein et al., 1998), it remains true that ‘PPA’ is located on the gyrus immediately medial to ‘FFA’, in both the present and the original accounts.
We compared maps across species in the cortical sheet, using functional landmarks, without considering the cortical folding patterns. This approach has become standard (Van Essen et al., 2001; Orban et al., 2004; Sereno and Tootell; 2005), partly because gyri and sulci vary enormously across species. For instance, macaques do not have a fusiform gyrus. Even when similar cortical folds exist, homologous areas vary in location relative to the cortical folds across species. For example, the well-established direction-selective area MT/V5 is located in the superior temporal sulcus in macaque, but in the inferior temporal sulcus in humans.
Previously (Rajimehr et al., 2011), we presented evidence for mPPA in two individual monkeys. Here we confirmed that finding in seven animals, in two independent group averages. In all cases, mPPA was defined as a patch of scene-responsive activity (Figures 7 and and8)8) centered exactly where a macaque homologue of human PPA should lie, adjacent to the most prominent face patch (mFFA). In the folded brain this location is ventral and slightly anterior to the posterior middle temporal sulcus (PMTS). Area TEO is centered roughly on the PMTS (Boussaoud et al., 1991); thus mPPA apparently lies immediately anterior to TEO. Like human PPA, mPPA is elongated along the posterior-to-anterior axis (Figures 1, ,77 and and8).8). Thus by the local-neighborhood criterion, the human-to-macaque match is good. The more global comparison including areas much farther from PPA (e.g. anterior temporal lobe, the subiculum) may not match quite as well, consistent with the disproportionate expansion in some cortical regions in humans, relative to macaques (see Figure 14).
Our data (Figures 9 and and10)10) indicate that the human scene-selective region ‘TOS’ is actually centered on the nearby lateral occipital gyrus, rather than within its namesake, the transverse occipital sulcus. As shown previously (Tootell et al., 1997), the transverse occipital sulcus spans a different, retinotopically-defined area, V3A. Thus scene-selective ‘TOS’ should lie immediately anterior and lateral to retinotopically-defined V3A, in/near retinotopic human areas V7 (Tootell et al., 1998), V3B (Press et al., 2001) and/or LO-1 (Larsson and Heeger, 2006). That conclusion was confirmed here in six hemispheres (Figure 10), consistent with earlier illustrations in two hemispheres (Levy et al., 2004), and one of two hemispheres in Spiridon et al. (2006).
Macaque cortical maps showed a corresponding cluster of scene-selective patches in dorsal occipital cortex (‘mTOS’; Figures 1 and and11).11). As in human TOS, mTOS includes the area anterior to macaque V3A (i.e. area V4d). In macaques, mTOS also extends posteriorly into V3A (Figure 11), depending on how V3A is defined.
This possible posterior extension of mTOS in macaques (relative to humans) does not rule out the assumption of homology, because incremental changes occur naturally as cortical maps evolve across species. Moreover, if there is an inter-species shift in (m)TOS relative to V3A, this has a precedent in the existing literature. In humans, V3A shows high motion selectivity (Tootell et al., 1997). However in macaques, higher motion selectivity is instead found in area V3 (Van Essen et al., 1990). To the extent that mTOS includes V3A, the region of high scene selectivity would thus be located adjacent and anterior to the region of higher motion selectivity (Figure 11), in both humans and macaques. That is, both functional properties (sensitivity to motion and sensitivity to scenes) would have shifted by a single area.
A third scene-responsive area was named RSC, with reference to the architectonically defined RetroSplenial Cortex (areas 26, 29 and 30 of Brodmann, 1909). However, Brodman’s report of small cytoarchitectonically-defined areas located posterior to the splenium (i.e. BA 26, 29 and 30) was not confirmed by subsequent anatomists (e.g. Economo, 1929; Bailey and Von Bonin, 1951), nor was an analogous area reported in macaque (Brodmann 1909). More importantly, the location of Brodmann areas 26, 29 and 30 does not overlap with the location of scene-selective ‘RSC’. Recently the original definition was blurred by widely broadening its borders (e.g. Epstein et al. 2007; Fenske et al., 2006) and/or the name itself (RetroSplenial ‘Complex’; Bar, 2007). In all of our data, scene-selective ‘RSC’ is a discrete region consistently located in the fundus of the parieto-occipital sulcus – roughly 1 cm from the original Brodman areas.
Surprisingly, we also found that RSC is located immediately adjacent to V1, in what would otherwise be the peripheral representation of dorsal V2. This was unexpected. Except for RSC, V1 is surrounded mainly by the second-tier cortical area V2. Thus RSC is quite unique: it is an apparently higher-tier area (e.g. Park and Chun, 2009) that nevertheless borders the two lowest-level areas in the cortical visual hierarchy (Van Essen et al., 1990). Functionally similar areas are often located near each other (e.g. area MT/V5 and surrounding direction-selective areas), presumably because such adjacency can shorten the more numerous cortical connections between functionally related areas. However, counter-examples can also be cited, in which adjacent areas are not functionally similar. The proximity of RSC with V1/V2 may be an example of the latter.
The topography of these three areas supports certain observations in the literature. First, Gattass et al. (1988) reported that V2 does not include a representation of the far peripheral visual field, unlike that found in V1. Such a retinotopic difference would ‘make room’ for RSC along the V1 border, as reflected in our data. Secondly, our data is consistent with evidence for an asymmetry in dorsal versus ventral V2 in macaques (Van Essen et al., 1984; Felleman and Van Essen, 1991).
An even more restricted representation of eccentricity has also been reported in area V3 (Gattass et al., 1988; Van Essen et al., 1984). As described above, such an arrangement would ‘make room’ for PPA, adjacent to V2 (e.g. Figure 12).
The present data reveals numerous complications in the current names for scene-selective cortical regions. The human regions are not centered on the gyri/sulci for which they are named, and the human names cannot be accurately generalized to homologous areas in macaques. The latter discrepancies arise commonly in cross-species comparisons, because different species develop different sulci and gyri.
Above, we used the original names for the scene-selective regions, for historical continuity. However in Figure 14, we proposed a simple alternative naming scheme that would remain accurate across both humans and macaques. In the new scheme, regions PPA, TOS and RSC are re-named VS, DS and MS, respectively (for Ventral, Dorsal and Medial regions of scene-responsivity). Corresponding regions in humans and macaques would be distinguished using the prefix ‘h’ or ‘m’, yielding hVS, hDS, hMS in humans, and mVS, mDS and mMS in monkeys.
The demonstration of scene-selective regions in macaques enables future experiments using classical neurobiological techniques, to reveal common neural mechanisms underlying scene processing. For instance, what are the functional properties of single units in each of these scene-selective patches? Do the different scene-selective regions share specific neural connections with each other, and/or with higher-level brain regions implicated in place processing (e.g. hippocampus via entorhinal cortex), and/or spatial navigation (the dorsal stream)? An analogous proliferation of knowledge about neural mechanisms followed the demonstration of ‘face-selective’ patches in macaques based on fMRI (Tsao et al., 2003) – which were prompted in turn by fMRI studies on face-selective patches in humans (Kanwisher et al., 1997; Haxby et al., 2000; Rajimehr et al., 2009). Hopefully the current study will serve a similar purpose.
This work was supported by the National Institutes of Health (NIH Grants R01 MH67529 and R01 EY017081 to RBHT, the Martinos Center for Biomedical Imaging, the NCRR, the MIND Institute, Shared Instrumentation Grants 1S10RR023401, 1S10RR019, and 1S10RR023043, and the NIMH Intramural Research Program.