We have the remarkable ability to recognize thousands of visual objects in our daily environment, such as faces, bodies, cars, keys, shoes, animals, food, tools, and houses. Despite its complexity, visual categorization is executed rapidly and effortlessly by the human brain. These computations appear to be mainly carried out by the ventral visual pathway
[1],
[2], through neurons with increasingly larger receptive fields, responding to increasingly complex features of the stimuli as one moves up within the hierarchy. The physical features of the input image are generally assumed to be first extracted in lower-level cortical areas (i.e., V1, V2, V4)
[3],
[4] before they are projected to higher-level regions in the occipito-temporal cortex where complex patterns are processed
[5]–
[8] and a visual representation of the input image is formed
[9].
Functional neuroimaging studies (e.g. positron emission tomography, functional magnetic resonance imaging (fMRI)) in humans have examined the higher-level cortical regions involved in the visual perception of different objects. Faces
[10]–
[14], bodies
[15]–
[17], animals
[18],
[19], houses
[20]–
[22], tools
[18],
[19] and letter strings
[23]–
[25] have been shown to selectively activate focal regions of cortex. While the location of areas involved in object processing has been widely studied, the sequence and timing of activation of these areas is less well known. The long-held general assumption is that at least during the first 100-ms complex visual stimuli are generally processed in the same low-level areas
[26], and that category-specific cortical activation occurs at later stages.
For instance, intracranial recordings in patients have shown that during face perception the well-established face-selective area of the fusiform gyrus becomes strongly activated around 170-ms after stimulus onset
[24],
[26]–
[28]. This time course is corroborated by a prominent face-selective component around 170-ms
[29] in recordings of electrical (EEG) and magnetic (MEG) brain activity from the scalp in healthy volunteers, labeled the ‘N170’
[30] in EEG studies or ‘M170’ in MEG recordings.
However, this traditional model of object perception is challenged by recent psychophysical and electrophysiological findings suggesting that visual categorization processes may already take place at even earlier latencies
[31]–
[35], i.e. around 100-ms post stimulus onset. Thorpe and colleagues
[32],
[35] found evidence for rapid visual categorization (the detection of animals versus non-animals in natural images) taking place within the first 100–150-ms after stimulus onset. In addition, category-specificity has also been claimed for an earlier component that peaks around 100–120-ms after the onset of a visual stimulus in posterior sensors in EEG or MEG recordings, labeled the ‘P1’ and ‘M100’ component respectively, or of even earlier activity (30–110-ms post-stimulus). Most of these interpretations are however heavily debated, as they were either based on inter-categorical comparisons
[36]–
[39] which suffer from serious low-level confounds
[40], or on old-novel distinctions which may signal general repetition effects rather than face-recognition per se
[41]–
[44]. More convincing evidence for rapid face categorization was nevertheless provided by two studies free from low-level stimulus confounds. Liu and colleagues
[33] found that the M100 component is sensitive to the successful detection of faces embedded in noise. In addition, Debruille et al. found early differential responses between carefully matched photographs of known and unknown faces around 100-ms at frontocentral and centroparietal sites
[45].
The neuronal underpinnings of this proposed early phase of visual categorization analysis remain however a puzzle. Reports on the neuronal origin of the P1/M100 response to faces have been inconsistent, as sources have been found in the retinotopic cortex of the medial occipital cortex
[36],
[46], posterior extrastriate cortex
[47],
[48], but also in high-level visual cortex of the mid-fusiform gyrus
[47].
We took advantage of the high temporal resolution of magnetoencephalography (MEG) combined with its relatively good spatial resolution to investigate whether category-specific cortical activation may already take place within the first 100-ms. Early visual electrophysiological responses are known to be extremely sensitive to the physical attributes of the stimulus
[40]. To avoid these low-level visual confounds we did not contrast the responses to different stimulus categories directly, but instead examined inversion effects, by comparing the differential responses between the upright and inverted presentation of three different stimulus categories, faces, bodies and houses. While physical stimulus properties remain unchanged, this simple procedure of stimulus inversion induces a large shift in the way some object classes are perceived, a phenomenon known as the inversion effect
[49]. This is presumably due to the fact that presenting stimuli in non-canonical orientation interferes with the normal configural processing
[50].
For the present experiment, we selected faces as a stimulus category because they show a strong behavioral
[49]–
[53] and electrophysiological inversion effect, and the face-sensitive cortical areas have been carefully mapped out. EEG and MEG recordings have yielded a robust neurological correlate of the face inversion effect, namely a delay and enhancement of the N170/M170 component
[30],
[46],
[48],
[54]–
[58]. It is therefore commonly assumed that the extraction of the overall stimulus configuration takes place during this stage. However, consistent with the afore mentioned evidence of categorization taking place around 100-ms after stimulus onset
[32],
[33], there is now a growing number of observations of an earlier electrophysiological inversion effect occurring around 100–120-ms after picture onset for the P1/M100 component
[48],
[57],
[59].
Bodies were chosen as a second type of biologically salient stimuli with strong configurational properties and a special perceptual status somewhat similar to that of faces
[60]–
[63], similar electrophysiological correlates like the elicitation of an N170 component
[61],
[64]–
[66] and an N170 inversion effect
[64] and a partially common neuro-anatomical substrate
[16],
[67]–
[69]. Houses were used as an example of a non-biological stimulus class with a clear canonical orientation but without a strong behavioral inversion effect
[50],
[63].
Participants viewed photographs of faces, bodies and houses presented in either their upright or inverted orientation or in a Fourier phase-scrambled version, and were asked to classify them accordingly, i.e. as upright, inverted or scrambled (see for examples of stimuli and the experimental paradigm). We used magnetoencephalography and anatomically-constrained distributed source modeling
[70] to monitor brain activity with millisecond-resolution in order to examine early category-specific cortical activity related to the inversion effects during the M100 stage of visual processing. Our results show that category-specific processing in high-level category-sensitive cortical areas already occurs during the first 100-ms of visual processing, much earlier than previously thought, hereby shedding a new light on the early neural mechanisms of visual object processing.