In this experiment we found that, besides P1 response (80-120 ms) over the lateral-occipital areas, attention to both spatial and non-spatial features was able to modulate early sensory processing, as indexed by ERPs, at a latency and with a topography consistent with the earliest visual C1 response (60-100 ms) over mesial-occipital areas. This earliest mesial activity, relative to the first time window (60-80 ms), showed an increase in negativity for shape relevant stimulus pairs independent of the location relevance. It must be admitted that we cannot be absolutely certain that these earliest attention effects (60-80 ms) truly reflect the previously reported C1 having its origin in the occipital calcarine fissure and inverting in polarity with stimulation of the upper and lower occipital cortical banks of this fissure, since, rather than having followed this stimulation mode, our lateralized stimuli fell centered over the horizontal meridian of the visual field and extended within both the upper and lower quadrants of the left or right sides of the visual field. Despite this potential caveat, we believe that this earliest attentional modulation may actually reflect true C1 effects. Indeed, notwithstanding the uncrossed stimulation mode, there is a remarkable consistency between the precocious increase in negativity in the present study and the increase in negativity of the earliest C1 sensory response (N80, 40-80 ms) for attended than unattended spatial frequency gratings presented across the four quadrants of the visual field found by a recent report [
32]. A LORETA source inverse solution performed on the difference wave obtained subtracting frequency irrelevant from frequency relevant stimuli also identified the active sources of the early attention effects in the visual primary cortex (BA17), the lateral occipital area (BA19), the superior parietal lobule (BA7), and various dorsolateral prefrontal regions. In the light of the consistency between the present earliest object-attention effects and the previous feature-related C1 attention effects in the crossed quadrants study, we believe that, rather than bring confounds to the debate on early attention modulation, the present results also add on to previous findings of C1 attention effects in the literature by providing evidence that, besides stimulus features, attention to more complex targets too, such as B/W drawings of familiar objects, can modulate the activation of the striate cortex at the earliest level, thus probably enhancing the processing of objects local details enabling the precocious selection.
The present data also showed an earliest effect of location relevance on sensors activity, in terms of an increased positivity (P80) to location relevant pairs, independent of their shape relevance or semantic category, starting as early as 60 ms post-stimulus, or earlier. It is interesting that despite the use of complex shapes, this finding is highly consistent with previous findings by both our own [
29,
30,
32], and other research groups [
23,
28] using tasks requiring a voluntary allocation of spatial attention and simple stimulus features administered across the horizontal meridian. The early timing of this attentional modulation with reference to the post-stimulus onset is consistent with a source in the sensory occipital cortex, here unfortunately not directly corroborated by the use of any source reconstruction procedure. It is also worth noting that, despite the consistency of the present spatially-based C1 modulation with that found in a study by Fu et al. [
20], at least in terms of early timing, the involuntary (or reflexive) nature of spatial attention allocation requested by the task used in that study opens some questions about this consistency, and the differential influences of voluntary and involuntary allocation of spatial attention on early visual sensory processing, which will have to be answered by means of further research.
It is interesting to note that shape and location relevance manifested in different polarities as the attentional modulation was concerned: paying attention to object features increased the negative voltage, whereas paying attention to spatial location increased the positive voltage of both C1 and P1 components. These findings support the notion that the space-based and the object-based attentional mechanisms are partially anatomically and functionally segregated [
54,
55]. In agreement with previous findings [
34,
39,
30,
56,
30], the present data bring to light an intriguing dissociation: indeed, the ascending part of C1 component was more sensitive to shape relevance
per se at the mesial occipital sites, closer to the visual striate area, while the descending portion of the same component was more sensitive to the combined interaction of the two features, as indicated by the interaction between shape and location relevance. And, indeed, it is possible that the initial portion of C1 component might have a stronger striate component, whereas the second, later-latency portion of C1 might reflect the later contribution of the extrastriate visual cortex, which is known to generate the P1 response, and be responsible of its space-based modulation [
57,
19]. Overall, these findings suggest that visual selective attention is able to modulate neural processing of object features independent of spatial processing, leading to the conclusion of a neuro-anatomical distinction of the 'what' and 'where' neural pathways [
58]. A conclusion that is strongly supported also by empirical evidence of a significant interaction between voluntary visuospatial attention and perceptual load at target discrimination processing level, reflected at the scalp surface by N1 component, as a result of the activation of brain temporoparietal-occipital (TPO) junction [
19].
An important theoretical issue to consider here is that our present findings clearly established that, as a result of the conjoined spatial and non-spatial attentional selection required by our task, visual cortex responsivity (including V1 activation) was cued to enhance the analysis of target shape attributes at both the relevant and irrelevant space locations, while simultaneously allocating spatial attention to the relevant target location, from the earliest post-stimulus processing time. This run counter to the traditional views of visual processing regarding spatial attention as having a special status, and spatial localization of visual input as preceding any feature- or object-selection carried out for the selective exploration of the outer world. At this regard, however, there has been an accumulation of empirical evidence from visual brain studies in the last two decades that supports the view that, on the one hand, the two selection mechanisms may operate in parallel right from the earliest levels of analysis, rather than being preceded by a space selection, and, on the other hand, feature-directed attention might directly precede attention to location, as repeatedly demonstrated in visual search studies. Indeed, there is a straightforward consistency between previous findings of our own research group in studies aimed at investigating neural mechanisms of spatial and non-spatial features (i. e., spatial frequency) conjoined selection [
25,
29,
32], and the parallel processing view. Furthermore, evidence from single unit studies in monkeys demonstrated that object or feature-selection, rather than being preceded by a space selection, is centred "on line" on precise spatial coordinates allowing an "object-based space selection" [e. g., [
59]]. Additional single unit evidence by Motter [
60] and Treue and Martinez-Trujillo [
61] have indicated that attention may be allocated to non spatial features in a location independent manner. This finding has been confirmed in humans by means of functional magnetic resonance (fMRI) imaging by Saenz et al [
62]. As for the view of a possible precedence of feature-selection over the spatial selection, most fascinatingly combined event-related potentials (ERP) and event-related magnetic fields (ERFM) indexes of activations to task relevant features, starting about 140 ms after stimulus onset, independent of location relevance, have been reported in human volunteers performing a visual search task in which the spatial distribution of non-target items with relevant features was varied independently of the relevance of the location of the target. These activations were followed by a later lateralized response (the so called N2pc component) reflecting the deployment of attention to target location, which began at about 170 ms after stimulus input [
63]. More recently, the measurement of N2pc during visual search tasks also revealed that there were functional differences in the deployment of attention to objects and space locations as a function of object-related structural conformation of space location, in that attention was shifted to a cued location in anticipation of a target shape when the location was marked by a placeholder object, whereas it was not when these cued locations were devoid of the placeholders, thus indicating a deployment of attention directly to objects [
64].
It must be said that our C1 effects are somewhat earlier than both the feature-related response, starting at about 140 ms, and the N2pc effects (starting at 170 ms). However, it seems plausible that, inasmuch as the visual search entails a larger, time-consuming set of neural processes underlying stimulus features detection, preceding stimulus selection and recognition, than our own conjoined selection task, and, inasmuch as ERPs can properly index the timing of neural processing, our earlier-latency effects may possibly spring from the lack of any previous search of object features location to comply with the spatial and non-spatial features conjoined selection task.
A further point deserving discussion concerns our findings of independent effects for object-features and spatial location selection at the earliest C1 rising time range (i. e., 60-80 ms), and of interactive effects of these conjoined features, already starting in the C1 peak and/or descending time range (i. e., 80-100 ms) and increasing as visual input attentional processing progressed in time, as reflected at the scalp by the later-latency ERP components. Most fascinatingly, these findings seems to indicate that features-conjoined selection is obtained as a result of concurrent operations of multiple, narrowing levels of task-related attentional selectivity having distinctive properties and based on a progressively greater amount and a better quality of overall information about relevant input features, some of which at an higher order level than the C1 were here mirrored by the trend of the relatively-late latency and late-latency components.
At the N1 level, the attentional effects not only reflected location-relevance, as reported by previous literature [
65,
29], but also object-features effects. The latter effects showed to be characterized by complex attention-related hemispheric asymmetries. In fact, at the right-hemisphere, N1 was, overall, of greater amplitude for both the congruent shape-relevance (S+ and S-) conditions than for the mixed one (S+/-), thus plausibly reflecting the role of the right-sided ventral stream in the categorization of familiar shapes [
66] independent of attention selection, consistently with the findings of a previous ERPs study [
45] showing that the categorization effects between homomorphic, animal entities with faces and legs, and artefacts emerged at ~150 ms (N120-180 ms) over the right occipital-temporal sites. Conversely, at the left hemisphere an attentional selectivity between the relevant (S+) and irrelevant (S-) shapes was evident at the mesial- and lateral-occipital leads. Compatibly with the earlier left-sided P1 shape-selection effects observed, the present N1 hemispheric asymmetries in attentional selectivity seem to suggest that the left-hemisphere carries out a sharper sensory-perceptual selective processing across attentional relevance conditions than the right hemisphere. This mesh closely with the view of a predominance in attention selectivity of the left-hemisphere [e.g., [
67]], in line with the accepted cognitive model of the latter hemisphere having a narrower attention focus and a more analytic attention strategy [
68,
69].
Most probably, the late positive (LP) component, reaching its maximum amplitude over the parietal sites, reflected the highest level of combined object and spatial processing, in terms of target selection and awareness, as well as decision making processes. This component is likely to reflect stimulus categorization processes and the attentive effects due to the interaction between shape and location relevance. The lack of any attentional modulation for shape-relevant stimuli at the irrelevant location seems to point out that outside the focus of spatial attention irrelevant stimuli are suppressed before being processed at the highest cognitive level [at this regard see [
29]].
The finding that location relevance affected LP amplitude at parietal sites fully meshes with both the classic electrophysiological reports of a larger late positive complex (LPC) to attended than unattended spatial targets [e. g., [
70,
29]], and the parietal activations shown by blood-flow studies during the covert shifting of visuospatial attention [e. g., [
71]]. However, activation of this same area has also been found in a feature conjunction search task [
72], and in a divided attention task involving global and local processing [
73], suggesting that this region is involved in more than shifting attention to a space location. In fact, it has also been demonstrated that the right superior medial parietal cortex is involved in overt and covert attention tasks of object- and space-based interactions [
74]. In agreement with these findings, our results of the N400 and LP components indicate an interaction of space and object feature processing over parietal and central cortex. In particular, N400 component may be possibly conceived as a mismatch response sensitive to the processing of semantically incongruent stimuli [
75].
There are other important theoretical issues to be considered here. Although the differences found across the various shape-relevance conditions at C1 level in no way index object categorization processes
per se, these conditions being simply a reflection of the different degree of their attentional saliency, it has to be considered here that, besides in attentional-relevance, the distracter pairs (S+/-) also differed somewhat in terms of stimulus-category features from those of both the salient (S+) and inconspicuous (S-) attention conditions. In our view, then, it is reasonable to believe that the significant differences in neural processing levels found within the relevant location between the salient condition (L+S+) and the distracting one (L+S+/-), besides the inconspicuous one (L+S-), might indicate that the visual system is somehow able to distinguish, at a first basic, unconscious level, between images of different semantic categories already at the earliest sensory processing level. Indeed, objective evidence in the literature seems to support this claim. On the one hand, man-made categories have more energy in 'cardinal' (i.e., vertical and horizontal) orientations compared to natural categories [
76]. On the other hand, animals have been indicated to be more 'homomorphic' (i.e., they all have heads and eyes that are generally round, and legs) than artefacts, that tend to contain more rectilinear strokes [
77]. Additionally, and most importantly, faces and man-made objects naturally vary in their Fourier spatial frequency amplitude spectrum (AS), with a steeper spectrum decrease for faces compared to natural images [
78]. Consistent with the latter evidence, proofs have also been provided that rapid image recognition can be biased by simply priming the amplitude spectrum information [
79,
80]. All in all, these indications strongly support the view that the human brain may be intrinsically tuned to this low-level information characterizing faces and body parts, thus facilitating rapid 'homomorphic' traits detection.
In line with all these indications, it is not unlikely that, despite the compensation for average luminance, size and other visual features, our animal and artefact categories too, besides the distracter pairs, still differed from each other in all the aforementioned basic features, but, in all prospect, mostly in their spatial frequency amplitude spectrum. It is possible, then, that these basic informational differences between natural and man-made objects
per se, may account for our early C1 effects for animals Vs artefacts discrimination. At these regards, far from being surprising, the attention effect found as earliest as 60 millisecond in C1 is, in our view, absolutely consistent with the findings of earliest attentional effects by both our own [
29,
30,
32] and other groups' previous studies involving single features, such as spatial frequencies, spatial attention, emotional faces, etc., differing between a target and distracters.
Most importantly, strong support to the aforementioned claims derives from most recent ERPs experimental evidence. Indeed, to investigate how fast the human brain categorizes faces in comparisons to other visual stimuli, Rossion and Caharel [
81] asked a sample of volunteers to discriminate between pictures of faces and cars, presented in both their intact and phase-scrambled versions, counterbalanced for luminance and other visual features. The authors found discriminative effects - a larger response to pictures of faces than cars independent of shape versions - at latency stages earlier than 100 ms (80-100 ms), indicated by them, in their own terms, as "a very early P1 level". However, a later N170 component also showed to be larger for faces than cars, but for the intact shape versions only. Overall, they explained their early-latency P1 effects to faces as a brain response to low-level visual cues, namely the steeper Fourier amplitude spectrum (AS) of face images indicated by Keil [
78], and their N170 effects as a scalp-recorded reflection of a true face perception or categorization stage.
That the visual system might give signs of distinguishing between stimulus categories at the earliest 60-80 processing latency range, as we found, does not absolutely mean that it has reached the complete recognition of the different stimulus categories at conscious level at this processing time. Quite on the contrary, as supported by Rossion and Caharel [
81], besides our own findings, it may simply indicate that the system has begun the selection of salient perceptual information at an 'entry' level required, as a prerequisite, for detecting, identifying, and categorizing objects by means of different perceptual decisions, the latter being very probably based on different, successive levels of accumulation of salient information and different time scales. This would be consistent with both the views that basic-level categorization is an entry level of processing that precedes stages of categorization at other levels [
82] very likely carried out through feedforward processing [
48], and that conscious perception is possible only with recurrent processing of the stimulus input as advanced by Roelfsema and colleagues [
83,
84]. Indeed, counter to traditional views of object visual processing evidence has stemmed that our visual system is able to categorize images of natural visual scenes at remarkable speed [
85]. At this regard, Kirchner and Thorpe [
86] have most impressively demonstrated that the participants of their study were able to perform a speeded saccade toward one out of two pictures presented, which contained an animal target, as fast as 120 ms post-stimulation. On the other hand, counter to traditional views of object detection and categorization, a parallel line of research over the past few years has also shown that objects can be often successfully detected without being successfully categorized [e. g., [
87,
88]]. Besides, other studies have shown that stimulus material manipulations, as stimulus inversion and image degradation, impair categorization at a more basic-level but not object detection [e. g., [
89]]. In addition, and most fascinatingly, several recent studies have also demonstrated effects of feature-based attention on the processing of stimuli of which the participants where not at all aware [e. g., [
90,
91]], in line with the view proposed by different sources that different neurophysiological processes are underlying attention and awareness [e. g., [
92,
93]].
It is worth noting that both our present earliest C1 and P1 shape-relevance effects, and the pre-100 ms P1 face effects observed in ERPs by Rossion and Caharel [
81], which are in all likelihood directly related to the low-level visual cues of the stimuli [
76-
78], and most probably their Fourier AS information [
78], meshes closely not only with the timing of the fastest saccadic behavioural response, but also with the differences observed between the detection and categorization processes, as well as between the neurophysiological processes underlying attention and awareness. For truth's sake, however, it must be reminded that, notwithstanding these intriguing compatibilities, the whole of previous ERP studies indexing the timing of shapes categorization has indicated category-divergence effects at the relatively later level of N1 (i. e., 140-160 ms) component [
43-
45].
A further source of evidence supporting our present claims concerns our behavioural findings. Indeed, no matter their category difference, our participants' motor response time to both animal and artefact categories occurred not only at a much later time than C1 latency range, but also somewhat later than both P300/N400 and LP processing levels too. Moreover, participants' response errors (FAs) almost exclusively concerned distracter pairs at the relevant spatial location (L+S+/-). Overall, this seem to further support the viewpoint that the significant effects at the C1 processing level might reflect a first basic-level selection of object salient features and spatial localization used to drive a perceptual decision process to which the relative later timing of motor responses, indexing target true conscious recognition and categorization, would be related. These later processes would be based on the availability of a greater amount and a better quality of perceptual evidence [see [
87] for a review advancing such theoretical hypothesis]. Despite the consistency with all this evidence from different lines of research on visual attentional and perceptual processing, we want anyway to point out that these claims must be confirmed by further research.
A most important matter also deserves to be discussed. Indeed, there seems to be a close consistency between the present C1 shape effects and the C1 feature effects of a previous study of our group, which, using different spatial-frequency gratings presented in the four quadrants of the visual space during a spatial and featural conjoined-selection task [
32], showed that these C1 feature effects had a source in the primary, besides the secondary, visual areas. In the light of the aforementioned consistency, we are akin to advance the intriguing hypothesis that there may be a similar involvement of the primary visual cortex in the shape-selection-related C1 effects obtained in the present study. Our hypothesis seems to borrow strong support from some recent influential evidence in the literature. Notably, single cells recordings in macaque V1 have undoubtedly demonstrated that the neuronal micro-networks of this low-level occipital area not only may actively merge the line and edge components of the visual scenes into perceptually unified wholes [e. g., [
94]], but are also involved in the top-down gating of horizontal connections of this area through feedback projections inhibiting some sets of lateral interactions and/or activating others during geometric selectivity, rather than a simple gain control or a simple reflection of higher sensory processing [e. g., [
95]]. Most interestingly, these distinct selectivity patterns between the task conditions would begin to develop between 70 ms and 120 ms after stimulus onset, and would reach maturity between 110 and 160 ms, the latter latency ranges being pretty consistent with our ERPs earliest- and early latency effects.
In summary, our data provide evidence of an early modulation of brain activity (~ 60 ms) over the mesial and lateral occipital cortices for both location and shape attentional relevance. While, on the one hand, target processing increased brain bioelectrical activity, which resulted as an increased N80 response for shape relevant pairs at mesial occipital sites (striate cortex), and as an increased P80 for location relevant stimuli at lateral occipital sites, on the other hand, non-targets (both distracters, but especially irrelevant pairs) elicited a decreased neural response, and stimuli falling outside the attentive focus were ignored at the highest cognitive levels immediately preceding the motor response.
All in all, this is one of the first pieces of experimental evidence in humans indicating that, besides other brain occipital areas, V1 area may also be directly involved in object selection. This involvement would start since the earliest post-stimulus processing latency, by contributing with the analysis of basic information (possibly curved vs. straight lines, presence of little circles, etc.), and most of all, possibly, of the Fourier spatial frequency amplitude spectrum, thus suggesting that visual attention can start modulating visual processing to a much earlier stage than previously thought [
4].