|Home | About | Journals | Submit | Contact Us | Français|
The visual features of an object are processed by multiple, functionally specialized areas of cerebral cortex. When several objects are seen simultaneously, what mechanism preserves the association of features that belong to a single item? We address this question—known as the “binding problem”—by examining combinatorial feature selectivity of neurons in area V2. In recording from anesthetized macaques, we estimate that dual selectivity for chromatic and spatiotemporal attributes is 50% more common (27% vs. 18% sampling frequency) in superficial and deep layer neurons receiving feedback connections from higher areas, compared with layer 4-3 neurons relaying ascending signals. The operation of feedback pathways is thought to mediate attentional modulation of activity that may achieve binding through acting to select one single object for higher representation and filtering out competing objects. We propose that dual-selective neurons perform a “bridging” function, mediating the transfer of feedback-induced bias between feature dimensions. The bias can be propagated through V2 output neurons (of unitary selectivity) to higher levels of specialized processing and so promote selection of the target object's representation among multiple feature maps. The bridging function would thus act to unify the outcome of parallel, object-selective processes taking place along segregated visual pathways.
Area V2 of the macaque monkey, like area V1, processes all basic visual attributes, such as color, motion, shape, and depth. It is known for its modular organization, a cyclic series of stripes (revealed by staining for cytochrome oxidase [CO]) that segregate modular inputs from V1 and relay them to higher visual areas with comparatively narrower realms of specialization (DeYoe and Van Essen 1985; Shipp and Zeki 1985; Sincich and Horton 2005). The stripes are expressed in the tangential dimensions of the cortex and are evident within each layer. As initially predicted (Shipp and Zeki 1989), the regular juxtaposition of such functionally specialized modules within V2 affords considerable potential for integrative interactions, either to enable cue invariance (e.g., contours defined by motion, color, or luminance contrast) or “binding” of attributes (Roe and Ts'o 1995; Gegenfurtner et al. 1996; Shipp and Zeki 2002a). Evidence of integration is provided by the presence of dually selective neurons (e.g., combining selectivity for color and direction of motion). Indeed, some studies have concluded that there is an essentially random combination of attribute selectivities at the single-unit level (Burkhalter and Van Essen 1986; Gegenfurtner et al. 1996; Friedman et al. 2003). This might appear paradoxical—how can free association of color and spatiotemporal properties be consistent with the segregation of V2 into specific, functionally specialized modules (DeYoe and Van Essen 1985; Shipp and Zeki 1985, 1989, 2002a; Hubel and Livingstone 1987; Tootell and Hamilton 1989; Roe and Ts'o 1995; Ts'o et al. 2001; Tootell et al. 2004)? To resolve this issue, we set out to test the idea that association of properties might be related to laminar organization. It is notable that the physiological characteristics of CO stripes are more distinct in the middle layers (3 and 4) than in the superficial and deep layers (1, 2, 5, and 6) (Shipp and Zeki 2002a)—a fact that tallies anatomically with a de-emphasis of modular organization in the distribution of “feedback” to these layers of V2. To be more specific, the feedback to V2 from areas such as V4 and V5/MT is found to be not only concentrated within the set of stripes that acts as the source of “ascending” input to each area but also distributed more diffusely across the intervening territory between the source stripes (Shipp and Zeki 1989; Zeki and Shipp 1989). A comparable account has been given in respect of feedback from area MT to area V2 in the new world Squirrel monkey (Krubitzer and Kaas 1989). Given both these structural and functional indications for “demodularization” of feedback, the clear prediction is that dual-tuned units should be relatively more frequent in the feedback layers.
Outside of V1, physiological differences between cortical layers are poorly understood (Shipp 2007). The uniform laminar organization across sensory cortex is known to form the structural basis of ascending (feedforward) and descending (feedback) patterns of cortical connections (Rockland and Pandya 1979), permitting a hierarchical analysis of the interactions between areas within each of the visual, auditory, and somatosensory systems (Felleman and Van Essen 1991). Ascending connections arise with greatest density from layer 3B and target layers 3 and 4 of an area higher in the hierarchy. Descending connections originate mainly in layers 5 and 6, with some contribution from layers 2 and 3A, and their terminals concentrate in layers 1, 5, and 6, with minimal density in layer 4 (Felleman and Van Essen 1991; Rockland 1997). The ascending and descending inputs to an area are, thus, deployed in a basically complementary fashion. However, their laminar segregation is not absolute, and further integration is achieved via translaminar intrinsic axons and by the fact that dendrites (particularly the apical dendrites of pyramidal neurons) can traverse several layers. The microanatomy thus implies that any subdivision of recording data into layer zones will inevitably be subject to mutual contamination. Our designation of “middle” and “feedback” layer samples (3 and 4 vs. 1, 2, 5, and 6) was intended to maximize the separation of ascending and descending influences, while anticipating some irreducible overlap of physiological characteristics. Nonetheless, statistical analysis of the relative frequency of dual selectivity confirmed a significant sampling difference between the 2 groups, suggesting 1) that combinatorial properties of neurons are a novel and useful index of laminar characterization and 2) their distribution may be related to the interplay of the ascending and descending pathways.
If the hypothesis regarding the laminar location of dual cells, as outlined above, can be derived from exclusively anatomical and physiological criteria, the motivation for exploring it has a more cognitive flavor, relating to theories of visual feature binding. The structure of the paper is therefore a focused account of the laminar location of dual cells, with a functional interpretation drawing upon current models of feedback, attention, and feature binding (Duncan et al. 1997; Treisman 1998; Reynolds and Desimone 1999; Deco and Rolls 2004; Spratling and Johnson 2004; Hamker 2005).
Data from 15 male, juvenile cynomologus macaques (Macaca fascicularis) contributed to this study. All procedures were in accordance with UK legislation under the Animals (Scientific Procedures) Act 1986. Animals were prepared for intracranial recording under anesthesia, with neuromuscular blockade to minimize residual eye movements, as described previously (Adams and Zeki 2001; Moutoussis and Zeki 2002). Extracellular recordings of monocularly driven activity were made with glass-coated tungsten microelectrodes, aimed to achieve either a short radial or a long tangential trajectory through V2 cortex within the lunate or parieto-occipital sulci (Shipp and Zeki 2002a). Recording sites were spaced at roughly 100 μm intervals, where possible. Spikes were gated by amplitude using a window discriminator (Neurolog) and counted at 20-ms bin width. The largest spikes were routinely selected, but the requirement for regular, unbiased sampling (i.e., maximizing data volume) took priority over the sedulous isolation of single units. Stimulus parameters were integrated with the spike data and receptive field location to produce online peristimulus time histograms. Multiple penetrations were made over a period of 4–5 days.
At the termination of recording animals were given a lethal dose of pentobarbitone and immediately perfused transcardially. Where tangential recordings were restricted to area V2, the occipital operculum was flattened and sections cut in the plane of cortical lamination. Electrode tracks were reconstructed from digital images of CO or Nissl-stained sections. The locations of electrolytic lesions and the boundaries between cortex, white matter, and sulci were used to scale recording depths, with local correction for uneven tissue shrinkage. Recording sites were assigned to 1 of 10 layer zones (1, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, and 6) as described previously (Shipp and Zeki 2002a), where intermediate zones are scaled proportionally between the architecturally recognizable layer boundaries. These include, minimally, layers 1\2, 3\4, 4\5, and 5\6, each of which is visible in both types of stain. For the purposes of this report, it is the 2\3 and 4\5 borders that are most significant as they define the boundaries of our middle and feedback layer zones. Although the demarcation of the 4\5 border by cellular architecture and connectivity is relatively sharp, that of layer 2\3 is not, and relies on the scaling procedure. However, because the division between ascending and descending influences in the superficial layers is also indistinct, the direct architectonic determination of this border is not crucial.
Computer (Amiga 2000)-generated bar stimuli were presented on a 19-inch Grundig BGC155 color monitor, placed 114 cm from the animal, at a refresh rate 50 Hz. The operator estimated optimal settings (bar dimensions and color) for manual receptive field plotting. All reported units were tested for orientation/direction and color preferences, typically in that order and with 3 trials of each stimulus variant. A small minority of units (10%) with noisier activity were tested with 4–8 trials. Tests were repeated if it transpired that prior procedures had employed suboptimal fixed parameters.
The axis of motion was orthogonal to bar orientation at 30° (or, rarely, 45°) intervals through a range of 360°. The standard speed in cardinal directions (0°, 90°, 180°, and 270°) was set to 1 pixel per frame (2.5 °/s) and adjusted if required. Speeds were slightly faster in oblique directions (e.g., by a factor of √2 for 45°, 135°, 225°, and 315°) as constrained by a pixelated display.
Six test colors—red, green, blue, yellow, magenta, and cyan—were selected from the 12-bit (4096 hues) display for minimal luminance difference to mid-gray by the minimum motion technique using human observers (Anstis and Cavanagh 1983). The CIE 1930 x, y coordinates were R=0.60, 0.35; G=0.31, 0.59; B=0.15, 0.068; Y=0.41, 0.51; M=0.33, 0.18; C=0.22, 0.30). These 6 hues plus mid-gray, white, or black formed the standard test set, displayed against either a white or black background with contrast exceeding 60% or 90%, respectively. Small departures from isoluminance among the 6 test hues may have been compounded by differences in spectral sensitivity between humans and macaques (the latter are less sensitive to long wavelengths; Dobkins et al. 2000); however, differential response to luminance contrast among the test stimuli should be minimized by the uniformly high background contrast that would act to saturate any response based on luminance contrast (for analysis of this proposition, refer to Section 1 of Supplementary Material).
These attributes were set to approximately optimal values but were not systematically assessed. Stimulus bars varied in length from 0.3° to 17° (median 3.2°) and in aspect ratio from 1:1 to 150:1 (median 9:1). Bar speed varied from 1.1°/s to 6.0°/s (median 2.5°/s).
The strategy of the analysis was 1) to divide all recording sites into binary categories—that is, “selective” or “unselective” with regard to the color and the axis and direction of motion of a bar stimulus; 2) to find the frequencies of dually selective, singly selective, and doubly unselective sites in both color/axis and color/direction contingencies; 3) to evaluate statistically the relative frequency of dually selective sites across middle and feedback layer zones by means of 3-way G-tests for association (Sokal and Rohlf 1995).
Stage 1 began with modeling each recorded response as the sum of specific and nonspecific components. The nonspecific component is an equal response to all test directions or colors. The specific motion response was modeled parametrically to find the amplitude, half width, and peak position of a (Gaussian) tuning curve centered on the preferred axis or direction of motion. The sampling density of color space was insufficient to permit a similar, parametric analysis of color tuning. Therefore, the amplitude of the color-specific component was defined as the incremental response to the preferred color over the nonspecific response (i.e., the mean response to the other 5 test colors). Sites with broader color tuning were alternatively modeled by comparing responses to the most and least effective triplets of test colors. The magnitude of the specific response was quantified 1) numerically, as a proportion of the maximal discharge rate, to yield a standard selectivity index (SI) and 2) statistically, to yield a P value in relation to the inherent variability of the discharge rate. Indices and P values were then used in concert to define selective and unselective categories.
Model tuning curves with 1, 4, 5, or 6 independent parameters were fitted to a unit's mean response (i.e., spike rate within receptive field) across trials, using least squares minimization. Model parameters are summarized in Table 1. M1 is a nonoriented model, with an equal response (or “gain,” G0) in all directions. M2 is the simplest direction-selective model, with 4 parameters: G0 plus an added Gaussian, tuned response with specified position (P), gain (G1), and tuning width (half width at half height, H1). M3 is the simplest orientation selective model, in which G1 and H1 recur at P and at P + 180°. As M2 is a “unilobed” model, its half width can be up to 90°; M3, being a “bilobed” model, has a maximum half width of 45°. M4 is an asymmetric version of M3 with unequal gains G1 and G2; hence, it also displays directional bias. M5 is an asymmetric version of M3 in which the half widths of the opposite lobes are allowed to differ (but only by an amount less than half the interval between test directions). M6 is “crooked,” with lobes of equal gain and half width at nonopposite positions; the maximum tolerated position discrepancy is again half the test interval. Finally, M7 and M8 are the directionally asymmetric variants of M5 and M6 (i.e., with unequal G1 and G2).
Orientation and direction selectivity were treated as independent properties of the fitted model, directional selectivity expressing the maximum differential response to bar movement in test directions 180° apart, and orientation the differential response along axes at 90° (hence, the operational term “axial selectivity” is preferred to “orientation selectivity”). Note, however, that the actual preferred direction and preferred axis of a unit were not independent because they inevitably coincide. For this reason, the axis/direction contingency (unlike axis/color and direction/color) cannot sensibly be treated as a proxy for feature binding.
Selectivity indices have the generic formula [modulation of response]/[maximum response]:
In the case of model M1, ASI = DSI = 0. The improvement of fit provided by models M2–M8 with respect to M1 was evaluated by a series of F-tests, with subsequent model selection determined by the smallest resultant P value. The best-fitting axial model for a response was selected from M3–M8 and the best directional model from the subsets M2, M4, M7, and M8 (because M3, M5, and M6 yield DSI = 0). ASI and DSI were then calculated from the relevant parameters of the selected axial and directional models. The P value associated with each ASI was the result of the F-test comparing its parent model to the best nonaxial model (i.e., M1). The P value associated with each DSI resulted from the F-test comparing its parent model to the best, lower order nondirectional model (i.e., M1, M3, M5, or M6).
The purpose of the models was to optimize computation of ASI and DSI (not to erect 8 distinct classes of unit, characterized by subtle differences in their directional tuning curves). Quite commonly, the selected axial and directional models were the same (M4). Importantly, however, by allowing them to differ and by disallowing the selection of M1, the procedure obtains nonzero values of ASI and DSI for every unit tested, producing a smooth, continuous distribution for each index. This in turn allows subsequent determination of index thresholds for selectivity to be based on observed (potentially bimodal) population characteristics, as opposed to a preimposed statistical criterion for selecting a single “best model” that may then give ASI = 0 and/or DSI = 0.
The color selectivity index (CSI) was computed as
where Bs is the best response to a single test hue and M is the mean response to the remaining 5 test hues. Alternatively, for units displaying broader tuning:
where Bt is the mean response to the best triplet of contiguous test hues (e.g., cyan, green, and blue) and Mt is the mean response to the remaining triplet. P values were determined by one factor (test hue) analysis of variance and subsequent, a priori t-tests for each of these contingencies. The smaller P value determined whether the associated singlet or triplet CSI was selected (The adjustment of the computation of CSI for broadly tuned units is necessary to prevent the response modulation being systematically underestimated and is procedurally analogous to using tuning curves of variable width in computing ASI and DSI of different units). As with ASI and DSI, the procedure generates a nonzero value of CSI for every unit tested.
Criteria for defining selective or unselective responses were determined independently for each feature by reference to the population frequency distribution of its SI. The primary threshold index value (T1) was determined from indications of bimodality in the frequency distribution of each index. Nodal values (dips between peaks) were noted at ASI = 0.50 (42nd percentile), DSI = 0.7 (95th percentile), and CSI = 0.4 (69th percentile)—see Figure S3 in Supplementary Material. Given that the modes of these distributions were broad and overlapping, T1 was coupled to a statistical threshold, P<0.05. A secondary threshold index value (T2) was set at the 70th percentile of the subset of index values comprising the lower peak (i.e., subthreshold to T1) coupled to a stricter statistical significance criterion. The dual threshold criteria for “selectivity” were thus
Frequency differences between layers were evaluated by means of 3-way G-tests (Sokal and Rohlf 1995) for association and are reported at a “1-tailed” level of significance because the test hypothesis predicts the direction of the imbalance. Subsequent “2-tailed,” 2-way G-tests were used to assess the frequency of dual-tuned units within a layer.
The core data reported here are the color/orientation and color/direction contingencies recorded at 256 sites in identified laminae of area V2. An additional 276 sites, lacking laminar confirmation, were available to provide a larger sample size from which to determine separate population distributions for each property. To study each site, computer-generated chromatic or achromatic bar stimuli were presented at 12 different directions of motion on a black or white background. Our measure of orientation selectivity was based on the preferred axis of movement orthogonal to the bar's orientation (for which we use the operational term axial selectivity). The 6 test colors (red, yellow, green, cyan, blue, and magenta) were approximately equiluminant with mid-gray for human observers.
Quantitative assessment of stimulus selectivity in neural activity is typically accomplished by computing a SI, in which the modulation of response across one dimension of the test stimulus is scaled by the maximum firing rate. Ideally, the subsequent selection of a threshold index value to designate selective and unselective categories requires that the index should display a bimodal population distribution (Schiller et al. 1976a, 1976b). If so, the exact threshold value is likely to vary between stimulus features. Adopting this approach avoids the nomination of a single, arbitrary threshold (e.g., SI>0.7) serving several feature dimensions (DeYoe and Van Essen 1985; Burkhalter and Van Essen 1986; Gegenfurtner et al. 1996; Tamura et al. 1996).
As an alternative to an index, the modulation of the response can be assessed statistically, that is, in relation to the inherent variability of the response. We devised a dual threshold procedure for categorizing selective and unselective units that combines both measures. Briefly, for every recording site, we obtained a DSI, ASI, and CSI each coupled to a statistical P value. All 3 indices produced signs of a bimodal population distribution (see Supplementary Fig. S3), such that the dip between the upper and lower peaks could be used to set the primary threshold, T1 (i.e., selective if index > T1 and P<0.05). In addition, marginally subthreshold units were recruited if their index value surpassed a lesser secondary threshold (T2) with a more significant associated P value (i.e., selective if index > T2 and P<0.01). The resulting overall frequencies of selective units according to these criteria were color 37.9% (97/256), axis 62.5% (160/256), and direction 16.4% (42/256). Each of these frequencies is close to the median level obtained by a meta-analysis of 8 previous studies of V2 (Shipp and Zeki 2002a)—see Section 3 of Supplementary Material.
Figure 1 shows the chromatic and bar motion tuning curves obtained in an illustrative radial electrode penetration, entering V2 from the subjacent white matter. Of the 12 units recorded, 4 were in the deep layers (layer zones 6.0, 5.5, 5.0, and 4.5), 4 in the middle layers (zones 4.0, 3.5, and 3.0), and 4 superficial (zones 2.5, 2.0, and 1). A relatively high proportion (50%) of units was direction selective (implying that the electrode had passed through a local cluster of such units, typically associated with the “thick” CO stripes (Shipp and Zeki 2002a). Four units (numbers 2, 3, 4, and 9), all located outside the middle layers, combined selectivity for color with selectivity for direction and/or axis of motion. Although radial penetrations provide an efficient means of sampling from all layers, the total sample of 256 fully characterized recording sites was somewhat skewed toward the middle layers (n=142) due to the inclusion of several long tangential penetrations whose excursion was largely confined to this laminar zone. The other units (40 superficial and 74 in the deep layers) were pooled to form a feedback layer group (n=114) for statistical analysis.
Analyzing each property separately, the sample from the middle layers was found to be slightly richer in selectivity for axis of motion (68% vs. 55%) but not for direction of motion (14% vs. 19%) nor color (36% vs. 40%). In the crucial, combinatorial analysis, 18% (25/142) of middle layer units were dual selective for color and axis and just 3% (4/142) for color and direction. Dual selectivity was more abundant in the feedback sample, 25% (28/114) showing color/axis selectivity and 11% (13/114) color/direction selectivity. Hence, for either class of combination, the proportion of dual-selective units in the feedback layers exceeded that expected by chance association, whereas the proportion of dual units in the middle layers was less than expected (Fig. 2). If direction and axis classes are pooled, our overall sampling rate for dual color spatiotemporal selectivity was proportionately 54% greater in the feedback layers.
Statistical analysis (Table 2) is provided by a 3-way contingency test—that is, to determine if the relationship of the 2 physiological factors is contingent upon a third factor, laminar location; if so, each laminar zone is subject to separate 2-way analysis. The 3-way test for layer imbalance was indeed significant (P≤0.005) for either feature combination, although differently weighted as shown by subsequent 2-way tests. Color selectivity and axial selectivity showed highly significant negative association in the middle layers and random association in the feedback layers. For color/direction, the negative association in the middle layers was less accentuated but complemented by an opposite trend (i.e., toward positive association) in the feedback layers, such that the net laminar difference attained significance.
The Supplementary Material includes several additional analyses, all reaffirming the above conclusion. Section 4 of Supplementary Material treats the superficial and deep feedback layer zones individually, showing that the frequency of dual-selective cells does not differ between them, although in each case it exceeds the middle layers. Section 5 of Supplementary Material describes a more conservative analysis discounting dual units when both selectivity indices are near threshold (i.e., where, effectively, a heterogenous multiunit origin of the response cannot be ruled out). This leads to stronger indications of negative association in the middle layers and implies random association in the feedback layers for both color/axis and color/direction. Supplementary Section 6 provides a reanalysis of an earlier V2 dataset (Shipp and Zeki 2002a), comprising over 800 units with qualitatively characterized responses to optimized bar stimuli but sharing the identical system for layer determination. Once again much the same pattern of results was obtained, including the fundamental observation that dual units, both color/axis and color/direction, were significantly less frequent in the middle layers.
The co-occurrence of color and spatiotemporal tuning can also be examined by charting the correlation of selectivity indices in each set of layers. Strictly, the hypothesis under test—that there is a paucity of middle layer dual-selective units—implies that a chart of CSI versus DSI (or CSI vs. ASI) should reveal a relative depopulation in the region of the upper right quadrant for the middle layer data. The hypothesis is neutral with respect to the opposite quadrant. Because ASI and DSI are independent measures of the degree of spatiotemporal selectivity and either might influence the value of CSI, the sensitivity of the correlation analysis was maximized by pooling ASI and DSI to form a general index of spatiotemporal sensitivity. This “motion” selectivity index (MSI) simply took the higher value of DSI or ASI for each unit. Figure 3 shows plots of CSI against MSI for each set of layers. There is a significant negative correlation in the middle layers (Pearson r = −0.41) and virtually zero correlation in the feedback layers (r=0.003). Nonparametric (Kendall) rank order correlation coefficients were T = −0.28 and T=0.06, respectively, yielding P<0.0001 (1-tailed) for the difference in CSI/MSI correlation between the 2 layer zones.
Overall, the results show that color and spatiotemporal processing in V2 are relatively independent in ascending pathways (originating in V1 and passing through layers 4 and 3 of V2) but reintegrated in the superficial and deep layers where they are subject to the influence of feedback from higher centers. A previous direct comparison of V1 and V2 has concluded that dual selectivity for color and direction is indeed more frequent in V2 (Tamura et al. 1996). Other studies of V2 have commonly encountered dual selectivity for color/orientation and color/direction, but there is no consensus as to whether these properties associate randomly (Burkhalter and Van Essen 1986; Gegenfurtner et al. 1996; Friedman et al. 2003) or not (Shipp and Zeki 2002a). The possibility of systematic laminar variation has not been previously examined, and the present results may offer some resolution of the discrepancy.
As discussed previously (Shipp and Zeki 2002a), reported frequencies for color, direction, and orientation selectivity in area V2 vary widely across studies, admitting no simple rationale dependent on the use of alert or anesthetized animals or qualitative versus quantitative data capture. Whatever factors are responsible, they may equally afflict the less often reported incidences of dual selectivity. Although the present results (concerning single feature selectivity) are close to a literature median (Table S1), it is apparent that the empirical level of selectivity is uncomfortably contingent upon the stimulus procedures and imposed definition. A better consensus is achieved by comparative indices, for example, frequency ratios of feature selectivity across CO defined stripes (Shipp and Zeki 2002a): in this respect studies concur, for instance, that dark CO thin stripes show a relatively higher incidence of color selectivity and a lesser incidence of orientation selectivity, than other CO stripes. Similarly, the results of comparison across layers, derived here, are better insulated against the vagaries of experimental technique and a step more reliable than the absolute levels of selectivity. In parenthesis, we do not include here a coanalysis of layers and CO stripes, partly because the data lack statistical power to tackle the inflated number of anatomical compartments inherent in layer/stripe permutation. We note simply that layer and CO stripe classifications are orthogonal and that, whatever its nature, the cryptic pattern of distribution of dual tuning across stripes cannot invalidate our conclusions pertaining to laminar organization.
The layer zones we used to compartmentalize our data—ascending layers (3 and 4) versus feedback layers (1, 2, 5, and 6)—did not necessarily show a striking difference in the proportion of dual cells: 18% versus 25% for color/axis and 3% versus 11% for color/direction. However, several factors combine to suggest that these should be conservative estimates of a functional difference. First and foremost, there is no abrupt laminar demarcation between the influence of ascending and descending connections, especially in the transition from the base of layer 3 to superficial layer 1. The diffuse termination of axonal arbors, coupled to translaminar dendritic fields, dictates that we can only rather crudely sample the physiological influences of inputs with separate anatomical origins. Any shortfalls in electrode track reconstruction and laminar assignment of recordings will only compound this problem. We therefore rely on statistical inference: there is a prior hypothesis, and the outcome is significant. The fact that we can obtain an equivalent result in an older V2 dataset (studied qualitatively—see Section 6 of Supplementary Material) confirms our view that the differences in proportion, though mild, are reliable indicators of functional differentiation. The superficial and deep feedback zones appeared similar in their overall level of dual tuning; there might be some functional variation within these zones, but our sample size is a little small to adequately pursue a finer laminar analysis (see Section 4 of Supplementary Material).
To be clear, we do not infer that the feedback inputs directly create dual tuning. It is possible that some elements of feedback are severely attenuated by anesthesia—for example, recordings from V1 electrode implants under alert and anesthetized conditions suggest that anesthesia abolishes contextual responses that are likely to depend on feedback (Lamme et al. 1998). By contrast, basic tuning properties such as orientation selectivity were conserved in the anesthetized state (Lamme et al. 1998). In what follows, therefore, we assume that the laminar differences in dual tuning are largely a product of intrinsic wiring in V2 and we examine the structural relationship between the architecture of dual tuning and the function of feedback.
To set layer function and the role of feedback in the cognitive context of binding (Treisman 1996), we need first to explore other aspects of neural circuitry, concentrating (for economy of exposition) on the superficial layers. What, for instance, is the particular function of layer 2? In both V1 and V2 layer 2 is, at best, a very minor source of ascending output, in comparison to layer 3 (Lund et al. 1981; Shipp and Zeki 1989; Zeki and Shipp 1989; Rockland 1992). Its essential role may therefore be to participate in feedback and intrinsic/integrative functions. In V1, the receptive fields of layer 2 neurons are reported to be larger but less precisely tuned than those of layer 3 and to show greater spontaneous activity (Gur and Snodderly 2008). These properties seem consistent with modulatory circuitry, in contrast to layer 3 that looks specialized to transmit specific, focal image features.
Critically, several elements of V2 cortical architecture also corroborate the role of layer 2 in modulatory, intrinsic duties. First, Golgi studies of area V2 show that all pyramidal neurons in layers 2 and 3 contribute axons to a fiber plexus coursing within layer 3B (here, layer “3.5”) (Valverde 1978; Lund et al. 1981) with a range of several millimeters demonstrable by focal injections of biocytin (Levitt et al. 1994). This distance is equivalent to a full cycle of V2’s stripe modules, permitting communication between all sites with overlapping receptive field locations (Roe and Ts'o 1995; Shipp and Zeki 2002b). Second, a key difference between pyramidal neurons in layer 2 and those in layer 3 is that the former bear apical dendrites with profusely spined collateral branches arborizing extensively in layer 1 (Lund et al. 1981; Peters et al. 1997). The pyramidal neurons of layer 3 (3B especially) may be larger, but their apical dendrites are reportedly less profuse; if the apical main shaft rises to layer 1, it bears few spines and arborizes relatively sparsely (Lund et al. 1981). Third, layer 1 is the primary target of feedback from higher, functionally specialized areas such as V4 and V5 (MT); these projections are organized relatively diffusely and are effectively nonreciprocal, invading the territory of all CO stripe modules, not just those that give rise to the ascending connection (Krubitzer and Kaas 1989; Shipp and Zeki 1989; Zeki and Shipp 1989). Individual feedback axons from V4 to V2 have recently been examined in detail (Anderson and Martin 2006). After giving off some collaterals in layer 6, they are observed to rise to layer 1 where branches may travel several millimeters. Synapses, examined by electron microscopy, are all asymmetric (excitatory) and 80% contact the spines of pyramidal neuron dendrites. Such a circuit, arising and terminating on glutamatergic pyramidal cells, probably exerts modulatory positive feedback (Bullier et al. 2001; Larkum et al. 2004; Spratling and Johnson 2004; Deco and Rolls 2005).
Feedback is considered, inter alia, to be an anatomical conduit for top–down effects relating to attention (Treisman 1998; Reynolds and Desimone 1999; Deco and Rolls 2004; Spratling and Johnson 2004; Hamker 2005), the chain of feature-specific feedback relayed to V2 from prestriate cortex originating from a frontoparietal network of areas (Liu et al. 2003). The feature-similarity gain model (Treue and Martinez Trujillo 1999) posits that attention directed to a particular feature (e.g., green or upward motion) enhances the gain of all correspondingly tuned neurons across the visual field, irrespective of concurrent spatial attention, as evidenced by physiological (Motter 1994; Treue and Martinez Trujillo 1999; McAdams and Maunsell 2002; Martinez-Trujillo and Treue 2004; Bichot et al. 2005), functional imaging (O'Craven et al. 1997; Chawla et al. 1999; Saenz et al. 2002), and psychophysical (Saenz et al. 2003; Arman et al. 2006) studies. Similar such modulation of dual-tuned neurons (in respect of either modality) provides a plausible mechanism for cross-feature attentional effects—for example, modulation of the motion aftereffect generated by moving dot fields contingent upon the match between dot color and attended color (Sohn et al. 2004). Heuristically, the attended dot field becomes the attended “object,” triggering multidimensional selection of the target object's features (Duncan et al. 1997; O'Craven et al. 1999; Schoenfeld et al. 2003; Sohn et al. 2004).
Figure 4 provides a schematic for the proposed mechanism. In this kind of display, where the stimuli are transparent, limited-lifetime dot fields (here with different combinations of dot color and drift direction—Fig. 4A), it is conventional to argue that the selection cannot be based on object location as the 2 dot fields are precisely spatially superimposed. If so, the specificity of the cross-featural effect may well depend on spatiotemporal registration of features at the level of individual dots. Or, in other words, bimodal neurons may play a “bridging” role at a local level, spreading top–down attentional bias from the target feature to the representation of other features with which the target feature is spatiotemporally conjunctive. In V2, this would require a feedback-enhanced activity to propagate from bimodal neurons in layer 2 to similarly selective (but unimodal) output neurons in layer 3 with overlapping receptive fields (Fig. 4B,C). The bridging mechanism becomes a perceptual binding mechanism in the context of the “integrated competition” theory of attention (Duncan et al. 1997) as it promotes consistency of object selection across higher, differently specialized areas of cortex. In the example of Figure 4, the combination of enhanced “red” and “up” outputs from V2 will influence the outcome of competing object representations in separate color and motion cortical processing streams (e.g., in areas V4 and V5/MT), leading to a predominant “red-up” percept.
The bridging mechanism proposed above could also account for the enhancement of sensitivity to coherent dot motion observed psychometrically (Croner and Albright 1997) and neurometrically in area V5 (Croner and Albright 1999), when the subset of coherent dots is made salient in color with respect to the incoherent dots. This effect has been difficult to rationalize in terms of the properties of V5 neurons alone, which are not color selective, and completely chromatically insensitive in the presence of significant luminance contrast (Saito et al. 1989; Dobkins and Albright 1994; Gegenfurtner et al. 1994; Thiele et al. 1999; Barberini et al. 2005). Direction-selective neurons in V1, including identified V5-efferent neurons, also appear to lack color tuning (Movshon and Newsome 1996; Horwitz and Albright 2005). Although the color properties of V5-efferent neurons in V2 have not been so specifically tested, the bridging proposal would enable color-specific motion bias to be injected into a pathway lacking any sign of intrinsic color selectivity.
The emergent “blueprint” for laminar physiology, potentially shared by sensory cortex beyond V2, is for neurons in the ascending pathways (layers 3 and 4) to be functionally specialized (i.e., relatively restricted in their feature dimensionality). Broader feature combinations are wired together in the inner and outermost layers, generating multimodal neurons in laminar locations that are subject to feedback bias including feature-specific effects. This modulatory effect on activity is transmitted through intrinsic connections such that the selected feature combination comes to be reflected in the pattern of activity across unimodal output neurons (see Fig. 4 and Supplementary Fig. S6). The superficial (and deep lying) multimodal cells can thus be considered “bridge” neurons, acting to transfer attentional biases between feature dimensions. This would evidently constitute a binding mechanism, promoting a unified outcome among the object competitive processes within separate cortical areas by spreading selection bias across all the features of the target object, as conceived by the integrated competition model (Duncan et al. 1997). A variety of bimodal neurons, with diverse pairings of sensitivities, would act to unify competition across a range of visual modalities, that is, avoiding the combinatorial pitfall associated with “grandmother” neurons (Ballard et al. 1983).
Wellcome Trust (UK) to S.Z.
We thank Grant Wray for technical assistance and John Romaya for stimulus and data collection software. Conflict of Interest: None declared.