|Home | About | Journals | Submit | Contact Us | Français|
Perceptual systems can be altered by immersing observers in environments with statistical properties that differ from those naturally encountered. Here we present a novel method for placing observers in naturalistic audio visual environments whose statistics can be manipulated in very targeted ways. We present the results of a case study that used this method. Observers were exposed to an environment where there was a novel statistical relationship between two simple, visual patterns in otherwise natural scenes. Exposure to this altered environment strengthened perceptual interactions between the two patterns.
Our environment has a strong influence on the properties of our perceptual systems over a range of timescales. Classic experiments like the monocular deprivation experiments of Wiesel and Hubel (1963) and the “stripe-rearing” experiments of Blakemore and Cooper (1970) and Hirsch and Spinelli (1970) revealed critical periods in perceptual development during which exposure to a normal environment is crucial for the development of normal perception (see Barlow, 1975 for an early review). Exposure to unnatural environments shifts neural resources towards the distribution of features within the altered sensory input. For example, depriving developing animals of information at specific orientations reduces the number of cortical neurons that are selective for that orientation (Blakemore & Cooper, 1970, Blasdel, Mitchell, Muir & Pettigrew, 1977, Sengpiel, Stawinski & Bonhoeffer, 1999). Clinical studies similarly indicate that there are critical periods for the development of visual capabilities in humans (Fawcett, Wang & Birch, 2005, Olitsky, Nelson & Brooks, 2002) – at least for capabilities mediated in early cortical levels of the visual system (see, e.g. Ostrovsky, Andalman & Sinha (2006) for a case where mid to high levels appeared to recover from long term early blindness).
Input driven plasticity in adult humans has received a lot of attention recently, due in part to clinical implications. Information about adult plasticity is helpful in designing recovery programs, for example, following a stroke where some visual capability is lost or following cataract removal where capability is gained (Huxlin, 2008). Knowing about adult plasticity is made more important by the emergence of prosthetic and genetic technologies that may be used to restore low level sensory capabilities (see, for example, Mancuso, Hendrickson, Connor, Mauck, Kinsella, Hauswirth, Neitz & Neitz (2007), Weiland & Humayun (2006)), but which may require substantial amounts of adaptation on the part of the patient.
Successful research in plasticity will depend on appropriate methods for manipulating the statistics of the environment. Optical methods have featured predominantly in past studies. Early experiments involved full-field shifts of incoming natural signals beginning with the use of the inverting lenses worn by Stratton (1897). Others used similar optical devices to invert, displace, and otherwise distort visual input. These were worn for hours, days, or even weeks (see list of methods in Rock, 1966). There is no doubt that the adaptation effects seen in these cases involved remapping between sensory and motor systems – it is not so clear that there was a change in perception (Linden, Kallenbach, Heinecke, Singer & Goebel, 1999) (although note Gibson, 1933, Kohler, 1962). Another early study used full-field colour shifts. Using bi-coloured lenses, the left visual hemifield was tinted blue whilst the right was tinted yellow (Kohler, 1964). After weeks of adaptation, colour judgments with the lenses in place became as reliable as prior to adaptation. Remarkably, after removal of the coloured lenses the observer experienced a gaze contingent, bi-coloured visual environment – looking left tinted the world yellow and looking right made the world blue. This after effect lasted about a month. Simpler arrangements using lenses of a single colour have produced results that concord with this one (Neitz, Carroll, Yamauchi, Neitz & Williams, 2002). Optical methods such as these, as well as those producing spatial shifts, are useful but are clearly limited in their scope for manipulating specific statistical properties of the input.
There has been a recent trend to study environmentally driven adult perceptual plasticity in the lab. In these cases, computer generated stimuli have been used to study, for example, learning of associations between complex unnatural shapes (Fiser & Aslin, 2001, Fiser & Aslin, 2002a, Fiser & Aslin, 2002b), the ability to recruit cues to disambiguate scenes (Backus & Haijiang, 2007, Haijiang, Saunders, Stone & Backus, 2006), adaptation to associations between grating patterns (Carandini, Barlow, O'Keefe, Poirson & Movshon, 1997, Falconbridge & Badcock, 2006), and adaptation to colour and orientation contingencies (McCullough, 1965, Vul & MacLeod, 2006). The advantage of these laboratory manipulations is that the statistics are under tight control, but the use of unnatural inputs in unnatural settings limits the potential for long term exposure, and may make it difficult to translate findings to real world situations.
Here we present a novel method for placing observers in a naturalistic environment where specific statistical features have been altered by the experimenter. A case study, consisting of two experiments on a total of 35 observers, demonstrates the usefulness of the method. Our results reveal the exciting possibility that the adult visual system is capable of adjusting low-level aspects of its perceptual model of the world to match the statistics of the environment.
The case study addresses the specific question of whether the adult visual system is capable of learning relationships between low-level visual features. By low-level features we mean simple visual patterns that match the receptive field profiles of early stage cortical neurons in primates. We chose “Gabor” patterns which are effective stimuli for activating simple cells in primary visual cortex (Jones & Palmer, 1987). We were motivated by two studies showing that the phenomenon of perceptual linking of co-linear Gabor-like patterns (Field, Hayes & Hess, 1993, Li & Gilbert, 2002) may be learned from the environment during development (Hou, Pettet, Sampath, Candy & Norcia, 2003, Kovacs, Kozma, Feher & Benedek, 1999). This conclusion is based on 3 key points. 1) Infants do not exhibit a differential EEG responses to co-linear versus e.g. parallel elements, and young children do not perform well in contour integration tasks, 2) adults do both, and 3) co-linearity is prevalent in the natural environment (Geisler, Perry, Super & Gallogly, 2001, Sigman, Cecchi, Gilbert & Magnasco, 2001). Our aim was to test for the possibility of such environmentally driven learning in adults. We chose a relationship between pairs of Gabors that is less common in natural images, and boosted its occurrence in real world video sequences. Specifically, we exposed adult observers to a strong parallel relationship between Gabor features.
We assessed learning by measuring the effect of a Gabor flanker on the apparent contrast of a target Gabor. Exposure to the parallel relationship increased the strength of the flanker effect, when the configuration of target and flanker matched the parallel relationship present in the altered environment. The change was positive so that a high contrast parallel flanker increased the perceived contrast of the target. This result suggests that the adult visual system retains the ability to learn new relationships between low-level features.
Observers viewed episodes of a popular television show that were manipulated to boost the prevalence of a parallel relationship between local oriented Gabor features. The use of a popular television show was designed to engage the attention of the observers. Informal reports by observers following exposure indicated high levels of attention. The materials of choice were video episodes of the television program “The Office” (NBC's U.S. version). Seven episodes of “The Office” were converted to gray-scale (resolution: 480(h) × 720(w) pixels). A control group of observers viewed the gray-scale episodes as they were, whilst experimental observers viewed manipulated versions of the video, described below. All videos subtended 16.6 × 25.3 deg. visual angle. The original audio soundtrack was presented to all observers along with the video. A sample manipulated video sequence is presented in the Supplementary materials (Supplementary Material A).
Wherever a Gabor pattern (here called “g1”) occurred in any one of the original video frames, a second Gabor (“g2”) with the same properties, but offset spatially was added to the frame at the same intensity (see Figure 1a). g1 and g2 each subtended 24 minutes of arc viewing angle and were spaced 28 min. arc apart in a parallel arrangement. They were oriented at 135 deg, had zero phase, and had a peak spatial frequency of 4.3 cycles/deg. A formal description of the manipulation process follows.
The manipulated movie frame M is a weighted sum of the original frame O and an “added image” A:
where α is a constant chosen to make the average amplitude of g2 equal to that of g1 for the first 20 movie frames of a given movie. To calculate A: Let o represent a 2D fast fourier transform (MATLAB's ‘fft2′ function was used) of O, let b represent the fft2 of g1, and let c be the fft2 of g2. Then the fft2 of A is
where. * represents point-wise multiplication. This is equivalent to filtering1 the original movie frame with g1, then convolving the resulting “amplitude map” with g2. This produces an image that consists of g2s added at each point in the array according to some constant times the amplitude of g1 at that point. As stated, the constant was chosen to equate the amplitude of the added Gabor and the pre-existing Gabor. An example g1 amplitude map is shown in Figure 1b.
Figure 1c shows the manipulated image corresponding to this map. Overall, the manipulation increased the conditional probability of finding significant g2 energy in the image given the presence of g12.
In order to understand the effect of the manipulation process on the relationship between our parallel features, one hundred movie frames were chosen randomly from an episode of The Office and another one hundred from a manipulated version of the same episode. g1 and g2 amplitude maps were produced by convolving the frames with g1 and g2 respectively (note that each g2 map was just a translated g1 map where the translation reflects the spatial relationship of g2 to g1). The correlations between corresponding points in the two maps were calculated for all 100 pairs of frames. The conditional probability of g2 given g1 as well as the g1/g2 joint probability were also calculated.
To avoid incorporating responses to clearly non-g1-like components (such as the top edge of the paper shredder in Figure 1b) in our calculations, a cutoff of 15% of the maximum amplitude was applied, and only amplitudes above this value were considered. 15% was chosen because it was clear of the background responses in the amplitude maps, but the results using 10%-30% cutoffs were similar.
To assess the effects of exposure on the perception of the added Gabor pattern, observers performed a contrast matching task prior to and following each video presentation. Observers adjusted the contrast of one sinusoidal grating (the ‘test’) to match the contrast of a similar grating (the ‘target’) via left-right movement of a mouse. The distance between the left hand side test patch and the right hand side target was 2.8 degrees. Both gratings had the same size and spatial frequency as the Gabor pattern that was added to our movies (Figure 2). Grating patches rather than Gabors were used to avoid the use of a contrast dependent size cue in performing a match.
The target was either presented alone (baseline condition – see Figure 2a) or in the presence of a similar flanker. The flanker was either parallel to the target (Figure 2b), reproducing the g1-g2 configuration in the manipulated video, or was orthogonal to it. The orthogonal condition was included as a control to verify that learning affected only the parallel relationship to which the observer was exposed. An extension to this experiment consisting of two control conditions was also run. In this case, a separate group of subjects were exposed to non-manipulated versions of the same videos, and were tested using the parallel and orthogonal flank conditions.
In order to exert maximal influence on the perception of the mid-contrast target, the flanker was always presented at 100% contrast. The mouse button was pressed when observers were satisfied with a match. Typically, each trial lasted between 5 and 20s. The test stimuli remained on screen during this interval. All stimuli were presented on a linearized, 35×45 degree CRT monitor using MATLAB (R2006a) and Psychtoolbox (1.0.6) software. Observers set ten independent matches per condition per testing phase. Conditions were randomly interleaved.
Observers participated in a single experimental session. They first practiced the matching task (10min) in the presence of the experimenter who then left the room. After completion of a pre-test they viewed two episodes of “The Office” (42min total) and completed a mid- (between the two episodes) and post-test. Each test phase took about 8 min to complete, and all three were identical except for the random order in which trials were presented. All observers viewed the same two episodes.
Fixation was restricted to a region (within a gray box) rather than to a point to allow testing to occur at multiple retinal locations (varying fixation location within the region was encouraged but not enforced). By fixating at a single location during testing one runs the risk of learning effects incurred during movie viewing being washed out by exposure to the testing material in the retinal locations where the test stimuli are presented. The risk with using this fixation guide was that matching results would be noisy as the potentially different visual field locations used to do the matching may supply different percepts of the test stimuli.
A follow up experiment focused on decreasing within- and between-observer variation at the risk of a smaller effect size (see Results section). This experiment used a more traditional fixation cross and a “top-up” procedure to limit decay of the learning effect during test. Observers received top-up exposures (24s of top-up for every 5s of testing) to the manipulated video in between each test which made the average pre-test exposure time similar to that in first experiment. Individual test trials were as in the original experiment.
Three conditions were tested in both experiments: the baseline, parallel flanker, and orthogonal flanker conditions described above. In addition, in the first experiment, a “double wavelength” condition was included which was the same as the parallel condition, but the flanker had half the spatial frequency of the target. We expected to see a marginal learning effect in this case because our added Gabor features tended to occur down and to the left of edges. Edges contain a range of spatial frequencies so our manipulation could be expected to produce an intermediate increase in the association between our added feature and a “double wavelength” flanker.
All observers had normal, or corrected to normal visual acuity and provided written consent prior to participation. Eleven observers viewed the manipulated movies in the main experiment, and a control group of nine additional observers underwent the same testing but viewed non-manipulated versions of the same episodes. A further 15 observers took part in the follow-up experiment described above. Testing procedures were approved by the UCLA Office for The Protection of Human Observers.
The strength of the relationship between the original and added Gabors, g1 and g2, was measured in three ways. We first calculated correlation coefficients between the amplitudes of pairs of Gabor features with the specified spatial offset across each entire image. This correlation was 0.23 for the non-manipulated episode, indicating a modest pre-existing relationship between the two patterns. The value in the manipulated episodes was 0.44. The fact that this correlation is still far from perfect (1.0) may at first seem surprising, but recall that our manipulations can only “guarantee” that if there is amplitude in the g1 position, then there is also amplitude in the g2 position half of the time. This is because we applied our algorithm only once, and so there was no guarantee that an added Gabor in position g2 (which accounts for ~50% of the Gabor features in the manipulated movie) had a similar Gabor down and to the left of it. As a second measure of association, we calculated the conditional probability of moderate (>0.15*maximum) g2 amplitude given moderate g1 amplitude. This value was 0.33 for the non-manipulated movie and 0.55 for the manipulated. The joint probability of moderate g1 and g2 amplitudes was 0.07 for the non-manipulated movie and 0.15 for the manipulated. Thus, for three measures of association, the manipulations produced close to a doubling.
To test for changes in the perceptual relationship between the parallel features, we measured the effect of a high contrast parallel flanker on the perceived contrast of a mid-contrast target. This was done by comparing the matched contrast for the target alone condition to the matched contrast for the flanker condition (see Figure 2). To quantify the effect of the flanker, we subtracted the target-alone from the flanker condition result for each subject. This was done for all flanker conditions. The subtraction also had the effect of discounting individuals' biases to see one side of the display as having higher contrast. Exposure to the videos did not affect the matches made for the target alone condition neither pre-to-mid, nor pre-to-post exposure (mean = 0.2% and -0.18% contrast, p = 0.804 and 0.831 respectively, two-tailed paired t-test, df=19).
Exposure to the manipulated video increased the effect of the parallel flanker on the target (Figure 3a). This effect was positive - the target appeared higher contrast following exposure than it did preceding it - and was present in 10 out of 11 observers. The average change in contrast is 3.5% (which represents a 7.0% magnitude change in terms of the original target contrast). Note that the different starting points for different individuals in Figure 3a is characteristic of centre-surround interactions (e.g. Cannon & Fullenkamp, 1993).
To determine the reliability and specificity of these effects of learning, we computed change scores by subtracting the size of flanker effects prior to exposure from their size following exposure. Change scores were then entered into a repeated measures three-way ANOVA with factors Video Type (manipulated, nonmanipulated), Flanker (parallel, orthogonal) and Test Time (mid-pre, post-mid). We obtained a significant interaction between Video Type, Flanker and Test Time (F(1,18) = 7.424, p < 0.05). Change scores are shown in Figures 3b for the parallel condition and the three control conditions.
The interaction was due to a specific increase of the effect of the parallel flanker following viewing of the manipulated video. An ANOVA for just the parallel condition found a reliable effect of video type (F(1,18) = 5.549, p < 0.05), and planned contrasts revealed that this was due to a reliable increase in the effect of the flanker for the manipulated (p<0.005, two-tailed paired t-test, df=10), but not the nonmanipulated videos (p=0.874, df=8). For the orthogonal condition, neither this ANOVA, nor the planned contrasts for either the manipulated or original video types showed reliable effects (p > 0.1 in all cases).
As expected, there was a significant but intermediate increase in the influence of the double wavelength flanker (Figure 3c). In this case, the change in matching contrast relative to the baseline match was 1.9% which represents a 3.8% magnitude change in terms of the target contrast (p < 0.05). This supports our observation that our added Gabor features tended to occur alongside edges in the original movie frames which contain significant energy at a range of spatial frequencies, including a wavelength twice that of the target. The change in the double wavelength condition for the control group who watched non-manipulated movies was not significant (p > 0.1).
We were interested in whether the use of a fixation region rather than a traditional cross somehow affected our results. We tested a further fifteen observers using a cross in place of the box depicted in Figure 2 in both manipulated and original video conditions. The results confirmed those of the main experiment. Exposure to the videos did not affect the matches made for the target alone condition (mean = 0.06%, p = 0.841, two-tailed paired t-test, df=14). Planned contrasts showed a reliable increase in the parallel flanker effect (p < 0.005, two-tailed paired t-test, df=14) and no change in the orthogonal flanker effect (p > 0.1). See Figure 3d. The magnitude of the learning effect in the parallel condition was 4.0% (relative to the 30% contrast target that was used). This is somewhat smaller than the effect observed in the initial experiment (7.0%). This, along with the smaller within- and between-observer error (compare Figures 3b and 3d) were expected outcomes of using a fixation cross rather than a fixation region (see explanation in Method section).
We have demonstrated a novel method for immersing observers in audio-visual environments that are naturalistic except in targeted ways. The possibilities for such targeted manipulations appear to be numerous. For example, all of the optical manipulations discussed in the Introduction are implementable using this approach - although an observer is not free to interact with the manipulated environment in the way they are using optical methods (see the Future Research section below for a possible resolution to this problem). We have shown that it is possible to add patterns according to certain rules. This led to an association between features in our case. It is also possible to remove any features that are detectable by computer vision algorithms. For example, all vertical features could be removed from a movie, or correlations between particular local features could be disrupted. It is also possible to alter temporal and spatio-temporal properties. For example, a strobe effect could be produced or motion could be distorted. The method thus provides a tool for studying the influence of a wide range of specific statistical manipulations of the environment on perception.
This study is one of a growing number of psychophysical studies that use natural images as stimuli (e.g. Bex, Mareschal & Dakin, 2007). We predict a continuation of this trend based on strong arguments for using natural stimuli for understanding natural vision (Felsen & Dan, 2005, Olshausen & Field, 2005) c.f. (Rust & Movshon, 2005).
In our case study, we manipulated natural video by adding a simple, oriented feature alongside regions where that same feature was already present. After exposure to these videos the apparent contrast of a target “added” feature was boosted by the presence of a high contrast flanking feature when they were in a configuration that matched the relationship present in the videos. There was no effect of exposure in test conditions where the flanker was orthogonal to the target. There was also no effect for the parallel flanker when exposure was to non-manipulated movies. These basic findings were consistent across both experiments reported here, as well as several additional pilot studies (not reported).
Note that our methods were designed to compare perceived target contrasts under various flanker conditions, for example, the parallel-flanker condition was compared with the no-flanker condition. There may also have been non-flanker-dependent changes that were missed by our testing methods. We cannot rule out, for example, the possibility that our main result was due to a decrease in the perceived contrast of the isolated test pattern (perhaps as a result of exposure to a greater-than-usual concentration of such patterns) and no change in the perceived contrast of the target that is paired with a parallel flanker. In any case, the result of pairing with a parallel flanker is a higher perceived contrast for the target compared with pairing with an orthogonal flanker. This means there is greater facilitation by the parallel flanker regardless of other possible effects. This supports the claim that introducing relationships between patterns in the visual environment can increase perceptual associations between the patterns.
The settings made in the final test phase are unlikely to be a result of observers simply reproducing a target/flanker combination they saw in the manipulated movie (“template matching”). This is because the test stimuli differed from the exposed stimuli in a number of ways. First, the test stimuli consisted of isolated grating patterns. Instead of embedding the test stimuli within naturalistic backgrounds, which would introduce relatively uncontrolled contextual effects, an “average image” background was used (which equates to a uniform mean luminance grey if enough movie frames are averaged). Also, the test gratings were much higher contrast than the average Gabor contrast in the manipulated movie. For 100 randomly selected frames, we found that the distribution of contrasts for the Gabor pattern of interest was highly kurtotic (sharply peaked at zero, and heavy in the tails compared to a Gaussian distribution) as is characteristic for Gabor features in natural images. The example map of g1 amplitudes in Figure 1b is illustrative of this. Most areas have zero amplitude, but there are a significant number of relatively high amplitude areas at the same time. Considering only Gabors with 15% contrast or more, 13.1% of Gabors had a contrast between 25% and 35%, only 0.4% had a contrast between 45% and 55%, and the highest contrast Gabor had a contrast of 80.2%. Thus our target and flanker gratings represented rare to very rare features in respect to their contrast (30%, 50% and 100%). The fact that there was a learning effect means that the relationship between the low contrast g1 and g2 pairs in the manipulated environment was abstracted by the visual system and then made manifest under our testing conditions. It is possible that the learning effect may be more pronounced at lower contrasts, but our methods nevertheless allow us to conclude that it is a relationship that has been learned, and not a template.
It is well known that context affects the perception of targets. In general, high contrast patterns presented near a supra-threshold target pattern reduce the perceived contrast of the target (Cannon & Fullenkamp, 1991, Mizobe, Polat, Pettet & Kasamatsu, 2001, Xing & Heeger, 2000). Given our results, facilitation between supra-threshold co-linear Gabor patterns in normal adults might be expected as we have had a life-time of exposure to co-linearly arranged edge segments (Geisler et al., 2001). There is some evidence that, in addition to the general suppressive effect of surround patterns, there is some facilitation (or at least a decrease in suppression) by co-linear flankers (Zenger-Landolt & Koch, 2001) and greater levels of modulation by co-linear flankers (Hou et al., 2003) in humans but at least two studies failed to find higher levels of facilitation by co-linear flankers compared to other flanker types (Williams & Hess, 1998, Xing & Heeger, 2000). This is somewhat surprising given that co-linear Gabors “pop-out” from backgrounds consisting of randomly oriented Gabor elements (Field et al., 1993, Li & Gilbert, 2002), that co-linear flankers facilitate detection of low-contrast targets (Geisler et al., 2001, Polat, Mizobe, Pettet, Kasamatsu & Norcia, 1998, Polat & Sagi, 1993, Polat & Sagi, 1994), and given the number of physiological studies demonstrating specific facilitation by co-linear flankers (e.g. Bauer & Heinze (2002), Polat et al. (1998)) - even some showing a strong correspondence between this neural facilitation and responses in contour detection tasks (Kapadia, Ito, Gilbert & Westheimer, 1995, Li, Piech & Gilbert, 2006). There are at least two possible ways to reconcile these findings. One is that perceptual facilitation effects are small and have been missed by the psychophysical studies cited above. The other is that facilitation does indeed exist in early perceptual mechanisms (e.g. those in V1), but that later mechanisms override the facilitation to insure that conscious perception of contrasts are close to veridical. Both facilitation to aid contour integration, and accuracy in apparent contrast might be considered desirable properties of vision. It is therefore possible that even longer exposure to our parallel relationship would cause a decrease in the size of the observed learning effect as the visual system attempts to restore veridical contrast perception across viewing conditions. Active neural suppression of a secondary, spatially offset image after years of exposure was seen in the case of an impaired vision patient (Fine, Smallman, Doyle & MacLeod, 2002).
Whatever co-linear effects are involved in contour integration, past work has suggested that they are learned from the environment (Geisler et al., 2001, Hou et al., 2003, Kovacs et al., 1999, Li, Piech & Gilbert, 2008). Our experiments demonstrate that relatively brief exposure to an environment where there is a relationship between a parallel target and flanker can lead to an increase in facilitation. This suggests the exciting possibility that some mechanisms of learning relationships between low-level features from the environment are active in adulthood.
Our results are consistent with studies that used traditional perceptual learning methods to strengthen existing associations between collinear elements lying on smooth contours (Kovacs et al., 1999, Li & Gilbert, 2002). In an attempt to understand the neural basis for such learning, a recent monkey study documented increases in neural facilitation between V1 cells whose receptive fields lay on contours with practice on a contour integration task (Li et al., 2008). Although the changes were measured in low-level visual areas, their results also support the involvement of higher level mechanisms. This is in agreement with human fMRI data from a shape learning study where signal growth was seen across a range of stages in cortical visual processing (Kourtzi, Betts, Sarkheil & Welchman, 2005). We extend this work by showing that environmentally driven learning, not tied to practice of a particular task, can increase facilitation between non-collinear elements. Although we measured changes in the relationship between low-level features, which are very likely to correspond to changes in low-level visual states, we can not rule out the involvement of higher level cortical mechanisms in producing the change.
A question arising from this study is: Why didn’t exposure to our movies lead to the suppression of the added edge-like patterns? Such a result would be more consistent with classical adaptation effects which are characterized by decreases in the saliency of exposed features (Blakemore & Campbell, 1969, Crawford, 1947, Gilinsky, 1968, Graham, 1989, Pantle & Sekuler, 1968) including those made contingent on other features (Carandini et al., 1997, Falconbridge & Badcock, 2006, McCullough, 1965). At a general theoretical level, traditional adaptation may be best understood as optimizing processing of features or relationships that are already encoded by the visual system (Clifford, Webster, Stanley, Stocker, Kohn, Sharpee & Schwartz, 2007). Such optimization could include operations such as sensor gain control and the perceptual dissociation of features that are associated in the environment (Carandini et al., 1997, Falconbridge & Badcock, 2006). Rescaling neural output to maintain efficiency in the face of changes in the statistics of the input from the environment (Brenner, Bialek & de Ruyter van Steveninck, 2000) is considered an adaptation response. Thus, whilst adaptation reflects the visual system optimizing the representation of an environment it has already encoded (Clifford et al., 2007), our results may represent the visual system's attempt to learn something new about the environment itself, specifically the increased occurrence of a relationship between two parallel features.
The specific mechanisms that lead to learning in some instances and adaptation in others remain to be explored in future research. Our study suggests several factors that may be important to study. Traditional contrast adaptation studies generally present a relatively intense, isolated adapting pattern for a period of seconds or minutes. The present experiment displayed the adapting pattern relatively weakly and sparsely within a rich, naturalistic audio-visual environment, over a longer duration. The duration, the richness of the input, and/or strength of the adapter may have been critical for producing the results we observed. It is further possible that even longer durations of exposure may lessen perceived contrast of uninformative patterns (Fine et al., 2002).
The method presented here might practically allow exposure to altered environments for hours at a time. To extend this to days and to further enhance the naturalness of the altered environment, it may be possible to implement on-line manipulations of the real visual world using a portable, video see-through augmented reality system. We are developing such a system, whose advantages include 1) the potential for longer exposure times as observers can go about most normal activities, 2) allowing observers to experience a more natural and immersive environment as they are free to engage with the seen world, and 3) providing a better tool for simulating clinical scenarios where a patient must engage in everyday activities.
We have presented a method that can be used to test the extent to which vision tunes itself to the statistics of the environment. An increasing number of theories posit that environmental statistics drive relatively low-level effects in vision (Geisler, 2008, Simoncelli, 2003), but past work has not shown that such effects can be learned by adults from the natural environment. Our methods allowed us to present observers with natural image statistics to which we added a carefully controlled statistical relationship between low-level visual features. Our data indicate that the adult visual system is capable of learning such regularities from simple exposure to an altered environment. These results suggest the exciting possibility that, even at relatively early stages of processing, the adult visual system can change its internal encoding of the world to reflect changes in the statistics of the external world.
There are no financial conflicts of interest on the part of any of the authors. We thank Alan Yuille, Don MacLeod and Patricia Cheng for support and helpful discussion. We gratefully acknowledge financial support from NIH EY11862 (SE), The Keck Foundation's UCLA Program for Vision and Image Science (MF, LS, SE), NIH EY01711 (MF), UCLA Graduate Division fellowship (DW), Naval Research Laboratory grant (LS), and UCLA Faculty Senate Grant (LS).
1Filtering is equivalent to convolving but with the filter matrix rotated 180 deg. As our filter matrix (g1) is the same after rotation by 180 deg, we can just convolve D with g1.
2Note that g1 and g2 are identical Gabors but “g1” and “g2” labels are assigned to particular Gabors within a pair to depict their spatial relationship to one another.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.