|Home | About | Journals | Submit | Contact Us | Français|
In the study of comparative cognition and perception, disparities in the diverse approaches that researchers take in studying behavior sometimes obscure the interpretation of a particular empirical finding. We describe an approach to the study of comparative cognition and perception which focuses on explaining the ways in which different biological systems solve the computational challenges that are posed by their natural environments. Within this investigative framework, the task of detecting correspondence between a three-dimensional object and its two-dimensional photographic representation falls outside the mainstream of most research in animal visual cognition and is of limited value for divulging the principles or mechanisms that underlie the visual abilities of animals. More productive pursuits seek to elucidate the principles and mechanisms of object recognition and categorization, and to illuminate how they contribute to the animal’s survival in the visual world.
Different species possess very different eyes and nervous systems. With such biological diversity, just what do other species see and understand about the world around them? And, are there any general principles of vision to be extracted from so much biological diversity? These questions inspire researchers in the behavioral and neural sciences to study vision in nonhuman animals.
Whether we can give satisfactory answers to these questions depends importantly on the way in which we approach them. Sometimes, disparities in the diverse approaches that researchers take in studying behavior obscure the interpretation of a particular empirical finding. In such cases, it is appropriate to appreciate that science is a systematic enterprise with very specific goals: to answer a research question or to test a theoretical notion. Empirical results should be interpreted in the light of these goals.
Because of the importance of these issues, in this paper, we explicitly characterize our objectives for the study of comparative cognition and perception, and we carefully detail the approach that we have adopted in pursuing these objectives. Our main aim is to discover the principles or mechanisms that underlie the visual abilities of pigeons and to determine whether they are like those of humans and other primates. A truly comparative science of visual cognition would, of course, include a much larger sample of species; but, we have focused our efforts on comparing birds and primates primarily because these are the two groups of animals that have evolved the most sophisticated biological visual systems (Husband & Shimizu, 2001; Shimizu, 2009; Shimizu & Bowers, 1999).
Our approach concentrates on specific behavioral tasks and how these tasks are solved by different visual systems. Thus, our research always begins by choosing a perceptual or cognitive task that is believed to be advantageous for the animal to master in order to adapt to its environment.
Once the task of interest has been determined, it must be characterized in terms of the range of possible inputs that the animal might receive in its natural environment as well as the range of possible behavioral outputs that could contribute to the animal’s adaptation. This characterization also constitutes the first step toward providing a computational or rational account of behavior in the task of interest (Anderson, 1990; Marr, 1982), which has the desirable consequence of providing links between comparative cognition and the larger enterprise of cognitive science. Comparative cognition can be seen as the study of the possible ways in which different biological systems solve the various computational tasks that are posed by their natural environments.
These links with cognitive science provide important theoretical tools to help interpret the results of our research efforts. For example, different species might solve the same computational task using different algorithms or neural structures. Thus, behavior might appear to be similar, but it could arise from different perceptual or cognitive mechanisms. It is these processes, and not mere descriptions of behavior, that should be the focus of our comparative science.
Also, similarities across species in the neurobiological structures that support behavior suggest the involvement of similar perceptual or cognitive mechanisms. Nevertheless, the same algorithm can be implemented in different ways by different nervous systems. Therefore, the lack of homology in the structures underlying a particular behavior does not mean that the mechanisms are algorithmically different.
It is, of course, true that studying animals in their natural environments increases the ecological and external validity of the research; but, doing so would not represent a prudent or practical investigative choice if our aim is to explicate the mechanisms and principles of visual cognition. This goal generally requires increasing the internal validity of the research to the detriment of its external or ecological validity; that is, it requires devising a laboratory model of the environmental task of interest, in order to manipulate and control the range of factors of interest.
The choices that we make in creating such a laboratory task will determine how much external and internal validity we can assign to our experimental research. This process commonly includes choosing the stimuli to be shown to the animal, the specific responses that the animal will be required to perform, the reward function that will be used in the experiment, and the key test that the animal must pass in order to document mastery of the task of interest. Importantly, by focusing on this key test and by specifying the conditions leading an animal to pass it, researchers hope to elucidate the perceptual and cognitive mechanisms that support performance on the task of interest.
In vision research, choosing the experimental stimuli for a laboratory task is an important matter that deserves thoughtful consideration. Stimulus selection must balance seemingly irreconcilable demands. On the one hand, we want to use stimuli that can be readily manipulated; on the other hand, we want to use stimuli that are as similar as possible to those in the animal’s natural environment, in order to faithfully reflect the complexity of the natural task that is being modeled.
We have suggested that photographs of real objects (“natural images”) more closely resemble the stimuli that animals encounter in nature than the stimulus patterns that are commonly presented in artificial categorization research (Soto & Wasserman, 2010). This suggestion is neither an assertion about the way that animals represent the stimuli nor a presumption about animals’ understanding of photographs as representations of real objects, but an objectively measurable fact.
The set of all possible images is enormous. Only a small fraction of the images that can be captured by an organism’s retina are actually encountered in natural scenes. Take as an example the image that is displayed in Figure 1A, that was created by independently and randomly assigning an intensity value to each pixel. Regardless of how many times this exercise is repeated, the likelihood of obtaining an image even remotely similar to anything found in nature is infinitesimally small, thereby underscoring how small the set of natural images is compared to the set of all possible images (Olshausen & Field, 2000).
Biological visual systems have evolved through exposure to this small set of natural images, a fact that has led researchers to become interested in providing a formal statistical description of the properties of this image set (for reviews, see Geisler, 2008; Ruderman, 1994; Simoncelli, 2005). Consider, for example, the image that is shown in Figure 1B. This image was generated semi-randomly, by sampling intensity values using a model that takes into account a single property of the spatial structure in natural images (an amplitude spectrum falling with spatial frequency f approximately as 1/f; Burton & Moorhead, 1987; Ruderman, 1997). The result is much more “natural” than the image that is shown in Figure 1A, looking similar to “clouds.”
Smaller subsets of natural images, such as those depicting an object from a particular category, can also be described in terms of their statistics. In some cases, simple statistics like the amplitude spectrum of an image can provide a basis for scene and object categorization (Torralba & Oliva, 2003).
Artificial stimuli do not generally conserve the statistics of natural stimulation. Researchers use them for strategic purposes only and they construct them without taking into account the known facts of natural images. Artificial categories also fail to capture the similarities and differences among those subsets of images that can be grouped into classes (see Figure 3 in Torralba & Oliva, 2003).
These facts may explain why artificial stimuli are more difficult for pigeons to categorize than natural stimuli (Lea, Wills, & Ryan, 2006). A likely possibility is that biological visual systems have evolved to code and transmit as much information as possible about retinal images, under the constraints that are imposed by the statistics of natural scenes (Atick, 1992; Barlow, 2001; Field, 1994; Geisler, 2008; Olshausen & Field, 2000; Simoncelli, 2003; Simoncelli & Olshausen, 2001). That is, visual systems may have evolved explicit mechanisms to exploit the statistical structure of natural scenes and natural images may engage these mechanisms to a greater extent than artificial stimuli (for neurophysiological evidence, see Felsen & Dan, 2005).
To be sure, it would be a mistake to conflate the stimulation that is provided by a photograph of an object with the stimulation that is provided by light that comes from the actual object; important visual information is lost in a photograph. However, it is perfectly reasonable to posit that the stimulation that is provided by photographs captures at least some the structure that is present in the stimulation that emanates from actual objects. Specifically, photographs apprehend the similarity relations within and between subsets of natural stimuli that can be grouped into classes, making them a sound choice for the study of categorization.
Stimuli must always be chosen with careful regard for the specific research goals and hypotheses under consideration. If we want to increase the external validity of our research, then we might decide to replace natural images with real objects. If we want to perform precise parametric stimulus manipulations, then we might decide to replace natural images with artificial stimuli.
These considerations may have prompted researchers to present pigeons with computer renderings of artificial three-dimensional objects (as reviewed by Spetch & Friedman, 2006) and to present zebra finches with pure tones (Sturdy & Weisman, 2006; Weisman, Williams, Cohen, Njegovan, & Sturdy, 2006), stimuli that do not correspond to anything that would be encountered by these birds in their natural environments. What then can research with such unnatural stimuli tell us about animal perception and cognition under more natural circumstances? It gives us vital information about the basic mechanisms underlying behavior, mechanisms which do change when we move animals from the natural environment to the laboratory.
In many cases, the animals’ own behavior tells us if our experimental stimuli have effectively captured the environmental task that is being modeled. This point returns us to the importance of the target test in a laboratory task. In visual categorization, the fact that animals exhibit generalization of a learned discrimination to novel photographs suggests that the experimental stimuli are effectively engaging the mechanisms that are necessary to extract invariant visual information from varied retinal images. Here, what the pigeons understand about the stimuli that they observe is less important than how they solve the experimental task that is being presented to them.
From our perspective, the ultimate research goals of comparative cognition are, first, to discover the perceptual and cognitive mechanisms underlying an animal’s ability to solve a computational task and, second, to compare the mechanisms that are deployed by different species in solving the same computational task, in the hopes of drawing conclusions about the evolution of cognition and perception.
This goal can only be attained through research involving the careful manipulation of stimuli and other task parameters. Such research is important because its results constrain the range of possible explanations of the animal’s behavior on a particular task.
When several possible processes might explain the available data, other sources of interpretative constraint come into play: these include, preferring explanations based on (often simpler) processes that are known to underlie related phenomena, building a biologically plausible theory, and taking into account the structure of the task and the nature of the input that is provided to the animal (Regier, 2003). The first of these constraints has been used in animal behavior research more often than in other areas of cognitive science, especially as expressed by Morgan’s familiar Canon of Parsimony (Morgan, 1894). The other sources of constraint have been used less often in comparative cognition research, a situation that we hope will change in the future.
Once we propose an hypothesis as to how an animal solves an environmental task, we must test it experimentally and refine our knowledge of those mediating processes. Only when we have reached a clear understanding can we work toward the second goal of comparative cognition: comparing the mechanisms of perception and cognition in different species.
Here, we must answer two important questions: Do different species solve the same computational problem using a common cognitive or perceptual process? Or do they use different processes? If the latter is true, then how do these processes differ and how can these differences be explained in an evolutionary framework?
In answering these questions, we must pay special attention to the fact that evolution involves shared ancestry (Papini, 2002). Thus, some of the mechanisms of visual cognition might be shared among even distantly related species, giving rise to general principles of perception, learning, and cognition (Bitterman, 2000). Although it is true that birds and primates have widely divergent visual nervous systems, it is also true that these systems have developed from a common ancestor (Husband & Shimizu, 2001; Jarvis et al., 2005; Shimizu & Bowers, 1999) through exposure to visual stimulation probably involving more similarities than disparities. Also, differences in implementation could hide striking similarities in the computational algorithms that are used to solve visual tasks, a possibility that is supported by the fact that avian and primate visual systems, although not structurally homologous, have been proposed to be analogous (Shimizu & Bowers, 1999). Finally, more than just vision is involved in high-level visual cognition. Visual categorization and object recognition also involve the selection of those outputs from the visual system that should control behavior. Error-driven learning plays an important role in these functions (Soto & Wasserman, 2010) and a key neural substrate for the computation of prediction error lies in the basal ganglia (Maia, 2009; Niv, 2009; Schultz & Dickinson, 2000)—a structure that is homologous in all amniotes (Reiner, 2002).
Comparison among species is facilitated by focusing on the computational tasks that are faced by animals in their natural environments. This strategy provides benchmarks against which an organism’s performance can be compared. Once a task has been clearly defined, it is possible to determine the optimal strategy that should be followed by a rational agent to solve the task (Geisler, 1987, 2004; Geisler, Perry, & Ing, 2008) and then to compare it to the performance of the organisms under study.
Two different species might approach optimality to different degrees and through different mechanisms, which could prove enlightening about the way(s) in which their disparate environments have shaped their respective visual systems (Geisler & Diehl, 2002). The case in which two different species deviate from optimality in similar ways may be particularly informative.
For example, we have recently found that both humans and pigeons heavily rely on nonaccidental image properties to discriminate renderings of simple three-dimensional shapes, whereas ideal computer observers rely more on other image properties (Gibson, Lazareva, Gosselin, Schyns, & Wasserman, 2007). Whether or not the pigeons in this study appreciated that each picture depicted a three-dimensional object, as the humans surely did, the discovery of such a basic bias in biological visual systems is especially informative about the way in which they solve the task of object recognition.
Having characterized our approach to the study of comparative cognition and perception, we turn to a point that is often raised when we describe our research to various audiences: do we believe that pigeons retrieve three-dimensional (object) representations from two-dimensional (pictorial) images? The answer is, of course, we do not know. We suspect that they do not do so for the simple reason that they lack the necessary picture-object training. Humans without such training might also fail.
That answer is not satisfactory for some, who would insist on such picture-object correspondence for research in animal vision to be ecologically or biologically meaningful (Weisman & Spetch, this issue). We disagree. From our perspective, the notion that we must establish that an animal appreciates the correspondence between two-dimensional images and three-dimensional objects in order to validate our studies of animal vision is misguided. All that is needed to establish a sound science of animal vision is that the animal has effectively solved a computational task that promotes its survival in the visual world.
The computational task in an object recognition or categorization setting is completely different from the task of detecting the correspondence between an object and its pictorial representation. Importantly, both humans and animals in their natural environments must solve the task of recovering the identity or the category of an object from a two-dimensional image projected on the retina. But, only humans living in the highly artificial environments that they themselves have devised must solve the task of determining that a photograph corresponds to an actual object.
Taking this point into account, the issue of picture-object correspondence may have some interest in its own right, but it is of little broad relevance to research in object recognition and categorization; picture-object correspondence is simply not a task, or a laboratory model of a task, that animals must master in their natural environments. Once we have determined the similarities and disparities in the way that two species solve the picture-object correspondence problem, how are we to interpret the results in an evolutionary framework? What does this tell us about the evolution of cognition and perception? Very little, other than that pigeons may be able to solve a task of limited ecological utility (Spetch & Friedman, 2006b; Watanabe, 1993).
Comparative vision science seeks to understand how biological systems process and use visual information to facilitate environmental adaptation. In some important ways, pigeons and people see eye to eye. Such salient similarities suggest that common processes can be discovered in the way that animals and humans perceive the world. Still, some disparities have been discovered (e.g., Young, Peissig, Wasserman, & Biederman, 2001) and many more may be divulged as we continue our research; after all, pigeons and people share a common ancestor which is at least 300 million years old!
This article was supported by National Institute of Mental Health Grant MH47313 and by National Eye Institute Grant EY019781.