|Home | About | Journals | Submit | Contact Us | Français|
Through a series of complex transformations, the pixel-like input to the retina is converted into rich visual perceptions that constitute an integral part of visual recognition. Multiple visual problems arise due to damage or developmental abnormalities in the cortex of the brain. Here, we provide an overview of how visual information is processed along the ventral visual cortex in the human brain. We discuss how neurophysiological recordings in macaque monkeys and in humans can help us understand the computations performed by visual cortex.
More is known about the functions and deficits of the eye than about the rest of the visual system. However, most visual processing occurs in the cortex of the brain. What we see is a heavily transformed version of the input provided by photons impinging on the photoreceptors. By and large, that transformation happens in the vast mass of neocortex devoted to visual processing in the primate brain, the visual cortex (Figure (Figure1).1). Combining systematic studies of the neurophysiology and neuroanatomy of the visual system in human and non-human animal models is providing new understanding of some of the computations performed in the visual cortex. We hope that this emerging knowledge will play a critical role in the future to help the large number of subjects with visual impairment due to cerebral cortex damage.
Even though many visual deficits that are presented clinically relate to the eyes, the retina mediates only the initial steps in processing visual information. There are abundant reports in the literature of cases that involve visual deficits that are caused by damage to cerebral cortex. The appearance of visual problems after cortical damage is not surprising, given that a large part of the cerebral cortex serves visual processing functions (1–5).
Among the many functions of vision, object recognition is arguably one of the most crucial. Visual object recognition is essential for most everyday tasks, including reading, navigation, and face identification. In a small fraction of a second (approximately 150 ms), we can recognize complex shapes and categorize objects and scenes (3, 6–10). Visual object recognition depends on combining two key properties: visual selectivity and the robustness of recognition to object transformations. Visual selectivity refers to the ability to discriminate among similar objects. A key to vision is that we can successfully recognize objects despite transformations in size, position, rotation, and illumination, among others (Figure (Figure2).2). Given that an object can cast an infinite number of projections on the retina, our ability to recognize objects in a manner that is robust to object transformations is likely to have played a key role in the evolution of the visual system. This robustness to image transformations constitutes one of the major challenges for computer-based visual recognition, because most of the image pixels can change dramatically while the content remains the same.
Here, we provide an overview of the processing steps along the ventral visual cortex in the human brain, focusing on what is known (and what is not known) about the mechanisms responsible for visual recognition. Because it is difficult to directly access human visual cortex through invasive methodologies, much more is known about the macaque monkey visual cortex than about the human visual cortex. Throughout this Review, we discuss data generated from the analysis of both the macaque monkey and the human visual system. We hope that our article provides an overview of a small fraction of the studies of visual cortex. The goal here is not to be comprehensive, as there is insufficient space for this; rather, we hope that this article will arouse readers’ curiosity, and we encourage them to peruse more literature.
We start by providing an overview of neural connectivity in the primate visual system based on anatomical studies in macaque monkeys (Figure (Figure1;1; for more details, see refs. 1 and 2). Vision starts when photons impinge on photoreceptors in the retina (for an overview of the retinal circuitry, see refs. 2, 11, 12). Information flows through the retinal circuit from the photoreceptors to retinal ganglion cells. These are the cells that provide the retinal output. Retinal ganglion cells project through axons in the optic nerve to a part of the thalamus called the lateral geniculate nucleus (LGN) (2). Neurons in the LGN project to the primary visual cortex (V1), which is located in the occipital neocortex. Like other parts of neocortex, V1 is composed of six layers, many of which can in turn be subdivided into sublayers (13–16). Neurons in V1 are organized into columns, approximately perpendicular to the cortical surface, where neurons share similar functional properties (16, 17). V1 is the first stage of visual processing, where information from the right and left eyes is combined (Figure (Figure1). 1).
Perhaps due to its relative accessibility, much more is known about V1 than about any other part of the visual cortex. Yet there are a large number of different visual areas in the neocortex outside of V1; these are collectively known as the “extrastriate visual cortex.” From V1, two main pathways can be distinguished: a dorsal stream and a ventral stream (1, 16, 18, 19). The dorsal stream includes areas known as area medial temporal (area MT, also known as area V5) and medial superior temporal (MST) and is predominantly involved in processing spatial information, including detecting object motion, position, and stereovision, i.e., depth perception; the dorsal stream is often referred to as the “where/how” pathway (20, 21). The ventral stream includes areas such as V2, V4, and the inferior temporal cortex (ITC) and is predominantly involved in discriminating colors and shapes; the ventral stream is often referred to as the “what” pathway (3, 22–24). To a first approximation, we can think about the visual cortex as two parallel and hierarchical cascades (Figure (Figure1).1). However, this is only a coarse approximation. There are also numerous interconnections between these two pathways. The V1 to V2, to V4, to ITC pathway is often thought of as a feed-forward pathway (whereby information flows “up” the visual hierarchy). In addition to these forward connections, there are backward connections at every stage of the visual system; the only exception to this rule is the retina: there are no known back-projections from the LGN to the retina. These back-projections are likely to play a key role in top-down influences in visual perception. Thus, the ITC projects back to V4; V4 projects back to V2; and so on. There are also numerous internal recurrent connections within each area. Furthermore, “bypass connections” link, for example, V1 directly to V4 and the ITC directly back to V2.
The last purely visual stages in the “what” and “where/how” streams project to multiple nonvisual areas of the cortex. Among them, two important targets of these projections are areas of the prefrontal cortex and the medial temporal lobe, including the hippocampus and surrounding structures. The prefrontal cortex is likely to play a key role in the moment-to-moment and task-dependent interpretation of visual input to orchestrate cognitive decisions (25, 26), while the hippocampus and surrounding structures play a key role in the consolidation of information into long-term memories (27, 28).
The description in this section provides a simplified overview of the neuroanatomy of the visual system. The system is highly complex (with billions of interconnected neurons), and at this stage we probably know only a small fraction of the connections in neocortex. There has been much excitement recently in the field of connectomics, in which researchers are aiming to develop high-throughput tools to map connections at high resolution within a small part of neocortex.
There are multiple clinical cases where the patient does not have any apparent deficit in eye function and yet manifests profound visual deficits. In several cases, these deficits can be ascribed to damage to specific areas within the neocortex. The study of these patients has provided key insights about visual recognition by providing a correlate between brain areas and functions. We provide an overview of some of these studies, highlighting what we have learned about the visual cortex. We start the discussion by describing the functional consequences of damage to early visual cortex. Lesion and neurophysiological studies in cats, and subsequently in monkeys, led the way to uncover a multiplicity of visual areas outside V1 (1, 20, 22, 29–31). We summarize below some of these visual deficits caused by damage to extrastriate visual areas (e.g., refs. 22, 32–35), dividing the discussion into damage to dorsal visual cortex and damage to ventral visual cortex. Although the cases involving damage to ventral visual cortex are particularly relevant for our discussion on visual recognition, we describe some studies involving dorsal stream damage because they vividly illustrate the relationship between location and damage and also because of the multiple interconnections between the dorsal and ventral processing streams. Many of these conditions are described in single case studies, and it is generally difficult to rigorously quantify their prevalence; some studies suggest that the combination of visual deficits due to cortical damage is even more prevalent than the combination of visual deficits that can be ascribed to the eyes (32).
The fundamental role of V1 in vision was appreciated early in the last century. Studies in cats, dogs, and monkeys that involved complete lesion of the occipital cortex left animals with little if any visual function (for a historical review, see ref. 31). After World War I, the study of subjects who suffered limited occipital lobe lesions due to bullet wounds revealed localized scotomas (i.e., islands of impaired visual acuity) in the part of the visual field contralateral to the wound (36, 37). The more severe condition hemianopia refers to loss of vision in one-half of the field of vision. Effects similar to those caused by bullet wounds in the occipital lobe are often encountered through vascular damage, tumors, and trauma.
A few case studies have discussed a phenomenon referred to as “blindsight” (38, 39). As the name suggests, some subjects with lesions in the occipital cortex are still capable of certain visual behavior within the scotoma. For example, subjects may be able to tell light from dark, or even to detect the direction of a moving object. Several possibilities were proposed to account for these observations, including anatomical routes that bypass V1 and the presence of small intact islands in V1 despite the lesions. Although the phenomenon has been clearly demonstrated, the range of visual behaviors preserved within the scotoma is rather limited, and the subjects’ capacity for fine visual discrimination within it is clearly lost. V1 is absolutely necessary for most visual functions. The profound visual deficits caused by V1 lesions in both animals and humans, combined with the challenges in examining visual behavior in animals, led several prominent investigators to argue that V1 is not only necessary but also sufficient for visual perception. In an interesting historical overview, Gross cites several striking demonstrations of this narrow-minded perception that we now know to be completely wrong (31).
One of the conditions in which the extrastriate visual areas of the dorsal stream are damaged is akinetopsia, which involves a specific inability to discriminate visual motion (40, 41). Zihl et al. described a patient who suffered a bilateral cerebral vascular lesion, outside V1, and who had a specific impairment in her ability to see objects in motion (41). The lesion included the posterior portion of the middle temporal gyrus, the retrorolandic area, and the temporoparietal and occipital white matter. It is quite tempting to ascribe the deficits in this subject to area MT, as several investigators have described neurons that are sensitive to the direction of motion in that region of the macaque monkey visual cortex (21). As in other human lesion studies, although the lesion in this subject does include the likely human homolog of area MT (40), making a precise connection is challenging due to the relatively large region of cortex impaired in this patient. It is interesting to note that electrical stimulation through large grid electrodes of area MT of the human brain can induce direction-specific motion blindness, i.e., blindness to motion in one direction while perception of motion in other directions is completely normal (42). Dutton described the opposite defect in a woman who could not see immobile objects following bioccipital infarctions (32). While the dorsal pathway was apparently intact, she was able to navigate independently only when she moved her head or eyes to see the environment. These studies suggest that the ability to discriminate specific aspects of the visual image (motion or lack thereof) can be correlated with the presence of intact cortex within relatively circumscribed parts of visual cortex.
Damage to the parietal cortex is often associated with different variations of neglect syndrome. While this is not necessarily a strictly visual deficit, it is often manifest as partial blindness. Individuals with this condition are unable to perceive part of the environment unless they pay direct attention to the neglected area. The most common form is left hemispatial neglect (i.e., an inability to perceive objects in the left side of the visual field, caused by posterior parietal lobe lesions in the contralateral right hemisphere) (35, 43–46). A syndrome that is often associated with hemineglect is Balint syndrome, which is characterized by the triad of simultanagnosia, optic ataxia, and ocular apraxia. Simultanagnosia refers to the inability to identify several objects in a scene at once. Patients perceive the world in fragments and are unable to form the whole picture. They typically show difficulty in recognizing a toy in a cluttered box or a person in a crowd, but they do not have a problem identifying the person or toy when it is isolated. Playing team sports or coping in busy environments (e.g., a supermarket or a busy playground) can also be challenging. Optic ataxia and ocular apraxia are conditions in which the visual input to the eyes cannot be used for visually guided movements and coordination of eye movements, respectively. Patients with Balint syndrome are also characterized by an impaired ability to estimate distances. In six patients with Balint syndrome showing similar visual deficits, bilateral lesions in the splenium and angular and supramarginal gyri were observed. The etiology of Balint syndrome is thought to be most frequently bilateral parietal damage, although lesion localization is difficult due to the small number of cases (32, 47, 48).
Several profound visual deficits have also been described as occurring as a result of damage along the ventral visual stream. These conditions are particularly relevant to understanding how ventral cortex contributes to visual recognition. One such condition, known as achromatopsia, involves a specific inability to perceive colors (30). This cortical color blindness is distinct and dissociable from retinal color blindness, which can be linked to missing or abnormal color pigments in the retinal cones. Subjects with achromatopsia typically have relatively large lesions in the inferior ventromedial sector of the occipital lobe, including the lingual and fusiform gyri. Depending on the extent of the lesion, subjects may also display a scotoma within their visual field and more specific visual recognition deficits (35). These subjects show poor performance in hue discrimination tasks but perform well in tasks that involve verbal instructions such as answering the question “What color are bananas?” emphasizing the visual nature of the deficit.
An interesting dissociation between the dorsal and ventral streams was reported in a patient with profound ventral cortex damage, particularly within the temporal lobe, by Goodale and Milner (34). This subject, who had suffered from carbon monoxide poisoning, had severe impairment in object shape recognition. Yet despite her inability to recognize objects, she showed a rather remarkable ability to interact with many objects. She showed an appropriate reach response toward objects whose shapes she could not describe. She also showed correct behavioral performance in visuomotor tasks. For example, in one of the experiments, the subject was asked to indicate the orientation of a slit and performed essentially at chance levels. Subsequently, she was asked to push an envelope through the slit. Surprisingly, the subject was almost as good as controls in this visually guided motor task despite the fact that she was unable to describe the orientation of the slit. Goodale and Milner proposed that the dorsal pathway is particularly engaged in “vision for action,” the immediate use of visual information to carry out specific visually guided behaviors. In contrast to this action mode, they proposed that awareness about an object requires activity in the ventral stream and the temporal lobe in particular.
One of the earliest demonstrations that V1 could not be the entire story in terms of visual recognition and that there must be extrastriate visual function was the study of Kluver-Bucy syndrome (49). After bilateral removal of the temporal lobe in macaque monkeys, a variety of behavioral effects, including loss of visual discrimination, but also increased tameness, hypersexuality, and altered eating habits, were initially reported (49). The work was subsequently refined by inducing smaller lesions that were able to dissociate different aspects of Kluver-Bucy syndrome. These circumscribed lesions showed that the ITC plays a critical role in visual recognition (50–53). Bilateral removal of the ITC in macaque monkeys leads to impairment in learning visual shapes, as well as deficits in retaining information about visual discriminations that were learned before the lesions. The deficits are long lasting, and the severity of the deficit is typically correlated with task difficulty. In other words, monkeys can still perform “easy” visual discrimination tasks after bilateral ITC lesions. The behavioral deficits are restricted to the visual domain and do not affect discrimination based on tactile, olfactory, or auditory inputs. The deficits after ITC lesions are restricted to the visual domain. The other behavioral abnormalities in Kluver-Bucy syndrome were not present when the lesions were restricted to the ITC. This observation underscores the importance of studying spatially restricted lesions (whenever possible) to properly interpret behavioral deficits and highlights the multiple confounding factors that arise when considering lesions that involve multiple brain locations. In the same way that Kluver-Bucy syndrome could be fractionated by more detailed and circumscribed lesions, it is quite likely that in future studies, more specific lesions within the ITC will further fractionate the object recognition deficits prevalent after bilateral ITC ablation. The lesion studies suggest that the ITC plays an important and necessary role in fine object discrimination.
In humans, to our knowledge, there is no evidence for a global object recognition deficit as a result of brain damage. Temporal lobe lesions typically lead to different forms of agnosias, an inability to recognize certain types of objects (22, 54). The variety of behavioral manifestations in the cases of agnosia described in the literature can be to a first approximation linked to the variety of brain regions damaged. Because these are not controlled lesion studies, as in the animal cases, the exact circuits that are affected vary from one patient to another. Perhaps the most widely discussed form of agnosia (although itself a rare condition) is prosopagnosia, a deficit in recognizing faces even though vision in general and recognition of other objects is not impaired (55–60). This condition is often caused by a stroke that affects the right posterior artery, which leads to obstruction of the fusiform and lingual gyri. Other etiologies for prosopagnosia have also been described (e.g., refs. 61–63).
The exact interpretation of the nature of the deficit associated with prosopagnosia has been a matter of debate in the neuroscience community. While some researchers emphasize that it is a deficit in recognizing the special category of face stimuli (55, 58, 64), other investigators consider this to be a specific instance of a more general impairment in recognizing specific exemplars that look alike within a category (65). In addition to acquired prosopagnosia, a few studies have described cases of apparent congenital prosopagnosia. The causes of congenital prosopagnosia are not clear. In these cases, patients sometimes may not even be aware of their deficit, as they easily recognize people by their clothes, voices, gait, and other clues.
Other forms of agnosia have also been described. For example, some studies have reported subjects that fail to discriminate between animate and inanimate objects (66). Other reports have emphasized specific types of impairment for certain object categories (35, 67–71). Topographic agnosia refers to the inability to navigate (32). The extent to which these cases of agnosia are restricted to the visual modality remains unclear. To show that a form of agnosia is purely visual, it is necessary to test recognition using other cues, and this has not been done in all studies. Thus, it is conceivable that at least some of these cases involve a more generic failure related to information retrieval or semantic interpretation that is linked to but goes beyond pure visual recognition. There is also debate about how to define such semantic categories (e.g., consider a definition of “inanimate”). Visual form agnosias have been reported following disseminated lesions or medial temporal stroke (34, 35, 72).
Impaired reading ability is also a common deficit, particularly in children. There is a very large variety of conditions that manifest as reading difficulties, only a few of which directly relate to cortical visual impairment (32, 73). Hemianopia, alexia (typically a disconnection between the right occipital lobe and the language centers), simultanagnosia, and temporal lobe dyslexia have all been associated with poor reading.
In order to understand the function of the neuronal circuits that process visual information, it is necessary to examine the activity of individual neurons in response to visual stimuli. The main method of recording the activity of individual neurons in awake behaving animals or humans is to use microwire electrodes (microelectrodes) that are implanted extracellularly and monitor the extracellular voltage at millisecond temporal resolution and neuronal spatial resolution. Due to technical limitations and the invasive nature of microwire recordings, little is known about the activity of individual neurons in the human cortex. A recent review summarized the properties of neurons in the human temporal lobe upon presentation of visual stimuli (ref. 5; see also refs. 7, 74).
More work has been done examining the activity of single neurons in the cat and monkey brains compared with the human brain (5). Neurophysiological recordings in retinal ganglion cells and LGN neurons have revealed that neuronal responses are constrained to a relatively small patch of the visual field known as the receptive field (75) (Figure (Figure3).3). The receptive field of a neuron is functionally defined as the region of the visual field that triggers activity above baseline levels. There is a considerable degree of variability in the size of the receptive field among different neurons. In particular, neurons with receptive fields in the fovea (the most sensitive part of the retina) have very small receptive fields, typically well below one degree of visual angle, whereas neurons with receptive fields in the periphery can have receptive field sizes up to one degree of visual angle or more. For orientation, one degree of visual angle approximately corresponds to the size of the thumb at arm’s length. In addition to the size, receptive fields typically have an important substructure that indicates how light patterns influence the response of the neuron. The receptive fields of retinal ganglion cells and LGN cells are typically characterized by a center-surround architecture. Two main types of cells have been described: “on-center” cells enhance their activity when stimulated in the center of their receptive fields and are inhibited by illumination of the surround, while “off-center” cells show the opposite responses (2).
Hubel and Wiesel pioneered the investigation of V1 through microwire recordings and opened the way for examining visual cortex neurophysiology (76). Initially working in cats, and subsequently in monkeys, they discovered that neurons in V1 typically respond to bars shown at a specific orientation within their receptive fields (17, 76, 77). The receptive field size for V1 neurons is typically below one degree of visual angle. Their responses are tuned to the orientation of the bar; this type of response is similar to the type of operations used in computer vision to extract the edges of an image (78, 79) (Figure (Figure4).4). This has led to the notion that one of the initial stages in processing visual information in the cortex is related to the extraction of edges in the image. Neurons in V1 are retinotopically organized, arranged within columns where neurons share similar response preferences. Hubel and Wiesel also proposed a model of how the orientation tuning in V1 could arise by combining multiple on-center LGN cells in a feed-forward fashion, whereby the centers of the LGN cells follow the orientation preferred by the V1 neuron. Neurons in V1 are also activated by motion, and some neurons are sensitive to color (80).
Although V1 remains by far the most studied part of visual cortex, multiple investigators have examined the responses of neurons in extrastriate visual areas. Overall, less is known about the properties of extrastriate neurons. Partly, this can be ascribed to the difficulty in examining the vast and high-dimensional visual space with limited recording time. Until recently, most neurophysiology experiments involved inserting and removing a single electrode, holding a recording for time periods of hours at best. More recently, several investigators have started to explore the possibility of using chronic recording technologies that may enable recording for prolonged periods of time. Given the limits in recording time and the vast number of possible stimuli, investigators often use a battery of visual stimuli that are presumed to be of interest to the neurons under study. A couple of general observations can be drawn about the responses in extrastriate visual cortex. As we ascend through the visual hierarchy (along both the dorsal and the ventral streams), there is a progressive increase in receptive field size, in response latencies, in the complexity of feature preferences, and in the degree of tolerance to image transformations. Additionally, neurons in higher visual areas typically show stronger modulation by attention compared with neurons in earlier visual areas (81, 82). That is, the activity of a neuron in response to a stimulus is substantially enhanced if the subject is paying attention to the receptive field of the neuron compared with the response to the same stimulus if the subject is paying attention elsewhere.
In the dorsal stream, V1 projects to area MT. Neurons in area MT show tuning to the direction of motion within their receptive fields, that is, they respond preferentially to certain directions of motion (21). MT neurons are also sensitive to disparity between the two eyes (see ref. 21). Electrical stimulation of groups of MT neurons has been shown to bias the perception of monkeys toward the preferred motion direction of the neuron (83). Lesions in area MT lead to motion perception deficits in monkeys (21). Combining the neurophysiological, electrical stimulation, and lesion findings in the macaque, one can speculate that area MT may play an important role in the condition known as akinetopsia mentioned in the previous section.
Investigators have described neurons in the ventral stream that respond to a variety of complex shapes in the ITC (22–24, 31). ITC neurons often respond vigorously to color, orientation, texture, direction of movement, and shape. Neurons in the posterior ITC (PIT) show a coarse retinotopic organization and an almost complete representation of the contralateral visual field. The receptive field sizes are typically larger than the ones found in V4 neurons. The receptive fields in anterior parts of the ITC are often large, but there is a wide range of estimates of the sizes of receptive fields in the literature, ranging from two degrees (84) to several tens of degrees (85, 86). Most ITC receptive fields include the fovea, i.e., the central pit in the retina, which shows maximal visual acuity.
Investigators have often found strong responses in ITC neurons elicited by all sorts of different stimuli. For example, several investigators have shown that ITC neurons can be stimulated when faces, hands, and other biological stimuli are presented (87–91). Figure Figure55 shows an example of a neurophysiological recording from a neuron in the human medial temporal lobe, which receives input from the ITC. This neuron showed a remarkable degree of selectivity and tolerance to transformations (92). Other investigators have used parametric shape descriptors of abstract shapes (93–95). Logothetis and Pauls trained monkeys to recognize paperclips forming different 3D shapes and subsequently found neurons that were selective for specific 3D configurations of paperclips (96). While this wide range of responses may appear puzzling at first, it is perhaps not too surprising given a simple model whereby ITC neurons are tuned to complex shapes. Our interpretation of the wide number of stimuli that can drive ITC neurons is that these units are sensitive to complex shapes that can be found in multiple different types of 2D visual stimuli, including fractal patterns, faces, and paperclips. This wide range of responses also emphasizes that we still do not understand the key principles and tuning properties of ITC neurons. Tanaka and others have shown that there is clear topography in the ITC response map. Advancement of an electrode in a trajectory approximately tangential to cortex has revealed that neurons within a tangential penetration show similar visual preferences (86, 97–99). They argue for the presence of “columns” and higher-order structures such as “hypercolumns” in the organization of shape preferences in ITC.
While each neuron shows a preference for some shapes over others, the amount of information conveyed by individual neurons about overall shape is limited (85). Additionally, there seems to be a substantial amount of “noise” in the neuronal responses in any given image presentation. Hung et al. recorded (sequentially) from hundreds of neurons and used statistical classifiers to decode the activity of a population of neurons in individual presentations (100). They found that a relatively small group of ITC neurons (approximately 200) could support object identification and categorization quite accurately (up to approximately 90% and approximately 70% for categorization and identification, respectively), with a very short latency after stimulus onset (approximately 100 ms after stimulus onset). Furthermore, the population response could extrapolate across changes in object scale and position. Thus, even when each neuron conveys only noisy information about shape differences, populations of neurons can be quite powerful in discriminating among visual objects in individual trials.
As emphasized above, a key property of visual recognition is the capacity to recognize objects despite image transformations at the pixel level. Several studies have shown that ITC neurons show a substantial degree of tolerance to object transformations. ITC neurons show similar responses despite large changes in the size of the stimuli (96, 101, 102). Even if the absolute firing rates are affected by the stimulus size, the rank order preferences among different objects can be maintained despite stimulus size changes (101). ITC neurons also show more tolerance to changes in object position than neuronal units in earlier parts of the ventral stream (96, 101, 102) and can tolerate a certain degree of depth rotation (22). They even show tolerance to the particular cue used to define the shape (such as luminance, motion, or texture) (103).
There has been increased interest recently in the possibility of examining the human brain at the neurophysiological level in cases where electrodes are implanted for clinical reasons. The most common of these cases involves the study of epileptic patients (5, 7, 74, 104). These recordings have been performed in subjects that have forms of epilepsy resistant to pharmacological treatments. Subjects are implanted with electrodes in order to map the region of the brain responsible for the seizures and to examine cortical function for potential surgical treatment of epilepsy. This approach provides a rare opportunity to examine neurophysiological activity in the human brain at high spatial and temporal resolution. Recordings in the human hippocampus, entorhinal cortex, amygdala, and parahippocampal gyrus have shown strong responses to visual stimuli and a substantial degree of tolerance to object transformations (5, 105) (Figure (Figure5).5). In another study recording from these areas, investigators found neurons that showed responses to multiple objects within a semantically defined object category such as faces, animals, and cars (105). While these areas receive input from the highest stages of the visual hierarchy (Figure (Figure1),1), they are multimodal areas and do not seem to be directly involved in the visual deficits that are typically described in subjects with neocortical damage. It is also possible to examine the responses of visual areas in the human neocortex by recording intracranial field potentials, also in the context of epilepsy surgery (7, 106). These recordings have revealed selectivity to complex shapes and tolerance to object transformations (7, 106).
Recording in patients with epilepsy has provided an extreme example of tolerance to object transformations. Some neurons in the human hippocampus, parahippocampal gyrus, entorhinal cortex, and amygdala show a remarkable degree of selectivity to individual persons or landmarks. For example, one neuron in one subject showed a selective response to images in which former president Bill Clinton was present (Figure (Figure5).5). Remarkably, the images that elicited a response in this neuron were quite distinct in terms of their pixel content, ranging from a black-and-white drawing to color photographs with different poses and views (92). As discussed above for the macaque ITC neurons, we still do not have any understanding of the circuits and mechanisms that give rise to this type of selectivity or tolerance to object transformations.
How is the selectivity to oriented bars at the level of V1 transformed into the responses to complex shapes and tolerance to object transformations at the level of ITC? The current line of thought is that shape preferences and tolerance to object transformations build up gradually as we ascend the visual hierarchy (24, 85, 107, 108). Consistent with this notion, neurons in areas V2 and V4 typically show larger receptive fields and responses to more complex shapes than do neurons in V1 but smaller receptive fields and less complexity and tolerance to object transformations than neurons in the ITC (109–112). Much more work is needed to elucidate the computations that lead to shape recognition along the ventral visual pathway.
As noted above, there are a wide variety of visual deficits that arise as a result of damage to different parts of the visual cortex. It is rather tempting to link the visual deficits described above (in “Visual deficits due to cerebral cortex lesions”) and the neurophysiological findings (described in “Neurophysiological studies of the visual cortex”). However, this requires solving several challenges. As mentioned earlier, most human lesions are not well circumscribed and involve multiple brain areas as well as neurons connecting different areas. It is therefore challenging to establish a one-to-one map between visual deficits and anatomical locations (it is also not clear whether there is a one-to-one map). The neuroanatomical connectivity in the human visual cortex is also much less well established than in the macaque (and even in the macaque, we only know some of the coarse connectivity patterns). Additionally, most of the neurophysiological data come from primate studies, and the homology of visual areas between humans and monkeys is not yet clearly established.
Some of these visual challenges may be subtle and hard to diagnose while others are dramatic and obvious. Dutton has provided several detailed strategies for the diagnosis of visual impairment due to cortical factors (32, 113). Although it might seem easier to treat visual deficits originating from the eye than from the brain, physicians can still help in cases of cerebral visual impairment. A useful and practically oriented review of therapeutic options can be found in ref. 113. Orientation, object recognition, and visually guided movements can be partly trained. The remaining partial function can be supported by several strategies, including enlarging text, reducing distractions, usage of good colors and contrasts, usage of specific visual cues, and other problem-specific solutions. For example, attentional deficits such as hemineglect can be tackled by clear markers such as colored stickers in the affected hemifield or approaches such as turning the plate 180° during every meal. Visual object recognition problems can be ameliorated by training recognition with nonvisual cues including labels and text. Even though most often there is no causal treatment of cerebral visual impairment, there is a lot to be done to improve the quality of life of the affected individuals. More research and effort is needed to translate the knowledge gained from the last four decades of studying visual cortex into methods to help patients who show visual impairment due to cortical damage or malfunction.
There are a large number of studies that use human neuroimaging to bypass the single-unit recordings and examine indirect correlates of human brain function in a noninvasive manner. This is a promising research tool given the potential to examine the human brain in a noninvasive manner. As it stands, the spatial and temporal resolution is very poor compared with that of neurophysiological recordings. The spatial resolution is approximately 1 mm3, which probably encompasses at least tens of thousands of neurons as well as other cells (compare with neurophysiological recordings that examine the activity of individual neurons). The temporal resolution is on the order of seconds, which is about 1,000 times slower than the speed of the dynamical signals that neurons use to communicate with each other in cortex. Part of the problem arises because current imaging techniques only indirectly measure neural activity. In one of the most common techniques, magnetic resonance imaging signals are used to infer blood oxygenation levels; blood oxygenation levels are in turn indirectly related to neural activity (for a detailed review of functional imaging, see refs. 114–117). Several studies have compared functional imaging measurements against neurophysiological recordings. Some of these studies have revealed a rather striking correlation between the two types of measurements (e.g., refs. 118, 119). However, other studies have suggested that the relationship between these two signals is quite complex and have described several situations where functional imaging signals can appear in the absence of concomitant neural activity and vice versa (e.g., refs. 115, 117, 120–122). Therefore, caution must be taken in interpreting human neuroimaging studies. For examples of recent neuroimaging work on human visual cortex, see refs. 18, 58, 123–131. In addition to functional studies, there is increasing enthusiasm regarding the usage of imaging techniques to examine other aspects of brain structure in the human brain. In particular, diffusion tensor imaging holds promise as a key tool to investigate connectivity in the human brain.
Several investigators are developing computational models of the ventral visual cortex. These computational models may play an important role in helping us quantitatively understand the complex and dynamic processes involved in visual recognition (e.g., refs. 3, 108, 132–134). In the future, one can imagine that we might be able to induce “lesions” in the models to examine the type of visual deficits caused by these lesions and to compare their output against the visual deficits that patients experience. While we still have a long way to go, combined with the neurophysiological, neurological, and imaging studies described above, this is a very promising line of investigation. In addition to these computational studies, much progress has also been made through the quantitative study of visual recognition in psychophysics (e.g., see ref. 135 for an overview).
Combining the knowledge and techniques described in this article, some investigators are trying to develop prosthetic devices to help the visually impaired and even blind people. The field of prosthetic devices has been particularly active in the context of paralyzed patients and motor deficits. In the motor domain, the idea is to “read out” cortical signals, decode them in real time, and use them to actuate a robotic device. In the visual domain, one idea is to use digital devices to capture light information and then “write in” this information to the subject’s brain. Investigators are thinking about neural prosthetic approaches at multiple different levels in the visual hierarchy, from the retina (retinal prosthesis) to primary visual cortex all the way to higher visual cortex (cortical prosthesis). Due to space constraints, we cannot describe these approaches in detail, but we would like to emphasize that the field of visual prosthetics is possible thanks to a combination of talent and dedication from researchers in multiple fields, including those involved in neurophysiological recordings, psychophysics, cognitive science, neurology, neurosurgery, neuroimaging, engineering, applied mathematics, machine learning, and engineering. This is a highly active and interdisciplinary area of research that may one day transform the lives of people with severe visual impairment that cannot be treated by other means.
We acknowledge support from the Massachusetts Lions Foundation, the NIH (1R21EY019710, 1DP2OD006461 to G. Kreiman), and the National Science Foundation (0954570 to G. Kreiman). We also acknowledge support from the Hamburger Foundation (J. Blumberg) and the German National Merit Foundation (J. Blumberg).
Conflict of interest: The authors have declared that no conflict of interest exists.
Citation for this article: J Clin Invest. 2010;120(9):3054–3063. doi:10.1172/JCI42161.