|Home | About | Journals | Submit | Contact Us | Français|
The vestibular system is vital for motor control and spatial self-motion perception. Afferents from the otolith organs and the semicircular canals converge with optokinetic, somatosensory and motor-related signals in the vestibular nuclei, which are reciprocally interconnected with the vestibulocerebellar cortex and deep cerebellar nuclei. Here, we review the properties of the many cell types in the vestibular nuclei, as well as some fundamental computations implemented within this brainstem–cerebellar circuitry. These include the sensorimotor transformations for reflex generation, the neural computations for inertial motion estimation, the distinction between active and passive head movements, as well as the integration of vestibular and proprioceptive information for body motion estimation. A common theme in the solution to such computational problems is the concept of internal models and their neural implementation. Recent studies have shed new insights into important organizational principles that closely resemble those proposed for other sensorimotor systems, where their neural basis has often been more difficult to identify. As such, the vestibular system provides an excellent model to explore common neural processing strategies relevant both for reflexive and for goal-directed, voluntary movement as well as perception.
Vision, hearing, smell, taste, and touch are the five senses we commonly recognize as providing us with information about our environment and our interaction with it. A less well recognized but exquisitely sensitive set of sensors, the vestibular organs in the inner ear, provide us with a vital sixth sense: the sense of our motion and orientation in space. In particular, three roughly orthogonal sets of semicircular canals measure how the head rotates in three-dimensions (3D). They are complemented by two otolith organs (the utricle and saccule) that measure linear accelerations including how the head translates and how it is positioned relative to gravity. Even creatures with relatively simple nervous systems (e.g., jellyfish, crustaceans) have basic graviceptors that provide information about orientation with respect to gravity that is critical for survival (Sandeman and Okajima 1972; Singla 1975).
The vestibular system plays a vital role in everyday life, contributing to gaze stabilization (Barnes 1993; Raphan and Cohen 2002; Angelaki 2004; Cullen and Roy 2004), balance and postural control (Inglis et al. 1995; Allum and Honegger 1998; Buchanan and Horak 2001; Horak et al. 2001; Cathers et al. 2005; Maurer et al. 2006; Stapley et al. 2006; Macpherson et al. 2007), spatial navigation (Andersen 1997; Stackman and Taube 1997; Page and Duffy 2003; Bremmer 2005; Day and Fitzpatrick 2005; Gu et al. 2007; Taube 2007), spatial perception and memory (Berthoz et al. 1995; Israel et al. 1997; Van Beuzekom et al. 2001; Stackman et al. 2002; Brandt et al. 2005; Klier et al. 2005; Li and Angelaki 2005; Klier and Angelaki 2008; Vingerhoets et al. 2008), voluntary movement planning (DiZio and Lackner 2001; Mars et al. 2003; Bresciani et al. 2005; Bockisch and Haslwanter 2007; Raptis et al. 2007), and autonomic function (Yates 1992; Balaban and Porter 1998; Yates and Bronstein 2005). However, unlike most other senses, we are typically not consciously aware of its contribution until uncertainty in interpreting vestibular signals or conflicts with other sensory cues give rise to illusions or motion sickness. Its essential contribution is felt most acutely when vestibular system function is compromised (e.g., due to vestibular hair cell loss, vestibular neuritis, central and peripheral lesions etc.) resulting in problems of disorientation, loss of balance and postural control, loss of visual acuity, and perceptual distortions (Curthoys et al. 1991; Halmagyi et al. 1991; Curthoys and Halmagyi 1995; Karnath and Dieterich 2006; Dieterich 2007). Being phylogenetically old, the vestibular system can also provide unique insights into the foundations upon which the computational strategies used widely by the brain are organized.
Much of the processing of vestibular signals occurs in the brainstem and cerebellum, where there is already strong multimodal convergence with optokinetic and proprioceptive information (Waespe and Henn 1977, 1981; Boyle and Pompeiano 1980, 1981; Boyle et al. 1985; Wilson et al. 1990; Buttner et al. 1991; Barmack and Shojaku 1995; McCrea et al. 1999; Wylie and Frost 1999; Gdowski and McCrea 2000; Barmack 2003). In addition, many of the secondary neurons receiving direct primary afferent inputs are also premotor cells that project directly to extraocular motoneurons (McCrea et al. 1980, 1987; Scudder and Fuchs 1992). Thus, beyond its obvious functional importance, the vestibular system also represents an ideal model system for studying broad principles of sensory processing ranging from multisensory integration for spatial motion estimation to the sensorimotor transformations required for motor control. While most recent reviews have concentrated on specific aspects of vestibular system function (e.g., gaze stabilization: Barnes 1993; Raphan and Cohen 2002; Angelaki 2004; Cullen and Roy 2004; Angelaki and Hess 2005; motor learning: du Lac et al. 1995; Raymond et al. 1996; Blazquez et al. 2004; Boyden et al. 2004; postural control and locomotion: Bent et al. 2005; Deliagina et al. 2008; spatial memory and visuo-spatial updating: Klier and Angelaki 2008; Smith et al. 2009; cortical multisensory integration: Andersen 1997; Fukushima 1997; Angelaki et al. 2009), the goal here is to focus on early (i.e., subcortical) vestibular processing (see also Angelaki and Cullen 2008) and how it has contributed to our understanding of neural computation.
Historically, computational approaches have always been an integral part of studies of the vestibular system. This trend was initiated early by pioneers who used control systems theory to establish the basic sensorimotor transformations by which vestibular signals are converted into the motor commands that drive compensatory eye movements (i.e., vestibulo-ocular reflexes, VOR) during head motion. The success of this approach was facilitated by at least four important factors: (1) vestibular stimuli can be precisely controlled, thus ensuring that the “input” can be easily quantified. This is also true for the “output”: eye movements can be very accurately measured (Robinson 1963); (2) all processing stages in the VOR, from primary afferents to extraocular motor neurons, take place within interconnected brainstem and cerebellar regions that are easily accessible for electrode recordings; (3) The eye represents a very simple motor plant both because it is a single joint system and because it carries a negligible load; (4) to a first approximation, the simplest processing in the VOR pathways is linear. As a result, it was possible not only to theoretically predict exactly which transformations need to take place to convert vestibular signals into an appropriate motor output, but also to identify experimentally the neural correlates for these transformations.
Such studies continue to provide new insights for sensorimotor control. However, more recently, research in the field has increasingly focused on more complex and often nonlinear or “context-dependent” computations. As reviewed below, the vestibular system provides an excellent model for identifying the neural correlates of contemporary principles of motor control (e.g., internal models, reafference versus exafference, and reference frame transformations) both because of its relative simplicity (e.g., as compared to the circuits for limb control) and because it is possible to precisely control and measure both the inputs to the system and its neural or behavioral outputs. Most recent work in the vestibular system has focussed on the following research questions: (1) the sensorimotor transformations for reflex generation; (2) the neural computations for inertial motion estimation; (3) the computations to distinguish active from passive movements; (4) the integration of vestibular and proprioceptive signals for body motion estimation. Therefore, in this review we will first discuss experimental and theoretical evidence for internal models in the VOR and their neural correlates in vestibular nuclei (VN) neurons that are sensitive to eye movements. Then we will shift our attention to another group of VN neurons without sensitivity to eye movements and summarize their role in the computation of inertial motion and in the distinction between actively-versus passively-generated head movements. The last topic we will review is how vestibular signals can be used to estimate not only head but also body motion. Note that, throughout this review, we concentrate on the subcortical processing of vestibular information (for reviews about cortical processing, see Fukushima 1997; Guldin and Grusser 1998; Angelaki et al. 2009).
A common theme throughout is the concept of internal models. In recent years, the term “internal model” has been used in a variety of contexts to refer to anything from an explicit neural representation of the dynamic properties of a motor plant or sensor (Shidara et al. 1993; Shadmehr and Mussa-Ivaldi 1994; Wolpert and Miall 1996; Kawato 1999; Green et al. 2007) to the representation of a solution to a specific equation that needs to be solved (Merfeld et al. 1999; Angelaki et al. 2004; Zago et al. 2004, 2009; Green et al. 2005). Here, we use the term in its broadest sense to refer to any neural representation of a specific computation that needs to be performed. The internal model concept is emphasized here because, as reviewed below, this is perhaps the only sensorimotor system for which neural correlates of internal models have been explicitly identified.
An essential role of the vestibular system is to ensure stable viewing of the world by eliciting short-latency reflexive eye movements to compensate for head movement, known as the vestibulo-ocular reflexes (VORs). Early studies of the vestibulo-ocular pathways have provided the groundwork for understanding basic sensorimotor transformations and have elucidated principles that have broad application to all types of motor control. In particular, in any motor system, the brain must compute motor commands from signals that provide a representation of desired action. The required computations often rely on internal representations of the dynamic properties of the motor plant to be controlled. Such “internal models”, which now constitute a general theoretical concept in motor control, may be used either to transform desired action into appropriate motor commands (“inverse model”) or conversely, to predict the consequences of motor commands on behavior (“forward model”) (Shidara et al. 1993; Shadmehr and Mussa-Ivaldi 1994; Wolpert and Miall 1996; Kawato 1999). Some of the earliest and most parsimonious evidence for such models and their neural implementation comes from studies of sensorimotor processing in the vestibulo-ocular pathways. Next we describe the sensorimotor transformations in the rotational vestibulo-ocular reflex (RVOR) and the concepts that have emerged thus far.
The need for an inverse dynamic model in the RVOR (Fig. 1a) was pioneered by David Robinson and his colleagues in the 1970s (Skavenski and Robinson 1973). Their hypothesis, which has remained influential in motor control, was based on three basic observations: (1) afferents from the semicircular canals encode head velocity over a broad frequency range (> ~ 0.03 Hz); (2) having little inertia, the mechanics of the eyeball are dominated by visco-elastic forces such that the relationship between eye position and motoneural firing rates can be approximated by a first-order low-pass filter with a bandwidth of ~ 0.5–0.6 Hz (Robinson 1964, 1965, 1970). As a result, if the semicircular canal afferent signals were simply projected in a feed-forward fashion directly to extraocular motoneurons, eye velocity would be proportional to head velocity only for frequencies above ~ 0.5 Hz (Fig. 1b; blue curve labeled “no inverse model”). Yet, (3) it has been shown experimentally that the compensatory RVOR bandwidth is broad, extending to very low frequencies (Fig. 1b; red curve labeled “with inverse model”; Buettner et al. 1981; Mizukoshi et al. 1983; Paige and Sargent 1991; Angelaki et al. 1996). The difference between the red and blue curves in Fig. 1b implies an additional processing stage (“inverse model” in Fig. 1a), whereby premotor circuits compensate for the dynamics of the eyeball by “filtering” canal afferent signals with an inverse dynamic model of the eye plant.
Robinson and colleagues also pioneered the first plausible implementation of such an inverse dynamic model that became well-known as the “parallel-pathway” model (Fig. 1c; Skavenski and Robinson 1973; Robinson 1981): They proposed that velocity signals were conveyed to motoneurons (MN) both directly and indirectly via a “neural integrator” (∫ in Fig. 1c). Together the two pathways compensate for the viscoelastic properties of the eyeball and are thought to comprise an inverse dynamic model of a simplified (first-order) eye plant. In an alternative representation the integration was implemented in a distributed fashion via positive feedback loops through a forward model of the eye plant (Fig. 1d; Galiana and Outerbridge 1984; Galiana 1991). These two descriptions produce equivalent sensorimotor transformations and the same VOR response characteristics (i.e., same blue to red curve transformation in Fig. 1b; see Green et al. 2007, supplemental material, for details).
Both implementations make an important prediction regarding the properties of the neurons driving the RVOR: premotor neurons should exist the firing rates of which are closely correlated with eye position, reflecting the output of the neural integrator in Fig. 1c or the output of the forward model in Fig. 1d. Further expanded and more complex models consistent with this notion also predicted the existence of neurons which encode various combinations of head velocity and eye movement-related signals (Cannon et al. 1983; Galiana and Outerbridge 1984; Cannon and Robinson 1985; Arnold and Robinson 1991; Cova and Galiana 1996; Green and Galiana 1996; Hazel et al. 2002). As described next, recordings from brainstem and cerebellar neurons have provided solid experimental evidence consistent with these predictions.
The first neurophysiological support for the existence of an inverse eye plant model that includes a neural integrator came from the discovery of “burst-tonic” and “tonic” neurons in the prepositus hypoglossi (PH) and adjacent medial vestibular nuclei (VN) (collectively referred to here as PH–BT cells). As shown in Fig. 2a, PH–BT neurons have firing rates that correlate closely with eye position during static fixation and low-frequency slow eye movements (Baker and Berthoz 1975; Lopez-Barneo et al. 1982; Escudero et al. 1992, 1996; McFarland and Fuchs 1992) and they do not respond to head movements in the absence of eye movement during fixation of a target that moves with the head (i.e., during RVOR suppression; McFarland and Fuchs 1992; Cullen et al. 1993; Green et al. 2007). Consequently, PH–BT neurons were thought to encode the eye position component of the inverse dynamic model (e.g., E* in Fig. 1c, d).
Other populations of VN neurons that became known as “position-vestibular-pause” (PVP, Fig. 2b) and “eye-head” (EH, Fig. 2c) cells were shown to carry different combinations of head velocity and eye position (and/or eye velocity) signals (King et al. 1976; Lisberger and Miles 1980; Chubb et al. 1984; Tomlinson and Robinson 1984; Scudder and Fuchs 1992; Cullen et al. 1993; Cullen and McCrea 1993; Lisberger et al. 1994c). Many PVP and EH neurons receive monosynaptic canal inputs and make direct projections to extraocular motoneurons, thus being identified as putative interneurons in the shortest-latency VOR pathways (McCrea et al. 1980, 1987; Scudder and Fuchs 1992). Depending on whether PVP and EH cells prefer contralaterally or ipsilaterally directed eye movements, they can be further subdivided into “eye-contra” and “eye-ipsi” cell types. Notably, only the eye-contra (also widely known as “type I”) PVP and EH subgroups appear to make the bulk of direct projections to motoneurons and are thus considered the main premotor VN neurons in the RVOR pathways (McCrea et al. 1980, 1987; Scudder and Fuchs 1992).
The PVP and EH cell types can be distinguished by the way they combine head and eye movement signals. PVP cells increase their activities for head rotation in one direction during RVOR suppression (i.e., stabilization of a target that moves with the head so that the eyes do not move) and for eye rotation in the opposite direction during head-stationary smooth target tracking (smooth pursuit); as a result, response modulation is largest during stable-gaze RVOR when the eyes move in the opposite direction to head motion as animals fixate a world-fixed target (Fig. 2b). In contrast, the preferences of EH cells for head rotation during RVOR suppression and for eye rotation during smooth pursuit are in the same direction, such that response modulation is reduced during stable-gaze RVOR (Fig. 2c; Scudder and Fuchs 1992). As EH cells typically exhibit larger responses during pursuit as compared to RVOR suppression, their modulation during stable gaze RVOR is often dominated by eye-movement-related activity (Scudder and Fuchs 1992; Cullen et al. 1993; Lisberger et al. 1994c). As a result, some EH cells with large pursuit responses show an apparent reversal in preferred direction during RVOR suppression (when the eyes do not move) as compared to RVOR stable gaze conditions (when compensatory eye movements are elicited; e.g., Fig. 2c). This “oppositely-directed” activity is presumed responsible for canceling out the strong PVP modulation (e.g., Fig. 2b) at the motoneuron level during RVOR suppression (Scudder and Fuchs 1992; Cullen et al. 1993; Cullen and McCrea 1993). Thus, in conjunction with PH–BT cells, PVP and EH cells are generally presumed to provide motoneurons with the correct combination of velocity and position-like signals to compensate for the plant dynamics during slow eye movements.
Learning and viewing context-related changes in VOR amplitude are often accompanied by significant changes in the depth of modulation of EH neurons. These cells are thus thought to play a particularly important role in the online contextual modulation of the VOR with viewing location (McConville et al. 1996; Chen-Huang and McCrea 1999a, b; Meng and Angelaki 2006) as well as in long-term adaptive reflex changes brought about by altered visual-vestibular mismatch stimuli (Lisberger et al. 1994b). A subset of EH (but not PVP) cells, known as floccular-target neurons (FTNs), receive direct inhibitory projections from the cerebellar flocculus and exhibit properties appropriate to drive changes in reflex gain during motor learning (Lisberger et al. 1994b, c). FTNs and their connectivity with the cerebellar flocculus/ventral paraflocculus have provided an excellent model system for studying the neural, cellular, and genetic basis of a simple form of motor learning (see reviews by Lisberger 1988; du Lac et al. 1995; Raymond et al. 1996; Blazquez et al. 2004; Boyden et al. 2004).
This brief summary emphasizes a widely accepted notion that the several types of eye-movement-sensitive premotor neurons collectively contribute to computing an inverse dynamic model of the eye plant. The distributed nature of this inverse model is supported both by the high level of neuronal interconnectivity and by eye-movement deficits consistent with a loss of integration after lesions to many brainstem and cerebellar areas (Zee et al. 1981; Cannon and Robinson 1987; Godaux et al. 1993; Mettens et al. 1994; Kaneko 1997, 1999). Recently, aspects of this theoretical construct have been reconsidered and extended, leading to new insights into the organization of the system that reveal close parallels with other motor systems (e.g., limb control). These insights have been brought about by considering another reflex type, the translational vestibulo-ocular reflex (TVOR), that generates compensatory eye movements during translation (e.g., during locomotion). Next we describe how differences between the RVOR and TVOR have helped probe the concepts of internal models and their neural implementation.
The TVOR differs from the RVOR in many respects (reviewed in Angelaki 2004; Angelaki and Hess 2005), including the basic dynamic transformations required to convert sensory signals to motor commands. In particular, unlike other types of eye movements including saccades, smooth target tracking, and the RVOR that are all driven by velocity-like signals, the sensory drive for the TVOR provided by otolith afferents is encoded in terms of linear acceleration (Fernandez and Goldberg 1976a, b). Behaviorally, the TVOR also has a much narrower dynamic range and is robust only at frequencies above the eye plant bandwidth (>~0.5–1 Hz; Paige and Tomko 1991a; Telford et al. 1997; Angelaki 1998).
These differences both at sensory and at motor levels imply that ultimately different sensorimotor processing is required for the TVOR versus the RVOR. But to what extent are common computational strategies employed? Recall that the broad RVOR bandwidth has been used as the main argument for the existence of an inverse dynamic model that compensates for the eye plant dynamics (Fig. 1a, b; Skavenski and Robinson 1973). However, using a similar logic, no such compensation is needed for the TVOR: the high-pass dynamics of the TVOR (e.g., similar to those in Fig. 1b, blue curve “no internal model”) would either argue against an inverse plant model or at best suggest that such processing may be unnecessary (Green and Galiana 1998; Musallam and Tomlinson 1999; Angelaki et al. 2001). In principle, only an integrator is necessary for the TVOR to convert linear acceleration into the velocity-like signals required to drive the reflex at higher frequencies. Thus, one way that otolith signals could be processed is by only utilizing the integrator pathway in Robinson’s parallel pathway diagram (Fig. 3a; Green and Galiana 1998; Musallam and Tomlinson 1999; Angelaki et al. 2001).
While the scheme shown in Fig. 3a represents the most efficient strategy for processing otolith signals in the TVOR, it nonetheless has a disadvantage. As a common inverse model would not be shared by all sensorimotor systems that drive the same effector (the eyeball in this case), the way that premotor neurons encode information about eye movement would depend on the sensory stimulus (see Green and Galiana 1998; Green et al. 2007 for details). Alternatively, a common inverse model might be shared by multiple sensorimotor systems to ensure that at least some premotor neurons always encode a consistent eye movement representation even when the dynamics of both the motor output and the sensory input differ. In this case, however, the processing in the TVOR would be less efficient; otolith linear acceleration signals would need to be preprocessed first (i.e., upstream of the inverse plant model; “prefiltering” stage; Fig. 3b; Paige and Tomko 1991a, b; Telford et al. 1997) both to make them compatible with the velocity-like eye movement drive from other sensory sources as well as to provide the high-pass properties that are observed behaviorally in the TVOR. What strategy does the brain use? One that optimizes the use of existing circuitry to perform multiple distinct sensorimotor transformations (Fig. 3a) or one that relies on a common internal model, despite the need for additional processing, with the goal of maintaining consistent internal state estimates (Fig. 3b)?
Single unit recordings from PH–BT, PVP, and EH cells during both rotation and translation have revealed distinctions in the way the particular neural subpopulations encode rotational versus translational signals (Angelaki et al. 2001; Meng et al. 2005; Meng and Angelaki 2006; Green et al. 2007). Nonetheless strong support has been provided for the prefiltering stage in Fig. 3b (Green et al. 2007). Both “eye-contra” PVP and PH–BT cells (but not EH and “eye-ipsi” PVP cells) exhibit modulations that lag eye velocity (i.e., are more closely in phase with eye position) at 0.5 Hz, suggesting that a second temporal integration of otolith signals must take place centrally in the TVOR pathways (i.e., compatible with a prefiltering stage). In addition, unlike all other cell types, PH–BT cells exhibited response dynamics relative to eye movement that were identical during rotation and translation. Thus, canal and otolith signals appear to be processed by a common inverse model to create a consistent estimate of the motor output at the level of PH–BT neurons (Green et al. 2007). Furthermore, as outlined next, a direct comparison of neural responses during rotation and translation with those of extraocular motor neurons has provided new insights regarding the neural basis of the inverse dynamic model and the role of PH–BT cells.
Recall that the prevailing theoretical conceptualizations emphasized the notion that PH–BT neurons encode an internal estimate of eye position, a signal representing the output of the neural integrator (E* in Fig. 1c) or of a forward eye plant model (E* in Fig. 1d). Yet, when the critical experiment to test such a presumption was performed, it was shown that PH–BT cell dynamics are identical to those of extraocular motoneurons (Fig. 4a; Green et al. 2007). Thus, PH–BT cells are not the output of the neural integrator portion of the inverse model, as previously assumed. Instead, they appear to represent the output of the inverse model itself, encoding an efference copy of the motor command signal (Green et al. 2007). In retrospect, this finding is not surprising. Motoneurons are only involved in generating the movement and the control of eye movements does not rely on on-line feedback from muscle spindles (Keller and Robinson 1971; Guthrie et al. 1983). Thus, PH–BT cells must play the important role of distributing an efference copy of the motor command (output of the inverse dynamic model) to different premotor and sensory areas, where it can be used for multiple purposes, including updating the brain about ongoing eye movements (McCrea and Baker 1985; Belknap and McCrea 1988; Green et al. 2007).
In particular, the requirement for dedicated neuronal populations that carry an efference copy of motor command signals is found in contemporary theories of limb control, which suggest that sensorimotor transformations may rely on complementary forward and inverse models of the sensors and motor actuators (e.g., Wolpert and Kawato 1998). Accordingly, on-going eye movement can be estimated by feeding the efference copy signal of the motor command (i.e., the output of the inverse model) through a forward model. The output of the forward model would then predict the estimated eye movement consequences of this motor command (Fig. 4b). Such a signal can be used to update the brain about ongoing eye movement and to correct online for any errors between predicted and desired action by subsequently refining the motor command.
If PH–BT neurons in the PH and VN represent the output of the inverse dynamic model for slow eye movements, where might the proposed forward model be? Green et al. (2007) (also see Glasauer 2003) suggested the cerebellum as one likely site for the implementation of the forward model. Indeed, the cerebellum has been implicated in the implementation of forward and inverse dynamic models both for limb control (Ito 1970; Miall et al. 1993; Wolpert and Kawato 1998; Kawato et al. 2003) and for eye movements (Shidara et al. 1993; Gomi et al. 1998; Glasauer 2003; Ghasia et al. 2008; Lisberger 2009). In support of this notion, BT neurons in PH/VN and paramedian tract are known to project to the flocculus (Langer et al. 1985b; McCrea and Baker 1985; Belknap and McCrea 1988; Buttner-Ennever et al. 1989; Nakamagoe et al. 2000). Feedback from a presumed forward model in the flocculus could be used to update the brainstem motor command signal via Purkinje cell projections onto FTNs (i.e., EH-type cells) in the VN (Langer et al. 1985a; Lisberger et al. 1994c).
Support for such a proposed role for the cerebellar flocculus and its projections onto EH cells in computing a forward model comes from a recent study which examined how various cell populations encode 3D ocular kinematics during smooth pursuit eye movements (Ghasia et al. 2008). In particular, visually-guided eye movements in 3D are subject to kinematic constraints such that eye positions always lie in what is known as Listing’s plane. To achieve this, the axis of eye rotation during movements initiated from eccentric positions must tilt out of this plane in the same direction as gaze, by approximately half as much (half-angle rule) (Tweed and Vilis 1990). As extraocular motoneurons do not encode the half-angle rule (Ghasia and Angelaki 2005) this property appears to be generated by the mechanical characteristics of the eyeball (Miller 1989; Demer et al. 2000; Kono et al. 2002; Klier et al. 2006). As a result, there is a clear distinction between the motor command and the resulting eye movement, a difference that provides a unique opportunity to investigate the neural substrates for inverse versus forward models. Indeed, PH–BT cells, like motoneurons, showed little systematic eye position dependence consistent with the half-angle rule (Ghasia et al. 2008), providing further support for the proposal that PH–BT neurons represent the output of the inverse model.
Combined, the studies of Green et al. (2007) and Ghasia et al. (2008) thus show that the firing rates of PH–BT neurons are both dynamically (i.e., in terms of their frequency response characteristics; Fig. 4a) and kinematically (i.e., in terms of their firing properties during 3D eye movements) identical to the firing rates of extraocular motoneurons. In contrast to motoneurons and PH–BT cells, EH neurons showed a systematic dependency on eye position that might be consistent with the half-angle rule, suggesting that they carry signals more closely related to the actual executed eye velocity (Ghasia et al. 2008). As many EH cells receive projections from the cerebellar flocculus, such signals could be conveyed from a forward model in the cerebellum. Yet, there are reasons to suggest that this may not simply be a forward model of the eye plant. In particular, many Purkinje cells in the cerebellar flocculus do not simply encode eye velocity but rather seem to combine eye and head velocity signals to compute an estimate of gaze velocity (Lisberger and Fuchs 1978; Miles et al. 1980; Stone and Lisberger 1990; Lisberger et al. 1994a). This has led to the speculation that if the cerebellar flocculus computes a forward model it may in fact be a model of the combined eye-head gaze system (Lisberger 2009). At present, no firm conclusion has been reached regarding the nature of and neural correlates for such a hypothesized forward model.
These recent advances emphasize a conceptual organization for the vestibulo-ocular system that closely parallels those proposed for limb control. Interpretations of experimental data in this context (e.g., Green et al. 2007; Chen-Harris et al. 2008; Ethier et al. 2008; Ghasia et al. 2008; Lisberger 2009) are thus likely to shed valuable new insights into neural strategies for sensorimotor processing, motor control, and learning that are relevant for all types of reflexive and goal-directed voluntary movement. Yet, while the vestibular system contribution to gaze stabilization is arguably one of its most well-studied functions, the vestibular sensors also provide important sensory cues for spatial orientation and self-motion perception. Next, we describe how signals from the two vestibular sensors interact and we focus on another neuron type, known as “vestibular-only” (VO) cells, which is distinct from the cell populations described above in that they do not carry signals related to eye movement.
Early studies showed that VO cells in the VN and cerebellum also behave as a distributed neural integrator (Cohen et al. 1977, 1981; Raphan et al. 1977, 1979; Waespe and Henn 1977; Katz et al. 1991; Reisine and Raphan 1992; Yokota et al. 1992; Wearne et al. 1997a, 1998). The original function ascribed to this integrative network, which became popularly known as the “velocity storage” integrator, was to compensate for the high-pass dynamic properties of the semicircular canals, with the goal of improving or storing central estimates of angular velocity (Raphan et al. 1977, 1979; Robinson 1977). Thus, this network too appeared to be computing an inverse model, but this time not of the dynamics of the eye but instead of the semicircular canals. However, subsequent experiments revealed that this so-called “velocity storage” network also exhibits complex spatial properties that depend on head orientation with respect to gravity (Raphan et al. 1981; Harris 1987; Raphan and Cohen 1988; Dai et al. 1991; Merfeld et al. 1993b; Angelaki and Hess 1994, 1995; Wearne et al. 1997b, 1998). These observations pointed to a broader role for this VO-cell network in integrating multisensory signals (i.e., optokinetic, vestibular, and somatosensory) to compute internal estimates of inertial self-motion (Merfeld et al. 1993a; Angelaki and Hess 1994, 1995; Merfeld 1995; Glasauer and Merfeld 1997; Hess and Angelaki 1997; Zupan et al. 2002; Green and Angelaki 2003, 2004). In this regard, the nomenclature used to describe VO neurons is misleading, as many respond not only to vestibular stimuli but also to full-field optokinetic and/or proprioceptive stimulation (Waespe and Henn 1977, 1981; Boyle and Pompeiano 1980, 1981; Boyle et al. 1985; Kasper et al. 1988; Wilson et al. 1990; Buttner et al. 1991; Barmack and Shojaku 1995; McCrea et al. 1999; Wylie and Frost 1999; Gdowski and McCrea 2000; Barmack 2003; Bryan and Angelaki 2009).
Among the most important computations implemented by the VO-neuron-network is the resolution of an ambiguity in interpreting sensory otolith signals. Below we summarize evidence suggesting that VO cells within the VN, rostral fastigial nucleus of the cerebellum (rFN) and nodulus/uvula of the caudal cerebellar vermis (NU, lobules IX and X) implement an internal model of the solution to a fundamental physical law necessary to resolve this sensory ambiguity. We start with a brief description of the problem.
The ambiguity arises because: (1) we move within a gravitational environment; (2) the otolith organs, like any other linear accelerometer, transduce both inertial (translational, t) and gravitational (g) accelerations, thereby providing information about net acceleration (a = t − g; Einstein’s equivalence principle; Einstein 1908). Thus, changes in the firing rate of otolith afferents are ambiguous in terms of the type of motion they encode: they could reflect either translation or a head reorientation relative to gravity (i.e., tilt) or combinations of these motions.
A key difference between translation and tilt is that as the head is reoriented relative to gravity it is also simultaneously rotated. Thus, tilts typically activate rotational sensors (e.g., the canals). In contrast, the semicircular canals are not stimulated during pure translation. In theory, therefore, the ambiguity can be resolved by combining otolith signals with estimates of head rotation (e.g., from semicircular canal, visual and/or proprioceptive cues). In recent years, a number of theoretical and behavioral studies have illustrated that rotational cues can be used to explicitly separate the net gravito-inertial acceleration signal sensed by the otoliths into central estimates of gravity and translational acceleration (Merfeld et al. 1993a, 1999; Merfeld 1995; Merfeld and Young 1995; Glasauer and Merfeld 1997; Angelaki et al. 1999; Bos and Bles 2002; Merfeld and Zupan 2002; Zupan et al. 2002; Green and Angelaki 2003, 2004; MacNeilage et al. 2007).
Theoretically, the way that rotational cues should be combined with net acceleration signals to resolve the sensory ambiguity is described by the following equation:
Equation 1 states that to estimate translation, t, the otolith net acceleration signal, a, must be combined with an independent estimate of head tilt (g = − ∫ω × g dt) computed from an extra-otolith rotation estimate, ω. The ∫ω × g term (where ∫ is an integration and × is a vector cross-product) describes the computations that take into account an initial estimate of head orientation (initial g state from static otolith and/or proprioceptive cues) to transform a head-referenced angular velocity signal, ω (e.g., from the canals) into an updated estimate of dynamic tilt relative to gravity, g.
Experimental support for a role for rotational signals in estimating translation (as predicted by Eq. 1) was provided in a series of elegant human and monkey behavioral studies. Merfeld and colleagues (Merfeld et al. 1999, 2001; Zupan et al. 2000) reasoned that if canal signals are inaccurate they would give rise to an inaccurate estimate of gravity (i.e., tilt) and consequently an inaccurate estimate of translation (i.e., an incorrect central estimate of g = − ∫ω × g in Eq. 1 results in an incorrect estimate of t). They then took advantage of the fact that the canals provide an inaccurate estimate of angular velocity at low frequencies to reveal a systematic pattern of “erroneous” ocular responses in humans consistent with the hypothesis that canal signals had contributed to an internal, albeit incorrect, estimate of translational motion (Merfeld et al. 1999; Zupan et al. 2000).
At about the same time, Angelaki and colleagues (Angelaki et al. 1999; Green and Angelaki 2003) used combinations of tilt and translation stimuli (e.g., Fig. 5a, top) at higher frequencies (>0.1 Hz), where canal estimates of angular velocity are accurate, to demonstrate that signals from the semicircular canals directly contribute to the generation of the TVOR in monkeys. Similar types of stimuli have subsequently been used to show that canal signals contribute to tilt/translation discrimination in human perceptual responses (Merfeld et al. 2005a, b) as well as to tilt perception in monkeys (Lewis et al. 2008). An exception to this finding is the human TVOR where tilts and translations are not ideally distinguished. Instead the human TVOR appears to rely predominantly on an alternative, but non-ideal, “filtering” strategy in which higher-frequency otolith stimuli are interpreted as translations while low-frequency stimuli are interpreted as tilts (Mayne 1974; Paige and Tomko 1991a; Merfeld et al. 2005a, b). More generally, a combination of “filtering” and “otolithcanal” convergence strategies are likely to be used to varying extents. In addition, contemporary theories based on Bayesian inference suggest that experimental findings consistent with the predictions of both strategies may be obtained using a zero inertial acceleration prior (i.e., it is more likely that we are stationary rather than moving; Laurens and Droulez 2007; MacNeilage et al. 2007; for a review, see Angelaki et al. 2010).
Importantly, however, under conditions where tilts and translations are appropriately distinguished behavioral studies have confirmed that (1) semicircular canal signals play a critical role in the estimation of translational self-motion and (2) the dynamic processing of canal-derived rotational signals is consistent with the integration implied by Eq. 1 (Green and Angelaki 2003). As will be shown next, the otolith-canal convergence necessary to implement Eq. 1 takes place on VO cells within brainstem–cerebellar circuits that involve the VN, rFN, and NU (Angelaki et al. 2004; Green et al. 2005; Shaikh et al. 2005; Yakusheva et al. 2007).
To investigate how and where neurons combine canal and otolith signals to distinguish tilts and translations, neural responses were recorded during four stimuli: translation, roll tilt, and simultaneous combinations of the two motions in which translational and gravitational accelerations either summed (Tilt + Translation stimulus) or canceled one another out (Tilt − Translation stimulus; Fig. 5a, top). Unlike the responses of otolith afferents which encode the net linear acceleration (Fig. 5a), many neurons in the VN and rFN modulated strongly during translation, and only marginally during tilt, as illustrated with the example VN cell in Fig. 5b. Note, in particular, that this VN neuron responded during Tilt–Translation motion, even though the dynamic linear acceleration stimulus to the otoliths was close to zero (see the lack of modulation of the otolith afferent, Fig. 5a). Thus, the robust response of central neurons to Tilt–Translation motion reveals nicely the underlying semicircular canal contribution to constructing an estimate of translation (Angelaki et al. 2004; Green et al. 2005; Shaikh et al. 2005; Yakusheva et al. 2007). Indeed, neural responses during Tilt–Translation disappeared after the semicircular canals were inactivated by plugging (Shaikh et al. 2005; Yakusheva et al. 2007).
The extent to which individual neurons in the VN, rFN, and NU reflected a neural coding of translation versus net acceleration is summarized in Fig. 5c, which compares the normalized correlation coefficients of the fits of each model to cell responses to the stimuli in Fig. 5a,b. Data points falling in the upper left quadrant represent neurons that were significantly more translation-coding. Cells in the lower right quadrant were significantly more net acceleration-coding. Whereas VN and rFN neurons spanned the whole range from translation to net acceleration coding, all NU Purkinje cell responses correlated best with translation (i.e., the output of an internal model of the solution to Eq. 1). Perhaps most importantly, quantitative analyses showed that for most neurons (including those that did not explicitly encode translation) the otolith and canal-derived signals converging onto each cell (terms a and − ∫ω × g, respectively) were spatially and temporally aligned, as necessary to implement an internal model of the solution to Eq. 1 (Angelaki et al. 2004; Green et al. 2005; Shaikh et al. 2005; Yakusheva et al. 2007).
Thus, in summary, brainstem and cerebellar neurons were shown to carry the appropriate signals to distinguish translation in the horizontal plane from small tilts from an upright orientation and to explicitly construct a central representation of translation. Under these conditions (i.e., translation and small tilts from upright), otolith signals are combined with both spatially matched and dynamically transformed (i.e., temporally integrated) canal signals to resolve the tilt/translation ambiguity (Green et al. 2005). However, the head is not always upright. As described next, the specific way that otolith and canal signals must combine to resolve the sensory ambiguity problem in 3D depends critically on head orientation.
Let us return to Eq. 1, showing that the component of acceleration due to head reorientation relative to gravity must first be computed using rotational signals (i.e., the term − ∫ω × g). As emphasized in Fig. 6a, as the canals are fixed in the head whereas the gravity vector is fixed in space, different sets of canals signal a reorientation relative to gravity when the head is upright (i.e., vertical canals), as compared to when the head is pitched forward or backward (i.e., horizontal canals). Thus, in general, the way that otolith and canal signals must combine to distinguish tilts and translations is head-orientation-dependent (Green and Angelaki 2004, 2007; Green et al. 2005). This is exactly what is implied by the vector cross-product g = − ∫ω×g term in Eq. 1; it implies that the brain must combine head-centered rotational information, ω, nonlinearly (multiplicatively) with a current estimate of head orientation, g, to compute a new updated tilt estimate.
For small rotations from different static head orientations, this computation can be thought of as approximately equivalent to transforming a head-centered representation of angular velocity (e.g., from the canals) into a world-centered representation of the earth-horizontal rotation component (Green and Angelaki 2004, 2007; Green et al. 2005; Yakusheva et al. 2007). Specifically, as illustrated schematically in Fig. 6b, the rotation component about the earth-horizontal axis, ωEH, corresponds to the component of rotation that signals a change in head orientation with respect to gravity. Integration of this signal yields an estimate of dynamic tilt (gdyn ≈ ∫ωEH), which can then be combined with otolith signals to extract an estimate of translation, t. Accordingly, an important theoretical prediction for cells that encode the output of an internal model of Eq. 1 (i.e., the NU cells that encode translation) is that they should combine otolith signals with canal signals that have been transformed into a spatially-referenced signal (i.e., an estimate of ωEH). At present, this prediction indeed appears to hold for the simple spike responses of NU Purkinje cells which exhibit a robust canal-derived ωEH signal during Tilt–Translation motion from an upright orientation (Fig. 6c) but do not respond to rotations about an earth-vertical axis (Fig. 6d; Yakusheva et al. 2007). That the responses of these neurons reflect the full vector cross-product computation of Eq. (1) required to estimate ωEH and compute translation in 3D remains to be explicitly shown by examining their responses across multiple head orientations.
The types of context-dependent (in this case head-orientation-dependent) computations that are required to estimate inertial motion are similar to those required for many other sensorimotor problems, such as planning limb movements where the way muscles are activated for the same movement direction depends on starting limb posture (Buneo et al. 1997; Scott and Kalaska 1997; Scott et al. 1997; Sergio and Kalaska 2003; Buneo and Andersen 2006; Ajemian et al. 2008). A better understanding of how such computations take place within the VO cell network and the role of the cerebellum in this process is thus likely to be of broad general relevance for understanding the processing strategies employed across multiple sensorimotor systems. As will be shown next, VO cells also participate in another fundamental computation: that of distinguishing actively-generated from passively-applied head movements (see also review by Angelaki and Cullen 2008).
Until recently, the vestibular system had been exclusively studied in head-restrained animals, by moving the head and body together using an externally applied stimulus. As a result, our understanding of vestibular processing was limited to the neuronal encoding of vestibular exafference (i.e., vestibular signals arising from motions applied by the external environment). More recently, investigators in the field have compared neural responses during self-generated head movements to those during more traditional “passive” vestibular stimulation (McCrea et al. 1999; Roy and Cullen 2001). While vestibular afferents reliably encode head motion during active movements (Cullen and Minor 2002; Sadeghi et al. 2007; Jamali et al. 2009), neural responses in the VN can be dramatically attenuated (Fig. 7, compare a and b; see also Boyle et al. 1996; McCrea et al. 1999; Roy and Cullen 2001). What is even more striking is that these same vestibular neurons continue to selectively respond to passively applied head motion when a monkey generates active head-on-body movements (Fig. 7c; Roy and Cullen 2001; Cullen and Roy 2004). Furthermore, cognitive signals appear to play no role as neural responses are not attenuated when the monkey uses a steering wheel to drive its own passive whole-body rotation (Roy and Cullen 2001). This selective suppression of self-generated vestibular activity during active head movements is specific to the class of VO neurons found in the VN and rFN regions that are interconnected with the NU (Cullen and Roy 2004). Notably, these are the same areas involved in computing inertial motion (i.e., described above) although at present whether the same neurons that extract such estimates also show a selective suppression of activity during active head movements remains to be determined.
These findings are of particular importance for understanding how the brain differentiates between sensory inputs that arise from changes in the world and those that result from our own voluntary actions. As pointed out by von Helmholtz (1925), this dilemma is notably experienced during eye movements: although targets rapidly jump across the retina as we move our eyes to make saccades, we never see the world move over our retina. Yet, tapping on the canthus of the eye to displace the retinal image (as during a saccadic eye movement) results in an illusionary shift of the visual world.
The concept of internal models, outlined in previous sections, is ultimately tied to the dilemma of distinguishing sensory inputs that arise from external sources from those that result from self-generated movements. To address this problem von Holst and Mittelstaedt (1950) proposed the “Principle of Reafference”, where a copy of the expected sensory results of a motor command is subtracted from the actual sensory signal, thereby eliminating the portion of the sensory signal resulting from the motor command (termed “reafference”) to create a perception of the outside world (termed “exafference”). An internal estimate of the reafferent signal can be derived by processing a motor efference copy signal via an internal model of the motor system to create an internal prediction of the sensory consequences of that motor command (Wolpert et al. 1995; Decety 1996; Farrer et al. 2003; Fig. 8a).
Recently, a series of elegant experiments by Cullen and colleagues (Roy and Cullen 2004) have shown that such a mechanism underlies the selective elimination of sensitivity to active head movement. In principle, either neck proprioceptive signals or an efference copy of the neck motor command might be responsible. However, in the rhesus monkey, passive activation of neck proprioceptors did not significantly alter VN neural sensitivities to head rotation (Fig. 7d; Roy and Cullen 2001, 2004). Similarly, when head-restrained monkeys were encouraged to attempt to move their heads, even though they produced the motor commands to generate head torques comparable to those generated during large gaze shifts (i.e., when the head actually does move), this had no effect on neural responses (Roy and Cullen 2004). Thus, neither neck motor efference copy nor proprioception cues alone were sufficient to account for the elimination of neuronal sensitivity to active as compared to passive head rotation (i.e., compare Fig. 8b and c). Instead, by experimentally controlling the correspondence between intended and actual head movement (Fig. 8d; see legend for details), Roy and Cullen (2004) showed that a “cancellation signal” is generated only when the activation of neck proprioceptors matches the motor-generated expectation (Fig. 8a). In agreement with von Holst and Mittelstaedt’s (1950) original hypothesis, an internal model of the sensory consequences of active head motion is used to selectively suppress reafference at the level of the vestibular nuclei.
The finding that vestibular reafference is suppressed early in sensory processing has clear analogies with other sensory systems, most notably the electrosensory system of the mormyrid fish: cerebellum-like electrosensory lobes provide the signal that is used to cancel the sensory response to self-generated stimulation (Bell 1981; Mohr et al. 2003; Sawtell et al. 2007; Bell et al. 2008). Identifying the neural representations of the cancellation signal for vestibular reafference promises to be an interesting area of investigation and the cerebellum is a likely site (see Cullen and Roy 2004 and Angelaki and Cullen 2008). Next we describe why vestibular/proprioceptive integration is also required to compute the motion of the body.
Although vestibular sensory cues are sufficient to estimate head motion and orientation, the ability to perform daily tasks, such as estimating our heading direction during locomotion and executing appropriate postural responses requires knowledge of the orientation and motion of the body. In conjunction with proprioceptive signals, vestibular cues are known to contribute to such body motion estimates (Mergner et al. 1981, 1991; Blouin et al. 2007). To use vestibular sensory information to help estimate body motion, the brain is faced with two computational tasks (Fig. 9). The first (Fig. 9; “reference frame transformation”) arises because our vestibular sensors are fixed in the head. As a result, the way in which individual sensors are stimulated as the body moves depends critically on how the head is statically oriented with respect to the body. For example, during forward locomotion with the head also facing forward, the otoliths are stimulated along the axis between the nose and the back of the head (naso-occipital axis; Fig. 9, top inset, center panel). However, the same body motion with the head turned far to the left or to the right stimulates the otoliths mainly along the axis between the ears (i.e., interaural axis; Fig. 9, top inset, left and right panels). The problem is thus similar to that of using canals signals to estimate head tilt across different head orientations with respect to gravity. To correctly interpret the relationship between the pattern of sensory vestibular activation and actual motion, vestibular signals must undergo a reference frame transformation. In the case of estimating body motion the transformation is from a head-centered to a body-centered reference frame. Such a computation requires a nonlinear interaction between dynamic vestibular estimates of head motion and neck proprioceptive estimates of static head orientation with respect to the body.
Recently, neurons in the rFN have been identified the responses of which are consistent with such a transformation. Specifically, Shaikh et al. (2004) dissociated head and body motions by examining neural responses in the rFN and VN when a monkey was translated in different horizontal-plane directions with the head fixed at different static positions relative to the trunk. Cells which encode motion in a body-centered reference frame should respond preferentially to a given direction of body motion independently of head orientation. In contrast, if a cell encodes motion in a head-centered reference frame, its preferred movement direction with respect to the body should systematically shift as the head is reoriented to maintain alignment with a particular axis in head coordinates. Most neurons in the rostral VN demonstrated responses consistent with this shift expected for a head-centered reference frame. In contrast, most rFN neurons also showed a shift but it was through a smaller angle than that of the head. As a result, their responses typically reflected encoding of motion in a frame intermediate between either head- or body-centered.
Similar observations were made by Kleine et al. (2004) when body and head reference frames were dissociated during rotation by considering pitch and roll rotations for different static horizontal-plane head positions relative to the trunk. Responses were not consistent with encoding of motion in a head-centered reference frame but rather one that was closer to body-centered. These observations suggest a potential role for the rFN in transforming vestibular signals into the appropriate reference frame for estimating body motion. This is compatible with the fact that the rFN represents a major target for projections from the anterior vermis (Voogd and Glickstein 1998) which has been implicated in vestibular-proprioceptive interactions for limb and postural control (Manzoni et al. 1997, 1999; Bruschini et al. 2006).
Importantly, while showing that in the rFN vestibular signals have been at least partially transformed into body-centered coordinates is consistent with the hypothesis that they are being used to estimate body motion, it does not yet prove that this is what they indeed encode. To estimate body motion requires a second computational step: motions of the body must be distinguished from motions of the head with respect to the body (Fig. 9; “Body motion computation”). In particular, whereas vestibular sensors will be stimulated in a similar fashion regardless of whether the head moves alone or the head and body move in tandem, to estimate body motion the two must be distinguished. The latter computation requires the integration of vestibular signals with dynamic neck proprioceptive inputs.
Despite an early convergence of vestibular and proprioceptive signals in the vestibular nuclei (Boyle and Pompeiano 1981; Kasper et al. 1988; Wilson et al. 1990; Gdowski and McCrea 2000), an explicit neural correlate for body (i.e., trunk) motion has been difficult to identify. For example, during passive movements VN neurons in rhesus monkeys encode motion of the head rather than the body (Roy and Cullen 2001, 2004), although there is evidence for a more mixed representation in squirrel monkeys (Gdowski and McCrea 2000). However, an elegant recent study by Brooks and Cullen (2009) showed that a neural correlate for body motion indeed exists in the macaque rFN. In particular, they showed that approximately half of rFN neurons responded robustly either to vestibular stimulation alone when the head and body were moved in tandem (i.e., whole-body rotation; Fig. 10a) or to neck proprioceptive stimulation alone when the body was passively moved beneath the head (Fig. 10b). In contrast, when the head was passively moved relative to the stationary body, proprioceptive and vestibular signals combined to cancel one another out (Fig. 10c). Thus, these neurons specifically encoded body motion. Importantly, the authors also showed that neural sensitivities to neck proprioceptive stimulation during body-under-head rotation varied as a function of static head orientation with respect to the body (Fig. 10d). This modulation in sensitivity was closely matched by similar head-position-dependent changes in sensitivity to vestibular stimulation during whole-body rotation (Fig. 10e). As a result, it was shown that vestibular and proprioceptive signals are not simply summed linearly to estimate body motion. Rather, the brain takes into account the specific nonlinear processing of vestibular signals required to match them to proprioceptive signals across head positions and compute accurate estimates of body motion (Fig. 10f).
This latter observation, (i.e., a dependence of yaw vestibular responses on yaw static head-re-body position) is of particular note because in the Brooks and Cullen (2009) study both the head and the body were always moved about a common axis (i.e., body/head yaw axis). This experimental manipulation differs fundamentally from the reference frame studies of Shaikh et al. (2004) and Kleine et al. (2004), where the direction of motion was systematically varied and spatial tuning curves were constructed at different static head positions with respect to the body (i.e., thereby dissociating head and body reference frames; first computational step in Fig. 9, top inset). Instead, by considering body and head motions under conditions where the head/body axes of rotation were always coincident, the Brooks and Cullen study unmasked an additional nonlinear processing (i.e., the head-position-dependent processing within the second computational step in Fig. 9; see bottom inset).
At present, the reason for this second non-linear computation step in estimating body motion remains unknown. Indeed if both vestibular and proprioceptive inputs provided head-position-independent estimates of rotation, such nonlinear processing would not be required to compute body motion when the head and body are rotated about a common axis. There is no experimental evidence to suggest that the way semicircular canal afferents encode head motion depends on head orientation with respect to the body. Thus, it is logical to speculate that the nonlinear processing in the second computational step arises because of a nonlinear proprioceptive encoding of body motion. This might be a result of changes in the relative lengths of different neck muscles as the head is reoriented relative to the body. Consequently, to distinguish head from body motion, vestibular signals also need to be processed to encode motion in the same head-orientation-dependent way as neck proprioceptors.
While, at present, this interpretation remains speculative it can again be related to the concept of internal models. Specifically, to combine multisensory signals that encode similar information in different ways, the brain must effectively implement the computations necessary to “match the codes up”. In the case of body-motion-encoding rFN cells, this might be accomplished by processing vestibular signals using an internal model of the way neck proprioceptors encode body motion (Fig. 9; bottom inset). This internal model might also be thought of as implementing a further transformation of body-centered vestibular motion estimates into a neck muscle-centered reference frame.
More generally, when the axes of body and head motion are different, a head-to-body (or muscle)-centered reference frame transformation of vestibular signals is required to match vestibular and proprioceptive motion codes before combining the two to estimate body motion (Fig. 9). Future work will be required to establish whether the neurons that show evidence for a body-centered representation of vestibular signals are the same neurons that encode body motion and whether in effect the two sets of computations occur simultaneously within a common population of neurons (i.e., as opposed to the distinct stages suggested in Fig. 9). Again, the cerebellar cortex (either anterior vermis or NU) represents a likely site. Furthermore, because at least some rFN neurons encode inertial self-motion (i.e., they encode translation as opposed to tilt; Angelaki et al. 2004) and distinguish passive from active movements (Brooks and Cullen 2007), it will be important to address the extent to which these populations overlap with those encoding body motion. Ultimately such investigations promise to shed new insights into how multisensory signals are integrated and processed by the CNS to create consistent motion representations for different behavioral and perceptual purposes.
A fundamental goal of systems neuroscience is to elucidate the strategies by which sensory signals are transformed into central representations that give rise to behavior, and how behavior in turn influences the interpretation of sensory information. Over the years, the vestibular system has served as an excellent model framework for investigating the neural correlates for such transformations. Among the earliest theoretical concepts promoted by studies of the vestibular system was the need for processing of sensory signals by an internal model of the dynamics of the motor effector—the eye plant. Since that time, studies of the vestibular system have continued to provide new insights into increasingly more complex, and often nonlinear, computations involved in combining multisensory signals to create different motion representations that may serve a variety of motor and perceptual purposes. Here, we have summarized recent advances in elucidating the neural correlates for four computational problems: the sensori-motor transformations for reflex generation, the resolution of a sensory ambiguity for inertial motion estimation, the ability to distinguish active from passive movements, and the integration of vestibular and proprioceptive signals for body motion estimation. Each relates to the concept of the “internal model”, which has become popular in recent years as a means of describing particular classes of neural computations (e.g., representation of the dynamics of a sensor or effector) common to multiple sensorimotor systems. Understanding whether, how and where such models are implemented is thus of great importance for understanding sensorimotor processing and the vestibular system has provided an excellent experimental model.
Perhaps the most widely-accepted use of the “internal model” concept is in motor control: (e.g., complementary forward and inverse models of the sensors and motor actuators; Wolpert and Kawato 1998). Implicit in these theories are the notions that: (1) motor commands are computed by processing sensory or behavioral goal-directed information via an inverse model of the effector to be controlled; (2) a common inverse model should be shared by all sensorimotor systems that drive the same effector; (3) there should exist populations of neurons at the output of such a model that encode an efference copy of the motor command; and (4) the efference copy should be conveyed to a forward model of the effector to generate a prediction of the consequences of that motor command, a signal that is critical for online refinement and updating of the motor command.
These concepts, which have been particularly influential in the field of limb control, have nonetheless remained mostly conceptual, largely due to the difficulty in identifying neural correlates. As reviewed here, support for such an organization and indeed direct neurophysiological correlates for many of these general concepts have been provided by studying the sensorimotor processing in the vestibular system. Recent studies have further emphasized that, regardless of the sensory drive, particular groups of neurons encode consistent information about the current or predicted state of the effector. Importantly, dedicated populations of neurons have been shown to explicitly encode an efference copy of the oculomotor command (Green et al. 2007), a concept that in other sensorimotor systems has been proposed in computational models but remained largely unconfirmed at the neurophysiological level (but also see Sommer and Wurtz 2002, 2008). Furthermore, distinct populations of cells known to receive projections from the cerebellar flocculus carry signals more closely kinematically related to the actual eye movement than the motor command (Ghasia et al. 2008), thus providing preliminary evidence for a forward model in the cerebellum. The explicit existence of such a forward eye plant model in the cerebellum remains hypothetical at present and provides an excellent direction for future work.
Support for the implementation of internal models as a general theoretical concept has also been provided by several recent studies characterizing the properties of a particularly interesting class of brainstem–cerebellar vestibular neurons, the activity of which is not correlated with eye movements (VO cells). Among the important computations that they perform is to distinguish between sensory signals that result from our own actions (i.e., those that arise from self-generated behaviors, such as active voluntary movements) versus those arising from changes in the external world (e.g., passive perturbations applied by the environment). Evidence has been provided that the computations involve processing efference copies of neck motor commands by a forward model (this time of the “neck plant”), the output of which reflects the expected sensory consequences of those commands, and comparing them with actual sensory feedback from neck proprioceptors. When the prediction matches the sensory input, a signal appropriate to cancel vestibular reafference is generated (Cullen and Roy 2004; Roy and Cullen 2004). While the neural correlates for the proposed forward model and source of the “cancellation” signal remain to be explicitly identified, these observations provide strong neurophysiological support for well-established theoretical notions of sensorimotor system organization. These will undoubtedly help to guide future research in other areas, such as limb control where the more complicated multijoint nature of the plant itself and its varied interactions with the environment (e.g., support of different loads and use of different tools) introduce additional complexities in elucidating basic organizational principles and their neural correlates.
While internal models of the physical characteristics of a motor plant (or sensorimotor process) have been particularly influential in motor control theory, recent studies of the vestibular system have also emphasized the need for internal models to combine and transform multisensory signals into a meaningful information about our interaction with the environment that can ultimately be used for both motor and perceptual purposes. One such example reviewed here is the implementation of an internal model of the computations to resolve the “tilt/translation” ambiguity that arises in interpreting ambiguous sensory signals from otolith afferents (Angelaki et al. 2004; Green et al. 2005).
Similar considerations apply to the problem of distinguishing body and head motion. Whereas either vestibular or neck proprioceptive signals alone provide ambiguous information about whether the head, body or both are in motion, recent studies have shown that this problem can be resolved by combining vestibular and neck proprioceptive signals in a very specific fashion (Brooks and Cullen 2009; Kleine et al. 2004; Shaikh et al. 2004). A likely, although speculative, interpretation of recent findings is that to ensure that vestibular and proprioceptive signals combine correctly (i.e., the signals match up), vestibular signals must first be processed by an internal model of the nonlinear way that neck proprioceptors encode information about body motion with respect to the head.
In summary, studies of the vestibular system have played an influential and important role not only in identifying what transformations need to be performed to solve specific problems, but also in explicitly providing neurophysiological evidence for the necessary computations. In so doing, these studies have provided support for general concepts of sensorimotor organization (e.g., implementation of forward/inverse models, concept of reafference, and reference frame transformations) that are relevant for all sensorimotor systems. Importantly, the solid neurophysiological foundation for such concepts provides unique opportunities to further investigate critical details regarding strategies for their implementation and use. For example, what are the specific roles of particular brain areas (e.g., cerebellum) in implementing aspects of the required computations (e.g., forward model representations, nonlinear context-dependent processing)? How are internal model representations learned and modified both over the long-term and from moment-to-moment depending on behavioral context? Theories of motor skill learning in the limb control system suggest that the learning process involves changes within neural populations that compute inverse and/or forward models of the motor effector and the environment (e.g., a tool) with which it interacts (Shadmehr 2004). Yet, because the neural correlates for such internal models remain poorly established, it has been difficult to provide explicit neural evidence for such theories and to confirm which models are modified under a particular set of conditions (e.g., forward and/or inverse models; model of the effector vs. representation of its interaction with a particular tool; Wolpert and Kawato 1998; Haruno et al. 2001; Cothros et al. 2006; Kluzik et al. 2008; Wagner and Smith 2008; but see Li et al. 2001; Padoa-Schioppa et al. 2002). In contrast, because significant progress has been made in identifying both the neural correlates for internal models as well as those for motor learning in the vestibular system, this task now becomes tangible. Lessons learned by studying the neural processing of vestibular signals for the control of eye and head movements are thus likely to provide new insights into salient strategies for motor skill learning in the more complicated limb control system.
Similarly, the multisensory integration strategies and nonlinear context-dependent computations (e.g., that depend on head orientation with respect to gravity or the body) required to resolve problems, such as the tilt/translation ambiguity or the computation of body motion have broad relevance to a wide variety of problems ranging from the sensorimotor processing to implement reference frame transformations (Salinas and Abbott 1995; Andersen 1997; Shaikh et al. 2004; Smith and Crawford 2005; Buneo and Andersen 2006; Batista et al. 2007; Green and Angelaki 2007; Yakusheva et al. 2007; Blohm et al. 2009) to the integration of multisensory signals to create meaningful representations of our environment (Driver and Noesselt 2008; Stein and Stanford 2008; Angelaki et al. 2009). The vestibular system represents a particularly good model system to study the neural correlates for some of these more complex computations because of the solid framework, built on the foundations of control system theory, for understanding much of the basic dynamic processing of sensory signals. Studies in the vestibular system will thus undoubtedly continue to provide important new insights into neural processing and computation in the brain.
Supported by NIH grants DC04260 and EY12814 and a chercheur boursier salary award from the Fonds de la recherche en santé du Québec (FRSQ).
Andrea M. Green, Dépt. de Physiologie, Université de Montréal, 2960 Chemin de la Tour, Rm. 4141, Montreal, QC H3T 1J4, Canada.
Dora E. Angelaki, Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, MO, USA.