|Home | About | Journals | Submit | Contact Us | Français|
In this paper, I address the question as to why participants tend to respond realistically to situations and events portrayed within an immersive virtual reality system. The idea is put forward, based on the experience of a large number of experimental studies, that there are two orthogonal components that contribute to this realistic response. The first is ‘being there’, often called ‘presence’, the qualia of having a sensation of being in a real place. We call this place illusion (PI). Second, plausibility illusion (Psi) refers to the illusion that the scenario being depicted is actually occurring. In the case of both PI and Psi the participant knows for sure that they are not ‘there’ and that the events are not occurring. PI is constrained by the sensorimotor contingencies afforded by the virtual reality system. Psi is determined by the extent to which the system can produce events that directly relate to the participant, the overall credibility of the scenario being depicted in comparison with expectations. We argue that when both PI and Psi occur, participants will respond realistically to the virtual reality.
The technology for immersive virtual reality (IVR) has existed for 40 years, initially as a demonstrated laboratory-based idea (the ultimate display) (Sutherland 1965) and for the past 20 years as practical, affordable and useful systems. The vast majority of research and development in this area has been to use it as a way to simulate physical reality. Yet it is a medium that has the potential to go far beyond anything that has been experienced before in terms of transcending the bounds of physical reality, through transforming your sense of place and through non-invasive alterations of the sense of our own body. In other words, virtual reality has rarely been seen as a medium in its own right, as something that can create new forms of experience, but rather as a means of simulating existing experience (Brooks Jr 1999). As has been mentioned before (Pausch et al. 1996), it is much like both cinema and television in their early days, which were used essentially as a medium for theatre. Our approach is to treat virtual reality as providing a fundamentally different type of experience, with its own unique conventions and possibilities, a medium in which people respond with their whole bodies, treating what they perceive as real.
This paper presents concepts that may help towards understanding how IVR has the power to transform place and even self-representation. These concepts are immersion, place illusion (PI), plausibility illusion (Psi) and the fusion of these last two in the notion of a virtual body. Throughout this paper I use the notation PI to represent place illusion and Psi to represent plausibility illusion. The reason for this is that we do not want the everyday meaning, for example, of ‘plausibility’, to intrude but rather only the specific meanings that we give to these concepts.
An ideal IVR system will typically consist of a set of displays (visual, auditory, haptic) and a tracking system. The computer maintains a dynamic database, which is a digital description of an environment, and the displays are continually rendered from this. The visual images displayed will be determined as a function of at least the position and orientation of the human participant's head, enabled through head tracking, and ideally should also include tactile, force-feedback, heat and smell displays so that all of the senses may be catered for. A typical IVR system today delivers stereo vision that is updated as a function of head tracking, possibly directional audio and sometimes some type of limited haptic interface. For example, the Cave (Cruz-Neira et al. 1993), is a system where between four and six walls of an approximately 3 m3 room are back-projected stereo projection screens. The images are determined as a function of head tracking so that, at least with respect to the visual system, participants can physically move through a limited space and orient their head arbitrarily to be able to perceive (but not necessarily from all directions—depending on how many screens there are). Audio is typically delivered by a set of speakers in unobtrusive positions around the Cave.
In a head-mounted display (HMD), the displays are mounted close to the eyes and head tracking ensures that the left and right images are updated according to the head movements of the participant with respect to the underlying virtual environment. The separated left and right images for each eye ensure stereo vision. Audio would be delivered via earphones. The participant has the illusion of moving through a surrounding, three-dimensional environment that contains static and dynamic objects, including possibly representations of other people (sometimes real people in remote physical locations or virtual people controlled wholly by a computer program). Participants can effect changes in the environment—for example, if at least one hand is tracked, then the participant can grab objects and move them to different locations, or carry out a variety of other types of interaction. Further details of such systems can be found in Sanchez-Vives & Slater (2005).
Parameters that determine the quality of the experience include the graphics frame rate (how long it takes to graphically render the currently visible portion of the virtual environment), the overall extent of tracking (apart from head tracking, how much of the rest of body movement is tracked), tracking latency (how long it takes before a head movement results in the correct change in the displayed image), the quality of the images (how great the brightness, spatial, colour and contrast resolutions are), the field of view (how great the visual field of view is compared to what is possible in normal vision, and how much the displays surround the participant), the visual quality of the rendered scene (how much objects appear geometrically to appear like what they are supposed to depict, and how realistic the illumination is), the dynamics (how well does the behaviour of objects conform to expectations) and the range of sensory modalities accommodated (and within each sensory modality the fidelity of its displays). In previous work (Slater & Wilbur 1997) we defined the concept of immersion as a description of the characteristics of a system: for example, by definition, one system would be more ‘immersive’ than another if it were superior on at least one characteristic above—for example, higher display resolution or more extensive tracking, other things being equal. Next I discuss a more conceptually useful way to classify the degree of immersion.
Immersive systems can be characterized by the sensorimotor contingencies (SCs) that they support. SCs refer to the actions that we know to carry out in order to perceive, for example, moving your head and eyes to change gaze direction, or bending down and shifting head and gaze direction in order to see underneath something (O'Regan & Noë 2001a,b; Noë 2004). The SCs supported by a system define a set of valid actions that are meaningful in terms of perception within the virtual environment depicted. For example, turn your head or bend forward and the rendered visual images ideally change the same as they would if you were in an equivalent physical environment. If head tracking was not enabled, then turning your head would have no effect, and therefore such an action could not be useful for perception. I define the set of valid sensorimotor actions with respect to a given IVR system to be those actions that consistently result in changes to images (in all sensory modalities) so that perception may be changed meaningfully. I define the set of valid effectual actions as those actions that the participant can take in order to effect changes in the environment. I call the union of these two sets the set of valid actions—the actions that a participant can take that can result in changes in perception or changes to the environment.
For example, consider an environment displayed visually through a head-tracked HMD. A participant in such an environment can usually quickly learn the effect of head movements on visual perception—the SCs. Such head movements will be valid sensorimotor actions. However, suppose the participant reaches out to touch a virtual object, but feels nothing because there is no haptics in this system. Here, the reaching out to touch something is not a valid sensorimotor action for this IVR. Now imagine an environment displayed visually on a large back-projected screen—again with head tracking. However, now when the participant looks far enough to one side visual elements from the surrounding real world would intrude into the field of view. Actions that result in perception from outside the virtual environment are also not valid sensorimotor actions. Suppose in either system the participant wears a tracked data glove, which is represented as a hand within the virtual environment. When this virtual hand intersects an object and the participant makes a grasping gesture, then the object might be selected and moved to another place. This would be an example of a valid effectual action. Without the data glove, the moving and grasping action would have no effect, and therefore would not be a valid effectual action. With today's generally available technology, participants will not experience a virtual reality system with generalized haptics, so that this dimension of SC will always fail if tested—for example, if a participant in a virtual reality touches some arbitrary virtual object they would feel nothing. The whole aspect of physicality is typically missing from virtual environment experiences—collisions do not typically result in haptic or even auditory sensations.
There is a fundamental difference between an immersive and non-immersive system: in an ideal immersive system it is possible in principle to fully simulate what it is like to go into a non-immersive system. For example, using a head-tracked HMD with appropriate haptics and sound, it is theoretically possible to construct a virtual environment in which a participant can virtually carry out all of the actions of sitting down at a desktop display, and experience that situation as a scenario within the virtual reality. However, this property of immersive systems is not symmetric—it is not possible inside a non-immersive display to simulate all of the actions of what it is like to go into an immersive system. The physical capabilities of these systems do not allow this. With an HMD without tracking it would not be possible to simulate all of the actions corresponding to going into another system, but it would be possible in the Cave or HMD with tracking to simulate what it is like to go into an HMD without tracking. Hence there is a natural set of equivalence classes, of systems that can be used to approximately simulate one another (e.g. in principle it is possible to approximately simulate the Cave in an HMD, and simulate an HMD in the Cave, so these are in the same equivalence class). They are all varying degrees of direct simulations of physical reality.
In a system such as the Cave it is possible to use its associated valid actions to produce the SCs necessary to ‘perceive’ a virtual environment displayed (for example) on a desktop system. It is not possible to use the valid actions that one must learn for perception of a virtual reality displayed through a desktop system to simulate the valid sensorimotor actions for perceiving a virtual reality displayed with the Cave.
In this view therefore, we describe immersion not by displays plus tracking, but as a property of the valid actions that are possible within the system. Generally, system A is at a higher level of immersion than system B if the valid actions of B form a proper subset of those of A. We refer to an IVR that has a set of valid actions that are approximations of reality as a first-order system, with corresponding SCs in at least one sensory modality. A second-order system is one that has valid actions as a proper subset of a first-order system, and so on for lower orders. Of course, unfortunately, true first-order systems do not exist today.
In this framework, displays and interactive capabilities are inseparable. Consider for example the issue of display resolution. At first sight this may appear to have nothing to do with interaction or SCs, but in fact if the participant wants to examine an object very closely, then the extent to which this is possible will be limited by the resolution of the display. Relatively low visual display resolution will mean that the normal action of bringing an object closer in vision by moving the body, head and eyes closer to it will fail earlier than it would in physical reality, and at different times in different systems.
It should be noted that the level of immersion is completely determined by the physical properties of the system—what happens when a participant carries out a particular act in terms of changes to the displays is a question of physics—of how the overall physical virtual reality and computer systems function. In the next section, we discuss the consequences for participants of different levels of immersion embodied in different physical systems.
A system that supports SCs that approximate those of physical reality can give rise to the illusion that you are located inside the rendered virtual environment. In the literature this illusion of location has been referred to as telepresence or presence—the ‘sense of being there’ in the environment depicted by the virtual reality system (Held & Durlach 1992; Sheridan 1992; Barfield & Weghorst 1993; Sheridan 1996; Slater & Wilbur 1997; Draper et al. 1998; Bystrom et al. 1999; Sanchez-Vives & Slater 2005). The origin of the concept of presence as a ‘feeling of being there’ is rooted in teleoperator systems, and is the feeling of being at the place of a remote physical robot that the user is operating (telepresence) (Minsky 1980). In the early 1990s this idea was transplanted to virtual reality, where instead of being at the remote physical environment, the participant was in a virtual environment with a sense of being at the place depicted by the virtual displays (Held & Durlach 1992; Sheridan 1992).
We reserve the term ‘place illusion’ (PI) for the type of presence that refers to the sense of ‘being there’. This terminology is used in order to avoid confusion, to make it clear that we refer specifically and only to the strong illusion of being in a place and not to other multiple meanings that have since been attributed to the word ‘presence’. It is the strong illusion of being in a place in spite of the sure knowledge that you are not there. Since it is a qualia there is no way to directly measure it. However, indirect assessments based on questionnaires, physiological and behavioural responses have been used, all of which in some way compare responses with those that would have been expected in real experiences.
Note that a system with valid sensorimotor actions that have the same range as in physical reality is not sufficient to support SCs that approximate those of physical reality. Imagine a participant viewing a virtual environment through a head-tracked HMD, but with a very narrow field of view (say 10° horizontally instead of the more than 180° of natural vision). Now since the HMD has head tracking, the participant can look around the scene and visually perceive by the full range of head movements possible with natural vision. But since the visual capture is significantly less than that in normal vision the participant would have to learn how to perceive in a different way, with greater head movements and patterns of head movements to obtain the same information compared to normal. This is not to say that eventually with sufficient movement and after a sufficiently long enough learning period these SCs would not become ‘normal’.
What is left of the distinction between immersion and PI? PI occurs as a function of the range of normal SCs that are possible. But we have also defined immersion in terms of the range of SCs that are possible, and an immersion hierarchy formed by the extent to which one system can be used to simulate another. Apparently PI like immersion has become essentially a property of the physics of the situation. By definition if a person perceives the virtual world making use of motor actions to perceive in the same way as perceiving the real world, but on the other hand knows that this is a virtual reality, then this must give rise to PI (how could it not?). But this would also be the most immersive system. So, are PI and immersion now the same from this point of view?
Suppose there are two participants who each in turn enter into an immersive system (for example, the Cave system). Person A experiences a low level of PI and B a high level. Clearly, if immersion and PI were identical then this could not occur. But in fact it could occur (irrespective of individual differences between A and B). Suppose person B stands in more or less the same position and simply looks around, whereas person A moves around, looks closely at objects, touches them and so on. Person A will quickly reach the bounds of resolution of the system, and see pixels. Moreover, A will expect, touch and feel nothing, except when bumping into the physical Cave walls—a break in PI, since this is a perception from outside the virtual environment (Garau et al. 2008). Person A is probing the bounds of perception to a much greater extent than B, and therefore PI will have the opportunity to break more often.
Immersion provides the boundaries within which PI can occur. Only in physical reality is it not normally possible to break those boundaries (except in illness or brain damage that disrupts the motor and perceptual process). In any kind of virtual system there will always be limits beyond which SCs fail to be applicable. PI occurs to the extent to which participants probe the boundaries of the system—the more they probe, the greater the chance for PI-breaks. This is further discussed in §7.
Can PI occur in computer games as used on desktop systems? To what extent can you have a feeling of ‘being there’ with respect to a desktop virtual reality system? If we consider PI as based on the extent to which normal SCs apply to perception, then the answer is ‘you cannot’.
Recall that in principle it is possible to simulate the playing of a computer game itself inside an immersive system. In an immersive system you can pull up a (virtual) chair, sit down, switch on the computer, etc. The limitation is today's off-the-shelf technology, but with sufficient resources such an application could be built.
Consider one of the claims of the SC approach to perception: ‘The basic claim of the enactive approach is that the perceiver's ability to perceive is constituted (in part) by sensorimotor knowledge (i.e. by practical grasp of the way sensory stimulation varies as the perceiver moves).’ (Noë 2004, p. 12). This also happens in computer games, but the type of sensorimotor knowledge is different. To look to the left, users make use of a joystick or keyboard presses rather than turning their heads—these would be valid sensorimotor actions. A whole new array of knowledge is established, which establishes a particular set of valid actions.
Now return to the question as to whether it is possible, or indeed whether it makes sense, to talk of PI within a desktop system such as a computer game or online systems such as Second Life. Consider the following thought experiment (which as stated above is realizable today with sufficient resources). A participant enters an immersive system such as a tracked Cave or HMD system, and within that system approaches a (virtual) computer and starts to play a computer game or Second Life. Now how can we speak about PI in such a set-up? They can sit by the (virtual) computer, pay attention to its display and carry out valid actions with respect to the desktop system. But the ‘host’ environment, the one in which this is taking place, is also a virtual environment here. So where is PI in all this?
Just as immersion is bound to a particular set of valid actions that support perception and effectual action within a particular virtual reality, so it is reasonable to consider that the same is the case with respect to PI. PI is bound to the particular set of SCs available to allow perception within that environment. In other words we can only ever talk about conditional PI with respect to a particular type of system—more specifically within a particular equivalence class of such systems (Slater et al. 1994). Moreover, the types of PI that are possible are qualitatively different for each equivalent class, and the responsive actions that it can support will also be qualitatively different. When talking of a system that has SCs roughly equivalent to physical reality, a participant can experience the qualia ‘just like being there’, meaning that the participant carries out the same physical actions in order to achieve approximately the same changes in perception as in physical reality. It is an illusion of being there made possible wholly by the physical set-up of the system in conjunction with the extent to which the participant probes the system. Now compare this with a desktop computer game. People can report a feeling of ‘being there’ to the extent that they engage in additional mental recreation that transforms their actions for perception into the feeling of being in a space in which they are clearly not located according to the rules of real-world SCs.
One consequence that follows from this is that when a participant in a virtual reality is asked to report, say in a questionnaire, on their feeling of ‘being there’, the answers that they give are, strictly speaking, not comparable across systems with different levels of immersion, since they are not talking about the same qualia. In the case of a highly immersive system, the qualia of ‘being there’ is a direct illusion of the same type as many visual illusions—it just happens without trying or doing anything special, as soon as the participant enters the environment and especially as soon as they move. In the case of a desktop system the situation is quite different; the feeling reported as ‘being there’ if it comes at all is after much greater exposure, requires deliberate attention and is not automatic—it is not simply a function of how the perceptual system normally works, but is something that essentially needs to be learned, and may be regarded as more complex.
Evidence suggests that the use of the same questionnaire to measure presence across different systems is highly problematic. In Usoh et al. (2000) it was found that standard questionnaires could not discriminate ‘presence’ between physical reality and a low-resolution HMD-delivered virtual reality of the same scene. Hence, PI is the human response to a given level of immersion, and is bound by the set of SCs possible at that level of immersion. The illusion of ‘being there’ does not refer to the same qualia across different levels of immersion. The range of actions and responses that are possible are clearly bound to the SC set that defines a given level of immersion.
It may, however, make sense to compare experience between systems that are in the same immersion equivalent class. Imagine comparing different Cave systems or Caves with HMDs, large power walls and so on. Here we would be interested in how different features of these systems trade off against each other—higher resolution compared to larger space, for example, and so on.
While PI is about how the world is perceived, the Psi is about what is perceived. Psi is the illusion that what is apparently happening is really happening (even though you know for sure that it is not). Based on evidence over many experiments, it appears that a key component of Psi is that events in the virtual environment over which you have no direct control refer directly to you.
Consider in a virtual reality that there is the appearance of a woman standing in front of you. Perceptually there is something there, in the same space as you; for example, as you shift your head from side to side, her image in your visual field moves as it would in reality, and you see things behind her that she had been obscuring. This is PI. Now she smiles at you and asks you a question, and you automatically find yourself smiling back and responding to her question, even though you know that no one is there. This scenario has been used in a study of how shy males respond to a forward (virtual) woman—a preliminary report is available in Pan & Slater (2007). She, the virtual woman, has looked you in the eye, and spoken to you. An event (that you did not cause) has related to you. As another example, suppose you move towards this virtual woman, and she steps backward. Again, an event over which you have no direct control (her step backwards) has related directly to you—this time to your action. Since you are as real as can be, and this external sensed world appears to be addressing you, the reality of that external world is itself enhanced. See also Heeter (1992), who talks about this in the context of social conventions such as handshakes.
Just as PI is maintained through synchronous correlations between the act of moving and concomitant changes in the images that form perception, so I posit that an important component leading to Psi is for the virtual reality to provide correlations between external events not directly caused by the participant and his/her own sensations (both exteroceptive and interoceptive). If we consider the avatar that looks at the participant in the eye (the external event), this would be likely to cause a response in the participant that could be expressed by physiological changes such as with respect to heart rate, skin temperature (blushing) and so on—changes associated with the internal feeling provoked by an entity that ‘looks at you’. Seeing a chair, say, angled in your direction, does not have the same associated feeling—it is not something directed at you.
It is important to realize that Psi does not require physical realism—witness people's responses in the virtual reprise of the Stanley Milgram obedience experiment, where it was shown that people exhibit anxiety responses when causing pain to a relatively low fidelity virtual character in terms of both visual appearance and behaviour (Slater et al. 2006a). The correlational principle mentioned above produces a cause–effect relationship. In this case the experimental participant pressed a button on a supposed electric shock machine and the virtual character playing the ‘learner’ of Milgram's paradigm responded with expressions of hurt and anger. The experimental participants responded with increasing anxiety with each wrong answer given by the learner, administration of the shock and painful response of the learner. In the control condition, when this virtual learner response could not be seen or heard, the anxiety response did not occur. The virtual learner, although humanoid in appearance, could never be mistaken for a real person—with respect to neither appearance nor movements.
Similarly, people respond according to type when speaking to an audience of virtual characters—people with a fear of public speaking react with anxiety whereas confident speakers do not (Slater et al. 2006b). Paranoia, the complex of having persecutory thoughts, provides another example. Here, participants in an immersive virtual environment with virtual characters tend to report the same kinds of persecutory feelings that they would have in similar situations in reality (Freeman et al. 2003; Freeman et al. 2005a,b; Valmaggia et al. 2007). In these examples, virtual characters exhibiting neutral expressions occasionally look at the participant, in the context of a public place such as a train journey. It has been found in several studies that people with a tendency towards paranoia respond to these virtual characters with persecutory thoughts, and this has been demonstrated in a large sample (200) of the general population (Freeman et al. 2008).
The correlational principle as one of the major factors leading towards Psi appears to be critical. Here is another example. There have been various studies of the visual cliff scenario (Gibson & Walk 1960) implemented within a virtual environment that is usually called the ‘pit room’ (Slater et al. 1995). The participant is in an unusual room where the floor is just a narrow ledge around an open hole to another room 6 m below. In one group of experiments the task of the participants was to get to the other side of the room. The vast majority of them edge their way carefully around the ledge at the side rather than simply gliding across the non-existent (virtual) pit. The utility of this environment is that the expected responses are clear: people should show signs of anxiety.
Usoh et al. (1999) found that subjective reports of PI were enhanced by participants using their body to actually walk or simulate walking by walking in place, compared to simply using a pointing device and pressing a button to move forward in the virtual environment. Here appropriate SCs engendered by using the body (i.e. walk) to perceive led to the expected increase in PI. Meehan et al. (2002) found that when a physical plank was registered with the virtual plank, so that participants would feel the plank itself as they placed their foot on the edge, their anxiety response (measured here by a change in heart rate) was significantly higher than when they only saw the plank but felt nothing. Again, SCs (touching) correlated with vision could be said to have enhanced PI. However, in another between-groups experiment (Zimmons & Panter 2003) where different groups of participants were exposed to the pit room rendered with varying degrees of visual realism (from wire frame through to the realistic lighting distribution known as radiosity), the different rendering qualities made no difference at all to the responses.
To add further to this we investigated the impact of the correlational aspect of Psi (Slater et al. 2009). The environment was rendered with real-time ray tracing in an HMD, so that when the participants moved they could see shadows and reflections of their virtual body move in correlation. Here we compared two pit environments that were the same, except that one had moving shadows and reflections of the participant's (virtual) body that moved in real time as their body moved (ray tracing), and the other did not (ray casting). In a between-groups experiment we found that arousal and anxiety (as measured by skin conductance, heart rate and heart rate variability) were higher for the group that experienced the shadows and reflections than for the other group.
Psi is therefore an illusion akin to PI—one that occurs as an immediate feeling, produced by some fundamental evaluation by the brain of one's current circumstances—‘is this real?’ Of course, at a higher cognitive level participants know that nothing is ‘really’ happening, and they can consciously decide to modify their automatic behaviour accordingly. In the virtual reprise of the Milgram obedience experiment, more than half of the subjects said afterwards that it had occurred to them to stop and withdraw from the experiment, but they did not do so because they kept reminding themselves that it was not real.
In physical reality and first-order virtual reality, there is something very simple that you can do to physically establish your presence. Look down, and you will see your body, or see parts of it continuously in peripheral vision. For example, wearing an HMD a virtual body can be portrayed collocated with your real physical body, so that when you do the same movements that you would do in physical reality to look at your body, instead you would see your virtual body. Using normal actions you know how to change your sensory stimulation as a function of your actions, and one action that is not special compared to other actions is to look down and see yourself.
The body is a focal point where PI and Psi are fused. As we have argued, the action involved in looking at your own body provides very powerful evidence for PI (your body is in the place you perceive yourself to be). However, this virtual body is not yours, it is a representation of you. In principle, you have no control over what it does. Now suppose you move your limbs and you see the limbs of this virtual body move in synchrony. This is a very powerful event in the external world that clearly relates to you—a correlation between proprioception and visual exteroception. Further, it is likely that there would be some degree of ownership over this virtual body—it comes to be ‘really’ to seem to be your body (even though you know it cannot be).
Recent research activity in cognitive neuroscience concerned with body ownership is based on the paradigm called the ‘rubber hand illusion’ (Botvinick & Cohen 1998; Pavani et al. 2000; Armel & Ramachandran 2003; Tsakiris & Haggard 2005). In this case, synchronous tactile stimulation on the person's hidden real hand and a visible rubber hand that is located in a plausible position in front of them results in the illusion that the rubber hand is their hand. This illusion is demonstrated behaviourally through proprioceptive drift, that is participants blindly point to their felt hand position as the rubber hand position rather than to their true hand. If the tactile stimulation is not synchronous, then the illusion does not occur. This has been repeated in a virtual environment, and ownership of a virtual arm projecting in stereo out of the person's body was evoked when synchronous visual stimulation to the virtual hand and tactile stimulation to the real hand were provided (Slater et al. 2008).
This same type of technique has been extended to give illusions of whole body displacement (Ehrsson 2007; Lenggenhager et al. 2007; Petkova & Ehrsson 2008)—where the persons (to some extent) feel themselves located outside their bodies. These are illustrations of the powerful effects that the types of correlation that we consider as contributing to Psi can have. Virtual reality can transform not only your sense of place, and of reality, but also the apparent properties of your own body.
The main hypothesis of this paper is that a participant in an IVR will respond to a virtual reality as if it were real as a function of PI and Psi. If you are there (PI) and what appears to be happening is really happening (Psi), then this is happening to you! Hence you are likely to respond as if it were real. We call this ‘response-as-if-real’ RAIR. This framework (immersion, PI, Psi, the virtual body) leaves open many important questions. First, what is the relationship of ‘reality’ to a first-order system? We said that a first-order system has SCs that approximate those of reality, but also that if one system can be used to simulate a second one, then the second is at a lower level of immersion than the first. So, is reality at a lower order of immersion than a first-order IVR? Clearly not, since there must always be actions that are possible within reality that are not valid actions of the IVR. If this were not the case, then the IVR would be indistinguishable from reality and the issue disappears.
Second, is PI with respect to any particular (equivalence) class of systems measurable? We suggest that PI should be treated as binary—it is a qualia associated with an illusion. Either you get the illusion or you do not—you cannot partially get an illusion. But its measurement can still be continuous—providing a kind of fuzzy estimate of PI. Let us consider an example. Suppose a particular IVR supports SCs associated with horizontal head rotation, but no other head rotation. So provided that you look around while maintaining your eyes approximately in the same plane, the visual images update appropriately. In other words, the SCs work only for yaw rotation but not for pitch and roll. Other things being equal, if a participant looks around a scene only using yaw head rotation, there would be uninterrupted PI. However, as soon as pitch or roll rotations are made, PI would be broken. If we take as a measure the proportion of time that the participant makes only yaw head rotations, then this would be continuous, even though the underlying phenomenon is binary. This idea of ‘breaks in presence’ measure was introduced by Slater & Steed (2000) using a simple Markov chain stochastic model of transitions between ‘present’ and ‘non-present’ states.
Second, I put forward the idea here that PI can be different in different modalities. Hence a person could be in a virtual environment moving through a virtual cityscape with unbroken PI in the visual sense, while simultaneously having a conversation with someone who is outside the virtual environment. If we notice the behaviours of the participant, they would probably exhibit strong signs of PI—for example, moving through the environment making sure to avoid obstacles. Nevertheless, clearly in the auditory domain there is no PI. Similarly, one could have auditory PI (with eyes closed) or haptic PI. It is my contention that provided there are no inconsistencies between these modalities, there can be PI in one domain without there being PI in another. This point is important because I regard PI not as a cognitive but as a perceptual phenomenon. The illusion, given the right physical set-up and the appropriate SCs in a particular modality, is automatic—it can coexist with different (but not contradictory) sensations in another modality.
Third, is Psi measureable? Since this is a new concept for virtual environments, this has not been previously attempted. However, it is likely that ‘breaks in Psi’ could be usefully employed—when the participant carries out actions that step outside the reality of what is occurring in the virtual reality. For example, a virtual character talks to the participant who ignores this, or a character breaks social norms such as those of proxemics, and moves too close to the participant, who does not respond in any measurable way. This would be considered a ‘break in Psi’. Moreover, experience from many past studies suggests that breaks in PI and breaks in Psi have different characteristics. When PI breaks it can quickly recover—in our example of yaw-only head rotations, if the participant steps outside this limitation then PI will break, but recover again as soon as yaw-only head rotations are resumed. This only emphasizes that PI is very much a perceptual illusion. On the other hand when Psi breaks, it is unlikely to recover. Once you have ceased to accept the ‘reality’ of that virtual character, for example, Psi does not usually reform again. For examples, see the report by Garau et al. (2008).
Finally, there is likely to be another layer of Psi that has not been considered in this paper. We have considered Psi as an automatic and rapid response of the human participant to the important issue—is this really happening? Now, of course, the participant knows that it is not happening, and this cognitive knowledge can certainly dampen down or entirely change responses away from the instinctive first reactions (to smile back at the character smiling at you, but then thinking—why am I doing this, there is no one there?—and consequently stop smiling and ignore the character). We here put forward the idea that the maintenance of these types of reactions may depend on the credibility of the events that are taking place—how much do they conform to expectations of what would normally happen in the circumstances portrayed? This layer of conformity with expectation, with prior knowledge and beliefs, must also play a very important role in maintaining the illusion that the events occurring are true. This is particularly important when the IVR is used to construct an environment that is intended to study the behaviour of people in specific circumstances. For example, we have recently started a project on the use of virtual reality for understanding people's responses to violence. Here RAIR is of fundamental importance—otherwise the enterprise would not be useful. We not only have to take into account Psi as achieved by events that are in relation to the participants (e.g. a person involved in a violent confrontation looks directly at the participant and speaks to him or her), but also of critical importance is the credibility of the scene itself. If unfolding violence, for example, in the context of a fight between rival football supporters, does not occur in a way that appears realistic to the participant, then they are not likely to exhibit RAIR. Therefore, in designing any virtual environment it is particularly important to model the events and interactions so that they depict a likely scenario.
In this sense Psi is far more difficult to achieve than PI. The latter relies on a set-up that supports essential SCs, a matter that depends on the physical system of tracking and displays, and the implementation of the algorithms needed to use these. Moreover, it is known that people can within limits be unaware of the exact consequences of their motor actions, so that there is some leeway in the accurate representation of their movements (Fourneret & Jeannerod 1998). However, Psi requires a credible scenario and plausible interactions between the participant and objects and virtual characters in the environment—with very little room for error. A conclusion is that the area of Psi is now a more fruitful and challenging research area than PI.
We have previously defined presence in virtual reality as the extent to which people respond realistically within a virtual environment, where response is taken at every level from low-level physiological to high-level emotional and behavioural responses (Sanchez-Vives & Slater 2005). Although we stand by this approach, we find the terminology problematic. The word ‘presence’ has come to have multiple meanings, and it is difficult to have any useful scientific discussion about it given this confusion. We suggest, for the sake of clarity, an alternative terminology. We have called PI the ‘being there’ qualia that was referred to as ‘presence’ in the original literature: it is the feeling of being in the place depicted by the virtual environment (even though you know that you are not there). We call the Psi the illusion that what is happening is real (even though you know that it is not real).
Different levels of immersion are defined by the SCs for perception that they support. A first-order immersive system has SCs that are similar to those of everyday reality. A second-order system can be simulated within a first-order system and so on. In a first-order system PI is made possible as a consequence of the physics of the set-up. PI may temporarily break, and therefore the integrated overall level of PI may be higher or lower as a function of the activity of the participant (and of course individual differences between people). In a lower order system, PI may still be reported, but this is a consequence of additional creative mental processing. It does not refer to the same qualia as for the first-order systems. Therefore, there is no numerical scale that can compare ‘presence’ across different immersive systems, since they are referring to qualitatively different phenomena.
Psi is the second and orthogonal concept to consider. This is concerned with the ‘reality’ of the situation depicted. It is maintained through correlations between actions and reactions, and correlations between events that can be perceived within the VE that are directed at the participant. It also includes the notion of the credibility of events in comparison with what would be expected in reality in similar circumstances. Psi relies on the process of inference from evidence (perceptions) to issues concerning the ‘reality’ of the situation. Psi and PI may result in a participant within a virtual reality realistically responding to events and situations. The conceptual framework presented here is a starting point for much future research.
The ideas described in this paper arose out of work that was carried out in the EU FET Integrated Projects PRESENCCIA (IST-2006-27731) and IMMERSENCE (IST-2006-027141), with partial help from the Spanish Ministry of Science and Innovation. Thanks to Dr Maria V. Sanchez-Vives for comments on an earlier version of this paper.
One contribution of 17 to a Discussion Meeting Issue ‘Computation of emotions in man and machines’.