Human multisensory systems are known to bind inputs from the different sensory modalities into a unified percept, a process that leads to measurable behavioral benefits. This integrative process can be observed through multisensory illusions, including the McGurk effect and the sound-induced flash illusion, both of which demonstrate the ability of one sensory modality to modulate perception in a second modality. Such multisensory integration is highly dependent upon the temporal relationship of the different sensory inputs, with perceptual binding occurring within a limited range of asynchronies known as the temporal binding window (TBW). Previous studies have shown that this window is highly variable across individuals, but it is unclear how these variations in the TBW relate to an individual’s ability to integrate multisensory cues. Here we provide evidence linking individual differences in multisensory temporal processes to differences in the individual’s audiovisual integration of illusory stimuli. Our data provide strong evidence that the temporal processing of multiple sensory signals and the merging of multiple signals into a single, unified perception, are highly related. Specifically, the width of right side of an individuals’ TBW, where the auditory stimulus follows the visual, is significantly correlated with the strength of illusory percepts, as indexed via both an increase in the strength of binding synchronous sensory signals and in an improvement in correctly dissociating asynchronous signals. These findings are discussed in terms of their possible neurobiological basis, relevance to the development of sensory integration, and possible importance for clinical conditions in which there is growing evidence that multisensory integration is compromised.
multisensory integration; cross-modal; McGurk; sound-induced flash illusion; perception; temporal processing
Several seconds of adaptation to a flickered stimulus causes a subsequent brief static stimulus to appear longer in duration. Non-sensory factors such as increased arousal and attention have been thought to mediate this flicker-based temporal-dilation aftereffect. Here we provide evidence that adaptation of low-level cortical visual neurons contributes to this aftereffect. The aftereffect was significantly reduced by a 45° change in Gabor orientation between adaptation and test. Because orientation-tuning bandwidths are smaller in lower-level cortical visual areas and are approximately 45° in human V1, the result suggests that flicker adaptation of orientation-tuned V1 neurons contributes to the temporal-dilation aftereffect. The aftereffect was abolished when the adaptor and test stimuli were presented to different eyes. Because eye preferences are strong in V1 but diminish in higher-level visual areas, the eye specificity of the aftereffect corroborates the involvement of low-level cortical visual neurons. Our results thus suggest that flicker adaptation of low-level cortical visual neurons contributes to expanding visual duration. Furthermore, this temporal-dilation aftereffect dissociates from the previously reported temporal-constriction aftereffect on the basis of the differences in their orientation and flicker-frequency selectivity, suggesting that the visual system possesses at least two distinct and potentially complementary mechanisms for adaptively coding perceived duration.
Understanding other people’s feelings in social interactions depends on the ability to map onto our body the sensory experiences we observed on other people’s bodies. It has been shown that the perception of tactile stimuli on the face is improved when concurrently viewing a face being touched. This Visual Remapping of Touch (VRT) is enhanced the more similar others are perceived to be to the self, and is strongest when viewing one’s face. Here, we ask whether altering self-other boundaries can in turn change the VRT effect. We used the enfacement illusion, which relies on synchronous interpersonal multisensory stimulation (IMS), to manipulate self-other boundaries. Following synchronous, but not asynchronous, IMS, the self-related enhancement of the VRT extended to the other individual. These findings suggest that shared multisensory experiences represent one key way to overcome the boundaries between self and others, as evidenced by changes in somatosensory processing of tactile stimuli on one’s own face when concurrently viewing another person’s face being touched.
Multisensory Interaction; Visual Remapping of Touch; Interpersonal Multisensory Stimulation; Self-recognition; Enfacement illusion
Because the environment often includes multiple sounds that overlap in time, listeners must segregate a sound of interest (the auditory figure) from other co-occurring sounds (the unattended auditory ground). We conducted a series of experiments to clarify the principles governing the extraction of auditory figures. We distinguish between auditory “objects” (relatively punctate events, such as a dog's bark) and auditory “streams” (sounds involving a pattern over time, such as a galloping rhythm). In Experiments 1 and 2, on each trial two sounds -- an object (a vowel) and a stream (a series of tones) – were presented with one target feature that could be perceptually grouped with either source. In each block of these experiments, listeners were required to attend to one of the two sounds, and report its perceived category. Across several experimental manipulations, listeners were more likely to allocate the feature to an impoverished object if the result of the grouping was a good, identifiable object. Perception of objects was quite sensitive to feature variation (noise masking), whereas perception of streams was more robust to feature variation. In Experiment 3, the number of sound sources competing for the feature was increased to three. This produced a shift toward relying more on spatial cues than on the potential contribution of the feature to an object's perceptual quality. The results support a distinction between auditory objects and streams, and provide new information about the way that the auditory world is parsed.
Auditory Figure; Auditory Scene Analysis; Auditory Perceptual Organization
We contrasted the effects of different types of working memory (WM) load on detection. Considering the sensory-recruitment hypothesis of visual short-term memory (VSTM) within load theory (e.g., Lavie, 2010) led us to predict that VSTM load would reduce visual-representation capacity, thus leading to reduced detection sensitivity during maintenance, whereas load on WM cognitive control processes would reduce priority-based control, thus leading to enhanced detection sensitivity for a low-priority stimulus. During the retention interval of a WM task, participants performed a visual-search task while also asked to detect a masked stimulus in the periphery. Loading WM cognitive control processes (with the demand to maintain a random digit order [vs. fixed in conditions of low load]) led to enhanced detection sensitivity. In contrast, loading VSTM (with the demand to maintain the color and positions of six squares [vs. one in conditions of low load]) reduced detection sensitivity, an effect comparable with that found for manipulating perceptual load in the search task. The results confirmed our predictions and established a new functional dissociation between the roles of different types of WM load in the fundamental visual perception process of detection.
visual working memory; executive cognitive control; selective attention; perceptual load; visual detection
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different explanations for the phenomenon of compensation for coarticulation (CfC). An example of CfC is that if a speaker produces a gesture with a front place of articulation, it may be pulled slightly backwards if it follows a back place of articulation, and listeners’ category boundaries shift (compensate) accordingly. The gestural account appeals to direct attunement to coarticulation to explain CfC, while the auditory account explains it by spectral contrast. In previous studies, spectral contrast and gestural consequences of coarticulation have been correlated, such that both accounts made identical predictions. We identify a liquid context in Tamil that disentangles contrast and coarticulation, such that the two accounts make different predictions. In a standard CfC task in Experiment 1, gestural coarticulation rather than spectral contrast determined the direction of CfC. Experiments 2, 3 and 4 demonstrated that tone analogues of the speech precursors failed to produce the same effects observed in Experiment 1, suggesting that simple spectral contrast cannot account for the findings of Experiment 1.
Compensation for coarticulation; speech perception; direct realism; articulatory; Tamil
Three experiments investigated whether 14- and 15-month-old infants use information for both friction and slant for prospective control of locomotion down slopes. In Experiment 1, high and low friction conditions were interleaved on a range of shallow and steep slopes. In Experiment 2, friction conditions were blocked. In Experiment 3, the low friction surface was visually distinct from the surrounding high friction surface. In all three experiments, infants could walk down steeper slopes in the high friction condition than in the low. Infants detected affordances for walking down slopes in the high friction condition, but in the low friction condition, they attempted impossibly slippery slopes and fell repeatedly. In both friction conditions, when infants paused to explore slopes, they were less likely to attempt slopes beyond their ability. Exploration was elicited by visual information for slant (Experiments 1 and 2) or a visually distinct surface that marked the change in friction (Experiment 3).
Infant locomotion; perception of affordances; friction; prospective control; perceptual exploration
Interference is reduced in mostly incongruent relative to mostly congruent lists. Classic accounts of this list-wide proportion congruence effect assume that list-level control processes strategically modulate word reading. Contemporary accounts posit that reliance on the word is modulated poststimulus onset by item-specific information (e.g., proportion congruency of the word). To adjudicate between these accounts, we used novel designs featuring neutral trials. In two experiments, we showed that the list-wide proportion congruence effect is accompanied by a change in neutral trial color-naming performance. Because neutral words have no item-specific bias, this pattern can be attributed to list-level control. Additionally, we showed that list-level attenuation of word reading led to a cost to performance on a secondary prospective memory task but only when that task required processing of the irrelevant, neutral word. These findings indicate that the list-wide proportion congruence effect at least partially reflects list-level control and challenge purely item-specific accounts of this effect.
list-wide proportion congruence; item-specific proportion congruence; cognitive control; prospective memory
When observers search for a target object, they incidentally learn the identities and locations of “background” objects in the same display. This learning can facilitate search performance, eliciting faster reaction times for repeated displays (Hout & Goldinger, 2010). Despite these findings, visual search has been successfully modeled using architectures that maintain no history of attentional deployments; they are amnesic (e.g., Guided Search Theory; Wolfe, 2007). In the current study, we asked two questions: 1) under what conditions does such incidental learning occur? And 2) what does viewing behavior reveal about the efficiency of attentional deployments over time? In two experiments, we tracked eye movements during repeated visual search, and we tested incidental memory for repeated non-target objects. Across conditions, the consistency of search sets and spatial layouts were manipulated to assess their respective contributions to learning. Using viewing behavior, we contrasted three potential accounts for faster searching with experience. The results indicate that learning does not result in faster object identification or greater search efficiency. Instead, familiar search arrays appear to allow faster resolution of search decisions, whether targets are present or absent.
Empirical work and models of visual word recognition have traditionally focused on group-level performance. Despite the emphasis on the prototypical reader, there is clear evidence that variation in reading skill modulates word recognition performance. In the present study, we examined differences between individuals who contributed to the English Lexicon Project (http://elexicon.wustl.edu), an online behavioral database containing nearly four million word recognition (speeded pronunciation and lexical decision) trials from over 1,200 participants. We observed considerable within- and between-session reliability across distinct sets of items, in terms of overall mean response time (RT), RT distributional characteristics, diffusion model parameters (Ratcliff, Gomez, & McKoon, 2004), and sensitivity to underlying lexical dimensions. This indicates reliably detectable individual differences in word recognition performance. In addition, higher vocabulary knowledge was associated with faster, more accurate word recognition performance, attenuated sensitivity to stimuli characteristics, and more efficient accumulation of information. Finally, in contrast to suggestions in the literature, we did not find evidence that individuals were trading-off in their utilization of lexical and nonlexical information.
There is growing evidence that individuation experience is necessary for development of expert object discrimination that transfers to new exemplars. Individuation training in human studies has primarily used label association tasks where labels are learned at both the individual and more abstract (basic) level, and expertise criterion requires that individual-level judgments become as fast as basic-level judgments. However, there are training situations when the use of labels is not practical (e.g., with animals or some clinical populations). Moreover, labeling itself can facilitate object discrimination, thus it is unclear what role labels play in the acquisition of expertise in such training paradigms. Here, participants completed an online game that did not require labels in which they interacted with novel objects (Greebles) or control objects (Yufos). Games either required individuation or categorization. We then assessed the impact of this exposure on an abridged Greeble training paradigm. As expected, participants who played Yufo games or Greeble categorization games showed a significant basic-level advantage for Greebles in the abridged training paradigm, typical of novices. However, participants who played the Greeble identity game showed a reduced basic-level advantage, suggesting that individuation without labels may be sufficient to acquire perceptual expertise.
The extent to which target words were predictable from prior context was varied: half of the target words were predictable and the other half were unpredictable. In addition, the length of the target word varied: the target words were short (4–6 letters), medium (7–9 letters), or long (10–12 letters). Length and predictability both yielded strong effects on the probability of skipping the target words and on the amount of time readers fixated the target words (when they were not skipped). However, there was no interaction in any of the measures examined for either skipping or fixation time. The results demonstrate that word predictability (due to contextual constraint) and word length have strong and independent influences on word skipping and fixation durations. Furthermore, since the long words extended beyond the word identification span, the data indicate that skipping can occur on the basis of partial information in relation to word identity.
According to P. K. Kuhl (1991), a perceptual magnet effect occurs when discrimination accuracy is lower among better instances of a phonetic category than among poorer instances. Three experiments examined the perceptual magnet effect for the vowel/i/. In Experiment 1, participants rated some examples of/i/as better instances of the category than others. In Experiment 2, no perceptual magnet effect was observed with materials based on Kuhl’s tokens of/i/or with items normed for each participant. In Experiment 3, participants labeled the vowels developed from Kuhl’s test set. Many of the vowels in the nonprototype/i/condition were not categorized as/i/s. This finding suggests that the comparisons obtained in Kuhl’s original study spanned different phonetic categories.
For many years there has been a consensus that early linguistic experience exerts a profound and often permanent effect on the perceptual abilities underlying the identification and discrimination of stop consonants. It has also been concluded that selective modification of the perception of stop consonants cannot be accomplished easily and quickly in the laboratory with simple discrimination training techniques. In the present article we report the results of three experiments that examined the perception of a three-way voicing contrast by naive monolingual speakers of English. Laboratory training procedures were implemented with a small computer in a real-time environment to examine the perception of voiced, voiceless unaspirated, and voiceless aspirated stops differing in voice onset time. Three perceptual categories were present for most subjects after only a few minutes of exposure to the novel contrast. Subsequent perceptual tests revealed reliable and consistent labeling and categorical-like discrimination functions for all three voicing categories, even though one of the contrasts is not phonologically distinctive in English. The present results demonstrate that the perceptual mechanisms used by adults in categorizing stop consonants can be modified easily with simple laboratory techniques in a short period of time.
Perception is influenced by the perceiver’s ability to perform intended actions. For example, when people intend to reach with a tool to targets that are just beyond arm’s reach, the targets look closer than when they intend to reach without the tool (Witt, Proffitt, & Epstein, 2005). This is one of several examples demonstrating that behavioral potential affects perception. However, the action-specific processes that are involved in relating the person’s abilities to perception have yet to be explored. Four experiments are presented that implicate motor simulation as a mediator of these effects. When a perceiver intends to perform an action, the perceiver runs a motor simulation of that action. The perceiver’s ability to perform the action, as determined by the outcome of the simulation, influences perceived distance.
distance perception; motor simulation; affordances; perception-action coupling; intention
The aim of this study was to investigate the perception of possibilities for action (i.e., affordances) that depend on one’s movement capabilities, and more specifically, the passability of a shrinking gap between converging obstacles. We introduce a new optical invariant that specifies in intrinsic units the minimum locomotor speed needed to safely pass through a shrinking gap. Detecting this information during self-motion requires recovering a component of the obstacles’ local optical expansion due to obstacle motion, independent of self-motion. In principle, recovering the obstacle motion component could involve either visual or non-visual self-motion information. We investigated the visual and non-visual contributions in two experiments in which subjects walked through a virtual environment and made judgments about whether it was possible to pass through a shrinking gap. On a small percentage of trials, visual and non-visual self-motion information were independently manipulated by varying the speed with which subjects moved through the virtual environment. Comparisons of judgments on such catch trials with judgments on normal trials revealed both visual and non-visual contributions to the detection of information about minimum walking speed.
Listeners rapidly adapt to many forms of degraded speech. What level of information drives this adaptation, however, remains unresolved. The current study exposed listeners to sinewave-vocoded speech in one of three languages, which manipulated the type of information shared between the training languages (German, Mandarin, or English) and the testing language (English) in an audio-visual (AV) or an audio plus still frames modality (A+Stills). Three control groups were included to assess procedural learning effects. After training, listeners’ perception of novel sinewave-vocoded English sentences was tested. Listeners exposed to German-AV materials performed equivalently to listeners exposed to English AV or A+Stills materials and significantly better than two control groups. The Mandarin groups and German-A+Stills group showed an intermediate level of performance. These results suggest that full lexical access is not absolutely necessary for adaptation to degraded speech, but providing AV-training in a language that is similar phonetically to the testing language can facilitate adaptation.
perceptual adaptation; vocoded speech; cross-language; degraded speech; speech perception
In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal quality. In contrast, the reversed speech signals preserved vocal quality while distorting temporally based phonetic properties. The training results indicate that listeners learned to identify talkers even from acoustic signals lacking natural vocal quality. Generalization performance varied across the different signals and depended on the salience of phonetic information. The results suggest similarities in the phonetic attributes underlying talker recognition and phonetic perception.
In a cross-modal matching task, participants were asked to match visual and auditory displays of speech based on the identity of the speaker. The present investigation used this task with acoustically transformed speech to examine the properties of sound that can convey cross-modal information. Word recognition performance was also measured under the same transformations. The authors found that cross-modal matching was only possible under transformations that preserved the relative spectral and temporal patterns of formant frequencies. In addition, cross-modal matching was only possible under the same conditions that yielded robust word recognition performance. The results are consistent with the hypothesis that acoustic and optical displays of speech simultaneously carry articulatory information about both the underlying linguistic message and indexical properties of the talker.
The brain exhibits remarkable facility in exerting attentional control in most circumstances, but it also suffers apparent limitations in others. Our goal is to construct a rational account for why attentional control appears sub-optimal under conditions of conflict, and what it implies about the underlying computational principles. The formal framework we employ is based on Bayesian probability theory, which provides a convenient language for delineating the rationale and dynamics of attentional selection. We illustrate these issues using the Eriksen flanker task, a classical paradigm that explores the effects of competing sensory inputs on response tendencies. We show how two distinctly formulated models, based on compatibility bias and spatial uncertainty principles, can account for the behavioral data. We also suggest novel experiments that may differentiate these models. In addition, we elaborate a simplified model that approximates optimal computation, and may map more directly onto the underlying neural machinery. This approximate model uses conflict monitoring, putatively mediated by the anterior cingulate cortex, as proxy for compatibility representation. We also consider how this conflict information might be disseminated and used to control processing.
Eriksen; conflict; attention; Bayesian; decision-making
Two experiments examined parafoveal preview for words located in the middle of sentences and at sentence boundaries. Parafoveal processing was shown to occur for words at sentence-initial, mid-sentence, and sentence-final positions. Both Experiments 1 and 2 showed reduced effects of preview on regressions out for sentence-initial words. In addition, Experiment 2 showed reduced preview effects on first-pass reading times for sentence-initial words. These effects of sentence position on preview could result from reduced parafoveal processing for sentence-initial words, or other processes specific to word reading at sentence boundaries. In addition to the effects of preview, the experiments also demonstrate variability in the effects of sentence wrap-up on different reading measures, indicating that the presence and time course of wrap-up effects may be modulated by text-specific factors. We also report simulations of Experiment 2 using version 10 of E-Z Reader (Reichle, Warren, & McConnell, 2009), designed to explore the possible mechanisms underlying parafoveal preview at sentence boundaries.
reading; eye movements; E-Z Reader; parafoveal preview; wrap-up effects
When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships.
audiovisual speech perception; congruent and incongruent; quantitative stimulus measures; factor analysis
Real-world objects can be viewed at a range of distances and thus can be experienced at a range of visual angles within the visual field. Given the large amount of visual size variation possible when observing objects, we examined how internal object representations represent visual size information. In a series of experiments which required observers to access existing object knowledge, we observed that real-world objects have a consistent visual size at which they are drawn, imagined, and preferentially viewed. Importantly, this visual size is proportional to the logarithm of the assumed size of the object in the world, and is best characterized not as a fixed visual angle, but by the ratio of the object and the frame of space around it. Akin to the previous literature on canonical perspective, we term this consistent visual size information the canonical visual size.
canonical perspective; canonical viewpoint; visual size; physical size; object representation
When we recognize an object, do we automatically know how big it is in the world? We employed a Stroop-like paradigm, in which two familiar objects were presented at different visual sizes on the screen. Observers were faster to indicate which was bigger or smaller on the screen when the real-world size of the objects was congruent with the visual size than when it was incongruent— demonstrating a familiar-size Stroop effect. Critically, the real-world size of the objects was irrelevant for the task. This Stroop effect was also present when only one item was present at a congruent or incongruent visual size on the display. In contrast, no Stroop effect was observed for participants who simply learned a rule to categorize novel objects as big or small. These results show that people access the familiar size of objects without the intention of doing so, demonstrating that real-world size is an automatic property of object representation.
object representation; familiar size; real-world size; visual size
Previous research on perceiving spatial layout has found that people often exhibit normative biases in their perception of the environment. For instance, slant is typically overestimated and distance is usually underestimated. Surprisingly, however, the perception of height has rarely been studied. The present experiments examined the perception of height when viewed from the top (e.g., looking down), or from the bottom (e.g., looking up). Multiple measures were adapted from previous studies of horizontal extents to assess the perception of height. Across all of the measures, a large, consistent bias was found: vertical distances were greatly overestimated, especially from the top. Secondary findings suggest that the overestimation of distance and size that occurs when looking down from a high place correlates with reports of trait- and state-level fear of heights, suggesting that height overestimation may be due, in part, to fear.
height perception; distance perception; perception and emotion; fear of heights; acrophobia