People naturally dance to music, and research has shown that rhythmic auditory stimuli facilitate production of precisely timed body movements. If motor mechanisms are closely linked to auditory temporal processing, just as auditory temporal processing facilitates movement production, producing action might reciprocally enhance auditory temporal sensitivity. We tested this novel hypothesis with a standard temporal-bisection paradigm, in which the slope of the temporal-bisection function provides a measure of temporal sensitivity. The bisection slope for auditory time perception was steeper when participants initiated each auditory stimulus sequence via a keypress than when they passively heard each sequence, demonstrating that initiating action enhances auditory temporal sensitivity. This enhancement is specific to the auditory modality, because voluntarily initiating each sequence did not enhance visual temporal sensitivity. A control experiment ruled out the possibility that tactile sensation associated with a keypress increased auditory temporal sensitivity. Taken together, these results demonstrate a unique reciprocal relationship between auditory time perception and motor mechanisms. As auditory perception facilitates precisely timed movements, generating action enhances auditory temporal sensitivity.
Action; Auditory temporal sensitivity; Visual temporal sensitivity
Behavioral and neuroimaging findings indicate that distinct cognitive and neural processes underlie solving problems with sudden insight. Moreover, people with less focused attention sometimes perform better on tests of insight and creative problem solving. However, it remains unclear whether different states of attention, within individuals, influence the likelihood of solving problems with insight or with analysis. In this experiment, participants (N = 40) performed a baseline block of verbal problems, then performed one of two visual tasks, each emphasizing a distinct aspect of visual attention, followed by a second block of verbal problems to assess change in performance. After participants engaged in a center-focused flanker task requiring relatively focused visual attention, they reported solving more verbal problems with analytic processing. In contrast, after participants engaged in a rapid object identification task requiring attention to broad space and weak associations, they reported solving more verbal problems with insight. These results suggest that general attention mechanisms influence both visual attention task performance and verbal problem solving.
verbal problem solving; visual attention; insight; creativity; focused attention; broadened attention
Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment.
How rapidly can one voluntarily influence percept generation? The time course of voluntary visual-spatial attention is well studied, but the time course of intentional control over percept generation is relatively unknown. We investigated the latter using “one-shot” apparent motion. When a vertical or horizontal pair of squares is replaced by its 90° rotated version, the bottom-up signal is ambiguous. From this ambiguous signal, it is known that people can intentionally generate a percept of rotation in a desired direction (clockwise or counterclockwise). To determine the time course of this intentional control, we instructed participants to voluntarily induce rotation in a pre-cued direction (clockwise rotation when a high-pitched tone is heard and counter-clockwise rotation when a low-pitched tone is heard), and then to report the direction of rotation that was actually perceived. We varied the delay between the instructional cue and the rotated frame (cue-lead time) from 0 ms to 1067 ms. Intentional control became more effective with longer cue-lead times (asymptotically effective at 533 ms). Notably, intentional control was reliable even with a zero cue-lead time; control experiments ruled out response bias and the development of an auditory-visual association as explanations. This demonstrates that people can interpret an auditory cue and intentionally generate a desired motion percept surprisingly rapidly, entirely within the subjectively instantaneous moment in which the visual system constructs a percept of apparent motion.
intentional control; visual bistability; apparent motion; attentive tracking
Several seconds of adaptation to a flickered stimulus causes a subsequent brief static stimulus to appear longer in duration. Non-sensory factors such as increased arousal and attention have been thought to mediate this flicker-based temporal-dilation aftereffect. Here we provide evidence that adaptation of low-level cortical visual neurons contributes to this aftereffect. The aftereffect was significantly reduced by a 45° change in Gabor orientation between adaptation and test. Because orientation-tuning bandwidths are smaller in lower-level cortical visual areas and are approximately 45° in human V1, the result suggests that flicker adaptation of orientation-tuned V1 neurons contributes to the temporal-dilation aftereffect. The aftereffect was abolished when the adaptor and test stimuli were presented to different eyes. Because eye preferences are strong in V1 but diminish in higher-level visual areas, the eye specificity of the aftereffect corroborates the involvement of low-level cortical visual neurons. Our results thus suggest that flicker adaptation of low-level cortical visual neurons contributes to expanding visual duration. Furthermore, this temporal-dilation aftereffect dissociates from the previously reported temporal-constriction aftereffect on the basis of the differences in their orientation and flicker-frequency selectivity, suggesting that the visual system possesses at least two distinct and potentially complementary mechanisms for adaptively coding perceived duration.
Expressions of emotion are often brief, providing only fleeting images from which to base important social judgments. We sought to characterize the sensitivity and mechanisms of emotion detection and expression categorization when exposure to faces is very brief, and to determine whether these processes dissociate. Observers viewed 2 backward-masked facial expressions in quick succession, 1 neutral and the other emotional (happy, fearful, or angry), in a 2-interval forced-choice task. On each trial, observers attempted to detect the emotional expression (emotion detection) and to classify the expression (expression categorization). Above-chance emotion detection was possible with extremely brief exposures of 10 ms and was most accurate for happy expressions. We compared categorization among expressions using a d′ analysis, and found that categorization was usually above chance for angry versus happy and fearful versus happy, but consistently poor for fearful versus angry expressions. Fearful versus angry categorization was poor even when only negative emotions (fearful, angry, or disgusted) were used, suggesting that this categorization is poor independent of decision context. Inverting faces impaired angry versus happy categorization, but not emotion detection, suggesting that information from facial features is used differently for emotion detection and expression categorizations. Emotion detection often occurred without expression categorization, and expression categorization sometimes occurred without emotion detection. These results are consistent with the notion that emotion detection and expression categorization involve separate mechanisms.
emotion detection; expression categorization; face-inversion effect; awareness; face processing
The present study investigated the limits of semantic processing without awareness, during continuous flash suppression (CFS). We used compound remote associate word problems, in which three seemingly unrelated words (e.g., pine, crab, sauce) form a common compound with a single solution word (e.g., apple). During the first 3 s of each trial, the three problem words or three irrelevant words (control condition) were suppressed from awareness, using CFS. The words then became visible, and participants attempted to solve the word problem. Once the participants solved the problem, they indicated whether they had solved it by insight or analytically. Overall, the compound remote associate word problems were solved significantly faster after the problem words, as compared with irrelevant words, were presented during the suppression period. However this facilitation occurred only when people solved with analysis, not with insight. These results demonstrate that semantic processing, but not necessarily semantic integration, may occur without awareness.
Awareness; Continuous flash suppression; Semantic processing; Semantic integration; Binocular rivalry; Problem solving
While perceiving speech, people see mouth shapes that are systematically associated with sounds. In particular, a vertically stretched mouth produces a /woo/ sound, whereas a horizontally stretched mouth produces a /wee/ sound. We demonstrate that hearing these speech sounds alters how we see aspect ratio, a basic visual feature that contributes to perception of 3D space, objects and faces. Hearing a /woo/ sound increases the apparent vertical elongation of a shape, whereas hearing a /wee/ sound increases the apparent horizontal elongation. We further demonstrate that these sounds influence aspect ratio coding. Viewing and adapting to a tall (or flat) shape makes a subsequently presented symmetric shape appear flat (or tall). These aspect ratio aftereffects are enhanced when associated speech sounds are presented during the adaptation period, suggesting that the sounds influence visual population coding of aspect ratio. Taken together, these results extend previous demonstrations that visual information constrains auditory perception by showing the converse – speech sounds influence visual perception of a basic geometric feature.
Auditory–visual; Aspect ratio; Crossmodal; Shape perception; Speech perception
visual spatial frequency; auditory amplitude-modulation rate; auditory-visual interactions
When attention is directed to the local or global level of a hierarchical stimulus, attending to that same scale of information is subsequently facilitated. This effect is called level-priming, and in its pure form, it has been dissociated from stimulus- or response-repetition priming. In previous studies, pure level-priming has been demonstrated using hierarchical stimuli composed of alphanumeric forms consisting of lines. Here, we test whether pure level-priming extends to hierarchical configurations of generic geometric forms composed of elements that can be depicted either outlined or filled-in. Interestingly, whereas hierarchical stimuli composed of outlined elements benefited from pure level-priming, for both local and global targets, those composed of filled-in elements did not. The results are not readily attributable to differences in spatial frequency content, suggesting that forms composed of outlined and filled-in elements are treated differently by attention and/or priming mechanisms. Because our results present a surprising limit on attentional persistence to scale, we propose that other findings in the attention and priming literature be evaluated for their generalizability across a broad range of stimulus classes, including outlined and filled-in depictions.
priming; local; global; attention; hierarchical stimuli
Reading comprehension depends on neural processes supporting the access, understanding, and storage of words over time. Examinations of the neural activity correlated with reading have contributed to our understanding of reading comprehension, especially for the comprehension of sentences and short passages. However, the neural activity associated with comprehending an extended text is not well-understood. Here we describe a current-source-density (CSD) index that predicts individual differences in the comprehension of an extended text. The index is the difference in CSD-transformed event-related potentials (ERPs) to a target word between two conditions: a comprehension condition with words from a story presented in their original order, and a scrambled condition with the same words presented in a randomized order. In both conditions participants responded to the target word, and in the comprehension condition they also tried to follow the story in preparation for a comprehension test. We reasoned that the spatiotemporal pattern of difference-CSDs would reflect comprehension-related processes beyond word-level processing. We used a pattern-classification method to identify the component of the difference-CSDs that accurately (88%) discriminated good from poor comprehenders. The critical CSD index was focused at a frontal-midline scalp site, occurred 400–500 ms after target-word onset, and was strongly correlated with comprehension performance. Behavioral data indicated that group differences in effort or motor preparation could not explain these results. Further, our CSD index appears to be distinct from the well-known P300 and N400 components, and CSD transformation seems to be crucial for distinguishing good from poor comprehenders using our experimental paradigm. Once our CSD index is fully characterized, this neural signature of individual differences in extended-text comprehension may aid the diagnosis and remediation of reading comprehension deficits.
reading comprehension; EEG/ERP; machine learning applied to neuroscience; current source density; working memory
How do the characteristics of sounds influence the allocation of visual-spatial attention? Natural sounds typically change in frequency. Here we demonstrate that the direction of frequency change guides visual-spatial attention more strongly than the average or ending frequency, and provide evidence suggesting that this cross-modal effect may be mediated by perceptual experience. We used a Go/No-Go color-matching task to avoid response compatibility confounds. Participants performed the task either with their heads upright or tilted by 90°, misaligning the head-centered and environmental axes. The first of two colored circles was presented at fixation and the second was presented in one of four surrounding positions in a cardinal or diagonal direction. Either an ascending or descending auditory-frequency sweep was presented coincident with the first circle. Participants were instructed to respond to the color match between the two circles and to ignore the uninformative sounds. Ascending frequency sweeps facilitated performance (response time and/or sensitivity) when the second circle was presented at the cardinal top position and descending sweeps facilitated performance when the second circle was presented at the cardinal bottom position; there were no effects of the average or ending frequency. The sweeps had no effects when circles were presented at diagonal locations, and head tilt entirely eliminated the effect. Thus, visual-spatial cueing by pitch change is narrowly tuned to vertical directions and dominates any effect of average or ending frequency. Because this cross-modal cueing is dependent on the alignment of head-centered and environmental axes, it may develop through associative learning during waking upright experience.
cross-modal perception; auditory-visual interactions; visual-spatial attention; implicit attentional processing; multi-modal cognition
Visual pattern processing becomes increasingly complex along the ventral pathway, from the low-level coding of local orientation in the primary visual cortex to the high-level coding of face identity in temporal visual areas. Previous research using pattern aftereffects as a psychophysical tool to measure activation of adaptive feature coding has suggested that awareness is relatively unimportant for the coding of orientation, but awareness is crucial for the coding of face identity. We investigated where along the ventral visual pathway awareness becomes crucial for pattern coding. Monoptic masking, which interferes with neural spiking activity in low-level processing while preserving awareness of the adaptor, eliminated open-curvature aftereffects but preserved closed-curvature aftereffects. In contrast, dichoptic masking, which spares spiking activity in low-level processing while wiping out awareness, preserved open-curvature aftereffects but eliminated closed-curvature aftereffects. This double dissociation suggests that adaptive coding of open and closed curvatures straddles the divide between weakly and strongly awareness-dependent pattern coding.
awareness; pattern adaptation; visual perception
Auditory and visual processes demonstrably enhance each other based on spatial and temporal coincidence. Our recent results on visual search have shown that auditory signals also enhance visual salience of specific objects based on multimodal experience. For example, we tend to see an object (e.g., a cat) and simultaneously hear its characteristic sound (e.g., “meow”), to name an object when we see it, and to vocalize a word when we read it, but we do not tend to see a word (e.g., cat) and simultaneously hear the characteristic sound (e.g., “meow”) of the named object. If auditory-visual enhancements occur based on this pattern of experiential associations, playing a characteristic sound (e.g., “meow”) should facilitate visual search for the corresponding object (e.g., an image of a cat), hearing a name should facilitate visual search for both the corresponding object and corresponding word, but playing a characteristic sound should not facilitate visual search for the name of the corresponding object. Our present and prior results together confirmed these experiential-association predictions. We also recently showed that the underlying object-based auditory-visual interactions occur rapidly (within 220 ms) and guide initial saccades towards target objects. If object-based auditory-visual enhancements are automatic and persistent, an interesting application would be to use characteristic sounds to facilitate visual search when targets are rare, such as during baggage screening. Our participants searched for a gun among other objects when a gun was presented on only 10% of the trials. The search time was speeded when a gun sound was played on every trial (primarily on gun-absent trials); importantly, playing gun sounds facilitated both gun-present and gun-absent responses, suggesting that object-based auditory-visual enhancements persistently increase the detectability of guns rather than simply biasing gun-present responses. Thus, object-based auditory-visual interactions that derive from experiential associations rapidly and persistently increase visual salience of corresponding objects.
Laughter is an auditory stimulus that powerfully conveys positive emotion. We investigated how laughter influenced visual perception of facial expressions. We simultaneously presented laughter with a happy, neutral, or sad schematic face. The emotional face was briefly presented either alone or among a crowd of neutral faces. We used a matching method to determine how laughter influenced the perceived intensity of happy, neutral, and sad expressions. For a single face, laughter increased the perceived intensity of a happy expression. Surprisingly, for a crowd of faces laughter produced an opposite effect, increasing the perceived intensity of a sad expression in a crowd. A follow-up experiment revealed that this contrast effect may have occurred because laughter made the neutral distracter faces appear slightly happy, thereby making the deviant sad expression stand out in contrast. A control experiment ruled out semantic mediation of the laughter effects. Our demonstration of the strong context dependence of laughter effects on facial expression perception encourages a re-examination of the previously demonstrated effects of prosody, speech content, and mood on face perception, as they may similarly be context dependent.
Crossmodal interaction; emotion; facial expressions; laughter
Frequency-following and frequency-doubling neurons are ubiquitous in both striate and extrastriate visual areas. However, responses from these two types of neural populations have not been effectively compared in humans because previous EEG studies have not successfully dissociated responses from these populations. We devised a light–dark flicker stimulus that unambiguously distinguished these responses as reflected in the first and second harmonics in the steady-state visual evoked potentials. These harmonics revealed the spatial and functional segregation of frequency-following (the first harmonic) and frequency-doubling (the second harmonic) neural populations. Spatially, the first and second harmonics in steady-state visual evoked potentials exhibited divergent posterior scalp topographies for a broad range of EEG frequencies. The scalp maximum was medial for the first harmonic and contralateral for the second harmonic, a divergence not attributable to absolute response frequency. Functionally, voluntary visual–spatial attention strongly modulated the second harmonic but had negligible effects on the simultaneously elicited first harmonic. These dissociations suggest an intriguing possibility that frequency-following and frequency-doubling neural populations may contribute complementary functions to resolve the conflicting demands of attentional enhancement and signal fidelity—the frequency-doubling population may mediate substantial top–down signal modulation for attentional selection, whereas the frequency-following population may simultaneously preserve relatively undistorted sensory qualities regardless of the observer’s cognitive state.
Visual spatial attention can be exogenously captured by a salient stimulus or can be endogenously allocated by voluntary effort. Whether these two attention modes serve distinctive functions is debated, but for processing of single targets the literature suggests superiority of exogenous attention (it is faster acting and serves more functions). We report that endogenous attention uniquely contributes to processing of multiple targets. For speeded visual discrimination, response times are faster for multiple redundant targets than for single targets due to probability summation and/or signal integration. This redundancy gain was unaffected when attention was exogenously diverted from the targets, but was completely eliminated when attention was endogenously diverted. This was not due to weaker manipulation of exogenous attention because our exogenous and endogenous cues similarly affected overall response times. Thus, whereas exogenous attention is superior for processing single targets, endogenous attention plays a unique role in allocating resources crucial for rapid concurrent processing of multiple targets.
When you are looking for an object, does hearing its characteristic sound make you find it more quickly? Our recent results supported this possibility by demonstrating that when a cat target, for example, was presented among other objects, a simultaneously presented “meow” sound (containing no spatial information) reduced the manual response time for visual localization of the target. To extend these results, we determined how rapidly an object-specific auditory signal can facilitate target detection in visual search. On each trial, participants fixated a specified target object as quickly as possible. The target’s characteristic sound speeded the saccadic search time within 215–220 ms and also guided the initial saccade toward the target, compared to presentation of a distractor’s sound or to no sound. These results suggest that object-based auditory-visual interactions rapidly increase the target object’s salience in visual search.
Although local interactions involving orientation and spatial frequency are well understood, less is known about spatial interactions involving higher level pattern features. We examined interactive coding of aspect ratio, a prevalent two-dimensional feature. We measured perception of two simultaneously flashed ellipses by randomly post-cueing one of them and having observers indicate its aspect ratio. Aspect ratios interacted in two ways. One manifested as an aspect-ratio-repulsion effect. For example, when a slightly tall ellipse and a taller ellipse were simultaneously flashed, the less tall ellipse appeared flatter and the taller ellipse appeared even taller. This repulsive interaction was long range, occurring even when the ellipses were presented in different visual hemifields. The other interaction manifested as a global assimilation effect. An ellipse appeared taller when it was a part of a global vertical organization than when it was a part of a global horizontal organization. The repulsion and assimilation effects temporally dissociated as the former slightly strengthened, and the latter disappeared when the ellipse-to-mask stimulus onset asynchrony was increased from 40 to 140 ms. These results are consistent with the idea that shape perception emerges from rapid lateral and hierarchical neural interactions.
aspect ratio; repulsion; assimilation; lateral interaction; hierarchical interaction; shape perception
Unconscious processing of stimuli with emotional content can bias affective judgments. Is this subliminal affective priming merely a transient phenomenon manifested in fleeting perceptual changes, or are long-lasting effects also induced? To address this question, we investigated memory for surprise faces 24 hours after they had been shown with 30-ms fearful, happy, or neutral faces. Surprise faces subliminally primed by happy faces were initially rated as more positive, and were later remembered better, than those primed by fearful or neutral faces. Participants likely to have processed primes supraliminally did not respond differentially as a function of expression. These results converge with findings showing memory advantages with happy expressions, though here the expressions were displayed on the face of a different person, perceived subliminally, and not present at test. We conclude that behavioral biases induced by masked emotional expressions are not ephemeral, but rather can last at least 24 hours.
subliminal priming; memory; emotion; facial expressions; consciousness; awareness; affect
Maintenance of stable central eye fixation is crucial for a variety of behavioral, electrophysiological, and neuroimaging experiments. Naïve observers in these experiments are not typically accustomed to fixating, requiring the use of cumbersome and costly eye-tracking or producing confounds in results. We devised a flicker display that produced an easily detectable visual phenomenon whenever the eyes moved. A few minutes of training using this display dramatically improved the accuracy of eye fixation while observers performed a demanding spatial attention cueing task. The same amount of training using control displays did not produce significant fixation improvements and some observers consistently made eye movements to the peripheral attention cue, contaminating the cueing effect. Our results indicate that (1) eye fixation can be rapidly improved in naïve observers by providing real-time feedback about eye movements, and (2) our simple flicker technique provides an easy and effective method for providing this feedback.
The ability to track multiple moving objects with attention has been the focus of much research. However, the literature is relatively inconclusive regarding two key aspects of this ability, (1) whether the distribution of attention among the tracked targets is fixed during a period of tracking or is dynamically adjusted, and (2) whether motion information (direction and/or speed) is used to anticipate target locations even when velocities constantly change due to inter-object collisions. These questions were addressed by analyzing target-localization errors. Targets in crowded situations (i.e., those in danger of being lost) were localized more precisely than were uncrowded targets. Furthermore, the response vector (pointing from the target location to the reported location) was tuned to the direction of target motion, and observers with stronger direction tuning localized targets more precisely. Overall, our results provide evidence that multiple-object tracking mechanisms dynamically adjust the spatial distribution of attention in a demand-based manner (allocating more resources to targets in crowded situations) and utilize motion information (especially direction information) to anticipate target locations.
attention; direction; localization; motion; multiple-object tracking; representational momentum; speed
High-level visual neurons in the ventral stream typically have large receptive fields, supporting position-invariant object recognition but entailing poor spatial resolution. Consequently, when multiple objects fall within their large receptive fields, unless selective attention is deployed, their responses are averages of responses to the individual objects. We investigated a behavioral consequence of this neural averaging in the perception of facial expressions. Two faces (7°-apart) were briefly presented (100-ms, backward-masked) either within the same visual hemifield (within-hemifield condition) or in different hemifields (between-hemifield condition). Face pairs included happy, angry, and valence-neutral faces, and observers rated the emotional valence of a post-cued face. Perceptual averaging of facial expressions was predicted only for the within-hemifield condition because the receptive fields of ‘face-tuned’ neurons are primarily confined within the contralateral field; the between-hemifield condition served to control for post-perceptual effects. Consistent with averaging, valence-neutral faces appeared more positive when paired with a happy face than when paired with an angry face, and affective intensities of happy and angry faces were reduced by accompanying valence-neutral or opposite-valence faces, in the within-hemifield relative to the between-hemifield condition. We thus demonstrated within-hemifield perceptual averaging of a complex feature as predicted by neural averaging in the ventral visual stream.
object recognition; face recognition; shape and contour; ventral visual pathway; neural averaging; inferotemporal cortex
In a natural environment, objects that we look for often make characteristic sounds. A hiding cat may meow, or the keys in the cluttered drawer may jingle when moved. Using a visual search paradigm, we demonstrated that characteristic sounds facilitated visual localization of objects, even when the sounds carried no location information. For example, finding a cat was faster when participants heard a meow sound. In contrast, sounds had no effect when participants searched for names rather than pictures of objects. For example, hearing “meow” did not facilitate localization of the word cat. These results suggest that characteristic sounds cross-modally enhance visual (rather than conceptual) processing of the corresponding objects. Our behavioral demonstration of object-based cross-modal enhancement complements the extensive literature on space-based cross-modal interactions. When looking for your keys next time, you might want to play jingling sounds.